Skip to content

Datasets

Boas Pucker edited this page Feb 9, 2021 · 4 revisions

Datasets

While, the concept and usage of KIPEs are documented in the README, the purpose of this wiki is to describe the fundamental basis of KIPEs: the datasets. The sequence collection of enzymes involved in the flavonoid biosynthesis pathway cover most of the described sequences and will be updated once more sequences are characterized. Since KIPEs is also helpful for the identification of gene families, it could be useful to investigate families of transcription factors (MYBs, bHLH, WD40, WRKY, MADS-box, bZIP, and more).

Comprehensive MYB dataset

Currently, there are collections of MYB sequences available which comprise only a few selected species. This collection is manually curated and based on several publications which looked into the MYB gene family in a specific species. However, I performed a more comprehensive analysis and identified R2R3 MYB sequences (and others) for most plant species with available genome sequences. Including this large number of additional MYB sequences could slightly improve the identification of MYBs in a new species, but would substantially increase the run time of that analysis. Therefore, I have not to include all sequences in the public dataset. However, all these sequences are available upon request. Please get in touch if you are interested: Dr. Boas Pucker (email).

Here is a list of species which were investigated based on annotations provided by NCBI :

Actinidia chinensis, Aegilops tauschii, Amborella trichopoda, Ananas comosus, Apostasia shenzhenica, Aquilegia coerulea, Arabidopsis lyrata, Arabidopsis thaliana, Arachis duranensis, Arachis ipaensis, Artemisia annua, Asparagus officinalis, Auxenochlorella protothecoides, Bathycoccus prasinos, Beta vulgaris, Brachypodium distachyon, Brassica napus, Brassica oleracea, Brassica rapa, Cajanus cajan, Camelina sativa, Capsella rubella, Capsicum annuum, Capsicum baccatum, Capsicum chinense, Carica papaya, Chenopodium quinoa, Chlorella sorokiniana, Chlorella variabilis, Cicer arietinum, Citrus clementina, Citrus sinensis, Coccomyxa subellipsoidea, Corchorus capsularis, Corchorus olitorius, Cucumis melo, Cucumis sativus, Cucurbita maxima, Cucurbita pepo, Cynara cardunculus, Daucus carota, Dendrobium officinale, Dichanthelium oligosanthes, Dorcoceras hygrometricum, Durio zibethinus, Elaeis guineensis, Erythranthe guttata, Eucalyptus grandis, Eutrema salsugineum, Fragaria vesca, Genlisea aurea, Glycine max, Gonium pectorale, Gossypium arboreum, Gossypium hirsutum, Gossypium raimondii, Handroanthus impetiginosus, Helianthus annuus, Herrania umbratica, Hevea brasiliensis, Ipomoea nil, Jatropha curcas, Juglans regia, Lactuca sativa, Lupinus angustifolius, Macleaya cordata, Malus domestica, Manihot esculenta, Marchantia polymorpha, Medicago truncatula, Micractinium conductrix, Micromonas commoda, Micromonas pusilla, Momordica charantia, Monoraphidium neglectum, Morus notabilis, Musa acuminata, Nelumbo nucifera, Nicotiana attenuata, Nicotiana sylvestris, Nicotiana tabacum, Nicotiana tomentosiformis, Olea europaea, Oryza brachyantha, Oryza sativa, Ostreococcus lucimarinus, Ostreococcus tauri, Panicum hallii, Phalaenopsis equestris, Phaseolus vulgaris, Phoenix dactylifera, Physcomitrella patens, Populus euphratica, Populus trichocarpa, Prunus avium, Prunus mume, Prunus persica, Punica granatum, Pyrus x bretschneideri, Quercus suber, Raphanus sativus, Ricinus communis, Rosa chinensis, Selaginella moellendorffii, Sesamum indicum, Setaria italica, Solanum lycopersicum, Solanum pennellii, Solanum tuberosum, Sorghum bicolor, Spinacia oleracea, Tarenaya hassleriana, Tetrabaena socialis, Theobroma cacao, Vigna angularis, Vigna radiata, Vitis vinifera, Volvox carteri, Zea mays, Ziziphus jujuba, Zostera marina

Here is a list of species which were investigated based on annotations provided by phytozome/JGI:

Acomosus_321_v3, Ahalleri_264_v1_1, Ahypochondriacus_459_v2_1, Ahypogaea_530_v1_0, Alinifolium_472_v1_1, Alyrata_384_v2_1, Aoccidentale_449_v0_9, Aofficinalis_498_V1_1, Athaliana_447_Araport11, Atrichopoda_291_v1_0, Bbraunii_502_v2_1, Bdistachyon_556_v3_2, Boleraceacapitata_446_v1_0, BrapaFPsc_277_v1_3, Bstacei_316_v1_1, Bstricta_278_v1_2, Bsylvaticum_490_v1_1, Camplexicaulis_470_v1_1, Carietinum_492_v1_0, Ccitriodora_507_v2_1, Cclementina_182_v1_0, Cgrandiflora_266_v1_1, Chispanica_488_v1_1, Ckanehirae_531_v3, Cmaritima_481_v1_1, CpurpureusGG1_539_v1_1, CpurpureusR40_538_v1_1, Creinhardtii_281_v5_6, Crubella_474_v1_1, Csinensis_154_v1_1, Cviolacea_484_v1_1, Czofingiensis_461_v5_2_3_2, Dalata_550_v2_1, Dcarota_388_v2_0, Dsophioides_482_v1_1, Dstrictus_489_v1_1, Egrandis_297_v2_0, Esalsugineum_173_v1_0, Evesicaria_487_v1_1, Fvesca_501_v2_0_a2, Gbarbadense_526_v1_1, Gdarwinii_529_v1_1, Ghirsutum_527_v2_1, GmaxLee_510_v1_1, Graimondii_221_v2_1, Gsoja_509_v1_1, Hannuus_494_r1_2, Hvulgare_462_r1, Iamara_485_v1_1, Itinctoria_475_v1_1, Kfedtschenkoi_382_v1_1, Klaxiflora_309_v1_1, Lannua_476_v1_1, Lsativa_467_v5, Lsativum_478_v1_1, Mdomestica_491_v1_1, Mesculenta_520_v7_1, Mguttatus_256_v2_0, MguttatusNONTOL_553_v4_0, MguttatusTOL_551_v5_0, Mmaritima_477_v1_1, Mpolymorpha_320_v3_1, MpusillaCCMP1545_228_v3_0, Msinensis_497_v7_1, Mtruncatula_285_Mt4_0v1, Olucimarinus_231_v2_0, Osativa_204_v7_0, Othomaeum_386_v1_0, PdeltoidesWV94_445_v2_1, Ppatens_318_v3_3, Ppersica_298_v2_1, PtrichocarpaStettler14_532_v1_1, Pvirgatum_516_v5_1, Pvulgaris_442_v2_1, PvulgarisUI111_534_v1_1, Rislandica_473_v1_1, Salba_480_v1_1, Sbicolor_454_v3_1_1, Sitalica_312_v2_2, Slycopersicum_514_ITAG3_2, Smoellendorffii_91_v1_0, Spolyrhiza_290_v2, Spurpurea_519_v5_1, SpurpureaFishCreek_518_v3_1, Stuberosum_448_v4_03, Sviridis_500_v2_1, Taestivum_296_v2_2, Tarvense_479_v1_1, Tcacao_523_v2_1, Tpratense_385_v2, Vcarteri_317_v2_1, Vvinifera_457_v2_1, Zmarina_324_v2_2, Zmays_284_Ensembl-18, ZmaysPH207_443_v1_1

If you have sequences the genome/transcriptome of a new species and would like to have your dataset integrated, please get in touch. I am also happy to run KIPEs on new assemblies.

Clone this wiki locally