Ensure correctness of selected biotypes and check for missing ones:
|
include_biotypes: 'protein_coding,polymorphic_pseudogene,non_stop_decay,nonsense_mediated_decay,IG_C_gene,IG_D_gene,IG_J_gene,IG_V_gene,TR_C_gene,TR_D_gene,TR_J_gene,TR_V_gene,TEC,mRNA' |
Below is the full list derived from ENSEMBL GTF file:
"protein_coding": 211475
"lncRNA": 191378
"retained_intron": 34239
"protein_coding_CDS_not_defined": 26573
"nonsense_mediated_decay": 21949
"processed_pseudogene": 9490
"misc_RNA": 2216
"unprocessed_pseudogene": 1949
"snRNA": 1910
"miRNA": 1879
"transcribed_unprocessed_pseudogene": 1589
"transcribed_processed_pseudogene": 1149
"TEC": 1108
"snoRNA": 942
"rRNA_pseudogene": 497
"transcribed_unitary_pseudogene": 201
"IG_V_pseudogene": 187
"IG_V_gene": 146
"TR_V_gene": 107
"non_stop_decay": 105
"unitary_pseudogene": 89
"TR_J_gene": 79
"protein_coding_LoF": 74
"rRNA": 53
"scaRNA": 49
"IG_D_gene": 37
"TR_V_pseudogene": 33
"IG_C_gene": 23
"Mt_tRNA": 22
"artifact": 19
"IG_J_gene": 18
"processed_transcript": 12
"IG_C_pseudogene": 9
"ribozyme": 8
"TR_C_gene": 6
"sRNA": 5
"TR_D_gene": 5
"pseudogene": 4
"TR_J_pseudogene": 4
"vault_RNA": 4
"IG_J_pseudogene": 3
"Mt_rRNA": 2
"translated_processed_pseudogene": 2
"IG_pseudogene": 1
Ensure correctness of selected biotypes and check for missing ones:
pgatk/pgatk/config/ensembl_config.yaml
Line 14 in b9f8d17
Below is the full list derived from ENSEMBL GTF file: