Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generateEvents option returning empty .ioe files #178

Open
ArnavBharti opened this issue Jan 3, 2024 · 5 comments
Open

generateEvents option returning empty .ioe files #178

ArnavBharti opened this issue Jan 3, 2024 · 5 comments

Comments

@ArnavBharti
Copy link

For input:

  1. I have got a GFF file from PlasmoDB. "PlasmoDB-66_PvivaxP01.gff"
  2. I converted it to GTF file using agat agat_convert_sp_gff2gtf.pl --gff ../PlasmoDB-66_PvivaxP01.gff -o PlasmoDB-66_PvivaxP01.gtf
  3. suppa.py generateEvents -i ../PlasmoDB-66_PvivaxP01.gtf -o AlternativeSplicing/localAS/local -f ioe -e {SE,SS,MX,RI,FL}

Output:
empty .ioe files
eg.

seqname	gene_id	event_id	alternative_transcripts	total_transcripts

GTF file:

PvP01_API_v2	VEuPathDB	ncRNA_gene	9	63	.	+	.	gene_id "PVP01_API00100"; ID "PVP01_API00100"; description "tRNA Threonine"; ebi_biotype "tRNA";
PvP01_API_v2	VEuPathDB	tRNA	9	63	.	+	.	gene_id "PVP01_API00100"; transcript_id "PVP01_API00100.1"; ID "PVP01_API00100.1"; Parent "PVP01_API00100"; description "tRNA Threonine"; gene_ebi_biotype "tRNA";
PvP01_API_v2	VEuPathDB	exon	9	63	.	+	.	gene_id "PVP01_API00100"; transcript_id "PVP01_API00100.1"; ID "exon_PVP01_API00100.1-E1"; Parent "PVP01_API00100.1";
PvP01_API_v2	VEuPathDB	protein_coding_gene	90	704	.	+	.	gene_id "PVP01_API00200"; ID "PVP01_API00200"; Name "RPS4"; description "apicoplast ribosomal protein S4, putative"; ebi_biotype "protein_coding";
PvP01_API_v2	VEuPathDB	mRNA	90	704	.	+	.	gene_id "PVP01_API00200"; transcript_id "PVP01_API00200.1"; ID "PVP01_API00200.1"; Parent "PVP01_API00200"; description "apicoplast ribosomal protein S4, putative"; gene_ebi_biotype "protein_coding";
PvP01_API_v2	VEuPathDB	exon	90	704	.	+	.	gene_id "PVP01_API00200"; transcript_id "PVP01_API00200.1"; ID "exon_PVP01_API00200.1-E1"; Parent "PVP01_API00200.1";
PvP01_API_v2	VEuPathDB	CDS	90	704	.	+	0	gene_id "PVP01_API00200"; transcript_id "PVP01_API00200.1"; ID "PVP01_API00200.1-p1-CDS1"; Parent "PVP01_API00200.1"; protein_source_id "PVP01_API00200.1-p1";
PvP01_API_v2	VEuPathDB	ncRNA_gene	716	787	.	+	.	gene_id "PVP01_API00300"; ID "PVP01_API00300"; description "tRNA Histidine"; ebi_biotype "tRNA";
@EduEyras
Copy link
Member

EduEyras commented Jan 4, 2024 via email

@ArnavBharti
Copy link
Author

ArnavBharti commented Jan 4, 2024

I managed to get the GTF file (instead of converting from GFF).

A few lines were like this: NC_009911.1 RefSeq exon 441005 441169 . - . gene_id "PVX_110843"; transcript_id "XR_003001228.1"; db_xref "GeneID:5471288"; locus_tag "PVX_110843"; note "LSU 5.8S rRNA; O-type"; orig_transcript_id "gnl|WGS:AAKM|mrna.PVX_110843-RA"; product "5.8S ribosomal RNA"; transcript_biotype "rRNA"; exon_number "1"; where the ; in LSU 5.8S rRNA; O-type was split into two causing an IndexError while parsing. I was wondering if replacing ; with ; (removing space) where it occurs inside quotation marks would affect the output. 'Cause by doing this IndexError goes away.

BUT the .ioe file generated is still empty.

genomic_data.gtf.zip

att_dict = dict(map(lambda x: (x[0], x[1].strip('"')), attributes)) <- this line is causing the index error

@EduEyras
Copy link
Member

EduEyras commented Jan 4, 2024 via email

@ArnavBharti
Copy link
Author

Even with pool genes flag the output is still empty.

$ suppa.py generateEvents -i ~/Downloads/genomic.gtf -o local -f ioe -e {SE,SS,MX,RI,FL} --pool-genes

INFO:eventGenerator:Reading input data.
INFO:eventGenerator:Pooling genes
INFO:eventGenerator:Calculating events
INFO:eventGenerator:Done

$ cat local*.ioe > biglocal.ioe
$ cat biglocal.ioe

seqname	gene_id	event_id	alternative_transcripts	total_transcripts
seqname	gene_id	event_id	alternative_transcripts	total_transcripts
seqname	gene_id	event_id	alternative_transcripts	total_transcripts
seqname	gene_id	event_id	alternative_transcripts	total_transcripts
seqname	gene_id	event_id	alternative_transcripts	total_transcripts
seqname	gene_id	event_id	alternative_transcripts	total_transcripts
seqname	gene_id	event_id	alternative_transcripts	total_transcripts

@EduEyras
Copy link
Member

EduEyras commented Jan 5, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants