Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mikado util stats error on NCBI gff3 #226

Closed
bbista opened this issue Oct 3, 2019 · 3 comments
Closed

Mikado util stats error on NCBI gff3 #226

bbista opened this issue Oct 3, 2019 · 3 comments
Assignees
Projects
Milestone

Comments

@bbista
Copy link

bbista commented Oct 3, 2019

I was trying to look at the stats for a gff3 file I downloaded off NCBI. I get this error message.
mikado util stats GCF_000241765.genomic.gff genomic.stats
/home/bbista/.local/lib/python3.6/site-packages/Mikado/configuration/configurator.py:529: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
scoring = yaml.load(scoring_file)
2019-10-02 19:04:43,336 - main - init.py:124 - ERROR - main - MainProcess - Mikado crashed, cause:
2019-10-02 19:04:43,336 - main - init.py:125 - ERROR - main - MainProcess - gene-LOC112059410
{}
Traceback (most recent call last):
File "/home/bbista/.local/lib/python3.6/site-packages/Mikado/init.py", line 110, in main
args.func(args)
File "/home/bbista/.local/lib/python3.6/site-packages/Mikado/subprograms/util/stats.py", line 711, in launch
calculator()
File "/home/bbista/.local/lib/python3.6/site-packages/Mikado/subprograms/util/stats.py", line 335, in call
self.parse_input()
File "/home/bbista/.local/lib/python3.6/site-packages/Mikado/subprograms/util/stats.py", line 324, in parse_input
current_gene.add_exon(record)
File "/home/bbista/.local/lib/python3.6/site-packages/Mikado/loci/reference_gene.py", line 165, in add_exon
raise AssertionError("{}\n{}".format(parent, self.transcripts, row))
AssertionError: gene-LOC112059410
{}
Do you have any idea what is going wrong?

Best,
Basanta

@lucventurini
Copy link
Collaborator

lucventurini commented Oct 3, 2019

Dear @bbista ,
thank you for reporting this bug. I just inspected the GFF you mentioned (which I presume is from this folder: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/241/765/GCF_000241765.3_Chrysemys_picta_bellii-3.0.3/ ) and the problem stems from the fact that gene-LOC112059410 is a pseudogene without any transcript feature associated to it. That's kinda invalid under the GFF ontology, and Mikado was explicitly written not to accommodate such a case.

Looking more in detail at the GFF file, there honestly seem to be a lot of similar problems, such as coding genes without mRNAs, or tRNAs without a gene parent. All of these break the gene ontology and Mikado's model of how a GFF should look like.

I also tried using the GTF, cleaning it up first with GffRead, but to no avail. The only solutions are

  • cleaning up the GTF aggressively
  • modifying Mikado to just ignore these cases (but that's easier said than done, given the tree-like structure of GTFs/GFFs).

@lucventurini
Copy link
Collaborator

Dear @bbista, I have started fixing the problems you found.

With the latest commit, mikado util stats is now able to parse the file appropriately. I will now work on making mikado compare compatible as well.

Changes will be reflected in Mikado2 (and live in Mikado 2.0rc6).

Kind regards

lucventurini added a commit to lucventurini/mikado that referenced this issue Oct 9, 2019
lucventurini added a commit to lucventurini/mikado that referenced this issue Oct 9, 2019
…lly broke the parsing for other strange GFF3 cases)
lucventurini added a commit to lucventurini/mikado that referenced this issue Oct 9, 2019
@lucventurini
Copy link
Collaborator

Current status: mikado now supports this problematic GFF in util stats, util convert, compare, prepare.
The only utility left before the issue can be closed is mikado util grep. Once that is fixed, this issue can be closed.

lucventurini added a commit to lucventurini/mikado that referenced this issue Oct 14, 2019
lucventurini added a commit to lucventurini/mikado that referenced this issue Oct 14, 2019
@lucventurini lucventurini added this to Closed in Version 2 Oct 15, 2020
lucventurini added a commit to lucventurini/mikado that referenced this issue Feb 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

No branches or pull requests

2 participants