Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mikado 20190606_6c8d542 serialise ValueError: Invalid frame specified #181

Closed
gemygk opened this issue Jun 10, 2019 · 7 comments
Closed
Assignees
Projects
Milestone

Comments

@gemygk
Copy link
Collaborator

gemygk commented Jun 10, 2019

Hi @lucventurini,

Looks like there is an issue at Mikado serialise stage when using prodigal gff. I also got this error for an earlier Mikado stable commit - mikado-20190325_c940de1, which we are using for the wheat accessions.

Please see below the error and logs.

CMD:

Mikado serialise command:
singularity exec /ei/software/cb/mikado/20190606_6c8d542/x86_64/mikado-20190606_6c8d542.img mikado serialise --json-conf mikado.configuration.yaml --xml blastx/split_blast_output --orfs mikado_prepared.fasta.prodigal.gff --blast_targets cross_species_all.protein.fasta

Prodigal command:
prodigal -i mikado_prepared.fasta -c -f gff -o mikado_prepared.fasta.prodigal.gff

WD:

/ei/workarea/group-ga/Projects/CB-GENANNO-444_Myzus_persicae_clone_O_v2_annotation/Analysis/mikado-20190606_6c8d542/trans_run1/mikado_long_reads

ERROR:

2019-06-10 02:31:41,402 - main - __init__.py:123 - ERROR - main - MainProcess - Mikado crashed, cause:
2019-06-10 02:31:41,402 - main - __init__.py:124 - ERROR - main - MainProcess - Invalid frame specified for 38_4: -49. Must be None or 0, 1, 2
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/Mikado/__init__.py", line 109, in main
    args.func(args)
  File "/usr/local/lib/python3.7/site-packages/Mikado/subprograms/serialise.py", line 329, in serialise
    load_orfs(args, logger)
  File "/usr/local/lib/python3.7/site-packages/Mikado/subprograms/serialise.py", line 144, in load_orfs
    serializer()
  File "/usr/local/lib/python3.7/site-packages/Mikado/serializers/orf.py", line 316, in __call__
    self.serialize()
  File "/usr/local/lib/python3.7/site-packages/Mikado/serializers/orf.py", line 277, in serialize
    for row in self.bed12_parser:
  File "/usr/local/lib/python3.7/site-packages/Mikado/parsers/bed12.py", line 1030, in __next__
    return self.gff_next()
  File "/usr/local/lib/python3.7/site-packages/Mikado/parsers/bed12.py", line 1071, in gff_next
    table=self.__table)
  File "/usr/local/lib/python3.7/site-packages/Mikado/parsers/bed12.py", line 212, in __init__
    self.__check_validity(transcriptomic, fasta_index, sequence)
  File "/usr/local/lib/python3.7/site-packages/Mikado/parsers/bed12.py", line 436, in __check_validity
    self._adjust_start(sequence, orf_sequence)
  File "/usr/local/lib/python3.7/site-packages/Mikado/parsers/bed12.py", line 503, in _adjust_start
    self.phase = self.end - self.thick_end
  File "/usr/local/lib/python3.7/site-packages/Mikado/parsers/bed12.py", line 769, in phase
    self.name, val))
ValueError: Invalid frame specified for 38_4: -49. Must be None or 0, 1, 2
Command exited with non-zero status 1

When I look at the line with id '38_4' (see below, taken from prodigal output - mikado_prepared.fasta.prodigal.gff), the phase is 0 (which is valid) and it looks fine to me.

polished_LQ_sampleWzA5U6ZI|cb28468_c1/f1p0/1824.mrna1   Prodigal_v2.6.3 CDS     1343    1795    27.7    -       0       ID=38_4;partial=00;start_type=GTG;rbs_motif=None;rbs_spacer=None;gc_cont=0.269;conf=99.83;score=27.70;cscore=29.86;sscore=-2.16;rscore=0.76;uscore=-1.43;tscore=-1.48;

Can you please look into this?

Thanks,
Gemy

@lucventurini
Copy link
Collaborator

Hi @gemygk,
I will have a look immediately.
Thank you

Luca

@lucventurini
Copy link
Collaborator

Hi @gemygk ,
update: the bug is triggered because Prodigal found a GTG start, and Mikado normally only considers ATG as a valid start (although this can be controlled). The bug is triggered because I fixed the start finding on the positive strand but not the negative one.
I am pushing the fix now.

@cschu
Copy link
Contributor

cschu commented Jun 10, 2019 via email

@cschu
Copy link
Contributor

cschu commented Jun 10, 2019 via email

@lucventurini
Copy link
Collaborator

lucventurini commented Jun 10, 2019

Hi @cschu ,
yes, that was the correct one.
The image is here:

/ei/software/testing/mikado/20190610_94160dd/x86_64/

The test finished successfully, see the log and database:

/ei/workarea/group-ga/Projects/CB-GENANNO-444_Myzus_persicae_clone_O_v2_annotation/Analysis/mikado-20190606_6c8d542/trans_run1/mikado_long_reads

The run crashed, but that's because the Blast database is not formatted properly (the FASTA lines all have different lengths). The orf loading was successful. See e.g. the ORF 38_4 for polished_LQ_sampleWzA5U6ZI|cb28468_c1/f1p0/1824.mrna1 that was crashing Mikado earlier:

68|38|1|1821|38_1|-|13|657|50.1|0|1|645|0
69|38|1|1821|38_2|-|739|906|8.1|1|1|168|0
70|38|1|1821|38_3|-|910|1101|3.9|1|1|192|0
71|38|1|1821|38_4|-|1343|1821|27.7|0|1|479|0

Now it has been correctly transitioned from 1343-1795 (with a GTG start) to 1343-1821 (ATG start).

I will close down the issue as the problem seems patched.

@cschu
Copy link
Contributor

cschu commented Jun 10, 2019 via email

@lucventurini
Copy link
Collaborator

Samtools faidx, through Pysam. I am using it to index all FASTA files and/or reading the FAI index, as that is much much faster that any of the alternatives (BioPython, PyFaidx) even though BioPython would nominally be more robust.

If this is an issue, I can have a fallback on BioPython and spit out a warning if such an error is encountered. I would do this only for serialising the Blastx index though, in other cases when I index a file in Mikado I generally need to be able to access the sequence data quickly (e.g. for padding, which is apparently already slow as it is).

lucventurini added a commit that referenced this issue Jun 18, 2019
* This should address #173 (both configuration file and docs) and #158

* Fix #181 and small bug fix for parsing Mikado annotations.

* Progress for #142 - this should fix the wrong ORF calculation for cases when the CDS was open at the 5' end.

* Fixed previous commit (always for #142)

* #142: corrected and tested the issue with one-off exons, for padding.

* This should fix and test #142 for good.

* Removed spurious warning/error messages

* #142: solved a bug which caused truncated transcripts at the 5' end not to be padded.

* #142: solved a problem which caused a false abort for transcripts on the - strand with changed stop codon.

* #142: fixing previous commit

* Pushing the fix for #182 onto the development branch

* Fix #183

* Fix #183 and previous commit

* #183: now Mikado configure will set a seed when generating the configuration file. The seed will be explicitly mentioned in the log.

* #177: made ORF loading slightly faster with pysam. Also made XML serialisation much faster using SQL sessions and multiprocessing.Pool instead of queues.

* Solved annoying bug that caused Mikado to crash with TAIR GFF3s.
lucventurini added a commit that referenced this issue Jun 18, 2019
* Solved a small bug in the Gene class

* This commit should fix some of the performance issues found in Mikado compare when testing in the all vs all (issue #166).

* Updated the CHANGELOG.

* Slight improvements to the generic GFLine class and to the to_gff wrapper

* Solved some assorted bugs, from stop_codon parsing in GTF2 (for Augustus) to avoiding a very costly pragma check on MIDX databases.

* Now Mikado util stats will only return one value for the mode, making the table parsable

* Solved some small bugs introduced by changing the mode for mikado util stats

* Dropping automated support for Python3.5. The conda environment cannot be created successfully, too many packages have not been updated in the original repositories.

* Updating the conda environment to reflect that only Python>=3.6 is now accepted

* Various fixes for managing correctly BED12 files.

* Fix for the previous commit breaking TRAVIS

* Switched to PySam for loading and fetching from genome files. Also, improved massively the speed of tests.

* Fixed previous commit

* Fixed travis bug

* Refactoring of check_index for Mikado compare (#166) and fix for #172

* Now Mikado will merge touching (NOT overlapping) exons coming from BED12 files. This should fix an issue with halLiftover

* This commit should fix a bunch of tests for when Mikado is installed with SUDO privileges (#137) potentially also fixing #172.

* Corrected a bug in the printing of transcriptomic BED12 files, corrected a bug in the serialisation of ORFs

* Fixed previous breakage

* Moved the code for checking the index into gene_dict. Also, now GeneDict allows access to positions as well.

* Minor edit to assigner

* Fixing previously broken commit

* Solving a bug which rendered the exclude_utr/protein_coding flags of mikado compare useless.

* Adding the GZI index to the tests directory to avoid permission errors. Addressing #175

* Corrected some testing. Moreover, now Mikado supports the BED12+1 format (ie gffread --bed output)

* Adding a maximum intron length for the default scoring configuration files.

* BROKEN. Proceeding on #142. Now the padding algorithm is aware of where a transcript finishes (intron vs exon). Moreover, we need to change the data structure for padding to a *directional* graph and keep in mind the distance needed to pad a transcript, to solve ambiguous cases in a deterministic (rather than random) way.

* Issue #174: modification to the abstractlocus.py file, to try to solve the issue found by @cschuh.

* #174: this should provide a solution to the issue, which is however only temporary. To be tested.

* #174: making the implicit "for" cycle explicit. Hopefully this should help pinpoint the error better.

* #174: peppered the failing block with try-except statements.

* #174: this should solve it. Now missing external scores in the database will cause Mikado to explicitly fail.

* Fixed #176

* BROKEN. Progress on #142, the code runs, but the tests are broken. **This might be legitimate as we changed the behaviour of the code.**

* Closing #155.

* #174: Now Mikado pick will die informatively if the SQLite3 database has not been found.

* #166: fixed some issues with self-compare

* BROKEN. We have to verify that the padding functions also on the 5' end, but we need to make a new test for that. The test development is in progress.

* The padding now should be tested and correct.

* Fixed previous commit. This should fix #142.

* Development (#178)

* Switched to PySam for loading and fetching from genome files. Also, improved massively the speed of tests.

* Fixed previous commit

* Fixed travis bug

* Refactoring of check_index for Mikado compare (#166) and fix for #172

* Now Mikado will merge touching (NOT overlapping) exons coming from BED12 files. This should fix an issue with halLiftover

* This commit should fix a bunch of tests for when Mikado is installed with SUDO privileges (#137) potentially also fixing #172.

* Corrected a bug in the printing of transcriptomic BED12 files, corrected a bug in the serialisation of ORFs

* Fixed previous breakage

* Moved the code for checking the index into gene_dict. Also, now GeneDict allows access to positions as well.

* Minor edit to assigner

* Fixing previously broken commit

* Solving a bug which rendered the exclude_utr/protein_coding flags of mikado compare useless.

* Adding the GZI index to the tests directory to avoid permission errors. Addressing #175

* Corrected some testing. Moreover, now Mikado supports the BED12+1 format (ie gffread --bed output)

* Adding a maximum intron length for the default scoring configuration files.

* BROKEN. Proceeding on #142. Now the padding algorithm is aware of where a transcript finishes (intron vs exon). Moreover, we need to change the data structure for padding to a *directional* graph and keep in mind the distance needed to pad a transcript, to solve ambiguous cases in a deterministic (rather than random) way.

* Issue #174: modification to the abstractlocus.py file, to try to solve the issue found by @cschuh.

* #174: this should provide a solution to the issue, which is however only temporary. To be tested.

* #174: making the implicit "for" cycle explicit. Hopefully this should help pinpoint the error better.

* #174: peppered the failing block with try-except statements.

* #174: this should solve it. Now missing external scores in the database will cause Mikado to explicitly fail.

* Fixed #176

* BROKEN. Progress on #142, the code runs, but the tests are broken. **This might be legitimate as we changed the behaviour of the code.**

* Closing #155.

* #174: Now Mikado pick will die informatively if the SQLite3 database has not been found.

* #166: fixed some issues with self-compare

* BROKEN. We have to verify that the padding functions also on the 5' end, but we need to make a new test for that. The test development is in progress.

* The padding now should be tested and correct.

* Fixed previous commit. This should fix #142.

* Update Singularity.centos.def

Changed python to python3 during %post, otherwise it will use the system python2.7...

* Fixed small bug in external metrics handling

* Update Singularity.centos.def

* Development (#184)

* This should address #173 (both configuration file and docs) and #158

* Fix #181 and small bug fix for parsing Mikado annotations.

* Progress for #142 - this should fix the wrong ORF calculation for cases when the CDS was open at the 5' end.

* Fixed previous commit (always for #142)

* #142: corrected and tested the issue with one-off exons, for padding.

* This should fix and test #142 for good.

* Removed spurious warning/error messages

* #142: solved a bug which caused truncated transcripts at the 5' end not to be padded.

* #142: solved a problem which caused a false abort for transcripts on the - strand with changed stop codon.

* #142: fixing previous commit

* Pushing the fix for #182 onto the development branch

* Fix #183

* Fix #183 and previous commit

* #183: now Mikado configure will set a seed when generating the configuration file. The seed will be explicitly mentioned in the log.

* #177: made ORF loading slightly faster with pysam. Also made XML serialisation much faster using SQL sessions and multiprocessing.Pool instead of queues.

* Solved annoying bug that caused Mikado to crash with TAIR GFF3s.
lucventurini added a commit that referenced this issue Jun 19, 2019
* Solved a small bug in the Gene class

* This commit should fix some of the performance issues found in Mikado compare when testing in the all vs all (issue #166).

* Updated the CHANGELOG.

* Slight improvements to the generic GFLine class and to the to_gff wrapper

* Solved some assorted bugs, from stop_codon parsing in GTF2 (for Augustus) to avoiding a very costly pragma check on MIDX databases.

* Now Mikado util stats will only return one value for the mode, making the table parsable

* Solved some small bugs introduced by changing the mode for mikado util stats

* Dropping automated support for Python3.5. The conda environment cannot be created successfully, too many packages have not been updated in the original repositories.

* Updating the conda environment to reflect that only Python>=3.6 is now accepted

* Various fixes for managing correctly BED12 files.

* Fix for the previous commit breaking TRAVIS

* Switched to PySam for loading and fetching from genome files. Also, improved massively the speed of tests.

* Fixed previous commit

* Fixed travis bug

* Refactoring of check_index for Mikado compare (#166) and fix for #172

* Now Mikado will merge touching (NOT overlapping) exons coming from BED12 files. This should fix an issue with halLiftover

* This commit should fix a bunch of tests for when Mikado is installed with SUDO privileges (#137) potentially also fixing #172.

* Corrected a bug in the printing of transcriptomic BED12 files, corrected a bug in the serialisation of ORFs

* Fixed previous breakage

* Moved the code for checking the index into gene_dict. Also, now GeneDict allows access to positions as well.

* Minor edit to assigner

* Fixing previously broken commit

* Solving a bug which rendered the exclude_utr/protein_coding flags of mikado compare useless.

* Adding the GZI index to the tests directory to avoid permission errors. Addressing #175

* Corrected some testing. Moreover, now Mikado supports the BED12+1 format (ie gffread --bed output)

* Adding a maximum intron length for the default scoring configuration files.

* BROKEN. Proceeding on #142. Now the padding algorithm is aware of where a transcript finishes (intron vs exon). Moreover, we need to change the data structure for padding to a *directional* graph and keep in mind the distance needed to pad a transcript, to solve ambiguous cases in a deterministic (rather than random) way.

* Issue #174: modification to the abstractlocus.py file, to try to solve the issue found by @cschuh.

* #174: this should provide a solution to the issue, which is however only temporary. To be tested.

* #174: making the implicit "for" cycle explicit. Hopefully this should help pinpoint the error better.

* #174: peppered the failing block with try-except statements.

* #174: this should solve it. Now missing external scores in the database will cause Mikado to explicitly fail.

* Fixed #176

* BROKEN. Progress on #142, the code runs, but the tests are broken. **This might be legitimate as we changed the behaviour of the code.**

* Closing #155.

* #174: Now Mikado pick will die informatively if the SQLite3 database has not been found.

* #166: fixed some issues with self-compare

* BROKEN. We have to verify that the padding functions also on the 5' end, but we need to make a new test for that. The test development is in progress.

* The padding now should be tested and correct.

* Fixed previous commit. This should fix #142.

* Development (#178)

* Switched to PySam for loading and fetching from genome files. Also, improved massively the speed of tests.

* Fixed previous commit

* Fixed travis bug

* Refactoring of check_index for Mikado compare (#166) and fix for #172

* Now Mikado will merge touching (NOT overlapping) exons coming from BED12 files. This should fix an issue with halLiftover

* This commit should fix a bunch of tests for when Mikado is installed with SUDO privileges (#137) potentially also fixing #172.

* Corrected a bug in the printing of transcriptomic BED12 files, corrected a bug in the serialisation of ORFs

* Fixed previous breakage

* Moved the code for checking the index into gene_dict. Also, now GeneDict allows access to positions as well.

* Minor edit to assigner

* Fixing previously broken commit

* Solving a bug which rendered the exclude_utr/protein_coding flags of mikado compare useless.

* Adding the GZI index to the tests directory to avoid permission errors. Addressing #175

* Corrected some testing. Moreover, now Mikado supports the BED12+1 format (ie gffread --bed output)

* Adding a maximum intron length for the default scoring configuration files.

* BROKEN. Proceeding on #142. Now the padding algorithm is aware of where a transcript finishes (intron vs exon). Moreover, we need to change the data structure for padding to a *directional* graph and keep in mind the distance needed to pad a transcript, to solve ambiguous cases in a deterministic (rather than random) way.

* Issue #174: modification to the abstractlocus.py file, to try to solve the issue found by @cschuh.

* #174: this should provide a solution to the issue, which is however only temporary. To be tested.

* #174: making the implicit "for" cycle explicit. Hopefully this should help pinpoint the error better.

* #174: peppered the failing block with try-except statements.

* #174: this should solve it. Now missing external scores in the database will cause Mikado to explicitly fail.

* Fixed #176

* BROKEN. Progress on #142, the code runs, but the tests are broken. **This might be legitimate as we changed the behaviour of the code.**

* Closing #155.

* #174: Now Mikado pick will die informatively if the SQLite3 database has not been found.

* #166: fixed some issues with self-compare

* BROKEN. We have to verify that the padding functions also on the 5' end, but we need to make a new test for that. The test development is in progress.

* The padding now should be tested and correct.

* Fixed previous commit. This should fix #142.

* Update Singularity.centos.def

Changed python to python3 during %post, otherwise it will use the system python2.7...

* Fixed small bug in external metrics handling

* Update Singularity.centos.def

* This should address #173 (both configuration file and docs) and #158

* Fix #181 and small bug fix for parsing Mikado annotations.

* Progress for #142 - this should fix the wrong ORF calculation for cases when the CDS was open at the 5' end.

* Fixed previous commit (always for #142)

* #142: corrected and tested the issue with one-off exons, for padding.

* This should fix and test #142 for good.

* Removed spurious warning/error messages

* #142: solved a bug which caused truncated transcripts at the 5' end not to be padded.

* #142: solved a problem which caused a false abort for transcripts on the - strand with changed stop codon.

* #142: fixing previous commit

* Pushing the fix for #182 onto the development branch

* Fix #183

* Fix #183 and previous commit

* #183: now Mikado configure will set a seed when generating the configuration file. The seed will be explicitly mentioned in the log.

* #177: made ORF loading slightly faster with pysam. Also made XML serialisation much faster using SQL sessions and multiprocessing.Pool instead of queues.

* Solved annoying bug that caused Mikado to crash with TAIR GFF3s.

* Development (#184)

* This should address #173 (both configuration file and docs) and #158

* Fix #181 and small bug fix for parsing Mikado annotations.

* Progress for #142 - this should fix the wrong ORF calculation for cases when the CDS was open at the 5' end.

* Fixed previous commit (always for #142)

* #142: corrected and tested the issue with one-off exons, for padding.

* This should fix and test #142 for good.

* Removed spurious warning/error messages

* #142: solved a bug which caused truncated transcripts at the 5' end not to be padded.

* #142: solved a problem which caused a false abort for transcripts on the - strand with changed stop codon.

* #142: fixing previous commit

* Pushing the fix for #182 onto the development branch

* Fix #183

* Fix #183 and previous commit

* #183: now Mikado configure will set a seed when generating the configuration file. The seed will be explicitly mentioned in the log.

* #177: made ORF loading slightly faster with pysam. Also made XML serialisation much faster using SQL sessions and multiprocessing.Pool instead of queues.

* Solved annoying bug that caused Mikado to crash with TAIR GFF3s.
@lucventurini lucventurini added this to Closed in Version 2 Oct 15, 2020
lucventurini added a commit to lucventurini/mikado that referenced this issue Feb 11, 2021
* Solved a small bug in the Gene class

* This commit should fix some of the performance issues found in Mikado compare when testing in the all vs all (issue EI-CoreBioinformatics#166).

* Updated the CHANGELOG.

* Slight improvements to the generic GFLine class and to the to_gff wrapper

* Solved some assorted bugs, from stop_codon parsing in GTF2 (for Augustus) to avoiding a very costly pragma check on MIDX databases.

* Now Mikado util stats will only return one value for the mode, making the table parsable

* Solved some small bugs introduced by changing the mode for mikado util stats

* Dropping automated support for Python3.5. The conda environment cannot be created successfully, too many packages have not been updated in the original repositories.

* Updating the conda environment to reflect that only Python>=3.6 is now accepted

* Various fixes for managing correctly BED12 files.

* Fix for the previous commit breaking TRAVIS

* Switched to PySam for loading and fetching from genome files. Also, improved massively the speed of tests.

* Fixed previous commit

* Fixed travis bug

* Refactoring of check_index for Mikado compare (EI-CoreBioinformatics#166) and fix for EI-CoreBioinformatics#172

* Now Mikado will merge touching (NOT overlapping) exons coming from BED12 files. This should fix an issue with halLiftover

* This commit should fix a bunch of tests for when Mikado is installed with SUDO privileges (EI-CoreBioinformatics#137) potentially also fixing EI-CoreBioinformatics#172.

* Corrected a bug in the printing of transcriptomic BED12 files, corrected a bug in the serialisation of ORFs

* Fixed previous breakage

* Moved the code for checking the index into gene_dict. Also, now GeneDict allows access to positions as well.

* Minor edit to assigner

* Fixing previously broken commit

* Solving a bug which rendered the exclude_utr/protein_coding flags of mikado compare useless.

* Adding the GZI index to the tests directory to avoid permission errors. Addressing EI-CoreBioinformatics#175

* Corrected some testing. Moreover, now Mikado supports the BED12+1 format (ie gffread --bed output)

* Adding a maximum intron length for the default scoring configuration files.

* BROKEN. Proceeding on EI-CoreBioinformatics#142. Now the padding algorithm is aware of where a transcript finishes (intron vs exon). Moreover, we need to change the data structure for padding to a *directional* graph and keep in mind the distance needed to pad a transcript, to solve ambiguous cases in a deterministic (rather than random) way.

* Issue EI-CoreBioinformatics#174: modification to the abstractlocus.py file, to try to solve the issue found by @cschuh.

* EI-CoreBioinformatics#174: this should provide a solution to the issue, which is however only temporary. To be tested.

* EI-CoreBioinformatics#174: making the implicit "for" cycle explicit. Hopefully this should help pinpoint the error better.

* EI-CoreBioinformatics#174: peppered the failing block with try-except statements.

* EI-CoreBioinformatics#174: this should solve it. Now missing external scores in the database will cause Mikado to explicitly fail.

* Fixed EI-CoreBioinformatics#176

* BROKEN. Progress on EI-CoreBioinformatics#142, the code runs, but the tests are broken. **This might be legitimate as we changed the behaviour of the code.**

* Closing EI-CoreBioinformatics#155.

* EI-CoreBioinformatics#174: Now Mikado pick will die informatively if the SQLite3 database has not been found.

* EI-CoreBioinformatics#166: fixed some issues with self-compare

* BROKEN. We have to verify that the padding functions also on the 5' end, but we need to make a new test for that. The test development is in progress.

* The padding now should be tested and correct.

* Fixed previous commit. This should fix EI-CoreBioinformatics#142.

* Development (EI-CoreBioinformatics#178)

* Switched to PySam for loading and fetching from genome files. Also, improved massively the speed of tests.

* Fixed previous commit

* Fixed travis bug

* Refactoring of check_index for Mikado compare (EI-CoreBioinformatics#166) and fix for EI-CoreBioinformatics#172

* Now Mikado will merge touching (NOT overlapping) exons coming from BED12 files. This should fix an issue with halLiftover

* This commit should fix a bunch of tests for when Mikado is installed with SUDO privileges (EI-CoreBioinformatics#137) potentially also fixing EI-CoreBioinformatics#172.

* Corrected a bug in the printing of transcriptomic BED12 files, corrected a bug in the serialisation of ORFs

* Fixed previous breakage

* Moved the code for checking the index into gene_dict. Also, now GeneDict allows access to positions as well.

* Minor edit to assigner

* Fixing previously broken commit

* Solving a bug which rendered the exclude_utr/protein_coding flags of mikado compare useless.

* Adding the GZI index to the tests directory to avoid permission errors. Addressing EI-CoreBioinformatics#175

* Corrected some testing. Moreover, now Mikado supports the BED12+1 format (ie gffread --bed output)

* Adding a maximum intron length for the default scoring configuration files.

* BROKEN. Proceeding on EI-CoreBioinformatics#142. Now the padding algorithm is aware of where a transcript finishes (intron vs exon). Moreover, we need to change the data structure for padding to a *directional* graph and keep in mind the distance needed to pad a transcript, to solve ambiguous cases in a deterministic (rather than random) way.

* Issue EI-CoreBioinformatics#174: modification to the abstractlocus.py file, to try to solve the issue found by @cschuh.

* EI-CoreBioinformatics#174: this should provide a solution to the issue, which is however only temporary. To be tested.

* EI-CoreBioinformatics#174: making the implicit "for" cycle explicit. Hopefully this should help pinpoint the error better.

* EI-CoreBioinformatics#174: peppered the failing block with try-except statements.

* EI-CoreBioinformatics#174: this should solve it. Now missing external scores in the database will cause Mikado to explicitly fail.

* Fixed EI-CoreBioinformatics#176

* BROKEN. Progress on EI-CoreBioinformatics#142, the code runs, but the tests are broken. **This might be legitimate as we changed the behaviour of the code.**

* Closing EI-CoreBioinformatics#155.

* EI-CoreBioinformatics#174: Now Mikado pick will die informatively if the SQLite3 database has not been found.

* EI-CoreBioinformatics#166: fixed some issues with self-compare

* BROKEN. We have to verify that the padding functions also on the 5' end, but we need to make a new test for that. The test development is in progress.

* The padding now should be tested and correct.

* Fixed previous commit. This should fix EI-CoreBioinformatics#142.

* Update Singularity.centos.def

Changed python to python3 during %post, otherwise it will use the system python2.7...

* Fixed small bug in external metrics handling

* Update Singularity.centos.def

* Development (EI-CoreBioinformatics#184)

* This should address EI-CoreBioinformatics#173 (both configuration file and docs) and EI-CoreBioinformatics#158

* Fix EI-CoreBioinformatics#181 and small bug fix for parsing Mikado annotations.

* Progress for EI-CoreBioinformatics#142 - this should fix the wrong ORF calculation for cases when the CDS was open at the 5' end.

* Fixed previous commit (always for EI-CoreBioinformatics#142)

* EI-CoreBioinformatics#142: corrected and tested the issue with one-off exons, for padding.

* This should fix and test EI-CoreBioinformatics#142 for good.

* Removed spurious warning/error messages

* EI-CoreBioinformatics#142: solved a bug which caused truncated transcripts at the 5' end not to be padded.

* EI-CoreBioinformatics#142: solved a problem which caused a false abort for transcripts on the - strand with changed stop codon.

* EI-CoreBioinformatics#142: fixing previous commit

* Pushing the fix for EI-CoreBioinformatics#182 onto the development branch

* Fix EI-CoreBioinformatics#183

* Fix EI-CoreBioinformatics#183 and previous commit

* EI-CoreBioinformatics#183: now Mikado configure will set a seed when generating the configuration file. The seed will be explicitly mentioned in the log.

* EI-CoreBioinformatics#177: made ORF loading slightly faster with pysam. Also made XML serialisation much faster using SQL sessions and multiprocessing.Pool instead of queues.

* Solved annoying bug that caused Mikado to crash with TAIR GFF3s.
lucventurini added a commit to lucventurini/mikado that referenced this issue Feb 11, 2021
* Solved a small bug in the Gene class

* This commit should fix some of the performance issues found in Mikado compare when testing in the all vs all (issue EI-CoreBioinformatics#166).

* Updated the CHANGELOG.

* Slight improvements to the generic GFLine class and to the to_gff wrapper

* Solved some assorted bugs, from stop_codon parsing in GTF2 (for Augustus) to avoiding a very costly pragma check on MIDX databases.

* Now Mikado util stats will only return one value for the mode, making the table parsable

* Solved some small bugs introduced by changing the mode for mikado util stats

* Dropping automated support for Python3.5. The conda environment cannot be created successfully, too many packages have not been updated in the original repositories.

* Updating the conda environment to reflect that only Python>=3.6 is now accepted

* Various fixes for managing correctly BED12 files.

* Fix for the previous commit breaking TRAVIS

* Switched to PySam for loading and fetching from genome files. Also, improved massively the speed of tests.

* Fixed previous commit

* Fixed travis bug

* Refactoring of check_index for Mikado compare (EI-CoreBioinformatics#166) and fix for EI-CoreBioinformatics#172

* Now Mikado will merge touching (NOT overlapping) exons coming from BED12 files. This should fix an issue with halLiftover

* This commit should fix a bunch of tests for when Mikado is installed with SUDO privileges (EI-CoreBioinformatics#137) potentially also fixing EI-CoreBioinformatics#172.

* Corrected a bug in the printing of transcriptomic BED12 files, corrected a bug in the serialisation of ORFs

* Fixed previous breakage

* Moved the code for checking the index into gene_dict. Also, now GeneDict allows access to positions as well.

* Minor edit to assigner

* Fixing previously broken commit

* Solving a bug which rendered the exclude_utr/protein_coding flags of mikado compare useless.

* Adding the GZI index to the tests directory to avoid permission errors. Addressing EI-CoreBioinformatics#175

* Corrected some testing. Moreover, now Mikado supports the BED12+1 format (ie gffread --bed output)

* Adding a maximum intron length for the default scoring configuration files.

* BROKEN. Proceeding on EI-CoreBioinformatics#142. Now the padding algorithm is aware of where a transcript finishes (intron vs exon). Moreover, we need to change the data structure for padding to a *directional* graph and keep in mind the distance needed to pad a transcript, to solve ambiguous cases in a deterministic (rather than random) way.

* Issue EI-CoreBioinformatics#174: modification to the abstractlocus.py file, to try to solve the issue found by @cschuh.

* EI-CoreBioinformatics#174: this should provide a solution to the issue, which is however only temporary. To be tested.

* EI-CoreBioinformatics#174: making the implicit "for" cycle explicit. Hopefully this should help pinpoint the error better.

* EI-CoreBioinformatics#174: peppered the failing block with try-except statements.

* EI-CoreBioinformatics#174: this should solve it. Now missing external scores in the database will cause Mikado to explicitly fail.

* Fixed EI-CoreBioinformatics#176

* BROKEN. Progress on EI-CoreBioinformatics#142, the code runs, but the tests are broken. **This might be legitimate as we changed the behaviour of the code.**

* Closing EI-CoreBioinformatics#155.

* EI-CoreBioinformatics#174: Now Mikado pick will die informatively if the SQLite3 database has not been found.

* EI-CoreBioinformatics#166: fixed some issues with self-compare

* BROKEN. We have to verify that the padding functions also on the 5' end, but we need to make a new test for that. The test development is in progress.

* The padding now should be tested and correct.

* Fixed previous commit. This should fix EI-CoreBioinformatics#142.

* Development (EI-CoreBioinformatics#178)

* Switched to PySam for loading and fetching from genome files. Also, improved massively the speed of tests.

* Fixed previous commit

* Fixed travis bug

* Refactoring of check_index for Mikado compare (EI-CoreBioinformatics#166) and fix for EI-CoreBioinformatics#172

* Now Mikado will merge touching (NOT overlapping) exons coming from BED12 files. This should fix an issue with halLiftover

* This commit should fix a bunch of tests for when Mikado is installed with SUDO privileges (EI-CoreBioinformatics#137) potentially also fixing EI-CoreBioinformatics#172.

* Corrected a bug in the printing of transcriptomic BED12 files, corrected a bug in the serialisation of ORFs

* Fixed previous breakage

* Moved the code for checking the index into gene_dict. Also, now GeneDict allows access to positions as well.

* Minor edit to assigner

* Fixing previously broken commit

* Solving a bug which rendered the exclude_utr/protein_coding flags of mikado compare useless.

* Adding the GZI index to the tests directory to avoid permission errors. Addressing EI-CoreBioinformatics#175

* Corrected some testing. Moreover, now Mikado supports the BED12+1 format (ie gffread --bed output)

* Adding a maximum intron length for the default scoring configuration files.

* BROKEN. Proceeding on EI-CoreBioinformatics#142. Now the padding algorithm is aware of where a transcript finishes (intron vs exon). Moreover, we need to change the data structure for padding to a *directional* graph and keep in mind the distance needed to pad a transcript, to solve ambiguous cases in a deterministic (rather than random) way.

* Issue EI-CoreBioinformatics#174: modification to the abstractlocus.py file, to try to solve the issue found by @cschuh.

* EI-CoreBioinformatics#174: this should provide a solution to the issue, which is however only temporary. To be tested.

* EI-CoreBioinformatics#174: making the implicit "for" cycle explicit. Hopefully this should help pinpoint the error better.

* EI-CoreBioinformatics#174: peppered the failing block with try-except statements.

* EI-CoreBioinformatics#174: this should solve it. Now missing external scores in the database will cause Mikado to explicitly fail.

* Fixed EI-CoreBioinformatics#176

* BROKEN. Progress on EI-CoreBioinformatics#142, the code runs, but the tests are broken. **This might be legitimate as we changed the behaviour of the code.**

* Closing EI-CoreBioinformatics#155.

* EI-CoreBioinformatics#174: Now Mikado pick will die informatively if the SQLite3 database has not been found.

* EI-CoreBioinformatics#166: fixed some issues with self-compare

* BROKEN. We have to verify that the padding functions also on the 5' end, but we need to make a new test for that. The test development is in progress.

* The padding now should be tested and correct.

* Fixed previous commit. This should fix EI-CoreBioinformatics#142.

* Update Singularity.centos.def

Changed python to python3 during %post, otherwise it will use the system python2.7...

* Fixed small bug in external metrics handling

* Update Singularity.centos.def

* This should address EI-CoreBioinformatics#173 (both configuration file and docs) and EI-CoreBioinformatics#158

* Fix EI-CoreBioinformatics#181 and small bug fix for parsing Mikado annotations.

* Progress for EI-CoreBioinformatics#142 - this should fix the wrong ORF calculation for cases when the CDS was open at the 5' end.

* Fixed previous commit (always for EI-CoreBioinformatics#142)

* EI-CoreBioinformatics#142: corrected and tested the issue with one-off exons, for padding.

* This should fix and test EI-CoreBioinformatics#142 for good.

* Removed spurious warning/error messages

* EI-CoreBioinformatics#142: solved a bug which caused truncated transcripts at the 5' end not to be padded.

* EI-CoreBioinformatics#142: solved a problem which caused a false abort for transcripts on the - strand with changed stop codon.

* EI-CoreBioinformatics#142: fixing previous commit

* Pushing the fix for EI-CoreBioinformatics#182 onto the development branch

* Fix EI-CoreBioinformatics#183

* Fix EI-CoreBioinformatics#183 and previous commit

* EI-CoreBioinformatics#183: now Mikado configure will set a seed when generating the configuration file. The seed will be explicitly mentioned in the log.

* EI-CoreBioinformatics#177: made ORF loading slightly faster with pysam. Also made XML serialisation much faster using SQL sessions and multiprocessing.Pool instead of queues.

* Solved annoying bug that caused Mikado to crash with TAIR GFF3s.

* Development (EI-CoreBioinformatics#184)

* This should address EI-CoreBioinformatics#173 (both configuration file and docs) and EI-CoreBioinformatics#158

* Fix EI-CoreBioinformatics#181 and small bug fix for parsing Mikado annotations.

* Progress for EI-CoreBioinformatics#142 - this should fix the wrong ORF calculation for cases when the CDS was open at the 5' end.

* Fixed previous commit (always for EI-CoreBioinformatics#142)

* EI-CoreBioinformatics#142: corrected and tested the issue with one-off exons, for padding.

* This should fix and test EI-CoreBioinformatics#142 for good.

* Removed spurious warning/error messages

* EI-CoreBioinformatics#142: solved a bug which caused truncated transcripts at the 5' end not to be padded.

* EI-CoreBioinformatics#142: solved a problem which caused a false abort for transcripts on the - strand with changed stop codon.

* EI-CoreBioinformatics#142: fixing previous commit

* Pushing the fix for EI-CoreBioinformatics#182 onto the development branch

* Fix EI-CoreBioinformatics#183

* Fix EI-CoreBioinformatics#183 and previous commit

* EI-CoreBioinformatics#183: now Mikado configure will set a seed when generating the configuration file. The seed will be explicitly mentioned in the log.

* EI-CoreBioinformatics#177: made ORF loading slightly faster with pysam. Also made XML serialisation much faster using SQL sessions and multiprocessing.Pool instead of queues.

* Solved annoying bug that caused Mikado to crash with TAIR GFF3s.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

No branches or pull requests

3 participants