Releases · EI-CoreBioinformatics/mikado

02 Mar 22:37

lucventurini

0.17.0

ab165bd

Total multiprocessing

This release brings prepare up to speed with the other parts of the suite; now it is completely multiprocessed.

WARNING Starting from this release, the library for mikado is "Mikado", not "mikado_lib"

Assets 2

01 Mar 18:25

lucventurini

0.16.5

2f567f5

Correct boundaries

BF release:

Fixed a very nasty bug that led to creating bogus CDSs and split transcripts, especially in the presence of negative strands. This release also contains some tests to prevent backslides.
Additionally, now after splitting transcripts Mikado checks that the internal ORFs are coherent with the original transcript.
Serialise now relies on a Process rather than Pool implementation.
Reworked prepare so as to avoid keeping in memory all the GFF lines - now we keep common information aside and store only the intervals explicitly. This should massively decrease memory usage.
WARNING: as a result, now Mikado needs the input files to have valid exon entries. If a file only contains CDS/UTR entries, it will be completely ignored.

Assets 2

26 Feb 22:18

lucventurini

0.16.0

0a82d0b

True multiprocessing

New in this release, which marks a real hallmark:

Added TRAVIS testing
possibility to choose the preferred multiprocessing start method from the configuration file
Added the key "only_confirmed_introns"
Moved picker and the new loci_processer module to a new subpackage, "picking"
Now AS events have to be valid AS events vs. all transcripts in the locus, not just the primary
Faster retrieval of verified introns
Bug Fix for printing of BED12 objects
Now in multiprocessing mode each process will write to a temporary file; at the end of the run, the files will be merged.
Switched to simple Queues instead of Manager-derived ones - EXTREMELY faster.
Switched to pyfaidx for ORF loading in the database
Now loading first the ORFs then the BLAST hits
Now Mikado prepare also keeps the information of the original transcript, if present at all.
BLAST files should be opened with the new BlastOpener class; "create_opener" has gone. The class can be used with the "with" statements and therefore prevents the process from having too many opened files at once.
Now we output monoloci_scores/metrics and loci_scores/metrics
In the loci scores/metrics files, transcripts with more than one ORF are reported multiple times, each time with a different ORF, to allow for better filtering using the provided tables.

After all these modifications, now Mikado pick completes AT in ~13 minutes; Chr1 finishes under 3. The 3B scaffolds in the TGAC assembly finished in less than one hour.

Assets 2

23 Feb 02:21

lucventurini

0.15.0

fa6b7a5

Light connections

Major changes for this version:

switched away from clique-based algorithms for finding the communities. By using NetworkX new "connected_components" method, even the toughest loci can be analysed in few seconds;
modified the SQL query retrieval for BLAST data, switching away from using the ORM to use direct SQL queries. The result is a massive speedup, allowing for real multiprocessing;
bug fix for awk_gtf;
Now Mikado pick won't crash if a transcript is invalid, but it will rather emit an error in the log and ignore the felonious record;
The reduction heuristics introduced in the previous version are proven useless with the new community finding algorithm, they are factually disabled by setting the threshold to 1000 nodes with the most connected at 1000 edges.

Assets 2

20 Feb 00:23

lucventurini

0.14.0

2150110

NP reduction

When faced with a complex locus (more than 250 nodes or node with maximal connectivity having more than 200 edges), Mikado now will employ the following algorithm:

First approximation: remove all redundant intron chains (ie those completely contained within another compatible intron chain)
Second approximation: remove all transcripts completely contained within another (class code c)
Third approximation: use the "source" field in the original files to collect transcripts from different sources until the limit is reached.

This ensures that even the most complex loci can be solved relatively quickly and painlessly.

Assets 2

18 Feb 19:24

lucventurini

0.13.5

dc8c519

Approximate clique finding

Introduced an approximate method of clique finding for complex loci (>350 transcripts). Now, in such cases Mikado will just try to find iteratively the maximum clique, remove all nodes from the graph, then repeat until the size of the graph is amenable to the classic Bron-Kerbosch algorithm (350 or less). The method is approximate but much faster than the previous implementation, while using only a fraction of the memory.
Complex loci such as these will be flagged with an "approximate" flag at the superlocus level.

Another change is that now mikado serialise is multi-threaded, an advantage which will useful only in cases where multiple BLAST files have been created.

Assets 2

17 Feb 17:58

lucventurini

0.13.0

a363cce

New community finding

Main changes:

Switched to the [Reid/Daid/Hurley algorithm|http://arxiv.org/pdf/1205.0038.pdf] for community finding; much more efficient for complex regions.
Now compare does not penalize fusions in the refmap.

Assets 2

15 Feb 16:34

lucventurini

0.12.0

b71ba96

Lower underscore

Now ccodes of _ indicate a nucleotide F1 of 80%.
Changes in serialise to make it leaner in the reading of the XML files and the FASTA files.

Assets 2

11 Feb 14:33

lucventurini

0.11.0

a2b19ca

PyFAIDXing and yielding the reference

Changes to mikado prepare, using generators and pyfaidx to make faster and lighter.
Moreover, now Mikado compare returns a richer output in the statistics file.

Assets 2

02 Feb 13:22

lucventurini

0.10.6

6d47725

New tags and badges

Greatest modification: changed the Class Codes, introducing J,C and modifying the meaning of n. This could have repercussions on Pick as well.

Reverted to best bit score for default blast scoring.

Bug fix in the calculation of metrics in sublocus.

Bug fix for SQL delete in the serialisation library.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: EI-CoreBioinformatics/mikado

Total multiprocessing

Correct boundaries

True multiprocessing

Light connections

NP reduction

Approximate clique finding

New community finding

Lower underscore

PyFAIDXing and yielding the reference

New tags and badges