Scaffolding contigs with transcripts
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.



SCUBAT2 (Scaffolding Contigs Using BLAST And Transcripts v2) uses transcriptome or proteome information to scaffold the genome. It was inspired by the original SCUBAT algorithm by Ben Elsworth.


Python Libraries

Biopython - to parse BLAST XML file

Numpy - to calculate some statistics


Requires a BLAST XML file

blastn -query transcripts.fa -db contigs.fa -evalue 1e-25 -outfmt 5 -out blast.xml

For the same species the default settings for identity cutoff should be okay

The user must specify the max allowed intron size (i.e for nematode species ~ 20000 bp). Alternatively the user can run the program with --intron_size_run that creates the file intron_size which has the intron sizes calculated by the mapped transcripts

Example command -b [blast.xml] -f [assembly.file] -max 20000