Scaffolding contigs with transcripts
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
LICENSE.md
README.md
SCUBAT_v2.py

README.md

SCUBAT2

Overview

SCUBAT2 (Scaffolding Contigs Using BLAST And Transcripts v2) uses transcriptome or proteome information to scaffold the genome. It was inspired by the original SCUBAT algorithm by Ben Elsworth.

Requirements

Python Libraries

Biopython - to parse BLAST XML file

Numpy - to calculate some statistics

Details

Requires a BLAST XML file

blastn -query transcripts.fa -db contigs.fa -evalue 1e-25 -outfmt 5 -out blast.xml

For the same species the default settings for identity cutoff should be okay

The user must specify the max allowed intron size (i.e for nematode species ~ 20000 bp). Alternatively the user can run the program with --intron_size_run that creates the file intron_size which has the intron sizes calculated by the mapped transcripts

Example command

SCUBAT_v2.py -b [blast.xml] -f [assembly.file] -max 20000