Reference-based tool for simplifying de novo assemblies
Python Shell
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
test
.gitignore
README.md
asm-unifier.py
fastareader.py
lavintervals.py
lavreader.py
quicksect.py

README.md

Assembly Unifier

This tool can simplify a messy assembly with the help of a reference. It will find where contigs overlap and either merge or discard them. For every overlapping region, it can decide which is the best contig to use. Run ./asm-unifier.py -h to see usage.

Requirements: Requires the lastz command to be available on your $PATH.

Caveats:

  • This tool is currently under active development and should be considered in alpha status. Double-check your results on your data.
  • This tool was designed for use on small genomes, so a very large or complex assembly could overwhelm it. But it has been tested on assemblies with thousands of (short) contigs, and handles them well.
  • Currently, it requires adjacent or overlapping contigs with no gaps between them. I.e. it does not fill gaps with N's, though this is planned.
  • Currently, it only works on a single reference sequence at a time, so if you have multiple chromosomes you'll have to split the FASTA file and run once per chromosome.