Scaffolding draft assemblies using reference assemblies and minimizer graphs
Description of the algorithm
ntJoin takes a target assembly and one or more 'reference' assembly as input, and uses information from the reference(s) to scaffold the target assembly. The 'reference' assemblies can be true reference assembly builds, or a different draft genome assemblies.
Instead of using costly alignments, ntJoin uses a more lightweight approach using minimizer graphs to yield a mapping between the input assemblies.
Main steps in the algorithm:
- Generate an ordered minimizer sketch for each contig of each input assembly
- Filter the minimizers to only retain minimizers that are:
- Unique within each assembly
- Found in all assemblies (target + all references)
- Build a minimizer graph
- Nodes: minimizers
- Edges: between minimizers that are adjacent in at least one of the assemblies. Edge weights are the sum of weights of the assemblies that support an edge.
- Filter the graph based on the minimum edge weight (
- For each node that is a branch node (degree > 2), filter the incident edges with an increasing edge threshold
- Each linear path is converted to a list of oriented target assembly contig regions to scaffold together
- Target assembly scaffolds are printed out
Original concept: Rene Warren
Design and implementation: Lauren Coombe
If you use ntJoin in your research, please cite:
Lauren Coombe, Vladimir Nikolic, Justin Chu, Inanc Birol, Rene L Warren: ntJoin: Fast and lightweight assembly-guided scaffolding using minimizer graphs. bioRxiv (2020) doi: https://doi.org/10.1101/2020.01.13.905240.
Usage: ntJoin assemble target=<target scaffolds> references='List of reference assemblies' reference_weights='List of weights per reference assembly' Options: target Target assembly to be scaffolded in fasta format references List of reference files (separated by a space, in fasta format) target_weight Weight of target assembly  reference_weights List of weights of reference assemblies prefix Prefix of intermediate output files [out.k<k>.w<w>.n<n>] t Number of threads  k K-mer size for minimizers  w Window size for minimizers (bp)  n Minimum graph edge weight  g Minimum gap size (bp)  m Minimum percentage of increasing/decreasing minimizer positions to orient contig  mkt If True, use Mann-Kendall Test to predict contig orientation (computationally-intensive, overrides 'm') [False] agp If True, output AGP file describing output scaffolds [False] Notes: - Ensure the lists of reference assemblies and weights are in the same order, and that both are space-separated - Ensure all assembly files are in the current working directory
ntJoin help prints the help documentation.
- Target assembly to scaffold: my_scaffolds.fa
- Two assemblies to use as 'references': assembly_ref1.fa, assembly_ref2.fa
- Giving the target asssembly a weight of '1' and each reference assembly a weight of '2'
- Using k=32, w=500
- Ensure that all input assembly files are in or have soft-links to the current working directory
ntJoin assemble target=my_scaffolds.fa target_weight=1 references='assembly_ref1.fa assembly_ref2.fa' reference_weights='2 2' k=32 w=500
- Scaffolded targeted assembly (
- Path file describing how target assembly was scaffolded (
- Unfiltered minimizer graph in dot format (
- If agp=True specified, AGP describing how target assembly was scaffolded (
Installing ntJoin using Brew
brew install brewsci/bio/ntjoin
Installing ntJoin using Conda
conda install -c bioconda ntjoin
Installing ntJoin from the source code
git clone https://github.com/bcgsc/ntJoin.git cd src make
Python dependencies can be installed with:
pip3 install -r requirements.txt
Testing your Installation
tests/test_installation.sh to test your ntJoin installation and see an example command.
ntJoin Copyright (c) 2020 British Columbia Cancer Agency Branch. All rights reserved.
ntJoin is released under the GNU General Public License v3
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 3.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.
For commercial licensing options, please contact Patrick Rebstein email@example.com