Correct misassemblies using linked reads
Cut sequences at positions with few spanning molecules.
Written by Shaun Jackman, Lauren Coombe, and Justin Chu.
Shaun D. Jackman, Lauren Coombe, Justin Chu, Rene L. Warren, Benjamin P. Vandervalk, Sarah Yeo, Zhuyi Xue, Hamid Mohamadi, Joerg Bohlmann, Steven J.M. Jones and Inanc Birol (2018). Tigmint: correcting assembly errors using linked reads from large molecules. BMC Bioinformatics, 19(1). doi:10.1186/s12859-018-2425-6
Tigmint identifies and corrects misassemblies using linked reads from 10x Genomics Chromium. The reads are first aligned to the assembly, and the extents of the large DNA molecules are inferred from the alignments of the reads. The physical coverage of the large molecules is more consistent and less prone to coverage dropouts than that of the short read sequencing data. The sequences are cut at positions that have insufficient spanning molecules. Tigmint outputs a BED file of these cut points, and a FASTA file of the cut sequences.
Each window of a specified fixed size is checked for a minimum number of spanning molecules. Sequences are cut at those positions where a window with sufficient coverage is followed by some number of windows with insufficient coverage is then followed again by a window with sufficient coverage.
Install Tigmint using Brew
brew install tigmint
Install Tigmint using Conda
conda install -c bioconda tigmint
Install Tigmint using PyPI
pip3 install tigmint
Run Tigmint using Docker
docker run -it bcgsc/tigmint
Install Tigmint from the source code
Download and extract the source code. Compiling is not needed.
git clone https://github.com/bcgsc/tigmint && cd tigmint
curl -L https://github.com/bcgsc/tigmint/archive/master.tar.gz | tar xz && mv tigmint-master tigmint && cd tigmint
Install Python package dependencies
pip3 install intervaltree pybedtools pysam statistics
Install the dependencies of Tigmint
brew install bedtools bwa samtools
Install the dependencies of ARCS (optional)
brew tap brewsci/bio brew install arcs links-scaffolder
Install the dependencies for calculating assembly metrics (optional)
brew install abyss seqtk
To run Tigmint on the draft assembly
draft.fa with the reads
reads.fq.gz, which have been run through
samtools faidx draft.fa bwa index draft.fa bwa mem -t8 -p -C draft.fa reads.fq.gz | samtools sort -@8 -tBX -o draft.reads.sortbx.bam tigmint-molecule draft.reads.sortbx.bam | sort -k1,1 -k2,2n -k3,3n >draft.reads.molecule.bed tigmint-cut -p8 -o draft.tigmint.fa draft.fa draft.reads.molecule.bed
bwa mem -Cis used to copy the BX tag from the FASTQ header to the SAM tags.
samtools sort -tBXis used to sort first by barcode and then position.
Alternatively, you can run the Tigmint pipeline using the Makefile driver script
tigmint-make. To run Tigmint on the draft assembly
myassembly.fa with the reads
myreads.fq.gz, which have been run through
tigmint-make tigmint draft=myassembly reads=myreads
To run both Tigmint and scaffold the corrected assembly with ARCS:
tigmint-make arcs draft=myassembly reads=myreads
To run Tigmint, ARCS, and calculate assembly metrics using the reference genome
tigmint-make metrics draft=myassembly reads=myreads ref=GRCh38 G=3088269832
tigmint-makeis a Makefile script, and so any
makeoptions may also be used with
tigmint-make, such as
- The file extension of the assembly must be
.faand the reads
.fq.gz, and the extension is not included in the parameters
reads. These specific file name requirements result from implementing the pipeline in GNU Make.
tigmint: Run Tigmint, and produce a file named
arcs: Run Tigmint and ARCS, and produce a file name
metrics: Run, Tigmint, ARCS, and calculate assembly metrics using
abyss-samtobreak, and produce TSV files.
Parameters of Tigmint
draft: Name of the draft assembly,
reads: Name of the reads,
span=20: Number of spanning molecules threshold
window=1000: Window size (bp) for checking spanning molecules
minsize=2000: Minimum molecule size
as=0.65: Minimum AS/read length ratio
nm=5: Maximum number of mismatches
dist=50000: Maximum distance (bp) between reads to be considered the same molecule
mapq=0: Mapping quality threshold
trim=0: Number of bases to trim off contigs following cuts
t=8: Number of threads
Parameters of ARCS
Parameters of LINKS
Parameters for calculating assembly metrics
ref: Reference genome,
ref.fa, for calculating assembly contiguity metrics
G: Size of the reference genome, for calculating NG50 and NGA50
- If your barcoded reads are in multiple FASTQ files, the initial alignments of the barcoded reads to the draft assembly can be done in parallel and merged prior to running Tigmint.
- When aligning with BWA-MEM, use the
-Coption to include the barcode in the BX tag of the alignments.
- Sort by BX tag using
samtools sort -tBX.
- Merge multiple BAM files using
samtools merge -tBX.
Using stLFR linked reads
To use stLFR linked reads with Tigmint, you will need to re-format the reads to have the barcode in a
BX:Z: tag in the read header.
For example, this format
@V100002302L1C001R017000000#0_0_0/1 0 1 TGTCTTCCTGGACAGCTGACATCCCTTTTGTTTTTCTGTTTGCTCAGATGCTGTCTCTTATACACATCTTAGGAAGACAAGCACTGACGACATGATCACC + FFFFFFFGFGFFGFDFGFFFFFFFFFFFGFFF@FFFFFFFFFFFF@FFFFFFFFFGGFFEFEFFFF?FFFFGFFFGFFFFFFFGFFEFGFGGFGFFFGFF
should be changed to:
@V100002302L1C001R017000000 BX:Z:0_0_0 TGTCTTCCTGGACAGCTGACATCCCTTTTGTTTTTCTGTTTGCTCAGATGCTGTCTCTTATACACATCTTAGGAAGACAAGCACTGACGACATGATCACC + FFFFFFFGFGFFGFDFGFFFFFFFFFFFGFFF@FFFFFFFFFFFF@FFFFFFFFFGGFFEFEFFFF?FFFFGFFFGFFFFFFFGFFEFGFGGFGFFFGFF
After first looking for existing issue at https://github.com/bcgsc/tigmint/issues, please report a new issue at https://github.com/bcgsc/tigmint/issues/new. Please report the names of your input files, the exact command line that you are using, and the entire output of Tigmint.