NinjaMap is no longer supported, please consider using nf-ninjamap (https://github.com/FischbachLab/nf-ninjamap) instead.
NinjaIndex is no longer supported, please consider using nf-ninjaindex (https://github.com/FischbachLab/nf-ninjaindex) instead.
ninjaMap is a software tool to calculate strain abundance for a given microbial database.
This tool runs in two steps, ninjaIndex and ninjaMap. It will accept a directory of your reference genomes (one genome per file). It calculate the uniqueness of the genome in the database along with other contigs related metadata, and return a binmap file along with a concatenated fasta file of your references.
- Nextflow (https://www.nextflow.io/)
- Python 3.7
- numpy
- scipy
- numba
- pysam
- SeqIO
- samtools
- bowtie2
- bbmap
Step 1. The ninjaIndex pipeline is built using Nextflow and processes data using the following steps:
- [ART] - Generate synthetic short reads for each genome
- [Bowtie2] - align reads to all reference genomes
- [ninjaIndex] - generate ninja index for a given synthetic community
Step 2. The ninjaMap pipeline accurately quantify a strain with abundance.
The input of the ninjaIndex is a list of genome files in fasta format.
nextflow run ./nf-core-ninjaindex/main.nf --genomes 's3://bucket/input/*.fna' --outdir 's3://bucket/output/' -profile aws
OR
aws batch submit-job \
--profile maf \
--job-name nf-ninjaindex \
--job-queue default-maf-pipelines \
--job-definition nextflow-production \
--container-overrides command=s3://nextflow-pipelines/ninjaindex,\
"--genomes","'s3://dev-scratch/ReferenceDBs/NinjaMap/Index/12Com/fasta/*.fna'",\
"--outdir","s3://genomics-workflow-core/Results/NinjaIndex/12Com/db"
The main input of the ninjaMap is a binmap file generated from the 1st step, and a sorted BAM file and its indexed bam.bai file must be present in same directory.
./ninjaMap/scripts/ninjaMap.py -threads 16 -bam sample_sorted.bam -bin binmap.tsv -prefix mycommunity
A wrapper script ninjaMap/ninjaMap_index.sh is provided to run ninjaMap on aws via batch.
The generic command to run a ninjaMap docker container:
docker container run \
-v /host/path/to/indata/:/input_data/ \
-v /host/path/to/outdata/:/output_data/ \
fischbachlab/ninjamap \
python ./scripts/ninjaMap.py \
-bin /input_data/binmap.tsv \
-bam /input_data/input.sorted.bam \
-outdir /output_data/summary \
-prefix mycommunity
python ./ninjaMap/scripts/ninjaMap.py --help
Description:
This script will calculate the abundance of a strain in a defined microbial community.
Usage: ninjaMap.py -bam sorted.bam -bin binmap.tsv -prefix my_community
optional arguments:
-h, --help show this help message and exit
-bam BAMFILE sorted bam file and its indexed bam.bai file must be present in same directory.
-bin BINMAP tab-delimited file with Col1= contig name and Col2=Bin/Strain name
-outdir OUTDIR output directory
-prefix PREFIX output prefix
-threads THREADS number of threads available for this job and subprocesses
-debug save intermediate false positives bam file
-truth TRUTH If using debug, please provide one strain name that you would like to track.
-mbq MIN_BASE_QUAL minimum read base quality to consider for coverage calculations.
The output files are organized into 4 folders.
The alignment file of all input reads aligned the defined community database in the bam format
The running logs of various scripts
- *.ninjaMap.abundance.csv: this file shows the statistics of the abundance, coverage and depth of each strain in the defined community
- Strain_Name: strain name
- Read_Fraction: the abundance in the defined community in percentage
- Percent_Coverage: the average coverage per strain in percentage
- Coverage_Depth: the average coverage depth
- *.ninjaMap.read_stats.csv: this file shows the statistics of input reads
- File_Name: sample name
- Reads_Aligned: the number of aligned reads
- Reads_wPerfect_Aln: the number of perfectly aligned reads
- Reads_wSingular_Votes: the number of reads voted as singular
- Reads_wEscrowed_Votes: the number of reads voted as escrow
- Discarded_Reads_w_Perfect_Aln: the number of discarded perfectly aligned reads
-
*.ninjaMap.strain_stats.csv: this file shows the various statistics of each strains
-
*.ninjaMap.votes.csv.gz: the statistics of reads voting (singular or escrow)
-
adapter_trimming_stats_per_ref.txt: this file shows the statistics of adapter trimming
-
read_accounting.csv: this file shows the statistics shows the total number of reads, the number of reads after trimming and the number of aligned reads
The aggregated output files are organized into 6 files.
- *.covDepth.csv: this file shows the average coverage depth per strain by samples
- *.host_contaminants.csv: this file shows the detected host contaminants (Human or Mouse) by samples if the unalignment rate is over 5%
- *.long.csv: this is the long format of three files (*.readFraction.csv, *.covDepth.csv and *.percCoverage.csv)
- *.percCoverage.csv: this file shows the average coverage per strain in percentage by samples
- *.reads_stats.csv: this file shows the reads statistics in read numbers by samples
- *.readFraction.csv: this file shows the abundance in the defined community in percentage by samples
To be added
[GNU GPL] (https://www.gnu.org/licenses/gpl-3.0.html)
- Sunit Jain
- Xiandong Meng (xdmeng at stanford.edu)
- PI: Michael Fischbach ( fischbach at fischbachgroup.org )