Skip to content

Latest commit

 

History

History
80 lines (59 loc) · 3.24 KB

Quickstart.md

File metadata and controls

80 lines (59 loc) · 3.24 KB

Quickstart

Installation

Kaiju can be downloaded and compiled from source, or easily installed via the bioconda channel:

conda install -c bioconda kaiju

Obtaining a Kaiju index

Kaiju requires an index file created from a reference database of protein sequences.

You can either create such an index locally or download a pre-built index.

For example, to download the Kaiju index for the NCBI BLAST nr database, download the index file with

wget https://kaiju.binf.ku.dk/database/kaiju_db_nr_2023-05-10.tgz

and unpack the tar archive with:

tar xzf kaiju_db_nr_2023-05-10.tgz

which will give these 3 files:

kaiju_db_nr.fmi
nodes.dmp
names.dmp

The Kaiju index itself is in the file kaiju_db_nr.fmi, containing the Borrows-Wheeler-Transform and the FM-Index of the protein sequences, wereas nodes.dmp and names.dmp contain the taxonomic tree and taxon names from the NCBI taxonomy.

Running Kaiju

To run Kaiju with the downloaded and unpacked files run:

kaiju -t nodes.dmp -f kaiju_db_nr.fmi -i sequencing_reads.fastq.gz

For paired-end reads use:

kaiju -t nodes.dmp -f kaiju_db_nr.fmi -i sequencing_reads_R1.fastq.gz -j sequencing_reads_R2.fastq.gz

Note: The reads must be in the same order in both files!

Kaiju can read input files in FASTQ or FASTA format, which may be gzip-compressed.

By default, Kaiju will print the output to the terminal (STDOUT). The output can also be written to a file using the -o option:

kaiju -t nodes.dmp -f kaiju_db.fmi -i sequencing_reads.fastq.gz -o kaiju.out

Kaiju can use multiple parallel threads, which can be specified with the -z option, e.g. for using 25 parallel threads:

kaiju -z 25 -t nodes.dmp -f kaiju_db.fmi -i sequencing_reads.fastq.gz -o kaiju.out

Multiple samples can be processed at once using kaiju-multi.

Kaiju has two run modes and several command-line parameters that influence the classification accuracy, see the original paper and the README.

Output

Kaiju will print one line for each read or read pair. The default output format contains three columns separated by tabs:

  1. either C or U, indicating whether the read is classified or unclassified.
  2. name of the read
  3. NCBI taxon identifier of the assigned taxon

Using the option -v enables the verbose output, which will print additional columns.

The included program kaiju2table converts Kaiju's output file(s) into a summary table for a given taxonomic rank and kaiju2krona creates a file for making a Krona visualisation.