A tool for viewing the contents of metagenomic samples
Files in this repository
- format_input.py: takes a file that contains a single BLAST query and taxon ID per line and formats it correctly as input for Keanu
- format is
contig_query_name taxonID_1 [counts], taxonID_2 [count], ...
- format is
- make_db.py: creates Keanu taxonomy database and merged/deleted database from NCBI taxonomy database . Select the taxdmp file and decompress it.
- keanu.py: creates the visualization by taking the output of format_input.py and make_db.py and parsing them
In order to reduce sequence duplication, the reads can be assembled with some assembler. ABySS was used as Keanu was develeoped. This step is optional.
Use BLAST to align the reads (or the assembled reads if the reads were assembled) to a database like the BLAST Nucleotide database. While the choice of output format is up to the user, using
outfmt '6 std staxids' to include the standard output format 6 with the additional subject taxon ID field is recommended if the BLAST results will be used in with other tools. Then, the query ID field and the subject taxon ID can be extracted from the BLAST results using
cut -f1,13 blast.results.txt > query.staxids.txt. If the BLAST results aren't needed for anything else, use
outfmt '6 qseqid staxids' to get the proper format.
If the BLAST alignments have already been completed, the subject taxon ID can be extracted from the database used for the alignments with the
blastdbcmd from BLAST. To get the file in the proper format, a combination of
blastdbcmd must be used. The command is
paste <(cut -f1 blast.results.txt) <(cut -f2 blast.results.txt | blastdbcmd -db /path/to/db -entry_batch - -outfmt '%T') > query.staxids.txt.
Making the database
The following command is used to create the
merged_deleted.dat databases necessary for running Keanu. There are no optional parameters. The input files - names.dmb, nodes.dmp, delnodes.dmp, and merged.dmp - come from the taxdmp file located here
python3 make_db.py -names names.dmp -nodes nodes.dmp -out_db taxonomy.dat -deleted delnodes.dmp -merged merged.dmp -out_md_db merged_deleted.dat
Formatting the input
The following command is used to format data from BLAST into the input file for Keanu.
query.staxids.txt should contain a single sequence ID and a single taxon ID per line, with the two IDs separated by a tab.
python3 format_input.py -in query.staxids.txt -out sample_name.keanu.txt
python3 keanu.py -db taxonomy.dat -md_db merged_deleted.dat -in input/sample_name.keanu.txt -view bilevel -out output/sample.bilevel.html
python3 keanu.py -db taxonomy.dat -md_db merged_deleted.dat -in input/sample_name.keanu.txt -view tree -out output/sample.tree.html
The paper describing Keanu can be found here.