VirKraken
An extension to identify viral elements in Kraken 2 outputs
Report Bug
·
Request Feature
Table of Contents
About VirKraken
This project was created to identify viral contigs using Kraken 2 in the publication Cite. This extension allows a user to get the headers of viral elements and provides the ability to retain only those headers or remove them if the sequencing file is provided.
Getting Started
To get a local copy up and running follow these simple steps.
Prerequisites
VirKraken requires Python 3 and the following libraries (if installling through pip, libraries are automatically install)
- pandas
- scikit-learn
- biopython
- importlib-resources
Installation
VirKraken is aviliable on PyPI and can be forked on this repository. The easiest way to install VirKraken is to use pip.
pip install virkraken
Usage
VirKraken works as a command line script. Once install via pip, virkraken the command can be accessed. To get the help screen type:
virkraken -h
The paramters of VirKraken are:
- -f: Kraken output file [required]
- -c: Seqeuncing file to parse [optional]
- -r: Remove viral elements flag
- -o: Rename output files [optional]
Running VirKraken
VirKraken without a sequencing file and renaming the output
virkraken -f Kraken_Output.txt -o Viral_Sequences
The script above will return Viral_Sequences.csv which will contain a column of sequnce headers and NCBI TaxIDs. All returned sequence headers are viral.
VirKraken to filter a seqeunce file
virkraken -f Kraken_Output.txt -c final.contigs.fa -o Viral_Sequences
The script above will return Viral_Sequences.csv and Viral_Sequences.fasta. All returned sequences are viral. VirKraken will filter the input fasta for sequence headers matching the predicted viral headers.
VirKraken to remove viral sequences
virkraken -r -f Kraken_Output.txt -c final.contigs.fa -o Filtered_Sequences
Seqeunce headers that are assigned a viral designation are removed from the resulting fasta file output.
Roadmap
Current Version: 0.0.5
Improvements to be made:
- Fix .gz contig fasta outfile
- Allow paired .fastq input
- Integrate into Kraken 2 codebase
See the open issues for a list of proposed features (and known issues).
Contributing
Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.
License
Distributed under the MIT License. See LICENSE
for more information.
Contact
Cody Glickman - @glickman_Cody - glickman.cody@gmail.com
Project Link: https://github.com/Strong-Lab/VirKraken
Acknowledgements
- Jo Hendrix