kopilot is a suite of scripts meant to make output from the KOBAS pathway annotation pipeline human readable and easily imported into statistical software, such as R. It also allows for quick subsetting of reads.
License
chevrm/kopilot
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
kopilot v0.1 released 2014-02-09 Copyright 2014 by Marc Chevrette under the GNU GPLv2 Direct questions and bug reports to: chevrm at gmail dot com README last updated 2014-02-09 NOTE: THIS SOFTWARE HAS BEEN RELEASED WITH THE HOPE THAT ALL WHO IMPLIMENT IT INTO WORKFLOWS THAT LEAD TO PUBLICATIONS, CITE APPROPRIATELY. Collaborations requesting custom implementations of kopilot are welcome; contact via email above. Manuscript in preparation. Please contact via above email to use kopilot in publications. Summary: kopilot is a suite of scripts meant to make output from the KOBAS pathway annotation pipeline human readable and easily imported into statistical software, such as R. It also allows for quick sub- setting of reads. Installation: - Download master.zip from github.com/chevrm/kopilot - Unpack and run ./install.sh Dependencies (installed by install-mm if necessary) - bioperl Programs and databases mentioned that are not directly a part of kopilot - ncbi-blast - KOBAS - KEGG - R Optional: Add kopilot dir to your path and .bashrc. OR Make a symbolic link in your bin (or another dir in $PATH) > ln -s /full/path/to/kopilot-program ~/bin/kopilot-program What is kopilot? KOBAS is an annotation webserver that can be used to elucidate biochemical pathways in NGS sequence data using KEGG orthology (KO). kopilot can be used in conjuciton with KOBAS to simplify workflow and prepare data for statistical analysis (in R, for example). It is a suite of scripts to allow for standardized analysis and pipeline creation. Currently, it is not a one-command pipeline. Refer to the example workflow below. What isn't kopilot? kopilot is not a replacement for KOBAS, or a pipeline that uses KOBAS stand-alone (...yet; Perhaps in v1.0 blast and KOBAS stand- alone will be incorporated to make kopilot all inclusive). As of v0.1, BLAST and KOBAS still need to be processed outside of kopilot. Refer to the example workflow below. External resources: - KEGG pathways http://www.genome.jp/kegg-bin/get_htext?ko00001.keg - KOBAS annotate webserver http://kobas.cbi.pku.edu.cn/program.inputForm.do?program=Annotate - KOBAS protein fastas for making blast database(s) http://kobas.cbi.pku.edu.cn/site/download_fasta.jsp Example workflow from raw fastq: > fq2fa example.fastq > example.fna > fanameshorten example.fna r > blastx -query example.renamed.fa -db ko.pep.db -out bx.ko.example.fa -evalue 1e-5 -num_threads 8 -culling_limit 10 -outfmt 6 # Upload bx.ko.example.fa to the KOBAS annotate webserver http://kobas.cbi.pku.edu.cn/program.inputForm.do?program=Annotate # Download KOBAS results, here example.annotate > kocount example.annotate > example.kc.tab > kopilot r example.annotate > example.kp.tab # load example.kc.tab or example.kp.tab into R Suite: - fq2fa Simple fastq to fasta converter. usage: fq2fa input.fastq > new.fasta - fanameshorten Renames fasta to indexed reads in order to keep file sizes under the limit of the KOBAS webserver. usage: fanameshorten example.fna prefix - kombiner For the combination of multiple KOABS annotate files into one. This is a useful preprocessing step for other kopilot scripts if, for example, you have split up your KO blast queries into multiple files. usage: kombiner 1.annotate 2.annotate 3.annotate > combo.annotate - kocount Produces tab table of pure KO-hit counts from KOBAS annotate. Can take many annotate files. usage: kocount readprefix 1.annotate 2.annotate > out.kc.tab - kopilot Produces tab table of categorized KO-hits from KOBAS annotate. usage: kopilot readprefix kobas.annotate > out.kp.tab Test: test.annotate is included in ./test/ to allow for testing of post-KOBAS scripts. Citations: Kanehisa, M., & Goto, S. (2000). KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Research, 28(1), 27–30. Xie, C., Mao, X., Huang, J., Ding, Y., Wu, J., Dong, S., … Wei, L. (2011). KOBAS 2.0: a web server for annotation and identification of enriched pathways and diseases. Nucleic Acids Research, 39(Web Server issue), W316–22.
About
kopilot is a suite of scripts meant to make output from the KOBAS pathway annotation pipeline human readable and easily imported into statistical software, such as R. It also allows for quick subsetting of reads.
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published