README

kopilot v0.1 released 2014-02-09
Copyright 2014 by Marc Chevrette under the GNU GPLv2
Direct questions and bug reports to:
	chevrm at gmail dot com

README last updated 2014-02-09

NOTE:   THIS SOFTWARE HAS BEEN RELEASED WITH THE HOPE THAT ALL WHO IMPLIMENT
        IT INTO WORKFLOWS THAT LEAD TO PUBLICATIONS, CITE APPROPRIATELY.

        Collaborations requesting custom implementations of kopilot are
        welcome; contact via email above. 

        Manuscript in preparation.  Please contact via above email to use
        kopilot in publications.   


Summary:
	kopilot is a suite of scripts meant to make output from the KOBAS
	pathway annotation pipeline human readable and easily imported into
	statistical software, such as R.  It also allows for quick sub-
	setting	of reads.

Installation:
	- Download master.zip from github.com/chevrm/kopilot
	- Unpack and run ./install.sh 

Dependencies (installed by install-mm if necessary)
	- bioperl

Programs and databases mentioned that are not directly a part of kopilot
	- ncbi-blast
	- KOBAS
	- KEGG
	- R

Optional:
        Add kopilot dir to your path and .bashrc.
                OR
        Make a symbolic link in your bin (or another dir in $PATH)
        > ln -s /full/path/to/kopilot-program ~/bin/kopilot-program

What is kopilot?
	KOBAS is an annotation webserver that can be used to elucidate
	biochemical pathways in NGS sequence data using KEGG orthology (KO).
	kopilot can be used in conjuciton with KOBAS to simplify workflow
	and prepare data for statistical analysis (in R, for example).
	It is a suite of scripts to allow for standardized analysis and
	pipeline creation.  Currently, it is not a one-command pipeline.
	Refer to the example workflow below.

What isn't kopilot?
	kopilot is not a replacement for KOBAS, or a pipeline that uses
	KOBAS stand-alone (...yet;  Perhaps in v1.0 blast and KOBAS stand-
	alone will be incorporated to make kopilot all inclusive).  As of
	v0.1, BLAST and KOBAS still need to be processed outside of kopilot.
	Refer to the example workflow below.

External resources:
	- KEGG pathways
	  http://www.genome.jp/kegg-bin/get_htext?ko00001.keg

	- KOBAS annotate webserver
	  http://kobas.cbi.pku.edu.cn/program.inputForm.do?program=Annotate

	- KOBAS protein fastas for making blast database(s)
	  http://kobas.cbi.pku.edu.cn/site/download_fasta.jsp

Example workflow from raw fastq:
	> fq2fa example.fastq > example.fna
	> fanameshorten example.fna r
	> blastx -query example.renamed.fa -db ko.pep.db -out 
	  bx.ko.example.fa -evalue 1e-5 -num_threads 8 -culling_limit
	  10 -outfmt 6
	# Upload bx.ko.example.fa to the KOBAS annotate webserver
	  http://kobas.cbi.pku.edu.cn/program.inputForm.do?program=Annotate
	# Download KOBAS results, here example.annotate
	> kocount example.annotate > example.kc.tab
	> kopilot r example.annotate > example.kp.tab
	# load example.kc.tab or example.kp.tab into R

Suite:
	- fq2fa
		Simple fastq to fasta converter.
		usage:
		fq2fa input.fastq > new.fasta
	
	- fanameshorten
		Renames fasta to indexed reads in order to keep file sizes
		under the limit of the KOBAS webserver.
		usage:
		fanameshorten example.fna prefix

	- kombiner
		For the combination of multiple KOABS annotate files into
		one.  This is a useful preprocessing step for other 
		kopilot scripts if, for example, you have split up your
		KO blast queries into multiple files.
		usage:  
		kombiner 1.annotate 2.annotate 3.annotate > combo.annotate

	- kocount
		Produces tab table of pure KO-hit counts from KOBAS
		annotate.  Can take many annotate files.
		usage:
		kocount readprefix 1.annotate 2.annotate > out.kc.tab

	- kopilot
		Produces tab table of categorized KO-hits from KOBAS
		annotate.
		usage:
		kopilot readprefix kobas.annotate > out.kp.tab

Test:
	test.annotate is included in ./test/ to allow for testing of
	post-KOBAS scripts.

Citations:
	Kanehisa, M., & Goto, S. (2000). KEGG: kyoto encyclopedia of 
	genes and genomes. Nucleic Acids Research, 28(1), 27–30. 

	Xie, C., Mao, X., Huang, J., Ding, Y., Wu, J., Dong, S., … Wei, L. 
	(2011). KOBAS 2.0: a web server for annotation and identification 
	of enriched pathways and diseases. Nucleic Acids Research, 
	39(Web Server issue), W316–22.