/
README
126 lines (101 loc) · 4.15 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
kopilot v0.1 released 2014-02-09
Copyright 2014 by Marc Chevrette under the GNU GPLv2
Direct questions and bug reports to:
chevrm at gmail dot com
README last updated 2014-02-09
NOTE: THIS SOFTWARE HAS BEEN RELEASED WITH THE HOPE THAT ALL WHO IMPLIMENT
IT INTO WORKFLOWS THAT LEAD TO PUBLICATIONS, CITE APPROPRIATELY.
Collaborations requesting custom implementations of kopilot are
welcome; contact via email above.
Manuscript in preparation. Please contact via above email to use
kopilot in publications.
Summary:
kopilot is a suite of scripts meant to make output from the KOBAS
pathway annotation pipeline human readable and easily imported into
statistical software, such as R. It also allows for quick sub-
setting of reads.
Installation:
- Download master.zip from github.com/chevrm/kopilot
- Unpack and run ./install.sh
Dependencies (installed by install-mm if necessary)
- bioperl
Programs and databases mentioned that are not directly a part of kopilot
- ncbi-blast
- KOBAS
- KEGG
- R
Optional:
Add kopilot dir to your path and .bashrc.
OR
Make a symbolic link in your bin (or another dir in $PATH)
> ln -s /full/path/to/kopilot-program ~/bin/kopilot-program
What is kopilot?
KOBAS is an annotation webserver that can be used to elucidate
biochemical pathways in NGS sequence data using KEGG orthology (KO).
kopilot can be used in conjuciton with KOBAS to simplify workflow
and prepare data for statistical analysis (in R, for example).
It is a suite of scripts to allow for standardized analysis and
pipeline creation. Currently, it is not a one-command pipeline.
Refer to the example workflow below.
What isn't kopilot?
kopilot is not a replacement for KOBAS, or a pipeline that uses
KOBAS stand-alone (...yet; Perhaps in v1.0 blast and KOBAS stand-
alone will be incorporated to make kopilot all inclusive). As of
v0.1, BLAST and KOBAS still need to be processed outside of kopilot.
Refer to the example workflow below.
External resources:
- KEGG pathways
http://www.genome.jp/kegg-bin/get_htext?ko00001.keg
- KOBAS annotate webserver
http://kobas.cbi.pku.edu.cn/program.inputForm.do?program=Annotate
- KOBAS protein fastas for making blast database(s)
http://kobas.cbi.pku.edu.cn/site/download_fasta.jsp
Example workflow from raw fastq:
> fq2fa example.fastq > example.fna
> fanameshorten example.fna r
> blastx -query example.renamed.fa -db ko.pep.db -out
bx.ko.example.fa -evalue 1e-5 -num_threads 8 -culling_limit
10 -outfmt 6
# Upload bx.ko.example.fa to the KOBAS annotate webserver
http://kobas.cbi.pku.edu.cn/program.inputForm.do?program=Annotate
# Download KOBAS results, here example.annotate
> kocount example.annotate > example.kc.tab
> kopilot r example.annotate > example.kp.tab
# load example.kc.tab or example.kp.tab into R
Suite:
- fq2fa
Simple fastq to fasta converter.
usage:
fq2fa input.fastq > new.fasta
- fanameshorten
Renames fasta to indexed reads in order to keep file sizes
under the limit of the KOBAS webserver.
usage:
fanameshorten example.fna prefix
- kombiner
For the combination of multiple KOABS annotate files into
one. This is a useful preprocessing step for other
kopilot scripts if, for example, you have split up your
KO blast queries into multiple files.
usage:
kombiner 1.annotate 2.annotate 3.annotate > combo.annotate
- kocount
Produces tab table of pure KO-hit counts from KOBAS
annotate. Can take many annotate files.
usage:
kocount readprefix 1.annotate 2.annotate > out.kc.tab
- kopilot
Produces tab table of categorized KO-hits from KOBAS
annotate.
usage:
kopilot readprefix kobas.annotate > out.kp.tab
Test:
test.annotate is included in ./test/ to allow for testing of
post-KOBAS scripts.
Citations:
Kanehisa, M., & Goto, S. (2000). KEGG: kyoto encyclopedia of
genes and genomes. Nucleic Acids Research, 28(1), 27–30.
Xie, C., Mao, X., Huang, J., Ding, Y., Wu, J., Dong, S., … Wei, L.
(2011). KOBAS 2.0: a web server for annotation and identification
of enriched pathways and diseases. Nucleic Acids Research,
39(Web Server issue), W316–22.