Multi-purpose bioinformatics toolbox
Perl Shell Other Other Python Ruby Other
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
bin testing mafft alignment Feb 25, 2016
src readme Mar 14, 2016
COPYING.txt renamed license to copying, as per GPL v3 Sep 4, 2013
README.txt minor updates Nov 9, 2014


Bioinformatics Toolbox (BTBox)
A multi-purpose collection of bioinformatics tools written in Perl, R, and C.

Contributors: Bryan P. White

White, B.P., Pilgrim E.M., Boykin L.M., Stein E.D., Mazor R.D. 2014. Comparing 
four species delimitation methods applied to a DNA barcode data set of insect 
larvae for use in routine bioassessment. Freshwater Science 33(1), 338-348.

Commonly used commands:

Genbank Downloader:

	Download all COI sequences for Gastropoda from NCBI: -query COI_full -batch-cap 500 -term Gastropoda

	Only download sequences with a voucher ID: -query COI_full -voucher-only 1 -batch-cap 500 
	-term Gastropoda

	Use a list of taxa: -list Gastropoda_list.txt -query COI_full 
	-voucher-only 1 -batch-cap 500

	Include pubmed information such as abstracts: -list Gastropoda_list.txt -query ND2_full 
	-batch-cap 500 -pubmed 1 -outp OutputFile

Process genbank files:
*Requires MAFFT to be in the path already

	Automatically quality check sequences downloaded using the Genbank Downloader -gb Gastropoda_COI.csv -out Gastropoda_COI_qc 
	-match COI_Match.fas -otu-cutoff 0.01

	Use a specific match sequence, use 3 threads during MAFFT alignment -gb Gastropoda_COI.csv -out Gastropoda_COI_qc 
	-match COI_Gastropoda_Match.fas -otu-cutoff 0.01 -threads 3

Clustering/OTU Delimitation:

	Delimit clusters at a 2% genetic distance cutoff (Kimura 2-parameter) -aln1 sample_baetis_seqs.fas -cutoff 0.02

	Skip calculating intra-OTU pairwise distances -shortcut-freq 0.05 -skip-intra-dist 1 
	-aln1 sample_baetis_seqs.fas

	Just count OTU's, skip intra dist, randomly splice the alignment -aln1 sample_baetis_seqs.fas -shortcut-freq 0.05 
	-ran-splice 1 -skip-intra-dist 1 -pseudo-reps 1

	Various bootstrapping methods -shortcut-freq 0.05 -ran-splice 1 -skip-intra-dist 1 
	-pseudo-reps 1 -aln1 sample_baetis_seqs.fas -shortcut-freq 0.05 -skip-intra-dist 1 -bootstrap 1 
	-bootstrap-size 500 -pseudo-reps 10 -ran-splice 1 -aln1 sample_baetis_seqs.fas -shortcut-freq 0.05 -skip-intra-dist 1 -bootstrap 1 
	-bootstrap-size 500 -pseudo-reps 100 -ran-splice 1 -aln1 sample_baetis_seqs.fas -shortcut-freq 0.05 -skip-intra-dist 1 -bootstrap 1 
	-bootstrap-size 500 -pseudo-reps 2000 -specific-splice 1:50 
	-aln1 sample_baetis_seqs.fas

	Pseudo-repping correspondence: -shortcut-freq 0.05 -skip-intra-dist 1 -bootstrap 1
	 -bootstrap-size 500 -pseudo-reps 100 -skip-nn 1 -aln1 sample_baetis_seqs.fas

	Pseudo-repping splicing: -shortcut-freq 0.05 -skip-intra-dist 1 -bootstrap 1 
	-bootstrap-size 500 -pseudo-reps 1000 -skip-nn 1 -min-aln-length 25 
	-ran-splice 1 -aln1 sample_baetis_seqs.fas

	Specific splice for primer: -shortcut-freq 0.05 -skip-intra-dist 1 -bootstrap 1 
	-bootstrap-size 500 -pseudo-reps 1000 -skip-nn 1 -min-aln-length 25 
	-specific-splice 1:135 -aln1 sample_baetis_seqs.fas

	Printing short-read simulation (short splice) -shortcut-freq 0.05 -skip-intra-dist 1 -skip-nn 1 
	-min-aln-length 654 -aln1 sample_baetis_seqs.fas -print-spliced-aln 1 
	-spliced-aln-size 400 -print-ref-seq 0 -shortcut-freq 0.05 -skip-intra-dist 1 -skip-nn 1 
	-aln1 sample_baetis_seqs.fas -print-spliced-aln 1 -spliced-aln-size 135 
	-print-ref-seq 1

454 Pipeline
*Requires NCBI Standalone BLAST+ to be in the path already

	Parse raw 454 output -aln 454_output -out 454_output.fas -clean 1

	Create a standalone BLAST database to check sequences against
	makeblastdb -in metazoan_db.fas -dbtype nucl

	BLAST uknown sequences against a BLAST database -aln 454_output.fas -out 454_output_labeled.fas 
	-db metazoan_db.fas -max_target_seqs 10

	Summarize results -aln TD90_1_test1.fas -out TD90_1_test1.csv