Skip to content
Steve Bond edited this page Jan 26, 2017 · 8 revisions

--blast, -bl

Description

BLAST is a local alignment algorithm commonly used to search large collections of sequences for likely homologs to a query sequence. The SeqBuddy blast tool can search a pre-existing blast database or a set of subject sequences in any supported format, returning any matches in their entirety. This is a departure from the normal BLAST output, which generally returns local alignment fragments. For reference, the actual BLAST statistics are streamed to standard error so you know how significant and extensive matches are.

Dependencies

The blastn, blastp, makeblastdb, and blastdbcmd binaries, from the [NCBI C++ toolkit] (http://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/) must be present in your system PATH.

If searching pre-made BLAST databases, the -parse_seqids option must have been used when calling the makeblastdb program, for example:

$: makeblastdb -in path/to/fasta_file -out db_name -dbtype {nucl, prot} -parse_seqids 

Arguments

Path to BLAST database ( str )

BLAST databases consist of 6 separate files; provide a relative or absolute path to any of these files or the base name of all files.

BLAST arguments ( str )

SeqBuddy calls blastn/p with the following parameters:

$: blastn -db database -query in_file.fa -out temp.txt -num_threads 4 -evalue 0.01 -outfmt 6

The BLAST programs have a large number of optional parameters, however, and these can be injected into the command used by SeqBuddy. The only commands that you can not change are "db", "query", "subject", "out", and "outfmt"; otherwise, pass all parameters in as a single double-quote enclosed argument (see example 3)

Examples

In the following examples, an assembled transcriptome from the millipede species Abacion magnum is being searched for pannexin sequences, using the known Drosophila sequences as a query.

Input file: Drosophila.nex

#NEXUS
begin data;
    dimensions ntax=8 nchar=316;
    format datatype=protein missing=? gap=-;
matrix
'Dme-Panxδ3' -----GFI---K----IDNMVFRCHYRITAILFTC-CIIVTANNLIGDPISCI--IPMHVINTFCWITYTYTV---A--GPGLE-K--HSYYQWVPFVLFFQGLMFYVPHWVWKM-D-GKIRMITG--VDDRDRIL-KYFVNNT--HNGYSFYFFCELLNFINVIVNIFMVDKFLGGAFMSYGTDVLKFSNMDQ-DRFDPMIEIFPRLTKCTFHKFGPSGSVQKHDTLCVLALNILNEKIYIFLWFWFIILATISGVAVLYSVVI---TR-TIR----------K--EGDFLILHFLSQNLSTRSYSDML-Q----
'Dme-Panxδ7' --L--SV----R-Q-RIDNIVFKLHYRWTVILLVA-TLLITSRQYIGEHIQCL--VVSPVINTFCFFTPTF-VD--P---PGI--D-RHAYYQWVPFVLFFQALCFYIPHALWKW-EGGRIKALVK--LG-MERVKD---IRDM--RLNWG-HVFAEVLNLINLLLQITWTNRFLGGQFLTLG------HALKN-RSDEVV---FPKITKCKFHKFGDSGSIQMHDALCVMALNIMNEKIYIILWFWYAFLLIVTVLGLLWRLCF---VR-WSL----------P-LASNWMFLFFLRSNLS-----E-L----DN
'Dme-Panxδ2' MDVFGSVKGLLKIDQV-DNNVFRMHYKATVIILIAFSLLVTSRQYIGDPIDCIVEIPLGVMDTYCWIYSTFTVPEGRDVQP--GSEKYHKYYQWVCFVLFFQAILFYVPRYLWKSWEGGRLKMLVDLSVNDKDRKIVDYFG-NLNRHNFYAFFFVCEALNFVNVIGQIYFVDFFLDGEFSTYGSDVLKFTELEPDERIDPMARVFPKVTKCTFHKYGPSGSVQTHDGLCVLPLNIVNEKIYVFLWFWFIILSIMSI-SLIYRIAVAPKLRHLLLRARSRAESEVEVAIGDWFLLYQLGKNIDPLIYKEVISDLEMG
'Dme-Panxδ5' MSAVKPLSKYLQFKIRIYDSVFTIHSRCTVVILLTCSLLLSARQYFGDPIQCI-S-EEKNIESYCWTMGTYYNEASIAE--GVEIRQYLRYYQWVIILLLFQSFVFYFPSCLWKVWEGRRLKQLCEVDNTRRM--LVKYFDMHFC----YMAYVFCEVLNFLISVVNIIVLEVFLNGFWSKYLRALW-------DRWV-SV---FPKIAKCELKF-GGSGTANVMDNLCILPLNILNEKIFVFLWAWFL-LALMSGLNLLCRLAICSRLREQMIRTKRHVKRALDLTIGDWFLMMKVSVNVNPMLFRDLMQEL---
'Dme-Panxδ6' MAAVKPLSNYLRLKVRIYDPIFTLHSKCTIVILLTCTFLLSAKQYFGEPILCL-S-SERQADSYCWTMGTYWNEQSIAE--GVETRMYLRYYQWVFMILLFQSLLFYFPSFLWKVWEGQRMEQLCEVDRTRQM--LTRYFPIHWC----YSIYAFCELLNVFISILNFWLMDVVFNGFWYKYIHALW-------NLWM-RV---FPKVAKCEFVY-GPSGTPNIMDILCVLPLNILNEKIFAVLYVWFL-FALLAIMNILYRLLICCPLRLQLLNPKSHVREVLSAGYGDWFVLMCVSINVNPTLFRELLEQL--D
'Dme-Panxδ4' MAAVKPLSKYLQFKVHIYDAIFTLHSKVTVALLLACTFLLSSKQYFGDPIQCF-G-D-KDMDAFCWIYGAYL-QCAVSK--VVEN--YITYYQWVVLVLLLESFVFYMPAFLWKIWEGGRLKHLCDFKRTHRV--LVNYFETHFR----YFVYVFCEILNLSISILNFLLLDVFFGGFWGRYRNALY-------NQWI-AV---FPKCAKCEYKG-GPSGSSNIYDYLCLLPLNILNEKIFAFLWIWFI-LAMLISLKFLYRLAVLYPMRLQLLRPKKHLQVALNCSFGDWFVLMRVGNNISPELFRKLLEEL---
'Dme-Panxδ1' YKLLGSLKSYLKWQIQTDNAVFRLHNSFTTVLLLTCSLIITATQYVGQPISCIVGVP-HVVNTFCWIHSTFTMPDRREVHPGVDF-KYYTYYQWVCFVLFFQAMACYTPKFLWNKFEGGLMRMIVGLNITRKRDALLDYLIKHVKRHKLY-AYWACEFLCCINIIVQMYLMNRFFDGEFLSYGTNIMKLSDVPQEQRVDPMVYVFPRVTKCTFHKYGPSGSLQKHDSLCILPLNIVNEKTYVFIWFWFWILLVLLGL--VFRCIIFPKFRPRLLNASNRIPMECRLDIGDWWLIYMLGRNLDPVIYKDVMSEFQVP
'Dme-Panxδ8' LDIFRGLKNLVKVSVKTDSIVFRLHYSITVMILMSFSLIITTRQYVGNPIDCVTDIP-DVLNTYCWIQSTYTLKSLVSVYPGIGNKKHYKYYQWVCFCLFFQAILFYTPRWLWKSWEGGKIHALIDLDISEKKKLLLDYLWENLRYHNWW-AYYVCELLALINVIGQMFLMNRFFDGEFITFGLKVIDYMETDQEDRMDPMIYIFPRMTKCTFFKYGSSGEVEKHDAICILPLNVVNEKIYIFLWFWFILLTFLTLLTLIYRVIIFPRMRVYLFRMRFRVRRDIEIKMGDWFLLYLLGENIDTVIFRDVVQDLRL-
;
end;

Database directory

$: ls path/to/blastdb
>>> Abacion_magnum.pex  Abacion_magnum.phr  Abacion_magnum.pin  Abacion_magnum.pog
>>> Abacion_magnum.psd  Abacion_magnum.psi  Abacion_magnum.psq

Usage example 1

Use a pre-made blast database to search for matches

$: sb Drosophila.nex -bl /path/to/blastdb/Abacion_magnum

output

Running...
blastp -num_threads 4 -evalue 0.01

# ######################## BLAST results ######################## #
Dme-Panxδ3	4086	38.344	326	129	10	3	256	18	343	3.12e-73	226
Dme-Panxδ3	5440	50.279	179	71	5	74	234	1	179	5.55e-57	180
Dme-Panxδ7	4086	39.273	275	107	9	7	221	20	294	8.45e-56	180
Dme-Panxδ7	5440	45.506	178	65	7	75	221	1	177	6.20e-38	130
Dme-Panxδ2	4086	46.821	346	140	11	2	303	3	348	1.12e-105	310
Dme-Panxδ2	5440	51.055	237	95	6	94	309	1	237	2.66e-78	236
Dme-Panxδ5	4086	32.448	339	168	10	8	285	10	348	1.41e-49	166
Dme-Panxδ5	5440	33.755	237	115	7	94	290	2	236	5.12e-32	116
Dme-Panxδ6	4086	32.378	349	171	12	8	291	10	358	2.48e-48	162
Dme-Panxδ6	5440	35.169	236	115	9	94	291	2	237	1.65e-30	112
Dme-Panxδ4	4086	33.038	339	162	11	8	281	10	348	4.12e-48	162
Dme-Panxδ4	5440	36.596	235	111	10	90	286	2	236	1.85e-33	120
Dme-Panxδ1	5440	50.211	237	96	7	95	309	1	237	4.23e-80	241
Dme-Panxδ1	4086	38.040	347	171	10	1	303	2	348	2.17e-79	243
Dme-Panxδ8	4086	42.486	346	158	9	2	306	3	348	5.37e-99	293
Dme-Panxδ8	5440	51.271	236	95	4	96	311	1	236	6.26e-84	251
# ############################################################### #

Warning: Alignment format detected but sequences are different lengths. Format changed to fasta to accommodate proper printing of records.

>4086 comp4411_c0_seq1|m.4086
MFDVLGSLKSVFLRLKTISVDNSIFKLHYRLTTIILAVFSILVTSKQYLGDPIDCTTSST
TIRAELLDQYCWVSSTYSLPKAFDQKVGRFGHVSHPGIATYHEGDQVIYHQYYQWVCFVL
FLQSMMFYLPHYLWKIWECGRLKALADDIQGPLTSDETKKGKLAAISAYFSTSLFHHNFY
ATRYSICEVLNFANVVGQMFLTNRFLGGTFLTYGTEVIEFSESNQLNRTDPMIKVFPRVT
KCSFFTYGSSGDMQNHDALCVLPVNIINEKIYIVLWFWFIILAVLSGLAIIYRLIVTFSV
RARYLALRSRANSVSRSEIEKIAYNTEFGDWFVLYLLSKNVNSYVFKEVVDVVVKQLDNS
DYVPKEKHGLFKKLPL*
>5440 comp6054_c0_seq1|m.5440
FVLFFQAMLFYIPRFLWKMWEGKRLETIVLGMHVGILTEEEKNNRKKVLLEYLTRHFRRH
TFYAIKYYICELLCLVNVIGQMYLMNKFLGGEFMDYGSRVLEFSEQNQDSRTDPMIYVFP
RMTKCTFHKFGTSGDIQRHDALCVLPLNIVNEKIYIFLWFWFIILATLTALVLCYRILII
AFPKFRPQILHARCRLTPMKTINSVLRNADLGDWFLFYLLGKNMDPCIFREVCIELSKKL
ETAESNNP*

Usage example 2

Search a plain sequence file for blast matches (note that 'Abacion_magnum.fa' contains thousands of sequences, so is not provided here).

$: sb Drosophila.nex -bl /path/to/Abacion_magnum.fa

output

Building a new DB with makeblastdb, current time: 01/26/2017 10:43:02
Sequence type: Protein
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 7193 sequences in 0.650279 seconds.

Running...
blastp -num_threads 4 -evalue 0.01

# ######################## BLAST results ######################## #
Dme-Panxδ3	4086	38.344	326	129	10	3	256	18	343	3.12e-73	226
Dme-Panxδ3	5440	50.279	179	71	5	74	234	1	179	5.55e-57	180
Dme-Panxδ7	4086	39.273	275	107	9	7	221	20	294	8.45e-56	180
Dme-Panxδ7	5440	45.506	178	65	7	75	221	1	177	6.20e-38	130
Dme-Panxδ2	4086	46.821	346	140	11	2	303	3	348	1.12e-105	310
Dme-Panxδ2	5440	51.055	237	95	6	94	309	1	237	2.66e-78	236
Dme-Panxδ5	4086	32.448	339	168	10	8	285	10	348	1.41e-49	166
Dme-Panxδ5	5440	33.755	237	115	7	94	290	2	236	5.12e-32	116
Dme-Panxδ6	4086	32.378	349	171	12	8	291	10	358	2.48e-48	162
Dme-Panxδ6	5440	35.169	236	115	9	94	291	2	237	1.65e-30	112
Dme-Panxδ4	4086	33.038	339	162	11	8	281	10	348	4.12e-48	162
Dme-Panxδ4	5440	36.596	235	111	10	90	286	2	236	1.85e-33	120
Dme-Panxδ1	5440	50.211	237	96	7	95	309	1	237	4.23e-80	241
Dme-Panxδ1	4086	38.040	347	171	10	1	303	2	348	2.17e-79	243
Dme-Panxδ8	4086	42.486	346	158	9	2	306	3	348	5.37e-99	293
Dme-Panxδ8	5440	51.271	236	95	4	96	311	1	236	6.26e-84	251
# ############################################################### #

Warning: Alignment format detected but sequences are different lengths. Format changed to fasta to accommodate proper printing of records.

>4086 comp4411_c0_seq1|m.4086
MFDVLGSLKSVFLRLKTISVDNSIFKLHYRLTTIILAVFSILVTSKQYLGDPIDCTTSST
TIRAELLDQYCWVSSTYSLPKAFDQKVGRFGHVSHPGIATYHEGDQVIYHQYYQWVCFVL
FLQSMMFYLPHYLWKIWECGRLKALADDIQGPLTSDETKKGKLAAISAYFSTSLFHHNFY
ATRYSICEVLNFANVVGQMFLTNRFLGGTFLTYGTEVIEFSESNQLNRTDPMIKVFPRVT
KCSFFTYGSSGDMQNHDALCVLPVNIINEKIYIVLWFWFIILAVLSGLAIIYRLIVTFSV
RARYLALRSRANSVSRSEIEKIAYNTEFGDWFVLYLLSKNVNSYVFKEVVDVVVKQLDNS
DYVPKEKHGLFKKLPL*
>5440 comp6054_c0_seq1|m.5440
FVLFFQAMLFYIPRFLWKMWEGKRLETIVLGMHVGILTEEEKNNRKKVLLEYLTRHFRRH
TFYAIKYYICELLCLVNVIGQMYLMNKFLGGEFMDYGSRVLEFSEQNQDSRTDPMIYVFP
RMTKCTFHKFGTSGDIQRHDALCVLPLNIVNEKIYIFLWFWFIILATLTALVLCYRILII
AFPKFRPQILHARCRLTPMKTINSVLRNADLGDWFLFYLLGKNMDPCIFREVCIELSKKL
ETAESNNP*

Usage example 3

Inject several additional blast parameters

$: sb Drosophila.nex -bl /path/to/blastdb/Abacion_magnum "-max_target_seqs 1 -gapopen 10"

Output

Running...
blastp -num_threads 4 -evalue 0.01 -max_target_seqs 1 -gapopen 10

# ######################## BLAST results ######################## #
Dme-Panxδ3	4086	38.344	326	129	10	3	256	18	343	2.68e-69	210
Dme-Panxδ7	4086	39.273	275	107	9	7	221	20	294	3.49e-53	168
Dme-Panxδ2	4086	46.821	346	140	11	2	303	3	348	4.11e-99	287
Dme-Panxδ5	4086	33.333	339	165	13	8	285	10	348	5.58e-48	156
Dme-Panxδ6	4086	32.951	349	169	14	8	291	10	358	7.03e-47	153
Dme-Panxδ4	4086	33.923	339	159	14	8	281	10	348	8.81e-47	153
Dme-Panxδ1	4086	38.329	347	170	11	1	303	2	348	4.45e-75	226
Dme-Panxδ8	4086	42.486	346	158	9	2	306	3	348	9.06e-93	271
# ############################################################### #

#NEXUS
begin data;
	dimensions ntax=1 nchar=377;
	format datatype=protein missing=? gap=-;
matrix
4086 MFDVLGSLKSVFLRLKTISVDNSIFKLHYRLTTIILAVFSILVTSKQYLGDPIDCTTSSTTIRAELLDQYCWVSSTYSLPKAFDQKVGRFGHVSHPGIATYHEGDQVIYHQYYQWVCFVLFLQSMMFYLPHYLWKIWECGRLKALADDIQGPLTSDETKKGKLAAISAYFSTSLFHHNFYATRYSICEVLNFANVVGQMFLTNRFLGGTFLTYGTEVIEFSESNQLNRTDPMIKVFPRVTKCSFFTYGSSGDMQNHDALCVLPVNIINEKIYIVLWFWFIILAVLSGLAIIYRLIVTFSVRARYLALRSRANSVSRSEIEKIAYNTEFGDWFVLYLLSKNVNSYVFKEVVDVVVKQLDNSDYVPKEKHGLFKKLPL*
;
end;

Main Toolkit Pages





Further Reading

Clone this wiki locally