Skip to content

Commit

Permalink
Test case for row of dashes in plain text BLAST
Browse files Browse the repository at this point in the history
Based on GitHub issue #554 reported by @breviata
  • Loading branch information
peterjc committed Jun 4, 2015
1 parent d03b5e4 commit 8f4bac5
Show file tree
Hide file tree
Showing 2 changed files with 217 additions and 0 deletions.
3 changes: 3 additions & 0 deletions Tests/Blast/README.txt
Expand Up @@ -5,6 +5,9 @@ This directory contains various data files for testing the
BLAST-related code in Biopython. All files are grouped by BLAST
release version, from the most recent first.

BLAST+ 2.2.30
-------------
text_2230_blastp_001 single query, match with full line of dashes

BLAST 2.2.28+
-------------
Expand Down
214 changes: 214 additions & 0 deletions Tests/Blast/text_2230_blastp_001.txt
@@ -0,0 +1,214 @@
BLASTP 2.2.30+


Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro A.
Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.
Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of
protein database search programs", Nucleic Acids Res. 25:3389-3402.


Reference for composition-based statistics: Alejandro A. Schaffer,
L. Aravind, Thomas L. Madden, Sergei Shavirin, John L. Spouge, Yuri
I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001),
"Improving the accuracy of PSI-BLAST protein database searches with
composition-based statistics and other refinements", Nucleic Acids
Res. 29:2994-3005.



Database: subject.fasta
1 sequences; 1,853 total letters



Query= TR11080zzzc0_g2_i2_0

Length=1691
Score E
Sequences producing significant alignments: (Bits) Value

gi|308468219|ref|XP_003096353.1| CRE-AMA-1 protein [Caenorhabdi... 1901 0.0


> gi|308468219|ref|XP_003096353.1| CRE-AMA-1 protein [Caenorhabditis
remanei]
Length=1853

Score = 1901 bits (4924), Expect = 0.0, Method: Compositional matrix adjust.
Identities = 975/1817 (54%), Positives = 1275/1817 (70%), Gaps = 149/1817 (8%)

Query 11 VKEVSYLQFGVLSPDEMKEMSVCKIEFPQTYENGVPKEGGLSDPRLGTMDRTQLCRTCNS 70
++ V +QFG+L P+E+K MSV +EFP+ YENG PK GGL DPR G +DR C TC
Sbjct 12 LRTVCRVQFGILGPEEIKRMSVAHVEFPEIYENGKPKLGGLMDPRQGVIDRRGRCMTCAG 71

Query 71 DARECPGHFGHIVLAKPMYHVGFLPFVLKILRCVCFFCSKLLVDSNDPRLKSILAQ--NH 128
+ +CPGHF H+ LAKP++H+GFL LKILRCVCF+C +LL+D ++PR+ IL + +
Sbjct 72 NLTDCPGHFAHLELAKPVFHIGFLTKSLKILRCVCFYCGRLLIDKSNPRVIDILKKTSGN 131

Query 129 HRRRIQSMMNLCRTKKICE-AGDDADDLAEHETEP-EKQRKPHGGCGNFQPNITKDGLRL 186
++R+ + +LC++K +CE A + + L + +P ++K GCG +QP+ + G+ +
Sbjct 132 AKKRLALIYDLCKSKSVCEGAAEKEEGLPDDVDDPMSGEKKIPAGCGRYQPSYRRVGIDI 191

Query 187 LAEFK-HVSDESIEKKQVLSAEKVYEILKKITDEDCRMMGLDPKFARPDWMVLTIFPVPP 245
AE+K +V++++ E+K +L+AE+V E+ K+ITDED ++G+DP+FARP+WM+ T+ PVPP
Sbjct 192 NAEWKKNVNEDTQERKIMLTAERVLEVFKQITDEDILVIGMDPQFARPEWMICTVLPVPP 251

Query 246 PPVRPSILMDSSSRGEDDLTCKLADIIKSNHALRQQELSGSPAHIITEFTQILQYHVATY 305
VRP+++ S++ +DDLT KL+DIIK+N L++ E +G+ AH++T+ ++LQYHVAT
Sbjct 252 LAVRPAVVTFGSAKNQDDLTHKLSDIIKTNQQLQRNEANGAAAHVLTDDVRLLQYHVATL 311

Query 306 LDNELPGLPPAIQRSGRPLKSIRQRLRGKNGRVRGNLMGKRVDFSARTVITPDANIGIDE 365
+DN +PGLP A Q+ GRPLKSI+QRL+GK GR+RGNLMGKRVDFSARTVIT D N+ ID
Sbjct 312 VDNCIPGLPTATQKGGRPLKSIKQRLKGKEGRIRGNLMGKRVDFSARTVITADPNLPIDT 371

Query 366 VGVPRSIALNLTFPDIVTPLNIDRMYEYVRNGPREYPGAKYIVRDDGSRLDLRYIRKPSD 425
VGVPR+IA NLTFP+IVTP N+D++ E V G +YPGAKYI+R++G+R+DLRY + +D
Sbjct 372 VGVPRTIAQNLTFPEIVTPFNVDKLQELVNRGDTQYPGAKYIIRENGARVDLRYHPRAAD 431

Query 426 LHLDYGYKVERHLRDGDFVLFNRQPSLHKMSIMAHRVKLLPFSTFRLNLSVTSPYNADFD 485
LHL GY+VERH++DGD ++FNRQP+LHKMS+M HRVK+LP+STFR+NLSVTSPYNADFD
Sbjct 432 LHLQPGYRVERHMKDGDIIVFNRQPTLHKMSMMGHRVKILPWSTFRMNLSVTSPYNADFD 491

Query 486 GDEMNLHVPQSLEAIAEAQELMLVPRQIISPQANKPVIGLVQDVLIGARNMTKRDTFIEL 545
GDEMNLH+PQSLE AE +E+ +VPRQ+I+PQANKPV+G+VQD L R MTKRD FI+
Sbjct 492 GDEMNLHLPQSLETRAEIEEICMVPRQLITPQANKPVMGIVQDTLCAVRMMTKRDVFIDW 551

Query 546 DTVMNILMCTENFDGRIPMPAILKPKKLWTGKQLFSLILP-NVNLIRFTSTHPDGE---- 600
+M++LM +DG++P PAILKPK LWTGKQLFSLI+P NVN++R STHPD E
Sbjct 552 SFMMDLLMYLPTWDGKVPQPAILKPKPLWTGKQLFSLIIPGNVNVLRTHSTHPDSEDSGP 611

Query 601 YTHISPGDTKVLIENGDLISGILCKRTLGTSGGSLIHIICNEHGHDTARLFLNQAQKVVN 660
Y ISPGDTKVLIE+G+L+SGI+C +T+G S G+L+H++ E G++ A F + Q V+N
Sbjct 612 YKWISPGDTKVLIEHGELLSGIVCSKTVGKSAGNLLHVVTLELGYEIAANFYSHIQTVIN 671

Query 661 NWLVNIGFSIGIGDTIADEATMEQINKTIASAKNQVKELVLQAQQNILECQPGRTLHESF 720
WL+ +G +IGIGDTIAD AT I TI AK V +++ +A + LE PG TL ++F
Sbjct 672 AWLLRVGHTIGIGDTIADHATYLDIQNTIKKAKQDVVDIIEKAHNDDLEPTPGNTLRQTF 731

Query 721 ENKVNKVLNTARDTAGTSAQNSLKESNNVKSMVTAGSKGSFINISQMIACVGQQNVEGKR 780
ENKVN++LN ARD G+SAQ SL E NN KSMV +GSKGS INISQ+IACVGQQNVEGKR
Sbjct 732 ENKVNQILNDARDRTGSSAQKSLSEFNNFKSMVVSGSKGSKINISQVIACVGQQNVEGKR 791

Query 781 IPYGFGHRTLPHFTKDDYGPESRGFVENSYLRGLTPQEFFFHAMGGREGLIDTAVKTSET 840
IP+GF HRTLPHF KDDYGPES+GFVENSYL GLTP EFFFHAMGGREGLIDTAVKT+ET
Sbjct 792 IPFGFRHRTLPHFIKDDYGPESKGFVENSYLAGLTPSEFFFHAMGGREGLIDTAVKTAET 851

Query 841 GYIQRRLVKAMEDLMVRYDGTVRNSLGCIIQFSYGEDGMDGAFVESQKLEILRLGDKAFQ 900
GYIQRRL+KAME +MV YDGTVRNSL ++Q YGEDG+DG +VE Q + ++ + F+
Sbjct 852 GYIQRRLIKAMESVMVNYDGTVRNSLAQMVQLRYGEDGLDGMWVEDQNMPTMKPNNAVFE 911

Query 901 TLYFMDPTKANFGEDFLDQEVVEDVRNSSEAYELLSAEYEELKRCRRVLRQEVIPSGDDT 960
+ MD T F ++VV +++ S + L+ +E+ +L+ RR+LR+ + P GD
Sbjct 912 RDFRMDLTDNKFLRKNYSEDVVREIQESEDGISLVESEWSQLEEDRRLLRK-IFPRGDAK 970

Query 961 WPLPVNLRRLIWNAQVIFRLDTRKSTDLSPVKIIKGVKTLLSRLIVVKGDDEVSLEAQES 1020
LP NL+RLIWNAQ IF++D RK +LSP+ +I GV+ L +LI+V G+DE+S +AQ +
Sbjct 971 IVLPCNLQRLIWNAQKIFKVDLRKPVNLSPLHVINGVRELSKKLIIVSGNDEISKQAQYN 1030

Query 1021 CTMLFGILLKSTLASKRVLKEFRLNTVAFDWVLGEIESRFMQAIVQPSEAVGAIAAQSIG 1080
T+L ILL+STL +K++ +LNT AFDW+LGE+ESRF QAI QP E VGA+AAQS+G
Sbjct 1031 ATLLMNILLRSTLCTKKMCTSAKLNTEAFDWLLGEVESRFQQAIAQPGEMVGALAAQSLG 1090

Query 1081 EPATQMTLNTFHYAGVSSKNVTLGVPRLKELINVAKKVKTPSLTVYLLPHCAKDSERAKS 1140
EPATQMTLNTFHYAGVS+KNVTLGVPRLKE+INV+K++KTPSLTV+L AKD+E+AK
Sbjct 1091 EPATQMTLNTFHYAGVSAKNVTLGVPRLKEIINVSKQLKTPSLTVFLTGAAAKDAEKAKD 1150

Query 1141 VQCQLEHATLNTVTASTEIFYDPDPTTTVIEEDREFVQAYFEMPDEDISLEKISPWLLRI 1200
V C+LEH TL VT +T I+YDPDP TVI ED E+V ++EMPD D+S + SPWLLRI
Sbjct 1151 VLCKLEHTTLKKVTLNTAIYYDPDPKNTVIAEDEEWVSIFYEMPDHDLS--RTSPWLLRI 1208

Query 1201 VLNREMMTDKKLSMADIAEKINIEFGCDTLCIFNDDNAEKLVLHVRIMNDQD--PKHEET 1258
L+R+ M DKKL+M IA++I+ FG D I+ DDNAEKLV +RI + EE
Sbjct 1209 ELDRKRMVDKKLTMEMIADRIHGGFGNDVHTIYTDDNAEKLVFRLRIAGEDKGADTQEEQ 1268

Query 1259 ISEEATTTFLKQLEANMLSEMTLKGIDQIRKVYMRE-----ARKTFFDVDGRIATENEWI 1313
+ + FL+ +EANMLS++TL+GI I KVYM + ++ +G +WI
Sbjct 1269 VDKMEDDVFLRCIEANMLSDLTLQGIPAISKVYMNQPNTDDKKRIIITPEGGFKAVADWI 1328

Query 1314 LETDGCNLLQVMSCPEVDFSRTTSNDIVEIIQVLGIEAARAALLKEIRDVISFDGSYVNY 1373
LETDG LL+V++ ++D RTTSNDI EI +VLGIEA R A+ +E+ +VISFDGSYVNY
Sbjct 1329 LETDGTALLRVLAERQIDPVRTTSNDICEIFEVLGIEAVRKAIEREMDNVISFDGSYVNY 1388

Query 1374 RHLAILVDFMTYRGYLMSITRHGINRNVTGPLMRCSFEETVEILMESAAFAEADHLRGVT 1433
RHLA+L D MT +G+LM+ITRHGINR G LMRCSFEETV+ILME++ AE D ++GV+
Sbjct 1389 RHLALLCDVMTAKGHLMAITRHGINRQEVGALMRCSFEETVDILMEASVHAEVDPVKGVS 1448

Query 1434 ENIILGQLGKFGTGSFDVFLNEKMLREAVDIPLPDGLEGQELFG--------DTSPTHQM 1485
ENI+LGQL + GTG FD+ L+ + + ++IP + G + SP+H
Sbjct 1449 ENIMLGQLARCGTGCFDLVLDVEKCKYGMEIPQNVVMGAGYYGGFAGSPNAHEFSPSH-- 1506

Query 1486 TPFETMGTP--GGA--------------FSPSTPNDGAMFSPFN--GYSNEATFSPSG-- 1525
+P+ + TP GGA FSP+ DG SPFN G+S + P G
Sbjct 1507 SPWNSGVTPSYGGASWSPGAGGMSPSAGFSPAGNMDGGA-SPFNEGGWSPASPGDPLGAL 1565

Query 1526 SPSSP----FSP--YTPASPGYSPSSPAYSPSSPAYSPTSP------------------- 1560
SP +P SP Y+P SP +S +SP YSP+SP+YSPTSP
Sbjct 1566 SPRTPAYGGMSPGAYSPTSPQFSMTSPHYSPTSPSYSPTSPAAGQSPASPSYSPTSPSYS 1625

Query ------------------------------------------------------------

Sbjct 1626 PTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPSSPRYSPTSP 1685

Query 1561 AYSPTSPAYSPTSPAYSPTSPAYSPTSPAYS----------PTSPAYSPTSPAYSPTSPA 1610
YSPTSP YSPTSP YSPTSP YSPTSP+Y P+SP YSPTSP+YSPTSP
Sbjct 1686 TYSPTSPTYSPTSPTYSPTSPTYSPTSPSYEGYSPSSPKYSPSSPTYSPTSPSYSPTSPQ 1745

Query 1611 YSPTSPAYSPTSPAYSPTSPAYSPTSPAYSPTSPAYSPTSPAYTPTSPAYTPTSPAYTPT 1670
YSPTSP YSP+SP Y+P+SP Y+PTSP SP YSPTSP Y+PTSP+YTP+SP Y+PT
Sbjct 1746 YSPTSPQYSPSSPTYTPSSPTYNPTSP--RAFSPQYSPTSPTYSPTSPSYTPSSPQYSPT 1803

Query 1671 SPAYTPTSPAYTPTSPA 1687
SP YTP SP+ P + A
Sbjct 1804 SPTYTP-SPSDQPGTSA 1819


Score = 166 bits (419), Expect = 1e-44, Method: Compositional matrix adjust.
Identities = 88/133 (66%), Positives = 101/133 (76%), Gaps = 13/133 (10%)

Query 1568 AYSPTSPAYSPTSPAYSPTSPAYSPTSPAYS----------PTSPAYSPTSPAYSPTSPA 1617
YSPTSP YSPTSP YSPTSP YSPTSP+Y P+SP YSPTSP+YSPTSP
Sbjct 1686 TYSPTSPTYSPTSPTYSPTSPTYSPTSPSYEGYSPSSPKYSPSSPTYSPTSPSYSPTSPQ 1745

Query 1618 YSPTSPAYSPTSPAYSPTSPAYSPTSPAYSPTSPAYTPTSPAYTPTSPAYTPTSPAYTPT 1677
YSPTSP YSP+SP Y+P+SP Y+PTSP SP Y+PTSP Y+PTSP+YTP+SP Y+PT
Sbjct 1746 YSPTSPQYSPSSPTYTPSSPTYNPTSP--RAFSPQYSPTSPTYSPTSPSYTPSSPQYSPT 1803

Query 1678 SPAYTPTSPAYTP 1690
SP YTP SP+ P
Sbjct 1804 SPTYTP-SPSDQP 1815


Score = 140 bits (352), Expect = 9e-37, Method: Compositional matrix adjust.
Identities = 73/110 (66%), Positives = 83/110 (75%), Gaps = 15/110 (14%)

Query 1597 YSPTSPAYSPTSPAYSPTSPAYSPTSPAYS----------PTSPAYSPTSPAYSPTSPAY 1646
YSPTSP YSPTSP YSPTSP YSPTSP+Y P+SP YSPTSP+YSPTSP Y
Sbjct 1687 YSPTSPTYSPTSPTYSPTSPTYSPTSPSYEGYSPSSPKYSPSSPTYSPTSPSYSPTSPQY 1746

Query 1647 SPTSPAYTPTSPAYTPTSPAYTPT-----SPAYTPTSPAYTPTSPAYTPT 1691
SPTSP Y+P+SP YTP+SP Y PT SP Y+PTSP Y+PTSP+YTP+
Sbjct 1747 SPTSPQYSPSSPTYTPSSPTYNPTSPRAFSPQYSPTSPTYSPTSPSYTPS 1796



Lambda K H a alpha
0.317 0.134 0.394 0.792 4.96

Gapped
Lambda K H a alpha sigma
0.267 0.0410 0.140 1.90 42.6 43.6

Effective search space used: 2948400


Database: subject.fasta
Posted date: Jun 3, 2015 4:53 PM
Number of letters in database: 1,853
Number of sequences in database: 1



Matrix: BLOSUM62
Gap Penalties: Existence: 11, Extension: 1
Neighboring words threshold: 11
Window for multiple hits: 40

0 comments on commit 8f4bac5

Please sign in to comment.