a lightweight PHP library for parsing NCBI BLAST XML output
PHP CSS
Switch branches/tags
Nothing to show
Permalink
Failed to load latest commit information.
misc Delete xmlBLASTparser_logo_500x125.png Sep 7, 2017
v1.1
LICENSE
README.md Update README.md Sep 14, 2017
test.xml
xmlBLASTparser.php Update xmlBLASTparser.php Sep 10, 2017

README.md

xmlBLASTparser

About

xmlBLASTparser is a lightweight PHP library for parsing an XML formatted NCBI BLAST output and rendering into a colorful HTML page. The database accession number/id in the webpage is properly hyperlinked to the external source database. Moreover, the description summary in the webpage is hyperlinked with anchor link to the corresponding alignment section. The complete list of NCBI standard sequence identifiers are tabulated below:

Tag and Identifier Syntax Identifier Source Description
bbm|integer NCBI GenInfo Backbone database identifier
bbs|integer NCBI GenInfo Backbone database identifier
dbj|coll-accession|locus DNA Database of Japan
emb|coll-accession|entry EBI EMBL Database
gb|coll-accession|locus NCBI GenBank database
gi|integer NCBI GenInfo Integrated Database ("jee-aye")
gim|integer NCBI GenInfo Import identifier
gnl|database|idstring General (user-definable) database and identifier
gp|coll-accession|locus_cds# GenPept (GenBank protein) identifier
lcl|integer Local (user-definable) identifier
oth|accession|name|release Other (user-definable) identifier*
pat|country|patentid|serialno Patent sequence identifier
pdb|entry|chainid Brookhaven Protein Database
pir|accession|entry Protein Information Resource International
prf|accession|name Protein Research Foundation
ref|coll-accession|locus NCBI RefSeq
sp|coll-accession|locus SWISS-PROT database
tpd|coll-accession|name Third party annotation, DDBJ
tpe|coll-accession|name Third party annotation, EMBL
tpg|coll-accession|name Third party annotation, GenBank

*The NCBI has discontinued support for "oth" identifiers, but support for them is maintained in xdformat/xdget.

Usage

xmlBLASTparser can be used to parse XML file format output of the NCBI BLAST sequence alignment result obtained through any one of the following methods:

  • NCBI BLAST - The XML file format output of the sequence alignment can be downloaded from the NCBI BLAST from the result page and loaded into the xmlBLASTparser PHP file. For example,
$xml = simplexml_load_file("V07E2YXG014-Alignment.xml") or die("Error: Cannot able to create object");
$out = file_get_contents("https://blast.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Get&FORMAT_TYPE=XML&FORMAT_OBJECT=Alignment&RID=$rid");
$xml = new SimpleXMLElement($out);
  • Standalone NCBI BLAST - The XML file format output of the sequence alignment result can be obtained by executing the standalone NCBI BLAST executable programs such as blastn.exe, blastp.exe, blastx.exe, tblastn.exe, tblastx.exe, etc. and loaded into the xmlBLASTparser PHP file. For example,
exec('blastp.exe -db pdb -query seq.fa -remote -outfmt 5 -out out.xml');
$xml = file_get_contents("out.xml");

Input

<?xml version="1.0"?>
<!DOCTYPE BlastOutput PUBLIC "-//NCBI//NCBI BlastOutput/EN" "http://www.ncbi.nlm.nih.gov/dtd/NCBI_BlastOutput.dtd">
<BlastOutput>
  <BlastOutput_program>blastp</BlastOutput_program>
  <BlastOutput_version>BLASTP 2.7.0+</BlastOutput_version>
  <BlastOutput_reference>Stephen F. Altschul, Thomas L. Madden, Alejandro A. Sch&amp;auml;ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), &quot;Gapped BLAST and PSI-BLAST: a new generation of protein database search programs&quot;, Nucleic Acids Res. 25:3389-3402.</BlastOutput_reference>
  <BlastOutput_db>pdb</BlastOutput_db>
  <BlastOutput_query-ID>Query_93791</BlastOutput_query-ID>
  <BlastOutput_query-def>KDG85104.1 hypothetical protein AE17_03267, partial [Escherichia coli UCI 58]</BlastOutput_query-def>
  <BlastOutput_query-len>82</BlastOutput_query-len>
  <BlastOutput_param>
    <Parameters>
      <Parameters_matrix>BLOSUM62</Parameters_matrix>
      <Parameters_expect>10</Parameters_expect>
      <Parameters_gap-open>11</Parameters_gap-open>
      <Parameters_gap-extend>1</Parameters_gap-extend>
      <Parameters_filter>F</Parameters_filter>
    </Parameters>
  </BlastOutput_param>
<BlastOutput_iterations>
<Iteration>
  <Iteration_iter-num>1</Iteration_iter-num>
  <Iteration_query-ID>Query_93791</Iteration_query-ID>
  <Iteration_query-def>KDG85104.1 hypothetical protein AE17_03267, partial [Escherichia coli UCI 58]</Iteration_query-def>
  <Iteration_query-len>82</Iteration_query-len>
<Iteration_hits>
<Hit>
  <Hit_num>1</Hit_num>
  <Hit_id>gi|109158070|pdb|2GTS|A</Hit_id>
  <Hit_def>Chain A, Structure Of Protein Of Unknown Function Hp0062 From Helicobacter Pylori</Hit_def>
  <Hit_accession>2GTS_A</Hit_accession>
  <Hit_len>86</Hit_len>
  <Hit_hsps>
    <Hsp>
      <Hsp_num>1</Hsp_num>
      <Hsp_bit-score>25.0238</Hsp_bit-score>
      <Hsp_score>53</Hsp_score>
      <Hsp_evalue>6.53601</Hsp_evalue>
      <Hsp_query-from>52</Hsp_query-from>
      <Hsp_query-to>74</Hsp_query-to>
      <Hsp_hit-from>20</Hsp_hit-from>
      <Hsp_hit-to>42</Hsp_hit-to>
      <Hsp_query-frame>0</Hsp_query-frame>
      <Hsp_hit-frame>0</Hsp_hit-frame>
      <Hsp_identity>9</Hsp_identity>
      <Hsp_positive>16</Hsp_positive>
      <Hsp_gaps>0</Hsp_gaps>
      <Hsp_align-len>23</Hsp_align-len>
      <Hsp_qseq>QFKSLMLKELNFVMNYVFTLETW</Hsp_qseq>
      <Hsp_hseq>RFKELLREEVNSLSNHFHNLESW</Hsp_hseq>
      <Hsp_midline>+FK L+ +E+N + N+   LE+W</Hsp_midline>
    </Hsp>
  </Hit_hsps>
</Hit>
<Hit>
  <Hit_num>2</Hit_num>
  <Hit_id>gi|970842266|pdb|5FCD|A</Hit_id>
  <Hit_def>Chain A, Crystal Structure Of Mccd Protein &gt;gi|970842267|pdb|5FCD|B Chain B, Crystal Structure Of Mccd Protein</Hit_def>
  <Hit_accession>5FCD_A</Hit_accession>
  <Hit_len>267</Hit_len>
  <Hit_hsps>
    <Hsp>
      <Hsp_num>1</Hsp_num>
      <Hsp_bit-score>25.409</Hsp_bit-score>
      <Hsp_score>54</Hsp_score>
      <Hsp_evalue>8.26162</Hsp_evalue>
      <Hsp_query-from>61</Hsp_query-from>
      <Hsp_query-to>81</Hsp_query-to>
      <Hsp_hit-from>174</Hsp_hit-from>
      <Hsp_hit-to>194</Hsp_hit-to>
      <Hsp_query-frame>0</Hsp_query-frame>
      <Hsp_hit-frame>0</Hsp_hit-frame>
      <Hsp_identity>10</Hsp_identity>
      <Hsp_positive>14</Hsp_positive>
      <Hsp_gaps>0</Hsp_gaps>
      <Hsp_align-len>21</Hsp_align-len>
      <Hsp_qseq>LNFVMNYVFTLETWYSFFVLR</Hsp_qseq>
      <Hsp_hseq>INFRPNPLWTLEYWHQFFSER</Hsp_hseq>
      <Hsp_midline>+NF  N ++TLE W+ FF  R</Hsp_midline>
    </Hsp>
  </Hit_hsps>
</Hit>
<Hit>
  <Hit_num>3</Hit_num>
  <Hit_id>gi|257097223|pdb|3FX7|A</Hit_id>
  <Hit_def>Chain A, Crystal Structure Of Hypothetical Protein Of Hp0062 From Helicobacter Pylori &gt;gi|257097224|pdb|3FX7|B Chain B, Crystal Structure Of Hypothetical Protein Of Hp0062 From Helicobacter Pylori</Hit_def>
  <Hit_accession>3FX7_A</Hit_accession>
  <Hit_len>94</Hit_len>
  <Hit_hsps>
    <Hsp>
      <Hsp_num>1</Hsp_num>
      <Hsp_bit-score>25.0238</Hsp_bit-score>
      <Hsp_score>53</Hsp_score>
      <Hsp_evalue>9.03233</Hsp_evalue>
      <Hsp_query-from>52</Hsp_query-from>
      <Hsp_query-to>74</Hsp_query-to>
      <Hsp_hit-from>20</Hsp_hit-from>
      <Hsp_hit-to>42</Hsp_hit-to>
      <Hsp_query-frame>0</Hsp_query-frame>
      <Hsp_hit-frame>0</Hsp_hit-frame>
      <Hsp_identity>9</Hsp_identity>
      <Hsp_positive>16</Hsp_positive>
      <Hsp_gaps>0</Hsp_gaps>
      <Hsp_align-len>23</Hsp_align-len>
      <Hsp_qseq>QFKSLMLKELNFVMNYVFTLETW</Hsp_qseq>
      <Hsp_hseq>RFKELLREEVNSLSNHFHNLESW</Hsp_hseq>
      <Hsp_midline>+FK L+ +E+N + N+   LE+W</Hsp_midline>
    </Hsp>
  </Hit_hsps>
</Hit>
</Iteration_hits>
  <Iteration_stat>
    <Statistics>
      <Statistics_db-num>93500</Statistics_db-num>
      <Statistics_db-len>23509168</Statistics_db-len>
      <Statistics_hsp-len>0</Statistics_hsp-len>
      <Statistics_eff-space>0</Statistics_eff-space>
      <Statistics_kappa>0.041</Statistics_kappa>
      <Statistics_lambda>0.267</Statistics_lambda>
      <Statistics_entropy>0.14</Statistics_entropy>
    </Statistics>
  </Iteration_stat>
</Iteration>
</BlastOutput_iterations>
</BlastOutput>

Output

xmlBLASTparser_v1.1 Output

Support

Please feel free to sent your queries, suggestions and/or comments related to xmlBLASTparser program to ashok.bioinformatics@gmail.com or ashok@biogem.org.

License

xmlBLASTparser is made available under version 3 of the GNU Lesser General Public License.