# The SearchIO object model

SearchIO parses a search output file’s contents into a hierarchy of four nested objects: QueryResult, Hit, HSP, and HSPFragment. Each of them models a part of the search output file:

* **QueryResult** represents a search query. This is the main object returned by the input functions and it contains all other objects.

* **Hit represents** a database hit,

* **HSP** represents high-scoring alignment region(s) in the hit,

* **HSPFragment** represents a contiguous alignment within the HSP

# HSPFragment

In [1]:
from Bio import SearchIO

blast_qresult = SearchIO.read("my_blast.xml", "blast-xml")
blast_frag = blast_qresult[0][0][0] # first hit, first hsp, first fragment
print(blast_frag)

      Query: ABL1_a MLEICLKLVGCKSKKGLSSSSSCYLEEALQRPVASDFEPQGLSE
        Hit: gi|568815589|ref|NC_000009.12| Homo sapiens chromosome 9, GRCh38...
Query range: [514:1086] (0)
  Hit range: [130883964:130885680] (1)
  Fragments: 1 (572 columns)
     Query - SDPLDHEPAVSPLLPRKERGPPEGGLNEDERLLPKDKKTNLFSALIXXXXXXXXXXXXR~~~DIVQR
             SDPLDHEPAVSPLLPRKERGPPEGGLNEDERLLPKDKKTNLFSALIKKKKKTAPTPPKR~~~DIVQR
       Hit - SDPLDHEPAVSPLLPRKERGPPEGGLNEDERLLPKDKKTNLFSALIKKKKKTAPTPPKR~~~DIVQR


In [None]:
blat_qresult = SearchIO.read("my_blat.psl", "blat-psl")
blat_frag = blat_qresult[0][0][0] # first hit, first hsp, first fragment
print(blat_frag)

In [None]:
blat_qresult = SearchIO.read("my_blat.psl", "blat-psl")
blat_hsp = blat_qresult[0][1] # first hit, second hsp

for frag in blat_hsp:
    print (frag)

# Exercise 1:

* Retrieve the first 5 entries having search term "starch AND Malus Domestica [Organism]" and store them in a fasta file. 
* Write a python function that aligns the sequences in the fasta file against the NCBI nr database limiting the hits to the Malus Domestica organism and prints to screen the following info for each hsp:

    * The title;

    * Score and e-value;

    * The number of identities and positives and the alignment length.

# Exercise 2:

* Write a python function that reads all the entries of a blast file in .xml format and outputs all the hits having at least one HSP with bitscore > B, alignment length > A and minimum percentage of identity > I, where B, A and I are input thresholds. 

* Test the function with an appropriate data.