# Running BLAST searches via BioPython
In this practical we will replicate within a Jupyter notebook what we did earlier via the NCBI BLAST webpage. 

## Installation

We can use the BioPython package for this task. BioPython is not another programming language, but a package that provides a lot of functionality for handling all sorts of Bioinformatics tasks. It is mostly focussed on sequence analysis, but also has a protein structure module. If you face one of these tedious Bioinformatics tasks, such extracting/filtering data from a bigger dataset, going from one format to another, etc. it (often) comes very handy.

BioPython is not installed yet, let's get it via the pip command. Remember what the "!" does?

In [None]:
!pip install biopython

## Loading
A slightly weird feature is that we load BioPython not via "import biopython" as you might assume. Also it is rather big, and normally we just load the parts we actually need. So ...

In [None]:
# load webBLAST and sequence modules
from Bio.Blast import NCBIWWW
from Bio import SeqIO

BioPython modules are fairly well documented. If you want to find out about available methods, parameters etc. Have a look at [NCBIWWW](https://biopython.org/docs/1.81/api/Bio.Blast.NCBIWWW.html) and [SeqIO](https://biopython.org/docs/1.81/api/Bio.SeqIO.html)

## Do stuff
Now let's start and read the PHOSPHO1 sequence we have downloaded earlier from a file we (hopefully) have saved.

In [None]:
# generate sequence object from file, 
record = SeqIO.read("Phospho1.fasta", format="fasta")

In [None]:
record

Now we have our input sequence ready. The next line of code does all the work. 

In [None]:
# run blast and store as result object. This will take a while note the *. Don't continue before the search is completed
result = NCBIWWW.qblast("blastp", "swissprot", record.seq, expect=10, format_type="XML")

In [None]:
result

Finally, let's save the output in a file.

In [None]:
# save blast results in file
with open("Phospho1_blast.XML", "w") as out_handle:
    out_handle.write(result.read())

# and close the result handle
result.close()


Let's quickly check whether the outputfile has indeed been generated.

In [None]:
!ls -ltr

All done :-)
But always a good idea to look at your data. Typically I'd use the *head* and *tail* commands in a terminal to check the content of the "Phospho1_blast.text" file, but it's a small file so let's use *cat* here to display the full content of the file. If we look at the content of the "Phospho1_blast.text" do you see anything suspicious?

In [None]:
!cat Phospho1_blast.text

The first few lines look odd and are likely to break any BLAST output parser that is expecting a standard BLAST text output. Hence, (and for a few other reasons) it is better to go for XML-formatted output. So let's fix and re-run.

## Try for yourself
We did a few more BLAST searches over the web earlier. Can you replicate them with BioPython? <BR>
An interactive notebook is great for showing what's going on, but in a production environment you would want to run such searches non-interactively. Can you condense this notebook into a script and run it from terminal? 