<a href="https://colab.research.google.com/github/christiam/jupyter-notebooks/blob/master/BLAST_taxonomic_filtering.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Taxonomic filtering in BLAST+
This notebook demonstrates how to get the latest BLAST+ release and use its taxonomic filtering feature. This feature is available starting in the BLAST+ 2.8.0 release. 



## Download the latest BLAST+ release
As of this writing (December 5, 2018), the latest release is BLAST+ 2.8.0. We will need to patch `update_blastdb.pl` as well  as install its dependencies.

In [0]:
! curl -s ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.8.0alpha/ncbi-blast-2.8.0-alpha+-x64-linux.tar.gz | tar -zxf -
! curl -s ftp://ftp.ncbi.nlm.nih.gov/blast/temp/update_blastdb.pl -o ncbi-blast-2.8.0+/bin/update_blastdb.pl && chmod +x ncbi-blast-2.8.0+/bin/update_blastdb.pl
! apt-get install -qqy libjson-perl

Selecting previously unselected package libcommon-sense-perl.
(Reading database ... 26397 files and directories currently installed.)
Preparing to unpack .../libcommon-sense-perl_3.74-2build2_amd64.deb ...
Unpacking libcommon-sense-perl (3.74-2build2) ...
Selecting previously unselected package libjson-perl.
Preparing to unpack .../libjson-perl_2.97001-1_all.deb ...
Unpacking libjson-perl (2.97001-1) ...
Selecting previously unselected package libtypes-serialiser-perl.
Preparing to unpack .../libtypes-serialiser-perl_1.0-1_all.deb ...
Unpacking libtypes-serialiser-perl (1.0-1) ...
Selecting previously unselected package libjson-xs-perl.
Preparing to unpack .../libjson-xs-perl_3.040-1_amd64.deb ...
Unpacking libjson-xs-perl (3.040-1) ...
Setting up libcommon-sense-perl (3.74-2build2) ...
Setting up libtypes-serialiser-perl (1.0-1) ...
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...
Setting up libjson-perl (2.97001-1) ...
Setting up libjson-xs-perl (3.040-1) ...


## Add the BLAST+ applications to your PATH

In [0]:
import os
os.environ['PATH'] += ":/content/ncbi-blast-2.8.0+/bin/"


## Download a BLAST database from NCBI
Next we will download the (relatively small) swissprot database to the current working directory.

In [0]:
! update_blastdb.pl --blastdb_version 5 --decompress swissprot taxdb

Connected to NCBI
Downloading swissprot.tar.gz... [OK]
Downloading taxdb.tar.gz... [OK]
Decompressing swissprot.tar.gz ... [OK]
Decompressing taxdb.tar.gz ... [OK]


## Run BLAST
In this example we will filter the swissprot database to search mouse sequences (NCBI taxonomy ID 10090) only; our query sequence is identified by accession P38398 (BRCA1 susceptibility protein)

In [0]:
! echo P38398 | blastp -taxids 10090 -db swissprot -outfmt '7 std scomname' -out blast-results.tab

## Inspect the results
Let's take a look at the top hits from this BLAST search. The last column of BLAST report below shows that all hits meet the filtering criteria of restricting results to mouse sequences.

In [0]:
! head blast-results.tab

# BLASTP 2.8.0+
# Query: P38398.2 RecName: Full=Breast cancer type 1 susceptibility protein; AltName: Full=RING finger protein 53; AltName: Full=RING-type E3 ubiquitin transferase BRCA1
# Database: swissprot
# Fields: query acc.ver, subject acc.ver, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score, subject com names
# 141 hits found
P38398.2	P48754.3	55.985	1863	752	32	1	1858	1	1800	0.0	1845	house mouse
P38398.2	Q3UWZ0.1	30.986	71	49	0	14	84	6	76	2.88e-08	57.8	house mouse
P38398.2	Q6PCN7.1	38.235	68	36	1	14	75	738	805	5.59e-08	57.8	house mouse
P38398.2	Q61510.2	37.333	75	46	1	15	88	4	78	1.12e-07	56.2	house mouse
P38398.2	Q1XH17.1	31.081	74	49	1	18	89	8	81	1.84e-07	55.5	house mouse
