Tutorial from tutorialspoint.com/biopython/biopython_overview_of_blast.htm

BLAST stands for Basic Local Alignment Search Tool. It finds regions of similarity between biological sequences. Biopython provides Bio.Blast module to deal with NCBI BLAST operation. You can run BLAST in either local connection or over Internet connection.

# Running over Internet
Biopython provides Bio.Blast.NCBIWWW module to call the online version of BLAST. To do this, we need to import the following module

In [16]:
from Bio.Blast import NCBIWWW

NCBIWW module provides qblast function to query the BLAST online version, https://blast.ncbi.nlm.nih.gov/Blast.cgi. qblast supports all the parameters supported by the online version.

To obtain any help about this module, use the below command and understand the features

In [10]:
help(NCBIWWW.qblast)

Help on function qblast in module Bio.Blast.NCBIWWW:

qblast(program, database, sequence, url_base='https://blast.ncbi.nlm.nih.gov/Blast.cgi', auto_format=None, composition_based_statistics=None, db_genetic_code=None, endpoints=None, entrez_query='(none)', expect=10.0, filter=None, gapcosts=None, genetic_code=None, hitlist_size=50, i_thresh=None, layout=None, lcase_mask=None, matrix_name=None, nucl_penalty=None, nucl_reward=None, other_advanced=None, perc_ident=None, phi_pattern=None, query_file=None, query_believe_defline=None, query_from=None, query_to=None, searchsp_eff=None, service=None, threshold=None, ungapped_alignment=None, word_size=None, short_query=None, alignments=500, alignment_view=None, descriptions=500, entrez_links_new_window=None, expect_low=None, expect_high=None, format_entrez_query=None, format_object=None, format_type='XML', ncbi_gi=None, results_file=None, show_overview=None, megablast=None, template_type=None, template_length=None)
    BLAST search using NCBI's

## Simple example

Open the sequence file, blast_example.fasta using python IO module.

In [12]:
sequence_data = open("blast_example.fasta").read() 
sequence_data

'Example of a single sequence in FASTA/Pearson format: \n>sequence A ggtaagtcctctagtacaaacacccccaatattgtgatataattaaaattatattcatat\ntctgttgccagaaaaaacacttttaggctatattagagccatcttctttgaagcgttgtc \n\n>sequence B ggtaagtcctctagtacaaacacccccaatattgtgatataattaaaattatattca\ntattctgttgccagaaaaaacacttttaggctatattagagccatcttctttgaagcgttgtc'

Now, call the qblast function passing sequence data as main parameter. The other parameter represents the database (nt) and the internal program (blastn).

In [13]:
result_handle = NCBIWWW.qblast("blastn", "refseq_rna", sequence_data) 
result_handle

<_io.StringIO at 0x1d86d4a6678>

In [21]:
result_handle.seek(0, 0)

0

In [22]:
with open('results.xml', 'w') as save_file: 
    blast_results = result_handle.read() 
    save_file.write(blast_results)

### Parsing Result

In [18]:
from Bio.Blast import NCBIXML

In [23]:
E_VALUE_THRESH = 1e-20 
for record in NCBIXML.parse(open("results.xml")): 
    if record.alignments: 
        print("\n") 
        print("query: %s" % record.query[:100]) 
        for align in record.alignments: 
            for hsp in align.hsps: 
                if hsp.expect < E_VALUE_THRESH: 
                    print("match: %s " % align.title[:100])



query: sequence A ggtaagtcctctagtacaaacacccccaatattgtgatataattaaaattatattcatat


query: sequence B ggtaagtcctctagtacaaacacccccaatattgtgatataattaaaattatattca
match: gi|1853088208|gb|CP054431.1| Chlamydia trachomatis strain CH2_mutant_L2/434/Bu(i) plasmid unnamed 
match: gi|1853087206|gb|CP054433.1| Chlamydia trachomatis strain CH1_mutant_L2/434/Bu(i) plasmid unnamed 
match: gi|1853086301|gb|CP054429.1| Chlamydia trachomatis strain CH3_mutant_L2/434/Bu(i) plasmid unnamed 
match: gi|1853085398|gb|CP054427.1| Chlamydia trachomatis strain CH5_mutant_L2/434/Bu(i) plasmid unnamed 
match: gi|1834305694|gb|MT241513.1| Cloning vector pREF100, complete sequence 
match: gi|575868421|gb|KF790910.1| Cloning vector pBOMB4-Tet-mCherry, complete sequence 
match: gi|575868419|gb|KF790909.1| Cloning vector pBOMB4R-MCI, complete sequence 
match: gi|575868417|gb|KF790908.1| Cloning vector pBOMB4R, complete sequence 
match: gi|575868414|gb|KF790907.1| Cloning vector pBOMB4-MCI, complete sequence 
match: g