# BLAST Analysis

### Overview

This script performs offline BLASTX analysis of multiple sequences against a local SwissProt database. The results are parsed and stored in a Pandas DataFrame for further analysis.

### Key Steps

1. Offline BLASTX Execution

Input: cleaned_sequences.fasta

Database: Local SwissProt DB

Output: results_multi.xml in XML format



2. Parsing BLAST Results

Using Bio.Blast.NCBIXML to parse the XML output

Extracting for each query:

Hit sequence

Alignment length

E-value

Query & subject alignment (first 50 bases)


Storing parsed data into a Pandas DataFrame for analysis



3. Results

DataFrame contains one row per alignment HSP

Queries with no hits have None values

Ready for downstream analysis and plotting

In [24]:
# Blast offline multiple sequences

import pandas as pd
from Bio.Blast import NCBIXML


result_for_pd= []
with open("C:/Users/User/Documents/blast_results/results_multi.xml") as blast_use:
    blast_records = list(NCBIXML.parse(blast_use))

print(f"Number of queries: {len(blast_records)}")

for i, record in enumerate(blast_records, start= 1):
    query_id= record.query
    
    if not record.alignments:
        result_for_pd.append({
            "Query": query_id,
            "Hit": None,
            "Length": None,
            "E-value": None,
            "Query alignment": None,
            "Subject alignment": None
        })
    else:
        for alignment in record.alignments:
            for hsp in alignment.hsps:
                result_for_pd.append({
                    
                    "Query": query_id,
                    "Hit": alignment.title,
                    "Length": alignment.length,
                    "E-value": hsp.expect,
                    "Query alignment": hsp.query[:50] + "...",
                    "Subject alignment": hsp.sbjct[:50] + "...\n"
        })

Number of queries: 3


In [25]:
df= pd.DataFrame(result_for_pd)
print(df)

            Query   Hit Length E-value Query alignment Subject alignment
0  Human_sequence  None   None    None            None              None
1  Mouse_sequence  None   None    None            None              None
2  Plant_sequence  None   None    None            None              None
