# Sequence Homology

**Authors:** [Tony Kabilan Okeke](mailto:tko35@drexel.edu), [Ifeanyi Osuchukwu](mailto:imo27@drexel.edu)  
**Date:** 01.31.2022

In [15]:
%%sh
# Query BLAST
./blast_pdb_query.sh query_protein.faa ptn_query_report.xml

## Parse `BLAST` Query Results

In [31]:
# Import modules
from Bio.Blast import NCBIXML
import re

# Load Query Report
with open("ptn_query_report.xml", 'r') as file:
  # Retrieve alignment records
  records = next(NCBIXML.parse(file))

### Top Protein Hit

Print the name of the top protein hit.

In [26]:
# Print Title
#? What exactly counts as the 'name' - for now, print ID
print( "Top Hit: {}".format(re.findall(r'^\S+', records.descriptions[0].title)[0]) )

Top Hit: pdb|6XA4|A


### Unique Species Names of All Returned Hits

Print a unique list of species names of all the hits.

In [36]:
hit_species = []

for descr in records.descriptions:
  # Extract species from
  species = re.search(r'\ \[(.*?)\]\ ?', descr.title).group(1)
  # Append to list
  hit_species.append(species)

In [40]:
# Select unique sepecies
unique_species = list( set(hit_species) )
print("The following species were present in the hits:", *unique_species, sep='\n  ')

The following species were present in the hits:
  Human coronavirus HKU1 (isolate N1)
  Escherichia coli K-12
  Porcine transmissible gastroenteritis coronavirus strain Purdue
  Feline infectious peritonitis virus (strain 79-1146)
  Human coronavirus NL63
  Paenibacillus glycanilyticus
  Transmissible gastroenteritis virus
  SARS coronavirus BJ162
  unidentified
  Mus musculus
  Severe acute respiratory syndrome coronavirus
  Infectious bronchitis virus
  Murine hepatitis virus strain A59
  Porcine epidemic diarrhea virus CV777
  Tylonycteris bat coronavirus HKU4
  SARS coronavirus BJ01
  Porcine epidemic diarrhea virus
  Shewanella oneidensis MR-1
  SARS coronavirus Sino1-11
  Human coronavirus 229E
  Severe acute respiratory syndrome-related coronavirus
  Mycolicibacterium smegmatis
  Severe acute respiratory syndrome coronavirus 2
  Middle East respiratory syndrome-related coronavirus
  Feline infectious peritonitis virus


### Top Scoring Mouse Protein Alignment

Find the top scoring hit with a mouse protein. Print the sequence alignment of the query with this mouse protein.

In [None]:
# Retrieve hits that contain Mouse protein
hits = [i for (i,x) in enumerate(hit_species) if x == "Mus musculus"]

records.alignments[0].hsps[0].match