### MEDC0106: Bioinformatics in Applied Biomedical Science

<p align="center">
  <img src="../../resources/static/Banner.png" alt="MEDC0106 Banner" width="90%"/>
  <br>
</p>

---------------------------------------------------------------

# 08 - Introduction to Biopython - Proteins Exercises

*Written by:* Mateusz Kaczyński

**This notebook contains the exercises covering the basic protein analysis and search.**

## Contents
1. [Plotting relative mutability](#Plotting-relative-mutability)
2. [BLAST and analyse](#BLAST-and-analyse)
-----


**Remember to save your results!**

#### Imports

Some imports you may, or may not need to complete the tasks (run this before you attempt the exercises).

In [None]:
%matplotlib notebook
import matplotlib.pyplot as plt

from urllib.request import urlretrieve 

from Bio import SeqIO
from Bio.Seq import Seq

## Plotting the relative mutability

In this exercise, we will use the relative aminoacid mutability scale as outlined in *Dayhoff M.O., Schwartz R.M., Orcutt B.C. In "Atlas of Protein Sequence and Structure", Vol.5, Suppl.3 (1978).* Their work includes a table presenting experimentally - derived mutation probability relative to Alanine.

1. Obtain a FASTA file for any protein of interest (e.g. using [Uniprot](https://uniprot.org)). *You can provide the sequence by hand if you find downloading too slow.* 
2. Plot the relative (Ala=100) mutability of the protein regions. Use 15-item wide sliding window.


In [None]:
aminoacid_relative_mutability = {
    "A": 100, "C": 20,  "D": 106, "E": 102, "F": 41, 
    "G": 49,  "H": 66,  "I": 96,  "K": 56,  "L": 40,
    "M": 94,  "N": 134, "P": 56,  "Q": 102, "R": 65,
    "S": 120, "T": 97,  "V": 74,  "W": 18,  "Y": 41
}

In [None]:
# Write your solution here, adding more cells if necessary.

## BLAST and analyse

1. Run the provided protein sequence against NCBI non-redundant protein sequence database.
2. Download the first available protein sequence hits (don't worry if you don't know how to do it with python, copy-paste the sequences by hand).
3. Calculate the molecular weight of the two proteins. Which one is heavier?
4. Using `aminoacid_relative_mutability` dictionary from the previous exercise, which of the two is more prone to mutation? 

In [None]:
sequence = """
EVSIIQSMGYRNRAKRLLQSEPENPSLQETSLSVQLSNLGTVRTLRTKQRIQPQKTSVYI
ELGSDSSEDTVNKATYCSVGDQELLQITPQGTRDEISLDSAKKAACEFSETDVTNTEHHQ
PSNNDLNTTEKRAAERHPEKYQGSSVSNLHVEPCGTNTHASSLQHENSSLLLTKDRMNVE
KAEFCNKSKQPGLARSQHNRWAGSKETCNDRRTPSTEKKVDLNADPLCERKEWNKQKLPC
"""
# Write your solution here, adding more cells if necessary.