### MEDC0106: Bioinformatics in Applied Biomedical Science

<p align="center">
  <img src="../../resources/static/Banner.png" alt="MEDC0106 Banner" width="90%"/>
  <br>
</p>

---------------------------------------------------------------

# 10 - Introduction to Biopython - Sequences Exercises

*Written by:* Mateusz Kaczyński

**This notebook contains the exercises to cover the sequences and alignment session of Biopython workshop. They aim at providing you with more exposure to how these can be used in Bioinformatics work.**

## Contents
1. [Following Central Dogma](#Following-Central-Dogma)
2. [SARS-CoV-2 variants alignment](#SARS-CoV-2-variants-alignment)
-----


**Remember to save your results!**

#### Imports

Some imports you may, or may not need to complete the tasks (run this before you attempt the exercises).

In [18]:
from urllib.request import urlretrieve 

from Bio import SeqIO
from Bio import pairwise2
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio.Align import MultipleSeqAlignment, AlignInfo

## Following Central Dogma
For the provided sequence:
 1. Transcribe it to RNA 
 2. Translate the RNA to a protein sequence
 3. Replace G at position 12 with A
 4. Translate the new DNA sequence to a protein sequence 
 5. Comment on the results and possible caveats


<details>
    <summary><strong>Hint</strong></summary>
    <em>Start by converting the Seq object using Seq(sequence_string), follow the relevant code in the previous module.</em>
</details>

<details>
    <summary><strong>Another hint</strong></summary>
    <em>new_sequence = sequence_string[:11]+"A"+sequence_string[12:]</em>
</details>

In [19]:
sequence_string = '''\
ATGGCCCTGTGGATGCGCCTCCTGCCCCTGCTGGCGCTGCTGGCCCTCTGGGGACCTGACCCAG\
CCGCAGCCTTTGTGAACCAACACCTGTGCGGCTCACACCTGGTGGAAGCTCTCTACCTAGTGTG\
CGGGGAACGAGGCTTCTTCTACACACCCAAGACCCGCCGGGAGGCAGAGGACCTGCAGGTGGGG\
CAGGTGGAGCTGGGCGGGGGCCCTGGTGCAGGCAGCCTGCAGCCCTTGGCCCTGGAGGGGTCCC\
TGCAGAAGCGTGGCATTGTGGAACAATGCTGTACCAGCATCTGCTCCCTCTACCAGCTGGAGAA\
CTACTGCAAC\
'''
# Write your solution here, adding more cells if necessary.

## SARS-CoV-2-variants-alignment

Below are the sequence resources URLs related to S (Spike) protein of SARS-CoV2 virus.

#### Reference Sars-Cov2 sequence
Protein: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&id=YP_009724390.1&rettype=fasta

#### Alpha variant
Protein: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&id=QWE88920.1&rettype=fasta

#### Delta variant
Protein: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&id=QWK65230.1&rettype=fasta


Using the protein sequences:
 1. Download the data and save them as FASTA files. 
 2. Read them as Biopython Seq objects.
 3. How long are the sequences?
 4. Find and print the (global) pairwise alignments and the scores between the reference and one of the variants.
 5. Find the alignment between Alpha and Delta variants. Compare against the reference alignments. What does this tell us about those two lineages?
 6. *(Optional) Run multiple sequence alignment on the sequences, remember they need to be of the same length for MSA*.
 
 
<details>
    <summary><strong>Hint</strong></summary>
    <em>urlretrieve("https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&id=YP_009724390.1&rettype=fasta", "data/reference.fasta")</em>
</details>

<details>
    <summary><strong>Another hint</strong></summary>
    <em>reference = next(SeqIO.parse("data/reference.fasta", "fasta"))</em>
</details>



In [17]:
# Write your solution here, adding more cells if necessary.