### MEDC0106: Bioinformatics in Applied Biomedical Science

<p align="center">
  <img src="../../resources/static/Banner.png" alt="MEDC0106 Banner" width="90%"/>
  <br>
</p>

---------------------------------------------------------------

# 04 - Exercises (Session 1)

*Written by:* Oliver Scott

**This notebook contains exercises based around the material provided in Session 1 of the Python workshop**.

Try to complete the exercises provided before the next session, feel free to refer back to the content in the previous notebooks to help you complete the tasks.

You should work through the tasks consecutively.

Save your results so we can go through the answers in the next session.

----

## Contents

1. [Task 1](#Task-1) - Basic statistics
2. [Task 2](#Task-2) - Base counting
3. [Task 3](#Task-3) - Sequence translation
4. [Task 4](#Task-4) - Sequence mutation
5. [Task 5](#Task-5) - 'Rock, paper, scissors' game

----

## Task 1

#### Basic statistics

A scientist in your lab has conducted a survey asking members of the public if they think the research they are doing will benefit them. They have asked you to calculate some basic statistics for them:

- the total number of participants,
- the mean age of the participants,
- the proportion of male ('M') to female ('F') participants,
- and the percentage of 'yes' ('Y') answers 

In [None]:
# Data from the survey

age = [45, 35, 37, 49, 31, 45, 43, 30, 48, 36, 38, 44, 39, 36, 40, 42, 38, 30, 31, 39]
gender = ['F', 'F', 'F', 'M', 'F', 'F', 'F', 'M', 'F', 'M', 'F', 'F', 'F', 'M', 'F', 'F', 'F', 'M', 'F', 'F']
answer = ['Y', 'N', 'N', 'Y', 'Y', 'Y', 'N', 'N', 'N', 'N', 'Y', 'N', 'N', 'N', 'Y', 'N', 'Y', 'N', 'N', 'Y']

# Calculate the required statistics

## Task 2

#### Base counting

In this task, we’ve provided a probe nucleotide sequence and some incomplete code. We would like you to complete the function `count_atcg` so that it returns the count of each nucleotide (A, T, C, G) in the sequence as a dictionary. The first line has already been written for you.

<details>
<summary>Click here for a hint!</summary>
<em>Strings can be iterated just like lists with a simple for loop!</em>
</details>

In [None]:
# Complete this function to complete the task!

def count_atcg(sequence):
    counts = {'A': 0, 'T': 0, 'C': 0, 'G': 0}
    # Complete the function to return the counts
    return counts

# Ignore the backslashes '\', they wont appear when iterating the string
probe_sequence = """\
ATGGCCCTGTGGATGCGCCTCCTGCCCCTGCTGGCGCTGCTGGCCCTCTGGGGACCTGACCCAG\
CCGCAGCCTTTGTGAACCAACACCTGTGCGGCTCACACCTGGTGGAAGCTCTCTACCTAGTGTG\
CGGGGAACGAGGCTTCTTCTACACACCCAAGACCCGCCGGGAGGCAGAGGACCTGCAGGTGGGG\
CAGGTGGAGCTGGGCGGGGGCCCTGGTGCAGGCAGCCTGCAGCCCTTGGCCCTGGAGGGGTCCC\
TGCAGAAGCGTGGCATTGTGGAACAATGCTGTACCAGCATCTGCTCCCTCTACCAGCTGGAGAA\
CTACTGCAAC"""

atcg_content = count_atcg(probe_sequence)

print('Counts for probe sequence:')
print('A:', atcg_content['A'])
print('T:', atcg_content['T'])
print('C:', atcg_content['C'])
print('G:', atcg_content['G'])

## Task 3

#### Sequence translation

In this task we would like you to translate the above `probe_sequence` (Task 2) into an amino acid sequence using the provided [codon table](https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?chapter=tgencodes#SG1) in the form of a Python dictionary. Complete the `translate` function below to return the fully translated sequence.

The [range function](https://www.w3schools.com/python/ref_func_range.asp) `range(first, last, step)` might come in handy!  

After you have translated the sequence, use web tools to identify the protein.

##### NCBI Codon Table:
```
TTT F Phe      TCT S Ser      TAT Y Tyr      TGT C Cys  
TTC F Phe      TCC S Ser      TAC Y Tyr      TGC C Cys  
TTA L Leu      TCA S Ser      TAA * Ter      TGA * Ter  
TTG L Leu i    TCG S Ser      TAG * Ter      TGG W Trp  

CTT L Leu      CCT P Pro      CAT H His      CGT R Arg  
CTC L Leu      CCC P Pro      CAC H His      CGC R Arg  
CTA L Leu      CCA P Pro      CAA Q Gln      CGA R Arg  
CTG L Leu i    CCG P Pro      CAG Q Gln      CGG R Arg  

ATT I Ile      ACT T Thr      AAT N Asn      AGT S Ser  
ATC I Ile      ACC T Thr      AAC N Asn      AGC S Ser  
ATA I Ile      ACA T Thr      AAA K Lys      AGA R Arg  
ATG M Met i    ACG T Thr      AAG K Lys      AGG R Arg  

GTT V Val      GCT A Ala      GAT D Asp      GGT G Gly  
GTC V Val      GCC A Ala      GAC D Asp      GGC G Gly  
GTA V Val      GCA A Ala      GAA E Glu      GGA G Gly  
GTG V Val      GCG A Ala      GAG E Glu      GGG G Gly
```
***Note:***  '*  Ter' indicates a termination codon.

- Try not to use the hints unless you really need to
- Remember that you can use assignment operators (`+=`) with strings

<details>
<summary>Click here for a hint</summary>
<em>The range(first, last, step) function can be used with a for loop to iterate over a range of values with an optional step.</em>
</details>

<details>
<summary>Click here for another hint</summary>
<em>Strings can be sliced like lists, e.g. sequence[0:3].</em>
</details>

<details>
<summary>Click here for another hint</summary>
<em>"for ix in range(0, len(sequence), 3)"</em>
</details>

<details>
<summary>Click here for a final hint</summary>
<em>"sequence[ix : ix + 3]"</em>
</details>

In [None]:
ncbi_codon_table = {
    'TTT': 'F', 'TCT': 'S', 'TAT': 'Y', 'TGT': 'C',
    'TTC': 'F', 'TCC': 'S', 'TAC': 'Y', 'TGC': 'C',
    'TTA': 'L', 'TCA': 'S', 'TAA': '*', 'TGA': '*',
    'TTG': 'L', 'TCG': 'S', 'TAG': '*', 'TGG': 'W',
    'CTT': 'L', 'CCT': 'P', 'CAT': 'H', 'CGT': 'R',
    'CTC': 'L', 'CCC': 'P', 'CAC': 'H', 'CGC': 'R',
    'CTA': 'L', 'CCA': 'P', 'CAA': 'Q', 'CGA': 'R',
    'CTG': 'L', 'CCG': 'P', 'CAG': 'Q', 'CGG': 'R',
    'ATT': 'I', 'ACT': 'T', 'AAT': 'N', 'AGT': 'S',
    'ATC': 'I', 'ACC': 'T', 'AAC': 'N', 'AGC': 'S',
    'ATA': 'I', 'ACA': 'T', 'AAA': 'K', 'AGA': 'R',
    'ATG': 'M', 'ACG': 'T', 'AAG': 'K', 'AGG': 'R',
    'GTT': 'V', 'GCT': 'A', 'GAT': 'D', 'GGT': 'G',
    'GTC': 'V', 'GCC': 'A', 'GAC': 'D', 'GGC': 'G',
    'GTA': 'V', 'GCA': 'A', 'GAA': 'E', 'GGA': 'G',
    'GTG': 'V', 'GCG': 'A', 'GAG': 'E', 'GGG': 'G'
}

# Complete this function to complete the task!
def translate(sequence):
    translated = ""
    # Complete the function to return the translation
    return translated

aa_sequence = translate(probe_sequence)
print('Translated:', aa_sequence)

## Task 4

#### Sequence mutation

Why would a mutation from 'G' to 'A' at position 12 in the probe sequence from Task 2 likely cause disease? You should be able to create some code to find out.

- Complete the mutation function to mutate the nucleotide at position 12, returning a new sequence.
- Use the translate function you defined in Task 3 to translate the mutated sequence.
- Work out why this mutation may cause disease.

***Note:*** *To clarify, position 12 means the 12th position in the sequence. This is not the same as the 12th index! Remember that indexing in Python starts at 0!*

<details>
<summary>Click here for a hint</summary>
<em>Strings are immutable/unchangeable so you will need to create a new string</em>
</details>

<details>
<summary>Click here for another hint</summary>
<em>Strings are sliceable, e.g. sequence[:i-1]</em>
</details>

<details>
<summary>Click here for a final hint</summary>
<em>Strings can be concatanated with '+', e.g. new_string = sequence[:i-] + mutation</em>
</details>

In [None]:
def mutate_sequence(sequence, position, mutation):
    # Complete this function to mutate the sequence
    return 

mutated_sequence = mutate_sequence(probe_sequence, 12, 'A')

# Now use your translate function to see why this mutation may cause disease!

## Task 5

#### 'Rock, paper, scissors' game

In this task we would like you to complete the 'rock, paper, scissors' game defined in the first cell below. You should complete the `determine_winner` function, printing the winner of the game ('Player Wins!', 'Computer Wins!' or 'Draw!').

If you are not familiar with the rules of the game:

- Rock beats Scissors,
- Scissors beats Paper,
- Paper beats Rock,
- and same choice results in a draw.

When the second cell is run it will call the `play_game` function defined in the first cell which will call the completed `determine_winner` function to print the results to the screen. When you run the code you will need to type in your choice in the input box that appears!

You do **not** need to edit any code in the second cell below.

In [None]:
import random

choice_list = ['Rock', 'Paper', 'Scissors']

# Function to get a random choice from the computer
def get_cpu_choice():
    choice = random.choice(choice_list)  # Randomly select 'Rock', 'Paper', or 'Scissors'
    return choice

# Function to get the player's choice
def get_player_choice():
    while True:
        # Prompt player to enter their choice, removing any leading/trailing whitespace
        player_choice = input('Choose Rock, Paper or Scissors:').strip()
        # Check if the choice is valid
        if player_choice not in choice_list:
            print(player_choice, 'is not a valid choice, pick from:', choice_list)
        else:
            print('\nYou chose:', player_choice)
            break  # Exit the loop once a valid choice is made
    return player_choice

def determine_winner(player_choice, computer_choice):
    # Complete the function to print the winner to the screen and determine whether player or CPU wins the game
    print()

# Main function to play the game
def play_game():
    print('Rock, Paper, Scissors Game!')
    print('-' * 27, '\n')
    # Get choices from both player and computer
    user_choice = get_player_choice()
    cpu_choice = get_cpu_choice()
    print('Computer chose:', cpu_choice, '\n')  # Display the computer's choice
    determine_winner(user_choice, cpu_choice)  # Determine and display the winner

In [None]:
play_game()

----

Remember to save your answers! 