# Review lists and loops

In the previous notebook, we explored how to work with lists, use loops, and make decisions with if-statements. Let’s review those skills to the test by diving into a hands-on challenge! Below, you’ll find a code snippet designed to construct a random DNA sequence of 100 base pairs, starting with the initial codon ATG.

Your Tasks:
* Identify and correct any errors in the code
* Add comments to explain what is going on line-by-line

In [None]:
from numpy import random

my_sequence = 'ATG'
final_sequence_length = "one hundred"
nucleotides = [A, T, G, C]
nucleotides_probs = [1/2, 1/3, 1/4, 1/5]

while length_of_my_sequence < final_sequence_length:
next_nucleotide = random.choice(nucleotides, p = nucleotides_probabilities)
my_sequence = my_sequence + next_nucleotide

if the_last_three_nucleotides_of_my_sequence == 'TAA'
print("My DNA sequence ends with a stop codon")
elif the_last_three_nucleotides_of_my_sequence == 'TAG'
print("My DNA sequence ends with a stop codon")
elif  the_last_three_nucleotides_of_my_sequence == 'TGA'
print("My DNA sequence ends with a stop codon")
otherwise
print("My DNA sequence does NOT end with a stop codon")

print(f'>random_sequence (length: {len(my_sequence)})\n{my_sequence}')

****

# Dictionaries

One way to store a collection of items in a variable was to use a `list`. Now we are going to learn a new data type which is very important for data structures in Python, called `dictionary (dict)`!

A dictionary in Python is a collection of key-value pairs. Each key is connected to a value, and you can use the key to access the value associated with it. Dictionaries are incredibly useful for storing data that can be easily retrieved by a unique identifier, much like looking up a word in a dictionary to find its definition.

Dictionaries are enclosed in curly braces `{}` with each item being a pair in the form `key: value`. Here's how we can create and use dictionaries:

In [None]:
my_dictionary = {}
print(type(my_dictionary))

Dictionaries have some properties in common with lists and strings with some key differences:

* iterable
* **un**ordered
* indexed (by keys)

Suppose we want to make a dicionary that helps you retrieve the number of mice per group 

|Group|Number of Mice|Average Mass(g)|Group Id|
|-----|--------------|---------------|--------|
|alpha|3|17.0|CGJ28371|
|beta|5|16.4|SJW99399|
|gamma|6|17.8|PWS29382|


In [None]:
group_to_num_mouse = {'alpha': 3,
                      'beta': 5}

print(group_to_num_mouse)

You can also add a new key-value pair to an existing dictionary

In [None]:
group_to_num_mouse['gamma'] = 6
print(group_to_num_mouse)

One important property of a dictionary is that you can call entries explicitly (rather than referencing indicies like 0, 1, or 2). Here is the general structure of a dictionary object: `dictionary = {key: value}`.

You can call a specific value stored in a dictionary by giving its key:

In [None]:
group_to_num_mouse['beta']

You can also see a list of keys in a dictionary by:

In [None]:
group_to_num_mouse.keys()

You can also check all the values in a dictionary by:

In [None]:
group_to_num_mouse.values()

**Challenge**: Create a dicionary where keys are `Group Id` and values are `Average Mass`

In [None]:
### Write your code here ###




****

## Translating RNA to Protein

In translation, the sequence of nucleotides in messenger RNA (mRNA) is translated into a sequence of amino acids, which come together to form a protein. During this process, cells read the mRNA nucleotides in groups of three, known as codons. The specific associations between these codons and their corresponding amino acids are defined by the genetic code, as outlined in the table below:

![from: http://scienceblogs.com/digitalbio/wp-content/blogs.dir/460/files/2012/04/i-39185d84268023fb77b43bbf9dba06c7-standard%20genetic%20code.png](img/rna_protein_code.png)

We can store the genetic code in a dictionary!

In [None]:
# Dictionary of {codon: AA (Amino Acid)}
codon_to_AA = {
    'AUA':'I', 'AUC':'I', 'AUU':'I', 'AUG':'M',
    'ACA':'T', 'ACC':'T', 'ACG':'T', 'ACU':'T',
    'AAC':'N', 'AAU':'N', 'AAA':'K', 'AAG':'K',
    'AGC':'S', 'AGU':'S', 'AGA':'R', 'AGG':'R',
    'CUA':'L', 'CUC':'L', 'CUG':'L', 'CUU':'L',
    'CCA':'P', 'CCC':'P', 'CCG':'P', 'CCU':'P',
    'CAC':'H', 'CAU':'H', 'CAA':'Q', 'CAG':'Q',
    'CGA':'R', 'CGC':'R', 'CGG':'R', 'CGU':'R',
    'GUA':'V', 'GUC':'V', 'GUG':'V', 'GUU':'V',
    'GCA':'A', 'GCC':'A', 'GCG':'A', 'GCU':'A',
    'GAC':'D', 'GAU':'D', 'GAA':'E', 'GAG':'E',
    'GGA':'G', 'GGC':'G', 'GGG':'G', 'GGU':'G',
    'UCA':'S', 'UCC':'S', 'UCG':'S', 'UCU':'S',
    'UUC':'F', 'UUU':'F', 'UUA':'L', 'UUG':'L',
    'UAC':'Y', 'UAU':'Y', 'UAA':'_', 'UAG':'_',
    'UGC':'C', 'UGU':'C', 'UGA':'_', 'UGG':'W'
}

In [None]:
### Change codon and see it give you a corresponding AA ###
codon = 'AUA'
AA = codon_to_AA[codon]
print(f'{codon} encodes {AA}')

**Challenge**: Using the dictionary and what we have learned so far, translate the following RNA string to a protein sequence!

<details>
  <summary>Hint 1</summary>
  
   *hint 1*: Each codon consists of three nucleotides and we can get the start index of each codon by `range(0, len(rna_sequence), 3)`
</details>


<details>
  <summary>Hint 2</summary>
  
   *hint 2*: `codon = rna_sequence[i:i+3]` and we can get the corresponding amino acid using the dictionary
 
</details>

<details>
  <summary>Hint 3</summary>  
    
   *hint 3*: Each time you get an amino acid, make sure to add it to your protein sequence by `protein_sequence += `
</details>

In [None]:
rna = 'AUGCAUGCGAAUGCAGCGGCUAGCAGACUGACUGUUAUGCUGGGAUCGUGCCGCUAG'
protein_sequence = ''

### Write your code here ###




print(protein_sequence)

When using loops in programming, you might occasionally want to exit the loop before it has completed all its iterations, especially if you have already achieved your desired result. For example, in the translation of an RNA sequence, the process should stop as soon as a stop codon is encountered. To get out of a loop, you can use the `break` statement like below:

In [None]:
# The first line says looping from 0 to 9 but...
for i in range(10):
    print(i)
    if i == 3:
        print("Breaking the loop!!")
        break

Adjust your previous code so that the translation stops immediately when a stop codon appears. After modifying the code, run it with a new sequence that includes a stop codon in the middle, and print out the protein sequence up to that point

In [None]:
rna = 'AUGCAAGACAGGGAUCUAUUUACGAUCAGGCAUCGAUCGAUCGAUGCUAGCUAGCGGGAUCGCACGAUACUAGCCCGAUGCUAGCUUUUAUGCUCGUAGCUGCCCGUACGUUAUUUAGCCUGCUGUGCGAAUGCAGCGGCUAGCAGACUGACUGUUAUGCUGGGAUCGUGCCGCUAG'
protein_sequence = ''

### Write your code here ###




print(protein_sequence)

**Bonus Challenge**: Can you translate the given sequence in all three reading frames? Do you see each reading frame generates a different protein sequence, each of which has a start codon and a stop codon?

In [None]:
rna = 'AUGCAAGACAGGGAUCUAUUUACGAUCAGGCAUCGAUCGAUCGAUGCUAGCUAGCGGGAUCGCACGAUACUAGCCCGAUGCUAGCUUUUAUGCUCGUAGCUGCCCGUACGUUAUUUAGCCUGCUGUGCGAAUGCAGCGGCUAGCAGACUGACUGUUAUGCUGGGAUCGUGCCGCUAG'
protein_sequence = ''

### Write your code here ###




print(protein_sequence)

****

# FUNctions

Now that we've learned how to perform several tasks together, let's streamline our work by wrapping them into functions. You've already seen and used various Python functions, like `print()`, `len()`, and `randint()`, which perform specific tasks. Just like these, we can create our own custom functions to make our code more organized, reusable, and efficient.

Here is how you can define a simple function:

In [None]:
def greet():
    print("Hello there")

This function, named `greet`, simply prints "Hello there!" when it is called; run the following cell and make sure you see the output:

In [None]:
greet()

Defining a function in Python follows a specific syntax:

```
def function_name():
    # code to execute
```

* `def`: This special keyword indicates that you are defining a new Python function
* `function_name`: An arbitrary name for your function, followed by parentheses `()` and colon `:`
* indent: The code inside the function must be indented (four spaces!). Proper indentation is crucial as it defines the scope of the function's code block.

After you define a function, you can run the function by `function_name()`

Fix the following code block and call the function twice:

In [None]:
### Fix the code below ###

greet_evening()

def greet_evening():
    print("Good evening!")


## Functions with parameters

Functions can accept one or more **parameters** (also known as **arguments**), which would allow them to operate on different data inputs and provide more flexibility! Here's how you can define a function with multiple parameters:

```
def function_name(parameter1, parameter2, ..., parameterN):
    # code to execute
```

Just like function names, parameters can be named anything, but it's important to choose names that are meaningful and descriptive just as any other variable names:

In [None]:
def greet_name(name):
    print(f"Hello there, {name}!")

greet_name('Alice')

Change the `name` parameter value and make sure the function performs the same operation but with different names

In [None]:
### Change the name value ###

my_name = ''
greet_name(my_name)  # Here you are passing a value stored in my_name

Sometimes you might not want to provide a value for every parameter every time you call a function. In Python, you can make function parameters optional by setting a default value when defining them. Common default values include an empty string `''`, the keyword `None`, which represents a null value, or any value you wish:

In [None]:
def greet_time(name, time='Morning'):
    if time == 'Morning':
        print(f'Good morning, {name}!')
    elif time == 'Evening': 
        print(f'Good evening, {name}!')
    else:
        print(f'Hi, {name}!')

greet_time('Alice')
greet_time('Bob', 'Evening')
greet_time('Mike', 'Afternoon')

## Returning values from Functions

Functions can also return values back to us using the `return` keyword. This is useful when you want to continue using the result of the function in your code:

In [None]:
def dna_to_rna(dna):
    rna = dna.replace('t', 'u')
    return rna

my_dna = 'agcttttacgtcgatcctgcta'
my_rna = dna_to_rna(my_dna)
print(my_rna)

In this example, we define a function called `dna_to_rna` that converts a DNA sequence into an RNA sequence. This function takes a single parameter, `dna`, and replaces all instances of `t` with `u`, effectively transcribing DNA to RNA. After defining the function, we use it to convert the DNA sequence stored in `my_dna` to RNA, storing the result in `my_rna`. 

**Challenge**: Write functions to do the following:

1. Calculates the GC content of a given DNA string
<details>
  <summary>Hint</summary>
  
   `string.count(value)` gives you the number of times `value` appears in the string
</details>

In [None]:
### Write your code here ###
def calculate_GC(dna):





2. Generates a random string of DNA of a given length where each nucleotide has the same likelihood
<details>
  <summary>Hint</summary>
  
   `random.choice(list_of_outcomes, p = likelihood_of_outcomes)`
</details>

In [None]:
### Write your code here ###
def generate_DNA():  # any parameter?






If you have successfully defined each function, you should be able to run the following code and make sure your GC content is around 50%:

****

In [None]:
DNA_generated = generate_DNA(100)
GC_content = calculate_GC(DNA_generated)
print(DNA_generated)
print(f'GC content of my randomly generated DNA sequence is: {GC_content*100:.2f}')

## Local Variables vs. Global Variables

In programming, it's important to understand where variables can be used once they are defined. Let's explore the difference between **local** and **global** variables.

Variables created inside a function are called **local variables**. This means they can only be accessed from within the function where they were defined. Here’s an example to illustrate this:

In [None]:
def print_local_variable():
    my_local_variable = "I am a local variable"
    print(my_local_variable)

print_local_variable()

In the above example, `my_local_variable` is a local variable to the function `print_local_variable()`. It gets created when the function is called and then it can be used within this function. What do you think will happen if we try to access a local variable outside of its function? Let's see what happens by running the next cell:

In [None]:
print(my_local_variable)

This will result in an error message because `my_local_variable` does not exist outside the function `print_local_variable()`.

On the other hand, **global variables** are defined outside of functions and can be accessed from any part of the code. Here is how you can use a global variable:

In [None]:
my_global_variable = "You can call me from anywhere"

def print_global_variable():
    print(my_global_variable)

print_global_variable()

****

# Last Challenge!!