# Introduction to Python
MiCM Workshop - February 11, 2025

Benjamin Z. Rudski, PhD Candidate, Quantitative Life Sciences, McGill University

Dear `Reader | Workshop Attendee`,  
Welcome! In this interactive Jupyter Notebook, I will introduce you to the [Python](https://www.python.org) programming language. In this workshop, we'll journey from the beginnings of storing data in variables and doing simple calculations to writing powerful functions to perform repeatable operations.

This notebook is the **solution version**, which contains the answers. There is also [**student version**](../scripts/IntroToPythonBZR.ipynb) in the `scripts` folder. The **student version** contains several blanks where I will write code during the live workshop and where you can fill out exercises. I recommend trying the exercises yourself before looking at the solutions, There is often more than one way to answer a programming question, so you should focus more on understanding the code that you are writing, instead of just copying my answers. You may come up with an answer better than the one I've provided!

Here's the outline of this workshop:

1. Module 1 – Python Basics (1 hour 15 minutes)
    1. Foundations of Python - A Brief Overview of Types and Variables
        1. Primitive Data Types (int, float, bool, string)
        2. Variables
        3. Collection Data Types (tuples, lists, dictionaries)
        4. Introduction to Functions (Function as a Machine)
    2. Numbers and Comparisons
        1. Mathematical Operations
        2. Booleans
    3. Intro to Control Flow and Loops (if, while and for)
        1. Control Flow: the if Statement
        2. while Loops
        3. Iteration with for Loops
    4. Exercise: Numbers and Loops for Unit Conversion
2. Module 2 – Strings and Collections: An Object Primer (1 hour)
    1. Introducing Objects
        1. What is an Object?
    2. Introducing the String!
        1. String Slicing
        2. String Methods (concatenation and string formatting, converting strings to numbers, find and replace)
    3. Introduction to Tuples, Lists and Dictionaries
        1. Tuples and Tuple Unpacking
        2. Lists and List Methods (adding, removing, slicing)
        3. Dictionaries (Key-Value storage, accessing, adding, removing)
    3. Exercise: Working with Strings and Collections for DNA and Protein Processing
3. Module 3 – Introduction to Functions (45 minutes)
    1. Function Overview
        1. What is a function?
    2. Writing Custom Functions
        1. Basic function definitions
        2. Passing inputs: Defining parameters
        3. Producing outputs: Return values
    3. Documenting Functions
        1. Defining function docstrings
        2. How to get help from your IDE: Type annotations (optional)
    4. Exercise: Writing functions for biological sequences.
4. Module 4 – Where to go from here (10 minutes)
    1. What to learn next? How?
    2. How to get help and how not to get help
        1. Your code editor
        2. Documentation
        3. Books
        4. Tutorials
        5. Stack Overflow (and pitfalls)
        6. ChatGPT (and pitfalls)
    3. Glimpse of other cool programming topics


When this workshop is over, you should be able to write simple Python scripts. More importantly, I am hoping to give you *the tools* so that you can learn new Python skills and read documentation to find what you need. **In my opinion, the most important part of programming is knowing how to get help when you need it.**

# Module 1 - Python Basics

In this section, we'll see the basic, foundational concepts of programming in Python. We'll start with the basics of mathematical operations and we'll see variables for storing data. Then, we'll also start seeing how to get things done in Python. Along the way, I'll also point out possible places where users of different programming languages need to pay special attention.

**Topics:**

1.	Foundations of Python - A Brief Overview of Types and Variables
    1.	Primitive Data Types (int, float, bool, string)
    2.	Variables
    3.	Collection Data Types (tuples, lists, dictionaries)
    4.	Introduction to Functions (Function as a Machine)
2.	Numbers and Comparisons
    1.	Mathematical Operations
    2.	Booleans
3.	Intro to Control Flow and Loops (if, while and for)
    1.	Control Flow: the if Statement
    2.	while Loops
    3.	Iteration with for Loops
4.	*Exercise*

## Foundations of Python

This section explores the most basic ways that we store small amounts of data. We'll see what types of data we can store and how we can combine pieces of data.

But first! It's conventional that the first program we write in a new language is the "Hello, World!" program. This is a simple program that writes the text "Hello, World!" to the screen. In Python, it's quite easy to do:

In [1]:
# Your exciting first line of Python code here!
# In this line of code, we'll display, or print, the text string "Hello, World!"

print("Hello, World!")

Hello, World!


This very simple program introduces a few important ideas. You'll notice that the first line doesn't really look like code. Actually, it's not! The first line is a **comment**. We (and the computer) know this because the line starts with the symbol `#`. Python ignores that symbol and everything that comes after it, letting you write notes about what your code is doing. It's very important to put comments in your code, especially if you're going to need to come back to it after a few weeks or if you're going to share it with other people.

On the second line, we have two things:
* the `print` *function*
* the *string* "Hello, World!"

The `print` function displays output to the screen or to the console. While it's not necessary in this Jupyter notebook (which automatically outputs the result of the last line of code), it's very helpful if you're ever writing code in a different program, like *PyCharm* or *Spyder*. We'll discuss functions in more detail later, but the idea is that functions take inputs, known as **arguments**, do operations on them and optionally return some sort of modified result. Here, the `print` function takes in the **string** of text "Hello, World!", writes it onto the screen and doesn't return any new data.

The **argument** that we pass to this function is the text **string** `"Hello, World!"`. We'll discuss strings in more detail later. The important thing is that a string is a group of characters surrounded by quotation marks (either single quotes `'Hello'` or double quotes `"Hello"`).

Don't worry if this doesn't make sense! We'll explain each part of it as we go along. By the end of this workshop, you'll understand what this line does!

Now that we've passed this important milestone, let's dive into basic Python!

### Primitive Data Types

Working with the computer is all about processing information. Yay! That's great! Except... what does this information look like? In Python, information can be stored in many different **types**. These types represent numbers, text, and more! Let's start by discussing the **building blocks of everything**, known as the **primitive types**.

In Python, there are four different **primitive types** (see [here](https://realpython.com/python-data-types/)):

1. Integers - `int`
2. Decimal numbers - `float`
3. True or false Boolean values - `bool`
4. Text strings - `str`

Let's take a look at each of these in a bit more detail. We can get the type of a value using the `type` function.

#### Integers

No big surprise, integers are whole numbers without a decimal. We write them... well, as a whole number without a decimal. For example, to represent the number 4 in Python, we'd write:


In [2]:
# Your code here to represent the number 4
4

4

There's an extra trick if you're working with really big numbers. To avoid confusion, you can add underscores to make the number easier to read:

In [3]:
# Your code here for a really big number
5_192_435_235

5192435235

We'll see what we can do with these numbers soon.

#### Decimal Numbers

In programming, a decimal number is known as a **floating-point number** or a **`float`**. These numbers are, unsurprisingly, written as numbers with decimals:

In [4]:
# Your code here to write 4 as a float
4.0

4.0

Another way of writing floating-point numbers is to use **scientific notation**. Let's see an example:

In [5]:
# Your code here to write 5 * 10^3 in scientific notation
5e3

5000.0

Note that even if we don't have a decimal here, our value is a floating point number. We can check this using the `type` function:

In [6]:
# Your code here to check the type of what we wrote before.
type(5e3)

float

We'll see more operations on these values very soon.

**Note:** For anyone who has used C, Java or Swift (or many other languages), Python does ***not*** have a separate `double` type. Python also doesn't have type modifiers like `long` or `unsigned`. If you wind up working with NumPy, then you may have to think about different types of integers and floating point numbers (but, we won't talk about that today).

#### Boolean Values

A **Boolean** represents one of two states: True or False. In Python, these values are represented using the names `True` and `False`.

In [7]:
# Your code here to show a Boolean value
True

True

**Note:** You **must** write `True` and `False`. Python does **not** recognise `true`, `false`, `T` or `F`. You **must** write the full word, with the first letter capitalised. Otherwise, Python will get mad at you.

#### Strings

In programming, we refer to text as a **string**. When writing a string, we **must** put it inside quotation marks. In Python, these can be single quotation marks (''), or double quotation marks (""). The important thing is to use the same quotation mark to open and close your string. Let's see some examples:

In [8]:
# Your code here for working with strings.
"Hello, Python!"

'Hello, Python!'

We'll see that there's a lot more that we can do with these strings. Let's say you want to have a long string that spans multiple lines and you want to keep the line breaks. Well, you can use a *triple-quoted* string to do just this.

In [9]:
# Your code here for example of triple-quoted string.
"""
This is my long
triple-quoted
string over
many, many,
many, many,
many, many,
lines!
"""

'\nThis is my long\ntriple-quoted\nstring over\nmany, many,\nmany, many,\nmany, many,\nlines!\n'

We have a whole section on strings coming up! So, stay tuned!

**Note:** It is ***super important*** to remember to put quotation marks around your string. Otherwise, Python will get mad at you.

### Variables

So, we've seen basic data types, but they're not really useful if we can't store the values. To store data, we use **variables**. A **variable** gives a *name* to a piece of data stored in memory so that you can easily access it later. The information stored in a variable can change (or **vary**).

**Note:** Python has no constants. Only variables. If you come from a language that has constants, I'm sorry.

#### Variable Names
There are rules for naming a variable:
* Variable names are **case-sensitive**.
* A variable name must contain only letters, numbers and underscores.
* A variable name cannot start with a number.
* A variable name cannot be the same as a reserved word in Python (see [here](https://docs.python.org/3/reference/lexical_analysis.html#keywords) for list).

A variable name may consist of multiple words combined. There are a few different conventions for putting words together. Two common ones are known as `snake_case` and `camelCase`:
* In `snake_case`, all letters are lowercase and words are separated by underscores.
* In `camelCase`, different words are combined with no spaces, and the first letter of a new word is put as a capital.

Different people use different conventions. Your code editor may suggest one over the other (for example, PyCharm prefers `snake_case`). The choice depends on your project setup and any existing code you may be adding to.

**Notes:** 

* Although you can combine words together, try to keep variable names reasonably concise.
* Although Python has no constants, `ALL_CAPS_NAMES` are sometimes used to denote variables that shouldn't change.
* Variable names *can* start with underscores, but this often has a special meaning.
* By **strongly encouraged** convention, we start variable names with a **lowercase** letter (or an underscore followed by a lowercase letter).

Let's see some examples of valid and invalid variable names:

| Invalid | Valid |
| ------- | ----- |
| `my_variable12.3` | `my_variable12_3` |
| `-myVariableName2`| `myVariableName2` |
| `@myVariable`| `myVariable` |
| `my-variable&` | `my_variable` | 
| `my+variable` | `my_variable` | 
| `23variable` | `my_variaBle_32` |
| `myV#ariable` | `myVariable` |
| `import` | `my_import` |

#### Variable Assignment

The way that we assign a variable is easy. We just use the `=` sign. That's it. We can also change the value of a variable by just assigning a new value using the equal sign (and so, the value **varies**).

Now, let's do a few examples of variable assignment. Here, we'll make use of the `print` function to track the value of the variables.

1. **Assignment**  
 Let's create a variable called `my_variable` with the value `42`.

In [10]:
# Your code here for variable assignment
my_variable = 42

my_variable

42

2. **Reassignment**  
 We can easily reassign a value using the equal sign again. Let's re-assign our variable `my_variable` to have the value `16`.

In [11]:
# Your code here for variable reassignment
my_variable = 16

my_variable

16

3. **Changing type**  
 There's no requirement for the new value to be of the same type as the original. Let's assign the string `"Hello"` to our variable `my_variable`:

In [12]:
# Your code here to assign a string
my_variable = "Hello"

my_variable

'Hello'

### Collection Types

So, we've seen how to store individual numbers and bits of text. Well, let's say we want to store a lot of values. If we have a small number, we can just create a bunch of variables:

In [13]:
# Your code here to create three variables with numbers
number1 = 3
number2 = 5
number3 = 16

Ok... so, this is a bit hard to manage for even small numbers of variables. Instead of working with multiple variables, we use **collection types**. A **collection** stores *multiple* values. We'll see three main types of collections:

1. Tuples - store a small fixed number of values.
2. Lists - store multiple values; can add or remove elements.
3. Dictionaries - store multiple values based on keys; can add or remove elements.

We'll go into more detail about each later, but first let's see some examples. Each collection type has specific notation:

1. Tuples - Values separated by commas between parentheses `(a, b)`
2. Lists - Values separated by commas between square brackets `[a, b]`
3. Dictionaries - Keys-Value pair defined by colons, separated by commas between brace brackets `{a:b, c:d}`

Confused? Let's see some examples.

First, a tuple example:

In [14]:
# Your code here to create a tuple
my_tuple = (3, 5, 16)

my_tuple

(3, 5, 16)

Now, let's see a list:

In [15]:
# Your code here to create a list
my_list = [3, 5, 16]

my_list

[3, 5, 16]

And now a dictionary:

In [16]:
# Your code here to create a dictionary
my_dictionary = {"Milk": 3, "Apples": 5, "Eggs": 16}

my_dictionary

{'Milk': 3, 'Apples': 5, 'Eggs': 16}

We'll see all of these in **much** more detail later. For now, what's important is that you know they exist. We'll see how to use them later.

### Introduction to Functions

So far, we've seen how to store data... but our code hasn't actually done anything. We'll see soon how to start writing code that makes decisions and does calculations. But first, let's talk about **functions**.

We can think of a **function** as a sort of machine. It takes **inputs**, does some sort of operation on them, and produces **outputs**. People often represent functions as a little black box.

![black box](../assets/function/Function.png)

When we use a function, this is known as **calling** the function. When we *call a function*, we tell it what data to use to perform the operations and where to store the result (typically in a variable). The syntax to call a function and store its result in a variable `x` is as follows:

```python
    x = function_name(arguments_here)
```

We've actually already seen a function! We used the `print` function a while ago. This function takes a string that we want to display, shows it on the screen and doesn't return any output. Let's call this function:

In [17]:
# Your code here to call the `print` function
print("Hello!!!!")

Hello!!!!


Python has other **built-in functions** available. There's a list available at [this link](https://docs.python.org/3/library/functions.html). 

For example, we can use the `abs` function to take the absolute value of a number:

In [18]:
# Your code here to call the absolute value function on some input
my_number = -5

# Call the absolute value function and store the result in a new variable
my_abs_number = abs(my_number)

# Display that new variable
my_abs_number

5

Functions can also take multiple inputs. For example, we can round numbers using the `round` function, described [here](https://docs.python.org/3/library/functions.html#round):

In [19]:
# Your code here to call the round function on a decimal number 2.95 to 1 decimal place.
my_number = 2.95

round(my_number, ndigits=1)

3.0

Here, `ndigits` is an optional **keyword argument**. Functions often have many of these, which have default values. To specify what value a keyword argument should take, you simply write it like you would a variable assignment. In this case, since the keyword argument is called `ndigits` in the documentation, and we want to set it to `1`, in the function call, we must write `ndigits=1`.

The important thing to remember about functions is that they are **defined packets of behaviour** that **encapsulate** certain operations. We call them on inputs and they produce outputs without us having to worry about the internal details.

We'll see how to **define functions** a bit later on in this workshop. Stay tuned!

## Numbers and Comparisons

We've seen how we can store data in variables. But just storing the data is boring! In this section, we'll start talking about things that we can *do* with the data. Specifically, we'll see operations that we can do on numbers and Booleans.

### Mathematical Operations

Python gives users the ability to perform simple mathematical operations on numbers. The following operations that you know very well can be easily done:
* **Addition** is performed using the `+` operator
* **Subtraction** is performed using the `-` operator
* **Multiplication** is performed using the `*` operator
* **Division** is performed using the `/` operator (does not round)

Python offers a few other operations as well:
* **Exponents** can be taken using the `**` operator (**NOT** `^`)
* **Modulus** (remainder) can be taken using the `%` operator (**Warning:** for anyone who uses MATLAB, this is **not** a comment!)
* **Integer division** (dividing and rounding down) can be performed using the `//` operator (**Warning:** for anyone who knows Java or C or any number of other languages, this is **not** a comment in Python!)

To perform a basic mathematical operation, all you need to do is type in the numbers, along with the operator, in the same way that you'd write the expression on paper. For example, to add 5 and 4, we would write the following:

In [20]:
# Put your code here
5 + 4

9

We can also chain operations together. Remember that the rules of **BEMDAS** apply. Let's do an example to show this. 

Write code that computes and prints the following results: $4+5\times 3$ and $(4+5) \times 3$. 

**Hint:** Remember the `print` function from above and use it to show the result of two different calculations in the same Jupyter notebook cell.

In [21]:
# Put your code here
print(4 + 5 * 3)
print((4 + 5) * 3)

19
27


These examples contained integers, known in Python as `int`s. We can also do calculations that involve decimal numbers, known as **floating point numbers** or simply, `float`s. We can also mix the two different types of numbers.

These rules don't only apply when working with numbers. We can also plug in **variables** that hold numeric values. We can also **store** the result in a new variable using the assignment operator `=`.

For example, let's set `a=5`, `b=4`, `c=2`. Let's compute the following:

* $a\times b - c$
* $\text{floor}(b^2 / a)$
* $(a + b) \mod c$

In [22]:
# Your code here
a = 5
b = 4
c = 2

# Example 1
ans1 = a * b / c
print("First answer:", ans1)

# Example 2
ans2 = b ** 2 // a
print("Second answer:", ans2)

# Example 3
ans3 = (a + b) % c
print("Third answer:", ans3)

First answer: 10.0
Second answer: 3
Third answer: 1


Well, let's say we want to update the original variable...

Let's create a variable called `my_variable` with the value `35`. Let's then multiply it by `2` and store this result in the same `my_variable` variable.

In [23]:
# Your code here
my_variable = 35

# Multiply by 2 and store in same variable
my_variable = my_variable * 2

# Show the result
my_variable

70

This assignment looked a bit bulky! For some of these operations, we have a shortcut so that we don't have to rewrite the variable name twice. For each operation, we can use a new assignment operator:
* We replace assignment and `+` with `+=`
* We replace assignment and `-` with `-=`
* We replace assignment and `*` with `*=`
* We replace assignment and `/` with `/=`
* We replace assignment and `**` with `**=`
* We replace assignment and `%` with `%=`
* We replace assignment and `//` with `//=`

So, we can rewrite the last example we did:

In [24]:
# Your code here
my_variable = 35

# Use the assignment operator
my_variable *= 2

# Show the result
my_variable

70

For more information on `int`s and `float`s and the numeric types in Python, see [this page](https://docs.python.org/3/library/stdtypes.html#typesnumeric) from the official Python documentation.

### Booleans

A **boolean** represents a value that is either `True` or `False`. In this section, we'll see how to generate them, and then we'll see fun things we can do with them!

#### Comparisons

Think back to when you were starting to learn math... What was one of the first things they taught you? For me, it was **comparisons** and **inequalities**. We had two numbers, and we had to put the correct sign, `>,<,=` in between (some of you were maybe also told to think of a crocodile opening its mouth to the bigger number...).

Well, this is an important idea in programming too! We can use the following operations to generate boolean values. Let's say that `a` and `b` are both numbers (either `int`s or `float`s):
* `a > b` -- **greater than**, evaluates to `True` if `a` is bigger than `b`, otherwise evaluates to `False`
* `a >= b`-- **greater than or equal to**
* `a < b` -- **less than**
* `a <= b` -- **less than or equal to**
* `a == b` -- **equal** -- ***NOTE:*** there are ***TWO*** equal signs!!!!!
* `a != b` -- **not equal**

Again, I want to emphasize that for the equals comparison, you must must must put two equal signs `==`! Otherwise, Python will think you're trying to assign a variable and it will get mad at you and give you an error!

Also, for `>=` and `<=`, the order of the two signs matters! Do **NOT** write `=>` or `=<`! If you forget, remember that the order is the same as we read it. **Less that or equal to** is first *less than*, so `<` and then *equal to*, so `=`, so the order is `<=`.

Now, let's see some examples:

In [25]:
# Your code here
a = 92
b = 43

# Complete these lines: # Your code here
print("a is greater than b:", a > b)
print("a is less than b:", a < b)
print("a is equal to b:", a == b)
print("a is not equal to b:", a != b)
print("a is greater than or equal to b:", a >= b)
print("a is less than or equal to b:", a <= b)

a is greater than b: True
a is less than b: False
a is equal to b: False
a is not equal to b: True
a is greater than or equal to b: True
a is less than or equal to b: False


Feel free to change the values of `a` and `b` and see how the output changes!

These operations don't only work on numbers! We can use `==` and `!=` on just about any other data. Let's see some examples on strings:

In [26]:
# Your code here for string comparisons
user_password = "Password"
actual_password = "passWorD"

user_password == actual_password

False

These types of comparisons are very important. We'll see why in a bit... But first, let's see some other cool things we can do with Booleans.

#### Boolean Operations

We've seen how to generate booleans using numbers and strings. We can also perform operations on booleans to get... more booleans! These three operations are **logical operations**:
* `and`
* `or`
* `not`

#### The `and` operation
The `and` operation takes **two** boolean values `a` and `b`. If **both** `a` and `b` are `True`, then `a and b` is also `True`. Otherwise, `a and b` is `False`. People coming from other programming languages may know `and` as `&&` or `&`. We can represent this operation using a **truth table**:

| `a` | `b` | `a and b` |
| --- | --- | --- |
| `False` | `False` | `False` |
| `False` | `True` | `False` |
| `True` | `False` | `False` |
| `True` | `True` | `True` |

In practice, you'll often work with Booleans that you've generated using comparisons. Now, let's a more complicated example. Let's set `a=4`, `b=5` and `c=6` and evaluate `(a < b) and (c > b)`:

In [27]:
# Your code here for the example
a = 4
b = 5
c = 6

# Perform the operation
(a < b) and (c > b)

True

Let's think about that last example: we have `a=4`, `b=5`, `c=6`. We're looking at the logical expression
```python
a < b and c > b
```

So, we start by breaking it up into the two parts:
* `a < b`
* `c > b`

Now, we look at each part separately:
* `a < b`: well, we have `a=4` and `b=5`, so we have `4 < 5`, which is `True`
* `c > b`: we have `c=6` and `b=5`, so we have `6 > 5`, which is `True`

Now, we can put these two back together: for `a < b and c > b` both the left and the right are `True`, which makes the whole expression `True`!

#### The `or` operation

The `or` operation also takes **two** boolean values `a` and `b`, but it evaluates to `True` if **at least one** of `a` or `b` is `True`. If both values are `False`, then `a or b` is `False`. Otherwise, `a or b` is `True`. In other programming languages, the `or` operation is represented as `a || b` or `a | b`.

To help visualise, here's the truth table:

| `a` | `b` | `a or b` |
| --- | --- | --- |
| `False` | `False` | `False` |
| `False` | `True` | `True` |
| `True` | `False` | `True` |
| `True` | `True` | `True` |

Once again, you'll often work directly with numbers and comparisons. Let's do an example where we have `a = 5`, `b = 6`, `c = 7` and let's evaluate `a > b or c > b`:

In [28]:
# Your code here for a numeric example:
a = 5
b = 6
c = 7

# Perform the comparisons and logical operation
a > b or c > b

True

Let's go through that last example. We have `a=5`, `b=6`, `c=7`. Let's again break up our expression into two parts:
* `a > b`
* `c > b`

Let's look at each one:
* `a > b` --> `5 > 6` --> `False`
* `c > b` --> `7 > 6` --> `True`

Since at least one of the two boolean values is `True`, then `a > b or c > b` is `True`.

We don't have to use comparisons of variables of all the same type. Let's set `n1 = 5`, `n2 = 6` and `password = "Hello"`. Let's now check to see if the product of the two numbers is less than 28 **or** the password is equal to `"World"`:

In [29]:
# Your code here for a more complicated example:

# Define our variables
n1 = 5
n2 = 6
password = "Hello"

# Perform our comparisons
(n1 * n2 < 30) or password == "World"

False

Let's now change the password to `"World"` and try again:

In [30]:
# Your code here...

# Define our variables
n1 = 5
n2 = 6
password = "World"

# Perform our comparisons
(n1 * n2 < 30) or password == "World"

True

Hopefully, you're starting to see that we can use these booleans to make decisions. We'll come back to this idea **really soon**.

#### The `not` operation

The `not` operation only takes in **one** boolean value `a` and flips its value. If `a` is `True`, then `not a` is `False` and if `a` is `False`, then `not a` is `True`. In other languages, it may be represented by `!a` or `~a`.

Here's the truth table:

| `a` | `not a` |
| --- | --- |
|`False` | `True`|
|`True` | `False` |

The easy way to understand it is that it's opposite day! When you add the `not` operator, everything that is usually `True` becomes `False` and everything that is usually `False` becomes `True`.


Let's do a numeric example. Let's set `a=6` and `b=8`. Let's evaluate `not a > b`:

In [31]:
# Your code here for a numeric example

# Define our variables
a = 6
b = 8

# Perform the logical operation
not a > b

True

Let's look a bit more closely at this last example. We have `a=6` and `b=8`.

The value of `a > b` is `6 > 8`, which is `False`. But the `not` operation flips this from `False` to `True`.

**Note:** When you want to invert equality, *DO NOT* do `not a == b`. We have an operation that does this in one step, called `!=`. So, you should do `a != b` instead. It's cleaner and simpler.

Now that we have a basic understanding of booleans, let's see one of their most practical uses...

## Intro to Control Flow and Loops

So far, our code has just run line-by-line. Everything we've written has run. But, we have ways of making decisions and repeating certain lines. In this section, we'll see how to do this using:

* Control Flow
* `while` Loops
* `for` Loops 

### Control Flow: the `if` Statement

Let's say you're coming to this workshop. You take the metro and get off at Peel. You get out at the corner of Metcalfe and de Maisonneuve and look around. In your head, you're thinking, `if` Metcalfe is open, I'll walk up there, otherwise (`else`), I'll go to Peel. **Congratulations!!!** You've just done control flow!

Control flow is about **making decisions** using boolean values. The important keyword here is `if`. Here is the structure of control flow in Python:

```python
if some_boolean:
    do_something
elif some_other_boolean:
    do_something_else
elif yet_another_boolean:
    do_another_something_else
...
else:
    all_else_has_failed_so_lets_do_this

some_other_code_here
```

Here's the idea: 

* If the value of `some_boolean` is `True`, then the line `do_something` runs.
* If the first branch isn't run, then we test to see if `some_other_boolean` is `True`. If it is, then we run `do_something_else`.
* We keep checking all the branches until one of the conditions is met (evaluates to `True`).
* If all conditions are `False`, then the code under `else` runs.

Here are a few things to note about the syntax:
* there is a **colon** (:) after the boolean value.
* the line `do_something` only runs if the `boolean_value` evaluates to `True`. 
* the line `do_something` is **indented**. In other languages, you might be used to curly brackets. Python **DOES NOT** use these. In Python, different blocks of code are indented. Also, note that in Python, we don't need to write `end` when we're done! It's enough to stop indenting.
* the line `some_other_code_here` runs *regardless* of whether the `boolean_value` is `True`. We can tell because it's **not** indented.

And now, some notes about the different branches:

* The `if` branch is **required** (otherwise, there's not much of a point here...).
* There is no limit to the number of `elif` clauses you can have. You can have zero, one, or as many as you want.
* There is no requirement to add an `else` clause. You can lots of `elif` clauses without a final `else`.
* You can only have at most **one** `else` clause.

It's **SUPER IMPORTANT** to remember that **only one branch is run**. Once Python finds a condition that matches, it stops checking all the other branches.

Let's see some examples to help illustrate.

In [32]:
# Your code here

metcalfe_is_open = True
peel_is_open = True

print("I'm out of the metro...")

if metcalfe_is_open:
    print("Yay! I can walk up Metcalfe.")
elif peel_is_open:
    print("Construction, great... I'm heading to Peel.")
else:
    print("No streets are open... How am I going to do this?")

print("I'm heading up to the Education Building.")


I'm out of the metro...
Yay! I can walk up Metcalfe.
I'm heading up to the Education Building.


Try changing the variable `metcalfe_is_open` to `False` and see what happens... Try playing with both boolean values.

You may have noticed that the lines under the `if` statement are indented. That tells Python that they will only run when the `if` condition is met. The lines underneath that aren't indented tell Python that they run no matter what.

You may have also noticed that I didn't write:
```python
if metcalfe_is_open == True
```

This isn't necessary, since we already have a boolean. Putting in the extra comparison makes our code less clean. Also, just try reading the code like it's a sentence. It even sounds like a conversation:
"If Metcalfe is open, [print] I'm going up Metcalfe".

If you think about all the branches, it still seems intuitive:
* If Metcalfe is open, I'm taking Metcalfe.
* If it's not open, but Peel is open, then I'm taking Peel.
* Otherwise, find another way...

We typically won't plug in a raw boolean value. To see something more practical, let's replace the boolean with one of the comparisons we have above...

Let's write a simple program that takes the current outdoor temperature in a variable, and tells us if we're below freezing, above freezing or at freezing.

In [33]:
# Freezing point example: Your code here

current_temperature = 100

# Give an introductory message
print("We're taking the temperature...")

# Check if the temperature is below zero
if current_temperature < 0:
    print("We're below freezing!")
elif current_temperature == 0:
    print("We're at freezing!")
else:
    print("We're above freezing!")

# Give a concluding message
print("Done taking the temperature")

We're taking the temperature...
We're above freezing!
Done taking the temperature


In this example, we put an expression that evaluates to a boolean after the `if`. Try setting the value of `current_temperature` to be above zero and see what happens.

### `while` loops

So, control flow is great for choosing which lines of code to run, but what if we want to run a line more than once? To do this, we can use **loops**. There are two main kinds of loops in Python:
* `while` loops
* `for` loops

They are similar, but `for` loops run for a predetermined number of times and `while` loops run for an arbitrary number of iterations. We'll start with `while` loops.

Syntax:

```python
    while some_boolean:
        do_some_code
    
    code_after_loop...
```

Now, you'll pretty much **NEVER** want to put a raw boolean value in the `while`. You'll instead want to use some sort of operation that returns a boolean. This operation usually involves a variable that you update in the loop. Again, notice the indent!

Sticking with our temperature theme... Let's write an example where the temperature starts at -15 and increases by 2° at every iteration until it hits 10°. At each iteration, we print a message saying the current temperature and whether we are below, at or above freezing:

In [34]:
current_temperature = -15

# Your code here
while current_temperature < 10:
    # Print our message
    if current_temperature > 0:
        message = "is above zero."
    elif current_temperature == 0:
        message = "is at zero."
    else:
        message = "is below zero."
    print("Current temperature", current_temperature, message)
    
    # Increase the temperature
    current_temperature += 2

Current temperature -15 is below zero.
Current temperature -13 is below zero.
Current temperature -11 is below zero.
Current temperature -9 is below zero.
Current temperature -7 is below zero.
Current temperature -5 is below zero.
Current temperature -3 is below zero.
Current temperature -1 is below zero.
Current temperature 1 is above zero.
Current temperature 3 is above zero.
Current temperature 5 is above zero.
Current temperature 7 is above zero.
Current temperature 9 is above zero.


Try changing the increment or the starting value to see the differences in the output.

**Remember:** It is **CRITICALLY IMPORTANT** to update the variable in the loop. Otherwise, the condition will always be true and the loop will run forever.

### Iteration with `for` loops

`for` loops are a bit simpler, since they involve running for a pre-determined number of times. To use a `for` loop, we need something to iterate over. One basic iterable uses the `range` function.

The `range` function takes **up to** three arguments:
```python
    range(a,b,c)
```

The behaviour changes depending on how many arguments you give:

* `range(a)` - produce all numbers from `0` up to, but *excluding* `a`.
* `range(a, b)` - produce all numbers from `a` up to, but *excluding* `b`.
* `range(a, b, c)` - produce all numbers from `a` up to, but *excluding* `b`, incrementing by `c`.

**Note:** In the last case, the numbers can be decreasing if `b < a` and `c < 0`.

Now that we know about ranges, let's look at the `for` loop!

Here is the `for` loop syntax:
```python

    for var_name in iterable:
        some_code
    
    code_when_finished

```

At each step, the next item from our iterable is stored in `var_name`. If we have a list, then the next list item is considered. If we have a `range`, then the next number in the `range` is considered.

Let's see an example where we're calculating the squares of all numbers between 1 and 10 (excluding 10):

In [35]:
# Your code here

for i in range(1, 10):
    i_squared = i * i
    print(i, "squared is", i_squared)


1 squared is 1
2 squared is 4
3 squared is 9
4 squared is 16
5 squared is 25
6 squared is 36
7 squared is 49
8 squared is 64
9 squared is 81


We can use many different types of objects in the `for` loop. We'll see examples where we can provide a `list` or a `str` instead.

#### Interrupting Loops

Sometimes, you may want to interrupt a loop early, or skip one iteration. For this, we have the keywords `break` and `continue`.

We use `break` if we want to stop going through a loop. For example, let's say we are using a `for` loop to calculate squares, but we don't want to go above 50:

In [36]:
# Your code here
for i in range(10):
    i_squared = i * i
    print(i, "squared is", i_squared)
    
    # Check if above 50
    if i_squared > 50:
        print("We're above 50! Stopping!")
        break

0 squared is 0
1 squared is 1
2 squared is 4
3 squared is 9
4 squared is 16
5 squared is 25
6 squared is 36
7 squared is 49
8 squared is 64
We're above 50! Stopping!


If you have disregarded my advice and you've put a pure boolean into a `while` loop, then **make sure** that you have a `break` somewhere to get out of the infinite loop.

**Hint for later:** Think about a common biological system where this ability to interrupt might be helpful...

The `continue` keyword skips the current iteration and moves on to the next step in the loop. For an example with `continue`, let's say we have a list and we only want to compute squares of all even numbers:

In [37]:
# Here's our list
my_list = [5, 23, 4, 1, 2, 2, 6, 7, 8, 5, 3, 4, 8]

# Your code here

# Iterate over the list
for n in my_list:
    # Check if n is odd
    if n % 2 == 1:
        print("Skipping odd number...")
        continue
    n_squared = n * n
    print(n, "squared is", n_squared)


Skipping odd number...
Skipping odd number...
4 squared is 16
Skipping odd number...
2 squared is 4
2 squared is 4
6 squared is 36
Skipping odd number...
8 squared is 64
Skipping odd number...
Skipping odd number...
4 squared is 16
8 squared is 64


These two keywords can also be used in `while` loops.

## Exercises

We have reached the end of this module!!!

Here are a few exercises to work on based on what we saw this module.

### Advanced Freezing Point Thermometer

We've seen a lot of examples of control flow using numbers. Well, we can also use strings.

For this exercise, let's update our temperature detector to be helpful for people who use Fahrenheit or Kelvin. Write code that takes the current temperature and a variable `units`. The `units` can be equal to `C`, `F` or `K`. Using these two variables and some control flow, determine whether the provided temperature is above, below or at freezing and print a message for each case.

In [38]:
# Store the current temperature and the units
current_temperature = 10
units = "F"

# Your code here for exercise with units

# Give an introductory message
print("We're taking the temperature...")

# Get the value for the freezing point
if units == "C":
    freezing_point = 0
elif units == "F":
    freezing_point = 32
elif units == "K":
    freezing_point = 273.15
else:
    print("Invalid units! Assuming Celsius!")
    freezing_point = 0

# Check if the temperature is below zero
if current_temperature < freezing_point:
    print("We're below freezing!")
elif current_temperature == freezing_point:
    print("We're at freezing!")
else:
    print("We're above freezing!")

# Give a concluding message
print("Done taking the temperature")

We're taking the temperature...
We're below freezing!
Done taking the temperature


### Temperature Conversions

Let's stick with the temperature theme for a bit longer (we'll get into more biological examples in the next section). In the United States, the temperature is commonly reported in Fahrenheit. But, here in Canada (and in much of the rest of the world), the temperature is recorded in Celsius. The conversion to Fahrenheit from Celsius is given by:
$$
    \text{F} = \frac{9}{5}\text{C} + 32
$$

To convert from Fahrenheit back to Celsius, we use the equation:
$$
    \text{C} = \frac{5}{9}(\text{F} - 32)
$$

(P.S. if you ever forget, easy way to remember: the relationship is linear -- the lines intersect at -40 -- and we know that water freezes at 32°F and 0°C and boils at 212°F and 100°C; with any two of these three points, you can definitely find the line).

First, let's write code to convert between the two units. Users will include an input temperature and input unit (either as `"F"` or `"C"`) and the code should convert to the other unit.

In [39]:
# Your code here

input_temperature = 60
input_units = "F"

if input_units == "C":
    output_units = "F"
    converted_temperature = 9 / 5 * input_temperature + 32
else:
    output_units = "C"
    converted_temperature = 5 / 9 * (input_temperature - 32)

print("Unit conversion complete!", input_temperature, input_units, "equals", converted_temperature, output_units)

Unit conversion complete! 60 F equals 15.555555555555557 C


Now, for another example, let's find the temperature in Fahrenheit for all Celsius temperatures from $-40^\circ \text{C}$ to $+35^\circ \text{C}$ (inclusively), incrementing by $5^\circ$.

**BONUS:** Write this code twice: once using a `for` loop and once using a `while` loop.

In [40]:
# Put your code here...

# For loop solution
print("===== FOR LOOP RESULTS ======")

for c in range(-40, 36, 5): # Notice that we have to go above 35, since 35 is excluded
    f = 9 / 5 * c + 32
    print(c, "C equals", f, "F.")

# While loop solution
print("\n\n===== FOR LOOP RESULTS ======")

# Note that here we need to set the initial temperature
c = -40

while c <= 35:
    f = 9 / 5 * c + 32
    print(c, "C equals", f, "F.")

    # We need to explicitly increment
    c += 5

-40 C equals -40.0 F.
-35 C equals -31.0 F.
-30 C equals -22.0 F.
-25 C equals -13.0 F.
-20 C equals -4.0 F.
-15 C equals 5.0 F.
-10 C equals 14.0 F.
-5 C equals 23.0 F.
0 C equals 32.0 F.
5 C equals 41.0 F.
10 C equals 50.0 F.
15 C equals 59.0 F.
20 C equals 68.0 F.
25 C equals 77.0 F.
30 C equals 86.0 F.
35 C equals 95.0 F.


-40 C equals -40.0 F.
-35 C equals -31.0 F.
-30 C equals -22.0 F.
-25 C equals -13.0 F.
-20 C equals -4.0 F.
-15 C equals 5.0 F.
-10 C equals 14.0 F.
-5 C equals 23.0 F.
0 C equals 32.0 F.
5 C equals 41.0 F.
10 C equals 50.0 F.
15 C equals 59.0 F.
20 C equals 68.0 F.
25 C equals 77.0 F.
30 C equals 86.0 F.
35 C equals 95.0 F.


### BONUS: Replacing `for` Loops with `while` Loops

Any time that you use a `for` loop, you can actually use a `while` loop instead. It's just not always as nice and clean:

In [41]:
# Done using a `for` loop
for i in range(10):
    print("The value of i is now", i)
    # print("The operation 2 * i gives us:", 2 * i)


# Done using a `while` loop.
i = 0

while i < 10:
    print("The value of i is now", i)
    i += 1

The value of i is now 0
The value of i is now 1
The value of i is now 2
The value of i is now 3
The value of i is now 4
The value of i is now 5
The value of i is now 6
The value of i is now 7
The value of i is now 8
The value of i is now 9
The value of i is now 0
The value of i is now 1
The value of i is now 2
The value of i is now 3
The value of i is now 4
The value of i is now 5
The value of i is now 6
The value of i is now 7
The value of i is now 8
The value of i is now 9


## Module Summary

Congratulations! You've made it through the basics! In this module we've seen:

* How to *store* different *types* of data in **variables**.
* How to perform *basic mathematical operations* on **integers** and **floating-point numbers**.
* How to perform **boolean operations** and apply these to **control flow** through **`if` statements**.
* How to repeat tasks using **`for` and `while` loops**.

# Module 2 - Strings and Collections: An Object Primer

In this module, we'll take things up to a new level. We've seen how to write code that does stuff with basic data, such as numbers and booleans. We've also played a bit with strings. Now, let's go into a bit more depth on strings and collections.

Here's the outline for this section:
1. Introducing Objects
    1. What is an Object?
2. Introducing the String!
    1. String Slicing
    2. String Methods (concatenation and string formatting, converting strings to numbers, find and replace)
3. Introduction to Tuples, Lists and Dictionaries
    1. Tuples and Tuple Unpacking
    2. Lists and List Methods (adding, removing, slicing)
    3. Dictionaries (Key-Value storage, accessing, adding, removing)
4. *Exercises*

## Introducing Objects

Python is **object-oriented**... What does that mean?

Well, at a basic level, it means that everything is an *object*. Does that help? Probably not.

Let's take a step back...

### What is an Object?

An **object** is a combination of **variables** (data) and of **functions** (behaviours) that can modify the data. The data contained in an object are known as **attributes** and the functions associated with the object are known as **methods**.

All objects have a **type**, which tells you what kind of object they are. All objects of the same type have the **same** attributes (although not necessarily the same *value*) and the same available methods.

For example, let's say we're talking about cars. All cars have *attributes*, like colour, year and model. They also have *methods*, like turning on, turning off, switching gear, and activating the headlights. Your car may be red while mine may be blue, and they may be from different manufacturers and have different ages, but they're still both cars. They both *have* these attributes and they both can do the same things (methods).

For a biological example, let's say we're looking at mice for an experiment. All mice have certain basic biological parameters, such as age and sex. They may also have attributes related to your experiment, such as a specific genotype or phenotype. All these define the **attributes** of each mouse. There are also certain experimental manipulations that can be performed on the mice, such as feeding, or exercising (I don't work with mice, but bear with me...). These define the mouse **methods**.

In Python, **everything is an object**. This doesn't really do much for the primitive types that we talked about above. But, it gets more important when we start discussing more complicated types of data, like **strings** and **collections**.

## Introducing the String!

A **string** is a sequence of text characters, surrounded by quotation marks. We saw an example above when we wrote the "Hello, World!" program. We can use either single quotes or double quotes:

In [42]:
# Your code here
print("This is a string")
print('This is also a string')

This is a string
This is also a string


We can also use triple-quotes to have a longer string that has line breaks in it, like we saw above.

**Remember:** It's very very very important that you remember the quotation marks! Otherwise, Python will think you're talking about variables.

What if we want to indent, or to add a new line? Well, we can add special characters through **escape sequences**:

* `\t` - Tab, indent.
* `\n` - New line.
* `\\` - Backslash.
* `\uXXXX` - Insert unicode character with code `XXXX`.
* `\'` - Insert an apostrophe (useful if your string is defined with `'`).
* `\"` - Insert a quotation mark (useful if your string is defined with `"`).

If you look at the value of the string, you will see these escape sequences, but if you print them, they get converted into their actual meaning:

In [43]:
# Your code here for escape sequences
my_string = "This \"string\"\nhas\n\ta\n\t\tlot\n\t\t\tof\n\t\t\t\tlines."

my_string

'This "string"\nhas\n\ta\n\t\tlot\n\t\t\tof\n\t\t\t\tlines.'

In [44]:
# Your code here to print that string
print(my_string)

This "string"
has
	a
		lot
			of
				lines.


See these pages for more information about escape sequences:

* [Python documentation](https://docs.python.org/3/reference/lexical_analysis.html#grammar-token-python-grammar-stringescapeseq)
* [W3Schools tutorial](https://www.w3schools.com/python/gloss_python_escape_characters.asp)

So, we can print strings... But what else?

There's lots of stuff that we can do with these strings. Let's discuss a few operations on strings.

The absolute most basic thing we can do is get the **length**, or number of characters, in a string. We can use the built-in `len` function to get this information, passing the string as the argument:

In [45]:
# Your code here
my_string = "I like Python!"
string_length = len(my_string)
print("The length of my string is", string_length, "characters")

The length of my string is 14 characters


**Note:** `""` defines the **empty string**. This string has no characters in it, and so it has a length of zero.

In [46]:
# Your code here to check the length of the empty string
empty_string_length = len("")

print("The length of the empty string is", empty_string_length, "characters")

The length of the empty string is 0 characters


### String Slicing

We can also access individual characters or substrings using the **bracket operator** `[]`. But first, we need to talk about **indexing**. In a Python string, every character has a numbered position. It's **extremely** important to remember that in Python, the first position is indexed with the number **0**.

Again, I'll repeat that...

***The first character in a Python string has index 0.***

So, you can also figure out that the last character in a string with *n* characters has index *n-1*, **not** *n*.

This diagram should help clarify it:

![string indexing](../assets/StringIndexingPositive.png)

Note that blank spaces are counted! To get the character at an index, stored in variable `i`, we'd write the following:

```python
character_of_interest = my_string[i]
```

To get a substring starting at index `i` and going to the character at index `j` (**excluding** that character), we write:
```python
my_substring = my_string[i:j]
```

If we omit `i`, then we get everything from the beginning up to (but **excluding**) `j`. If we omit `j`, then we get the substring starting at index `i`.

We can even skip every `k` characters by adding a third number:
```python
my_substring = my_string[i:j:k]
```

Now, let's see some examples of string indexing and taking substrings. In Python, this process is commonly referred to as *slicing*.

In [47]:
my_string = "my string text"

# Your code here

# Let's look at single characters
print("The first character in the string is:", my_string[0])
print("The last character in the string is:", my_string[len(my_string) - 1])

# Now, let's look at substrings
print("The substring from index 3 to index 12 is:", my_string[3:12])

# Now, let's skip a few characters
print("The substring from 5 to the end, skipping every 2 is:", my_string[5::2])

The first character in the string is: m
The last character in the string is: t
The substring from index 3 to index 12 is: string te
The substring from 5 to the end, skipping every 2 is: rn et


Python also has a great feature where we can use **negative** indices! The last character has an index of -1 and the values go back to -n, where n is the length of the string. Here's an updated diagram:

![Negative indices](../assets/StringIndexingNegative.png)

Now, it's your turn! Let's do some string indexing with negative indices. **Note:** We *can* combine positive and negative indices.

In [48]:
# Reproduce the above strings using negative indexing where convenient
my_string = "my string text"

# Your code here

# Let's look at single characters
print("The last character in the string is:", my_string[-1])

# Now, let's look at substrings
print("The last 3 characters in the string are:", my_string[-3:])
print("The first 7 characters in the string are:", my_string[:7])

The last character in the string is: t
The last 3 characters in the string are: ext
The first 7 characters in the string are: my stri


One last note on string slicing and indexing: Strings are **immutable**, meaning that you can't change any of the individual characters or substrings. You can create a new string using existing strings, but you **cannot** change the content of a string.

**Note:** These indexing rules are **super important** to remember! They aren't just useful for strings. When we get to lists, we'll see these slicing tools again.

### String Operations and Methods

#### Concatenation and Formatting
A common operation on strings is **concatenation**, or combining strings. We can combine strings with the `+` sign:

In [49]:
string_1 = "Hello,"
string_2 = "World!"

# Your code here
concatenated_string = string_1 + string_2

print("Concatenated string is:", concatenated_string)

Concatenated string is: Hello,World!


This example shows something very important! Concatenation does **NOT** add in any spaces. It just takes the two strings and combines them together. If you want there to be spaces, you need to make sure to add them in!

Also, concatenation only works on **strings**! Let's look at this example:

In [50]:
string_1 = "The meaning of life, the universe and everything is "
meaning_of_life = 42

# This gives an error!
# print(string_1 + meaning_of_life)

This is very important to remember if you know JavaScript! Running this gives us an error! We can't concatenate an integer and a string. If we want to add the two together, we **must convert the `int` to a string** using `str`:

In [51]:
string_1 = "The meaning of life, the universe and everything is "
meaning_of_life = 42

# Your code here
complete_sentence = string_1 + str(meaning_of_life)

print(complete_sentence)

The meaning of life, the universe and everything is 42


But, there's a shortcut using **string formatting**, or **f-strings**, which let you put a variable directly into a string:
```python
my_formatted_string = f"The meaning of life, the universe and everything is... {meaning_of_life}"
```

In [52]:
# Your code here for string formatting
my_formatted_string = f"The meaning of life, the universe and everything is {meaning_of_life}."

print(my_formatted_string)

The meaning of life, the universe and everything is 42.


Notice that there is an `f` before the opening quotation mark and that the variable goes in curly braces. This tool makes life **much** easier! There are also cool ways of formatting numbers with extra zeros and spaces... but we won't see them today.

#### Converting Strings to Numbers

Let's say, you've gotten some data from a file or the internet and it contains a number. You want to do some sort of mathematical operation on it... and you rush to Python and you do this:

```python
    my_number_from_file = "32.3"

    my_answer = 3 * my_number_from_file

    print("The answer to my computation is:", my_answer)
```

What do you think will print?

In [53]:
my_number_from_file = "32.3"

# Your code here to multiply by 3
my_answer = my_number_from_file * 3

print("The answer to my computation is:", my_answer)

The answer to my computation is: 32.332.332.3


The answer may surprise you. Depending on which operation you're doing, you'll either get:
* a complete nonsense answer
* an error.

There's an important step that we need to do before we can do any mathematical operations: we must convert the strings to numeric types. This is very easy:
* To convert a string to a `float`, just call the `float()` function with the string as the argument.
* To convert a string to an `int`, just call the `int()` function with the string as the argument.

For example:

In [54]:
my_string_float = "32.3"
my_string_int = "41"

# Fill in the blanks to perform the type conversions
# Your code here

my_int = int(my_string_int)
my_float = float(my_string_float)

print("The product of 32.3 and 41 is:", my_float * my_int)

The product of 32.3 and 41 is: 1324.3


**Fun fact**: The `int` function can also be used on numbers that are not base-10!

### Finding a Substring - Intro to Methods and Objects
And now, for a string exercise! Remember that I said you can't change the contents of a string. Well, let's now create a new string that has a single character that is different. And, since this is a QLS-MiCM workshop, let's use DNA as an example.

In [55]:
dna_sequence = "AAGGACCTTAGAAGGGGACCATTATTAAATTCCCGCA"

There are more things that we can do with strings. In Python, strings are a type of **object**. An **object** is a grouping of variables, known as **attributes**, and functions, known as **methods** that all relate to one thing. String objects have various methods that we can use, or **call**, to do different things with the text contents. To call a method, we use the syntax

```python
    variable_name.method_name(arguments)
```

***This syntax will look quite familiar to anyone coming from Java or a C-based language. It may be a bit confusing for people coming from R or MATLAB. Remember, in Python, the dot `.` is NOT part of the variable name. It is an operator that lets us access functions and variables that belong to certain objects.***

Remember from earlier that **functions** may take inputs, or **arguments**, perform calculations, and then **return** outputs. Let's see a few examples of methods that we can use on strings.

For example, one method we can use on strings is `find`. Let's look at the documentation to see what this method does: https://docs.python.org/3/library/stdtypes.html#str.find

The `find` method looks for a specified substring within a whole string, or part of a string, and returns the index where it is located.

In [56]:
dna_sequence = "AAGGACCTTAGAAGGGGACCATTATTAAATTCCCGCA"

# Put in your code to find the index of the first T nucleotide
index_of_first_t = dna_sequence.find("T")

print("The first thymine nucleotide is located at index", index_of_first_t)
print(dna_sequence[index_of_first_t])

The first thymine nucleotide is located at index 7
T


### Replacing Characters

Well, let's say we want to replace this `T` nucleotide with a `G` nucleotide. We can use another useful method: `replace`. As the name suggests, this method replaces specified characters or substrings with the provided new ones. Its documentation is [here](https://docs.python.org/3/library/stdtypes.html#str.replace).

The syntax is:
```python
    new_string = my_string.replace("old", "new", optional_count)
```

Let's go back to our DNA sequence and replace only the first `T` with `G`:

In [57]:
# Your code here
mutated_dna_sequence = dna_sequence.replace("T", "G", 1)

print("Our modified sequence is:", mutated_dna_sequence)

Our modified sequence is: AAGGACCGTAGAAGGGGACCATTATTAAATTCCCGCA


There are many more methods we can call for strings. To learn more, see the `str` reference on the Python documentation website (https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str).

### String Iteration and the `for` Loop

Remember, earlier we saw the `for` loop. Well, we can do fun things with the `for` loop in strings! We can iterate over each character in the string.

Here's the syntax:

```python
    for c in my_string:
        do_something
```

Here, `c` is a single character in the string. Let's see an example:

In [58]:
my_dna_sequence = "ACGGACAGGAGCGAGATTTGACAGCATTA"

number_of_purines = 0
number_of_pyrimidines = 0

# Your code here
for nucleotide in my_dna_sequence:
    if nucleotide == "C" or nucleotide == "T":
        number_of_pyrimidines += 1
    elif nucleotide == "A" or nucleotide == "G":
        number_of_purines += 1
    else:
        print("Invalid nucleotide! Skipping!")

print(f"In our sequence there are {number_of_purines} purines and {number_of_pyrimidines} pyrimidines.")

In our sequence there are 19 purines and 10 pyrimidines.


There's actually an easy way to clean up our boolean conditions. Instead of using string equality, we can check if the nucleotide is contained in a string using the `in` keyword:

In [59]:
my_dna_sequence = "ACGGACAGGAGCGAGATTTGACAGCATTA"

number_of_purines = 0
number_of_pyrimidines = 0

for nucleotide in my_dna_sequence:
    # Your code here to simplify
    if nucleotide in "CT":
        number_of_pyrimidines += 1
    elif nucleotide in "AG":
        number_of_purines += 1
    else:
        print("Invalid nucleotide! Skipping!")

print(f"In our sequence there are {number_of_purines} purines and {number_of_pyrimidines} pyrimidines.")

In our sequence there are 19 purines and 10 pyrimidines.


### String Summary

We've reached the end of the section on strings. Here are the main points:
* Strings represent **text** in Python.
* We can use **slicing** to access individual characters or substrings.
* Strings are **objects** and we can use string **methods** to create new, modified strings.

Now that we've seen strings, let's look at some collection types!

## Collection Types - Introduction to Tuples, Lists and Dictionaries

We've seen that we can store data in basic types, like strings, `int`s and `float`s. But, let's say we want to store many of these at a time. For example, let say we have 100 DNA sequences that we want to store and process? Well, for this we have **collection types**. In this section, we'll see three important collection types:
* Tuples
* Lists
* Dictionaries

For more information on tuples and lists, see [this page](https://docs.python.org/3/library/stdtypes.html#sequence-types-list-tuple-range) of the Python documentation. For more info about dictionaries, see [here](https://docs.python.org/3/library/stdtypes.html#mapping-types-dict).

### Tuples

A tuple is a way of packaging a fixed number of values together. The number of values can't be changed, and neither can the values themselves. Tuples are **immutable**, like strings. Remember, though, we can always assign a new tuple to the same variable. Tuples are represented using multiple values separated by commas within round brackets (parentheses) -- `()`.

In [60]:
# Your code here
my_tuple = ("Error", 404)

my_tuple

('Error', 404)

#### Accessing Elements
There are two different ways to access individual elements in a tuple:
* Slicing
* Unpacking

When working with tuples, **slicing** works the *exact same way* that it did with strings, described above. And since it works the same way, we won't do an example right here.

But, even without an example, I'll give you my usual reminder... ***INDEXING STARTS AT ZERO***.

#### Tuple Unpacking
**Unpacking** is a different process. Let's say we have a tuple with 2 elements in it. We can assign each one of these elements to a variable, like this:

In [61]:
my_point = (-3, 5)

# Your code here to assign x and y
x, y = my_point

print("The value of my point is:", my_point)
print("The value of x is:", x)
print("The value of y is:", y)

The value of my point is: (-3, 5)
The value of x is: -3
The value of y is: 5


**NOTE:** You **MUST** have the same number of variables and the number of elements in the tuple. Otherwise, unpacking won't work and you'll get an error from Python.

Finally, like with strings, we can concatenate tuples using the `+` operation.

In [62]:
my_tuple = (4, 5, "Hello", "World!", 12, True, 4.5)

# Your code here to concatenate tuples
my_combined_tuples = my_tuple + my_point

my_combined_tuples

(4, 5, 'Hello', 'World!', 12, True, 4.5, -3, 5)

#### Tuple Summary

Here are the main take-aways about tuples:
* Tuples hold a **small number** of values.
* The values can have different types.
* We can use **unpacking** to extract the elements.
* We **can't change** the elements.

### Lists and List Methods

List are more exciting than tuples. Lists are **mutable**! So, we can add entries to a list, remove entries from a list, and change the entries in a list. Lists are represented as comma-separated values in square brackets -- `[]`. Unlike tuples, we can't unpack lists. Lists also *usually* contain elements of the same or similar type (although they don't have to).

In [63]:
# Your code here for an example list
my_list = ["The", "quick", "brown", "fox"]

my_list

['The', 'quick', 'brown', 'fox']

**Note:** While you *can* put elements of different types in a list, ask yourself whether in a given scenario you *should* and make sure that your code is prepared to handle the different types of data.

Now, I've told you all these great things that we can do with lists... but how do we do them?

#### Length of a List
Well, let's start with the simplest thing... taking the **length** of a list. We do this in the exact same way that we took the length of a string! We use the `len` function.

In [64]:
# Your code here
my_squares = [1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121]
print("My squares list has length:", len(my_squares))

My squares list has length: 11


#### List Slicing

We can obtain individual items and sublists through *slicing*, exactly the same way that we did with strings and tuples.

And, in case you forgot, indexing starts at zero 😉.

Here's an exercise to test your skills with this...

I'm giving you this list: `[1, 1, 2, 3, 5, 8, 13, 21, 34]`

Using slicing, find:
* the last element
* the values `3, 5, 8`
* the values `1, 2, 5, 13`

In [65]:
my_list = [1, 1, 2, 3, 5, 8, 13, 21, 34]

# Your code here
print("The last element in the list is:", my_list[-1])
print("The sublist is:", my_list[3:6])
print("The sublist is:", my_list[0:-1:2])

The last element in the list is: 34
The sublist is: [3, 5, 8]
The sublist is: [1, 2, 5, 13]


But, there's more that we can do with the slicing! We can now update values using the `=` sign! We can do this for both individual elements and for sublists!

Let's take this example: `[1, 2, 4, 9, 16, 32, 64, 129, 257]`

Any idea what this sequence is? There are three mistakes that we need to correct!

So... Where are the mistakes? How do we correct them?

In [66]:
# Here is our error-filled list:
powers_of_two = [1, 2, 4, 9, 16, 32, 64, 129, 257]

# Your code here to correct
powers_of_two[3] = 8
powers_of_two[-2:] = [128, 256]


print("The corrected list is:", powers_of_two)

The corrected list is: [1, 2, 4, 8, 16, 32, 64, 128, 256]


So, we can replace single elements by assigning a new value, or an entire sub-range by passing a list!

#### Adding Elements

Now for the fun part! Let's insert new items! Remember that the list is **mutable**, so when we add new items, we are actually *changing* the list. We are **not** creating a new list and we are **not** creating a new variable (there's no `=` sign). To change the list, we use **methods** from the list object.

Let's start with adding a new item at the **end** of the list. This process is known as *appending* to a list. So, naturally, the method to do this is called `append`:

In [67]:
# Example using our powers of two
# Your code here to continue the list
powers_of_two.append(512)

print("Powers of two is now:", powers_of_two)

Powers of two is now: [1, 2, 4, 8, 16, 32, 64, 128, 256, 512]


We can also insert at any index `i` using the method called... [`insert`](https://docs.python.org/3/library/stdtypes.html#sequence-types-list-tuple-range:~:text=(6)-,s.insert(i%2C%20x),-inserts%20x%20into)! This method takes **two** arguments: the index `i` *before which* we want to insert the new element and the new element that we want to insert:

```python
my_list.insert(i, new_element)
```

***NOTE:*** You must respect this order of arguments.

Here's an example:

In [68]:
days_of_the_week = ["Sunday", "Tuesday", "Wednesday", "Thursday", "Saturday"]

# Your code here to add Monday in the correct spot
days_of_the_week.insert(1, "Monday")

# Your code here to add Friday in the right spot (hint: negative indexing)
days_of_the_week.insert(-1, "Friday")


print(f"The {len(days_of_the_week)} days of the week are: {days_of_the_week}")

The 7 days of the week are: ['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday']


How can we learn more about these methods? We can check out the [documentation](https://docs.python.org/3/library/stdtypes.html#list). We can also see other methods, like `index`, which we can use to find the position of an element.

#### Removing Elements

Sometimes, we want to delete elements from a list. There are a few ways to do this:
- using the `del` keyword
- using an assignment
- using the `pop` method
- using the `clear` method

Here are the details:
* The `del` keyword can be used to get rid of single elements or a range. `del` is **not** a function, so we **don't** use brackets. 
* To remove a range, we can alternatively just use slicing and assign an empty list to the desired range (see [here](https://docs.python.org/3/library/stdtypes.html#mutable-sequence-types)).
* We can use the `pop` method without an argument to remove the last item from a list, or with an index as argument to remove the item at index `i`. The `pop` method returns the removed element, so it can be stored in a variable.
* We can use the `clear` method to remove **all** items from a list.

In [69]:
test_list = [2, 3, 5, 7, 9, 11, 13, 17, 19, 23]

# Your code here to remove the number which doesn't belong using `pop`...
my_index = test_list.index(9)
removed_element = test_list.pop(my_index)


print("Test list is now:", test_list, "since we removed item:", removed_element)

Test list is now: [2, 3, 5, 7, 11, 13, 17, 19, 23] since we removed item: 9


In [70]:
test_list_2 = [1, 2, 2, 1, 2, 3, 1, 2, 2, 1, 1, 2, 2]

# Your code here to remove the numbers that disrupt the pattern using assignment.
test_list_2[4:6] = []

print("Test list 2 is now:", test_list_2)

Test list 2 is now: [1, 2, 2, 1, 1, 2, 2, 1, 1, 2, 2]


In [71]:
# Your code here to remove all elements using `clear`
test_list_2.clear()

print("The test list 2 is now:", test_list_2)

The test list 2 is now: []


**Extra:** Using `del` to delete elements. Remember *not* to put brackets.

In [72]:
test_list = [2, 3, 5, 7, 9, 11, 13, 17, 19, 23]

# Your code here

# Get index of 9
my_index = test_list.index(9)

# Remove the number which doesn't belong using `del`...
del test_list[my_index]

print("Test list is now:", test_list)

Test list is now: [2, 3, 5, 7, 11, 13, 17, 19, 23]


In [73]:
test_list_2 = [1, 2, 2, 1, 2, 3, 1, 2, 2, 1, 1, 2, 2]

# Your code here to remove the numbers that disrupt the pattern using `del`.
del test_list_2[4:6]


print("Test list 2 is now:", test_list_2)

Test list 2 is now: [1, 2, 2, 1, 1, 2, 2, 1, 1, 2, 2]


#### List Concatenation

One last operation: lists can be concatenated using the `+` operator like strings and tuples. Remember that **both** the left and the right must be lists! You can't add a number to a list by concatenation! If you want to add a new element without appending, you must first embed it in a list.

In [74]:
list_a = [1, 4, 6]

# Your code here to add 3 to the end of list_a to create list_b
list_b = list_a + [3]

print("Modified list a is:", list_b)

Modified list a is: [1, 4, 6, 3]


**Remember,** concatenation makes a new list! After concatenating our two lists, the original lists are not modified!

#### List Iteration

Remember how we went through each character in a string? Well, we can do the exact same thing with a list!

```python
    for item in my_list:
        do_something...
```

We saw examples of this earlier, so I'll skip the simple examples. Let's see something a bit more complicated...

Let's say we want to loop through the list **and** get the index of the element... Well, we can use the `enumerate` function. This returns a **tuple** containing the index and the item from the list.

**Note:** In the `for` loop, we can **immediately unpack** the tuple!

In [75]:
my_list = [2, 4, 6, 5, 8, 7, 1, 3, 5, 7, 8, 9, 10, 22, 11, 95]

number_of_even = 0
number_of_odd = 0

last_even_index = -1
last_odd_index = -1

# Your code here to extract the number of odd and even and get the final indices of each
for i, n in enumerate(my_list):
    if n % 2 == 0:
        number_of_even += 1
        last_even_index = i
    else:
        number_of_odd += 1
        last_odd_index = i


print("Our list has", number_of_even, "even numbers and", number_of_odd, "odd numbers.")
print("The last even number was at index", last_even_index, "and the last odd number was at index", last_odd_index)


Our list has 7 even numbers and 9 odd numbers.
The last even number was at index 13 and the last odd number was at index 15


#### List Summary

That's all we'll discuss for lists. Here are the main takeaways:

* A list contains an **unfixed number** of values, *typically* of the **same** type.
* We can access individual elements through **slicing** and **indexing**.
* Lists are **mutable**, so we can **add** and **remove** elements easily.
* We can **iterate** over lists and use **`enumerate`** to get the index of each element.

Now that we're doing with our first look at lists, let's look at our last major collection type: the **dictionary**.

### Dictionaries

So... How many of you can remember using a paper dictionary? What's the idea behind them?

#### Key-Value Storage

Well, we're not going to be defining words... but think about the **structure** of a dictionary. You look up a word and you get an associated valuable piece of information, a definition. Let's call the word a **key** and the associated information a **value**. A **dictionary** is a collection that stores **Key-Value** pairs.

Now, for the syntax... Well, tuples involved round brackets, and lists involved square brackets... so it's only natural that the syntax for dictionaries uses curly brackets, or brace brackets `{}`. But, there's another twist here. 

We need both keys and values! The **values** can be any type, but the **keys** must be **immutable**. So, the keys can be numbers, tuples or strings (or booleans, I guess, but that may not be useful), but they **cannot** be lists. In addition, keys **cannot** be duplicated, but values can. If you try to duplicate a key, only one of the values is kept.

Let's make a simple dictionary with keys for `microCT`, `FIB-SEM`, `confocal`, `STORM` and `cryoTEM`:

In [76]:
# Your code here: dictionary example for image_counts

image_counts = {"microCT": 12, "FIB-SEM": 5, "confocal": 36, "STORM": 6, "cryoTEM": 2}

image_counts

{'microCT': 12, 'FIB-SEM': 5, 'confocal': 36, 'STORM': 6, 'cryoTEM': 2}

Now, there are lots of operations that we can do on dictionaries!

#### Accessing and Modifying Dictionary Entries

Recall that in strings, tuples and lists we used the square brackets `[]` for indexing. We're still going to use them here, but instead of using a *numeric* index, we put a key in the brackets instead. We can then perform our usual operations of retrieving and replacing values.

In [77]:
# Your code here to access the number of microCT scans and store it in micro_ct_scans
micro_ct_scans = image_counts["microCT"]

print(f"We have {micro_ct_scans} microCT scans in our database!")

# Your code here to modify the number of confocal images
image_counts["confocal"] = 39

print("Imaging database now has the following datasets:", image_counts)


We have 12 microCT scans in our database!
Imaging database now has the following datasets: {'microCT': 12, 'FIB-SEM': 5, 'confocal': 39, 'STORM': 6, 'cryoTEM': 2}


**Note:** Remember, unless your keys are numbers, you **cannot** use numerical indexing to select elements. You **must** use a valid key.

**Warning:** You **cannot** do slicing on a dictionary!

#### Adding Keys

Adding new elements to a dictionary is easy! We just need the new key and the new value, and then we write:
```python
    my_dictionary[new_key] = new_value
```

For example:

In [78]:
# Your code here to add TEM to our imaging database
image_counts["TEM"] = 10

print(f"Our imaging database now has the following datasets available: {image_counts}")

Our imaging database now has the following datasets available: {'microCT': 12, 'FIB-SEM': 5, 'confocal': 39, 'STORM': 6, 'cryoTEM': 2, 'TEM': 10}


#### Removing Entries

To remove an entry, we can again use the `del` keyword, or we can use `pop`. Like with lists, `pop` gives us the value that we removed in a variable.

In [79]:
# Your code here to remove the STORM datasets and store them in a variable storm_datasets
storm_datasets = image_counts.pop("STORM")

print("The number of STORM datasets was:", storm_datasets)

print("Our dictionary is now:", image_counts)

The number of STORM datasets was: 6
Our dictionary is now: {'microCT': 12, 'FIB-SEM': 5, 'confocal': 39, 'cryoTEM': 2, 'TEM': 10}


#### Other Operations

Much of the expected behaviour of dictionaries is similar to lists. There are a few methods that are exclusively used by dictionaries:
* The `keys` method returns the keys in the dictionary.
* The `values` method returns the values in the dictionary.
* The `items` method returns tuples containing `(key, value)` pairs.
* The `update` method can be used for combining dictionaries (Concatenation doesn't work!). **This method updates the current dictionary and does not produce a new one!**

In [80]:
# Your code here
my_keys = image_counts.keys()
my_values = image_counts.values()
my_items = image_counts.items()

print("The keys are:", my_keys)
print("The values are:", my_values)
print("The items are:", my_items)

The keys are: dict_keys(['microCT', 'FIB-SEM', 'confocal', 'cryoTEM', 'TEM'])
The values are: dict_values([12, 5, 39, 2, 10])
The items are: dict_items([('microCT', 12), ('FIB-SEM', 5), ('confocal', 39), ('cryoTEM', 2), ('TEM', 10)])


In [81]:
new_datasets = {
    "synchrotron": 3,
    "STEM": 4,
}

# Your code here to update the dictionary
image_counts.update(new_datasets)

print("Imaging catalogue now has data:", image_counts)

Imaging catalogue now has data: {'microCT': 12, 'FIB-SEM': 5, 'confocal': 39, 'cryoTEM': 2, 'TEM': 10, 'synchrotron': 3, 'STEM': 4}


#### Dictionary Iteration

To do things with all data stored in the dictionary, we don't usually iterate over indices. Instead, we can iterate over the keys, or the values, or the `items` which contain both. To iterate over the keys, we can just do the following:

```python
for k in my_dictionary:
    do_something
```

To iterate over the keys, values or item tuples explicitly, just put the appropriate method call in the `for` loop. For example:

```python
for v in my_dictionary.values():
    do_something
```

As an example, let's find the average of our imaging catalogue counts from above:

In [82]:
image_counts = {
    "microCT": 12,
    "FIB-SEM": 5,
    "confocal": 36,
    "STORM": 6,
    "cryoTEM": 2
}

# Your code here to compute the average number of datasets for the modalities and store it in average_count
number_of_datasets = 0

for modality in image_counts:
    n = image_counts[modality]
    number_of_datasets += n

average_count = number_of_datasets / len(image_counts)

print("The average number of image datasets is", average_count)

The average number of image datasets is 12.2


#### Dictionary Summary

We've reached the end of the dictionary section. Here are the highlights:

* Dictionaries store **key-value** pairs.
* Instead of using a numerical index, a **key** must be used to look up a value.
* Dictionaries are **mutable**, so we can easily add and remove keys, and change the values associated with each key.

Now that we're done covering our basic collection types, let's do some exercises!

## Exercises: Working with Strings and Collections for DNA and Protein Processing

Earlier, we were looking at temperatures. Now, let's do some more biological exercises. DNA, RNA and proteins can be easily represented using Python collections. In these exercises, we're going to implement the fundamental gene expression steps: **transcription** and **translation**.

DNA and RNA are both **nucleic acids** that are composed of sequences of **nucleotides**. We won't go into the chemical details here (there are plenty of biology textbooks and Wikipedia pages for that), but here are the important ideas:

* DNA is composed of **adenine** (`A`), **thymine** (`T`), **guanine** (`G`) and **cytosine** (`C`).
  * `A` and `G` are known as **purines** and `T` and `C` are known as **pyrimidines**.
* DNA is double-stranded. Each strand consists of a sequence of nucleotides. These two strands interact with each other through **base pairing**. The rules for base pairing are:
  * `A` always pairs with `T`.
  * `C` always pairs with `G`.
* DNA and RNA share *most* of their nucleotides, but they differ in one of the pyrimidines. DNA has thymine `T` while RNA has uracil `U`.
* RNA is single-stranded.

DNA and RNA are (of course) more complicated, but these are the basics that will be helpful in these exercises.

### Transcription

**Transcription** is the process by which messenger RNA (mRNA) is produced based on DNA. Recall that DNA is **double-stranded**. One strand serves as the **template** for the mRNA, and the new nucleotides forming the mRNA base-pair with this template strand. To obtain an mRNA sequence based on a DNA sequence, we have two possibilities:

* If we are considering the *template* strand, we go backwards along the sequence, base-pairing each nucleotide to build up a sequence.
* If we are considering the *non-template* strand, we go along the sequence, replacing each `T` with a `U`.

Let's consider the following sequence:

```
AGCAGATGCATTAGCCATTAGTTTGCACCAGTATATGCAGAGTTTAGGAGACCATAATTAACGAGAGCCGATAGCTAGA
```

1. Assume that this is the **non-template strand**. Transcribe this sequence into mRNA.

In [83]:
dna_sequence = "AGCAGATGCATTAGCCATTAGTTTGCACCAGTATATGCAGAGTTTAGGAGACCATAATTAACGAGAGCCGATAGCTAGA"

# Put your code here
rna_sequence = dna_sequence.replace("T", "U")

rna_sequence

'AGCAGAUGCAUUAGCCAUUAGUUUGCACCAGUAUAUGCAGAGUUUAGGAGACCAUAAUUAACGAGAGCCGAUAGCUAGA'

2. Now, let's assume it is the **template strand**. Perform the transcription.

In [84]:
dna_sequence = "AGCAGATGCATTAGCCATTAGTTTGCACCAGTATATGCAGAGTTTAGGAGACCATAATTAACGAGAGCCGATAGCTAGA"

# Put your code here

# Solution 1
reversed_strand = dna_sequence[::-1]

print(reversed_strand)
print("-"*len(reversed_strand))

rna_sequence = ""

for nucleotide in reversed_strand:
    if nucleotide == "A":
        rna_sequence += "U"
    elif nucleotide == "T":
        rna_sequence += "A"
    elif nucleotide == "C":
        rna_sequence += "G"
    elif nucleotide == "G":
        rna_sequence += "C"

# Solution 2
rna_sequence = ""

for i in range(len(dna_sequence) - 1, -1, -1):
    nucleotide = dna_sequence[i]
    if nucleotide == "A":
        rna_sequence += "U"
    elif nucleotide == "T":
        rna_sequence += "A"
    elif nucleotide == "C":
        rna_sequence += "G"
    elif nucleotide == "G":
        rna_sequence += "C"

print(rna_sequence)

# Solution 3
pairings = {"A": "U", "T": "A", "C": "G", "G": "C"}

rna_sequence = ""

for i in range(len(dna_sequence) - 1, -1, -1):
    nucleotide = dna_sequence[i]
    rna_sequence += pairings[nucleotide]

print(rna_sequence)

# Solution 4
pairings = {"A": "U", "T": "A", "C": "G", "G": "C"}

rna_sequence = "".join([pairings[dna_sequence[i]] for i in range(len(dna_sequence) - 1, -1, -1)])

print(rna_sequence)

# Solution 5
pairings = {"A": "U", "T": "A", "C": "G", "G": "C"}

rna_sequence = "".join([pairings[nt] for nt in reversed(dna_sequence)])

print(rna_sequence)


AGATCGATAGCCGAGAGCAATTAATACCAGAGGATTTGAGACGTATATGACCACGTTTGATTACCGATTACGTAGACGA
-------------------------------------------------------------------------------
UCUAGCUAUCGGCUCUCGUUAAUUAUGGUCUCCUAAACUCUGCAUAUACUGGUGCAAACUAAUGGCUAAUGCAUCUGCU
UCUAGCUAUCGGCUCUCGUUAAUUAUGGUCUCCUAAACUCUGCAUAUACUGGUGCAAACUAAUGGCUAAUGCAUCUGCU
UCUAGCUAUCGGCUCUCGUUAAUUAUGGUCUCCUAAACUCUGCAUAUACUGGUGCAAACUAAUGGCUAAUGCAUCUGCU
UCUAGCUAUCGGCUCUCGUUAAUUAUGGUCUCCUAAACUCUGCAUAUACUGGUGCAAACUAAUGGCUAAUGCAUCUGCU


### Translation (Part I)

After the DNA is transcribed into mRNA, the mRNA travels to the ribosomes, where it is translated into an amino acid sequence. This translation occurs by **codons** of 3 nucleotides each. But, remember, we need to look for a **start codon** `AUG`.

I've given you an mRNA sequence. To prepare for translation, convert the mRNA sequence into a list of codons **starting with the start codon**. Print the number of codons you've found.

In [85]:
my_rna = "AGCAGCAUGACCGAGUCAGUCAGCUUGCGGCUACGUACUGGCCAUUAGCAGUACAGU"

# Your code here

In [86]:
my_rna = "AGCAGCAUGACCGAGUCAGUCAGCUUGCGGCUACGUACUGGCCAUUAGCAGUACAGU"

# Your code here

# Here are a few hints ...

# 1. Create an empty codon list
my_codons = []

# 2. Find the start codon
start_codon_index = my_rna.find("AUG")

# 3. Iterate over the string
for i in range(start_codon_index, len(my_rna) - 2, 3):
    # 4. Get the codon...
    new_codon = my_rna[i: i + 3]

    # 5. Add codon to list
    my_codons.append(new_codon)
    

print("We found", len(my_codons), "codons")
print(my_codons)

We found 17 codons
['AUG', 'ACC', 'GAG', 'UCA', 'GUC', 'AGC', 'UUG', 'CGG', 'CUA', 'CGU', 'ACU', 'GGC', 'CAU', 'UAG', 'CAG', 'UAC', 'AGU']


### Translation (Part II)

Now that we have a list of codons, we can convert them to amino acids using a codon table. To make things a bit more interesting, I've given you the inverse codon table from https://en.wikipedia.org/wiki/DNA_and_RNA_codon_tables. This table has the amino acids as keys and the list of corresponding codons as the values.

**Hint:** As a first step, you may want to create the forward dictionary, with the codons as keys and the amino acid as value. This step isn't *necessary* but it will make your code more efficient (and look nicer).

**Recall:** Your list of codons from the DNA sequence earlier should still be in the variable `my_codons`.

In [87]:
amino_acid_to_codon_table = {
    "F": ["UUU", "UUC"],
    "L": ["UUA", "UUG", "CUU", "CUC", "CUA", "CUG"],
    "I": ["AUU", "AUC", "AUA"],
    "M": ["AUG"],
    "V": ["GUU", "GUC", "GUA", "GUG"],
    "S": ["UCU", "UCC", "UCA", "UCG", "AGU", "AGC"],
    "P": ["CCU", "CCC", "CCA", "CCG"],
    "T": ["ACU", "ACC", "ACA", "ACG"],
    "A": ["GCU", "GCC", "GCA", "GCG"],
    "Y": ["UAU", "UAC"],
    "STOP": ["UAA", "UAG", "UGA"],
    "H": ["CAU", "CAC"],
    "Q": ["CAA", "CAG"],
    "N": ["AAU", "AAC"],
    "K": ["AAA", "AAG"],
    "D": ["GAU", "GAC"],
    "E": ["GAA", "GAG"],
    "C": ["UGU", "UGC"],
    "W": ["UGG"],
    "R": ["CGU", "CGC", "CGA", "CGG", "AGA", "AGG"],
    "G": ["GGU", "GGC", "GGA", "GGG"]
}
# Your code here

# Start by creating a new dictionary where the codons are the keys
forward_codon_table = {} # This creates an empty dictionary

# Use iteration to create the opposite table: Codons to Amino Acids
for amino_acid in amino_acid_to_codon_table:
    for codon in amino_acid_to_codon_table[amino_acid]:
        forward_codon_table[codon] = amino_acid

# Perform the translation on the provided codon list:
my_codons = ['AUG', 'ACC', 'GAG', 'UCA', 'GUC', 'AGC', 'UUG', 'CGG',
          'CUA', 'CGU', 'ACU', 'GGC', 'CAU', 'UAG', 'CAG', 'UAC', 'AGU']

my_protein = ""

for codon in my_codons:
    # Get the corresponding amino acid for the codon
    new_amino_acid = forward_codon_table[codon]
    
    # Check if it is the stop codon
    if new_amino_acid == "STOP":
        print("STOP CODON!")
        break # End the loop

    # Add the new amino acid to the protein
    my_protein += new_amino_acid
    print(f"Added new amino acid {new_amino_acid} for codon {codon}!")

print("Our protein has amino acid sequence:", my_protein)

Added new amino acid M for codon AUG!
Added new amino acid T for codon ACC!
Added new amino acid E for codon GAG!
Added new amino acid S for codon UCA!
Added new amino acid V for codon GUC!
Added new amino acid S for codon AGC!
Added new amino acid L for codon UUG!
Added new amino acid R for codon CGG!
Added new amino acid L for codon CUA!
Added new amino acid R for codon CGU!
Added new amino acid T for codon ACU!
Added new amino acid G for codon GGC!
Added new amino acid H for codon CAU!
STOP CODON!
Our protein has amino acid sequence: MTESVSLRLRTGH


## Module Summary

Yay! We've made it through our second content module! Here, we've explored the basics of strings and collection types. Here are the main points that we saw:

* An **object** groups together variables, known as **attributes** and functions, known as **methods**.
* A **string** represents *text* in Python. We can use **slicing** to access its elements. We can also perform operations, like **concatenation and string formatting**, and use **methods** to get extra info about a string or create modified versions of it.
* A **tuple** represents a *small number of objects grouped together*. To access elements, we can either use slicing, or we can **unpack** its contents into the corresponding number of variables. Tuples can't be modified.
* A **list** represents a *variable-length collection* of objects. We can add or remove objects from the list using **list methods**, such as `append`, `insert` and `pop`. We can also iterate over all elements of a list using a `for` loop.
* A **dictionary** represents *key-value storage*. Instead of having a numeric index, we access **values** using a **key**. We can add or remove elements using keys and we can modify the dictionary using **dictionary methods**, such as `pop` and `update`. We can also use the `keys`, `values` and `items` methods to get different pieces of information.
* All of these are **objects**, which means that they store information and have functions, or **methods** associated with them.
* We can iterate over all these types of objects to process individual elements.

For more information about any of these objects, check out the official Python documentation. There's a lot of detail about each type:
* Strings: https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str
* Tuples: https://docs.python.org/3/library/stdtypes.html#tuple
* Lists: https://docs.python.org/3/library/stdtypes.html#list
* Dictionaries: https://docs.python.org/3/library/stdtypes.html#mapping-types-dict

Finally, there's another collection type that I didn't discuss, called a *set*. If you want to learn about it, check out this page: https://docs.python.org/3/library/stdtypes.html#set-types-set-frozenset.

# Module 3 - Introduction to Functions

In this module, we'll explore functions. By now, we've used existing functions, like `abs` or `round` or, of course, `print`, as well as methods like `str.replace`. Here, we'll see how to *define* new functions.

Here's the outline for this module:

1. Function Overview
    1. What is a function?
2. Writing Custom Functions
    1. Basic function definitions
    2. Passing inputs: Defining parameters
    3. Producing outputs: Return values
3. Documenting Functions
    1. Defining function docstrings
4. *Exercises*


## Function Overview

We saw functions earlier when we were getting a flavour for Python. Let's do a quick recap.

### What is a Function?

We can think of functions as **machines** that take in **inputs**, run code (do calculations, magic or a bit of both), and then produce an **output** that can be used.

The inputs are known as *parameters* or *arguments* and the outputs are known as *return values*.

Here's a diagram to illustrate this.

![Function as a machine](../assets/function/Function.png)

Like anything in Python, a function has a **name**. To run the function, we must **call it** by writing its name, and then including the arguments in brackets.

**Remember! Even if the function has no arguments, you must put the brackets!**

If the function **returns** a value, we can store it in a variable using the typical `=` assignment.

Let's explore the built-in [`round`](https://docs.python.org/3/library/functions.html#round) function:

In [88]:
# Your code here

my_float = 3.4

my_rounded_number = round(my_float)

print(my_rounded_number)

3


We can learn more about any function using the built-in `help` function:

In [89]:
# Your code here

help(round)

Help on built-in function round in module builtins:

round(number, ndigits=None)
    Round a number to a given precision in decimal digits.

    The return value is an integer if ndigits is omitted or None.  Otherwise
    the return value has the same type as the number.  ndigits may be negative.



This help documentation, known as a **docstring** tells us important information about the function. It describes the parameters and return values, as well as any quirks that the function may have.

In addition to using the `help` function, we can also read the docstring online, at the official Python documentation: https://docs.python.org/3/library/functions.html#round.

## Writing Custom Functions

Now that we've review what functions are and how to *use* them, let's dive into **defining** our own.

### Why write our own functions?

It's all good and fun to write all the steps you want to do line-by-line. But, let's say you want to run the same set of steps multiple times, potentially on different inputs. You could just copy-paste the code... but what happens if you have to change it? You'll have to change all the copies!

Instead of copying the code, we can write new **functions**.

### A Bit of Syntax

In Python, functions are defined using the `def` keyword. The syntax is:

```python

def function_name(argument1, argument2, argument3, ..., argumentN):
    """
    documentation here
    """

    your_code_here...

    return some_value

```
Here are the important elements to notice when **defining** a function:

* The function definition begins with the `def` keyword. This is similar to the `function` keyword in Javascript, or the `func` keyword in Swift.
* The **function name** follows the same rules as variable names. There are different naming conventions for names that consist of multiple words (`snake_case` vs `camelCase`). By common convention, the function name starts with a **lowercase** letter.
* After the function name, you can include a list of parameters in parentheses. **If your function takes no arguments, you must still put the brackets.** Each argument in the list must have a valid variable name. We'll discuss these in more detail later.
* After closing the argument list bracket, we put a **colon** (`:`).
* After the first line, we must **indent**. This tells Python where the function body begins and ends.
* We can start the body with a **docstring**, which describes the function. We'll discuss these more later.
* Then, you write your code as normal. In this function body, treat the arguments **like normal variables**.
* To **output** a result that can be used later, use the keyword `return`, followed by the result. We'll discuss this more later.
* After finishing to define the function, simply stop indenting. There's no need to close any brackets or type `end`.

To demonstrate, let's write a function with no arguments that simply prints a string onto the screen:

In [90]:
# Your code here

def do_nothing():
    print("Not much going on...")

Wait! What happened? Or well, what didn't happen? We didn't see any string... What's going on?

Well, we only **defined** the function. To actually run the function we must *call* it. To call a function, simply write the name of the function, followed by the desired arguments in brackets. **If the function takes no arguments, you must still type the empty brackets.**

Let's call our function we just defined:

In [91]:
# Your code here

do_nothing()

Not much going on...


### Function Parameters

This function worked, but we didn't really put the *fun* in *function*.

We said that a function takes input and produces output... This does neither!!! So, let's create a function with some parameters! Let's look at the specific syntax:

```python

def my_function(arg1, arg2, arg3, ..., argN):
    my_code...

```

We separate each parameter using **commas (,)**. We can then refer to these as variables in the function body. In this case, in the function body's code, you can refer to `arg1` just as you would any other variable.

As an example, let's write a function that takes a DNA sequence as input and prints the transcribed RNA. To make it more interesting, let's add an extra parameter that indicates whether we are considering the sequence to be on the template strand or not. For simplicity, let's ignore the directionality of DNA.

Remember, if the DNA is on the template strand, we must perform base-pairing!

In [92]:
# Your code here

def transcribe(dna, is_template_strand):
    if not is_template_strand:
        rna_sequence = dna.replace("T", "U")
    else:
        base_pairs = {"A": "U", "C": "G", "T": "A", "G": "C"}
        rna_sequence = ""
        for nt in dna:
            rna_sequence += base_pairs[nt]
    print(rna_sequence)

And now, let's call this function using a specific sequence.

In [93]:
my_sequence = "AATTAGCGAGCCGAATATATAGCCGCGATTCAGACAGTTCCAGCGCA"

# Your code here
transcribe(my_sequence, True)

UUAAUCGCUCGGCUUAUAUAUCGGCGCUAAGUCUGUCAAGGUCGCGU


This works well! Except, what if most of the time, we're going to call the function on the template strand? It would be nice if we didn't have to specify this argument every time we call the function and if we could give it a default value.

#### Keyword Arguments

Good news! We can set default values for function arguments. These are known as *keyword* arguments. Values without a default value are known as *positional* arguments. To specify the default value, simply assign the value with `=`:

```python

def my_function(my_positional_arg, my_kw_arg=default_value):
    ...

```

Let's extend our transcription example to set a default value for the `is_template_strand` parameter:

In [94]:
# Your code here to modify the function

def transcribe(dna, is_template_strand=True):
    if not is_template_strand:
        rna_sequence = dna.replace("T", "U")
    else:
        base_pairs = {"A": "U", "C": "G", "T": "A", "G": "C"}
        rna_sequence = ""
        for nt in dna:
            rna_sequence += base_pairs[nt]
    print(rna_sequence)

So, now we can call the function without having to specify a value for the second parameter:

In [95]:
transcribe(my_sequence)

UUAAUCGCUCGGCUUAUAUAUCGGCGCUAAGUCUGUCAAGGUCGCGU


There are a few **important rules** to remember about positional and keyword arguments:

1. Positional arguments **always** come first, both when defining and when calling functions.
2. When calling a function, you **must** include **all** positional arguments, but you can omit keyword arguments (since they have default values).
3. Keyword arguments can be passed in **any order**, but positional arguments must be kept in the same order.

### Function Return Values

So, we've seen how to pass information into functions, but now, how do we get information out? The answer is **return values**. These return values let us capture the result of a function, which we can then use like a normal variable in code. To return a value, we simply type `return` followed by the value we want to return.

Here's the syntax:
```python

def my_function(...):
    ...

    my_result = ...

    ...

    return my_result

```

Let's now switch our previous transcription function to *return* the mRNA instead of simply printing it:

In [96]:
# Your code here to modify the function to return a result

def transcribe(dna, is_template_strand=True):
    if not is_template_strand:
        return dna.replace("T", "U")
    else:
        base_pairs = {"A": "U", "C" : "G", "T": "A", "G": "C"}
        rna_sequence = ""
        for nt in dna:
            rna_sequence += base_pairs[nt]
        return rna_sequence

So, this is how to return the value. Now, let's see how to capture and use it. To capture the value, we simply assign it to a variable, like normal, using the equal sign `=`.

In [97]:
# Your code here
my_rna_sequence = transcribe(my_sequence)

print(my_rna_sequence)

UUAAUCGCUCGGCUUAUAUAUCGGCGCUAAGUCUGUCAAGGUCGCGU


**Note:** If your code has multiple branches, you can put multiple return statements in your code. **But**, once your code reaches the `return` line, the function **stops** and returns to the code that called it. Any code that you've written after the `return` statement **will not run**.

Let's just repeat that again: **Code underneath a `return` statement WILL NOT RUN.**

If you're using a good code editor, it will give you a warning about this "dead code".

We can also return *multiple* values using tuples, lists or dictionaries. For example, let's say we want to count the number of each type of nucleotide in a sequence of DNA:

In [98]:
# Your code here

def count_nucleotides(dna_sequence):
    number_of_a = 0
    number_of_t = 0
    number_of_c = 0
    number_of_g = 0

    dna_sequence = dna_sequence.upper()

    for nt in dna_sequence:
        if nt == "A":
            number_of_a += 1
        elif nt == "T":
            number_of_t += 1
        elif nt == "C":
            number_of_c += 1
        else:
            number_of_g += 1
    
    return number_of_a, number_of_t, number_of_c, number_of_g

Now, let's run this code on our example sequence:

In [99]:
# Your code here
my_counts = count_nucleotides(my_sequence)

print(my_counts)

(15, 9, 12, 11)


This is great! But let's say you get this function from someone else to import and use in your own code. You don't want to have to find this function and read all the code just to use it... But, how do we know what parameters this function takes and what values it returns...

## Documenting Functions

The answer to this question is **documentation**. Remember how we looked at the `help` for the `round` function earlier? We can do the same thing for our custom functions!

### Defining Function Docstrings

When defining a function, we can provide a *docstring*, which describes the important information about a function in a **human-readable** form. The docstring is just a string that a person can read to learn more about a function. If you're using a code editor or IDE, like VS code or PyCharm, this string appears when you hover your mouse over a function. The information contained in this docstring can include:

* A brief description of the function.
* A longer description of the function. If you're implementing an existing approach, it could be good to include a citation here. You can also include equations here.
* A description of the function parameters, including their types.
* A description of the function return values, as well as their types. This is especially useful if you are returning multiple values and need to include their order.

Let's clarify our previous example by adding a docstring:


In [100]:
# Your code here to add a docstring to our function

def count_nucleotides(dna_sequence):
    """Count nucleotides in a DNA sequence.

    Parameters
    ----------
    dna_sequence: str
        String containing a DNA sequence.

    Returns
    -------
    a : int
        Number of A nucleotides
    t : int
        Number of T nucleotides
    c : int
        Number of C nucleotides
    g : int
        Number of G nucleotides

    Warnings
    --------
    This function only works on DNA sequences.
    
    """
    
    number_of_a = 0
    number_of_t = 0
    number_of_c = 0
    number_of_g = 0

    dna_sequence = dna_sequence.upper()

    for nt in dna_sequence:
        if nt == "A":
            number_of_a += 1
        elif nt == "T":
            number_of_t += 1
        elif nt == "C":
            number_of_c += 1
        else:
            number_of_g += 1
    
    return number_of_a, number_of_t, number_of_c, number_of_g

Now that we have a docstring, we can actually read it using the `help` function!

In [101]:
# Your code here to look at the help for count_nucleotides.

help(count_nucleotides)

Help on function count_nucleotides in module __main__:

count_nucleotides(dna_sequence)
    Count nucleotides in a DNA sequence.

    Parameters
    ----------
    dna_sequence: str
        String containing a DNA sequence.

    Returns
    -------
    a : int
        Number of A nucleotides
    t : int
        Number of T nucleotides
    c : int
        Number of C nucleotides
    g : int
        Number of G nucleotides

    --------
    This function only works on DNA sequences.



While there are not many rules for how to write docstrings, there are some guidelines laid out in the Python documentation in [PEP 257](https://peps.python.org/pep-0257/). There are also a number of common conventions used. One is the **numpydoc** style, which is used by the developers of the NumPy project. This style is described online [here](https://numpydoc.readthedocs.io/) and is integrated into some code editors.

## Exercises: Writing Functions for Biological Sequences

### Amino Acid Properties

Proteins are composed of sequences of amino acids, arranged in polypeptide sequences. There are 20 common amino acids, which have different properties. We'll focus on polarity and charge. Amino acids are grouped into four categories:
1. Non-polar
2. Polar
3. Acidic
4. Basic

Let's write a function called `compute_amino_acid_properties` that takes a peptide sequence and returns the number of amino acids falling into each category. I've given you a dictionary with the amino acids and their properties as a starting point (obtained from [Wikipedia](https://en.wikipedia.org/wiki/DNA_and_RNA_codon_tables)).

In [102]:
AMINO_ACID_PROPERTIES = {
    "NON_POLAR": ["F", "L", "I", "M", "V", "P", "A", "W", "G"],
    "POLAR": ["S", "T", "Y", "Q", "N", "C"],
    "ACIDIC": ["D", "E"],
    "BASIC": ["H", "K", "R"]
}

# Your code here
def compute_amino_acid_properties(seq: str) -> dict[str, int]:
    """Compute the number of amino acids having different properties.

    Parameters
    ----------
    seq : str
        A string containing a sequence of peptides, represented as
        single letter symbols.

    Returns
    -------
    dict[str, int]
        Dictionary with keys representing the amino acid properties,
        ``NON_POLAR``, ``POLAR``, ``ACIDIC`` and ``BASIC``.
    
    """

    # Create the dictionary to store the amino acid counts
    my_counts_dictionary = {}

    # Initialise the dictionary keys
    for aa_property in AMINO_ACID_PROPERTIES.keys():
        my_counts_dictionary[aa_property] = 0

    # Flip the amino acids properties dictionary
    amino_acid_properties_flipped = {}

    for aa_property in AMINO_ACID_PROPERTIES.keys():
        for amino_acid in AMINO_ACID_PROPERTIES[aa_property]:
            amino_acid_properties_flipped[amino_acid] = aa_property

    # Iterate over the amino acids
    for amino_acid in seq:
        # Get the property associated with the current amino acid
        aa_property = amino_acid_properties_flipped[amino_acid]

        # Increment the count in the dictionary
        my_counts_dictionary[aa_property] += 1
        
    # Return the dictionary
    return my_counts_dictionary
    

In [103]:
# Here's an artificial amino acid sequence to test with:
test_peptide = ("EDEQLPAMFYDHSRMGQDCTIQYRAFFKFKCDEVVICPRMCRFDM"
                "GYLSCNWPDQWQFWPPNPHTDSTWVSLDYPLRWDCCRKPHTFEPY"
                "TMHASWCTERDPDIWACIKDSWMSPFEPQGSWGSTELVKEDPGFF"
                "SVFALRPCVWAAPTT")

test_peptide_properties = compute_amino_acid_properties(test_peptide)

print(test_peptide_properties)

{'NON_POLAR': 70, 'POLAR': 42, 'ACIDIC': 21, 'BASIC': 17}


### Translation

Earlier we wrote code to perform translation. This code worked well, but it would be more helpful if we wrapped it into a function. In this exercise, write and document functions for translation based on the code from the previous module. Then test this function on some artificial mRNA sequences.

**BONUS:** Make the function a bit more robust to the input. Use string methods to make the function case-insensitive.

**Note:** You may choose to break down the process into *multiple* functions.

In [104]:
# Your code here

# Create the codon table
amino_acid_to_codon_table = {
    "F": ["UUU", "UUC"],
    "L": ["UUA", "UUG", "CUU", "CUC", "CUA", "CUG"],
    "I": ["AUU", "AUC", "AUA"],
    "M": ["AUG"],
    "V": ["GUU", "GUC", "GUA", "GUG"],
    "S": ["UCU", "UCC", "UCA", "UCG", "AGU", "AGC"],
    "P": ["CCU", "CCC", "CCA", "CCG"],
    "T": ["ACU", "ACC", "ACA", "ACG"],
    "A": ["GCU", "GCC", "GCA", "GCG"],
    "Y": ["UAU", "UAC"],
    "STOP": ["UAA", "UAG", "UGA"],
    "H": ["CAU", "CAC"],
    "Q": ["CAA", "CAG"],
    "N": ["AAU", "AAC"],
    "K": ["AAA", "AAG"],
    "D": ["GAU", "GAC"],
    "E": ["GAA", "GAG"],
    "C": ["UGU", "UGC"],
    "W": ["UGG"],
    "R": ["CGU", "CGC", "CGA", "CGG", "AGA", "AGG"],
    "G": ["GGU", "GGC", "GGA", "GGG"]
}

# Start by creating a new dictionary where the codons are the keys
forward_codon_table = {} # This creates an empty dictionary

# Use iteration to create the opposite table: Codons to Amino Acids
for amino_acid in amino_acid_to_codon_table:
    for codon in amino_acid_to_codon_table[amino_acid]:
        forward_codon_table[codon] = amino_acid

def convert_mrna_to_codons(mrna):
    """Convert mRNA into a list of codons.
    
    Parameters
    ----------
    mrna : str
        A string representing an mRNA sequence, as a
        sequence of uppercase letters.

    Returns
    -------
    list of str
        List of three-letter codons, starting with
        the start codon (AUG).
    """

    # Create an empty codon list
    my_codons = []
    
    # Find the start codon
    start_codon_index = my_rna.find("AUG")
    
    # Iterate over the string
    for i in range(start_codon_index, len(my_rna) - 2, 3):
        # Get the codon...
        new_codon = my_rna[i: i + 3]
    
        # Add codon to list
        my_codons.append(new_codon)

    # Return the list of codons
    return my_codons
    

def translate(mrna):
    """Translate mRNA to a peptide sequence.
    
    Parameters
    ----------
    mrna : str
        A string representing an mRNA sequence, as a
        sequence of uppercase letters.

    Returns
    -------
    str
        A peptide sequence, as a string of uppercase
        single letter codes.
    """

    my_codons = convert_mrna_to_codons(mrna)
    
    my_protein = ""
    
    for codon in my_codons:
        # Get the corresponding amino acid for the codon
        new_amino_acid = forward_codon_table[codon]
        
        # Check if it is the stop codon
        if new_amino_acid == "STOP":
            break # End the loop
    
        # Add the new amino acid to the protein
        my_protein += new_amino_acid

    return my_protein

And now for the testing...

In [105]:
my_sequence = "AGCAGATGCATTAGCCATTAGTTTGCACCAGTATATGCAGAGTTTAGGAGACCATAATTAACGAGAGCCGATAGCTAGA"

my_mrna = transcribe(my_sequence)
my_peptide = translate(my_mrna)

print(my_peptide)

MTESVSLRLRTGH


## Module Summary

In this module, we've explored **functions**. Specifically, we've seen:

* What functions are and how to **call** them.
* How to **define new functions**, which take in **parameters** and **return** results.
* How to **document** functions using **docstrings** to make them easier to understand and reuse.

Now, let's take a look at our exercise!

# Module 4 - Where to Go From Here

We're just about at the end of our workshop! Over the course of these few hours, we've seen the basics of variables and numbers, Booleans and strings, as well as more complicated collection types. We've also seen how to package up our code into functions to have repeatable units of behaviour.

So... what comes next?

## What to Learn Next? How?

What great questions? Well, there are still a bunch of topics that I didn't cover today.

We saw functions today. Functions are great and fun. You can write so much stuff... but what if you don't want to reinvent the wheel and rewrite everything from scratch?

Good news! You don't have to! With Python, it's very easy to install **packages** and import code from **modules**.

These topics will be covered in my upcoming workshop **Data Processing in Python (Part 2)**.

How can you learn about other Python topics? There are plenty of resources out there. Keep your eyes open for other workshops! And check online for tutorials and videos. I'll talk a bit more about these soon.

## How to Get Help and How NOT to Get Help?

This section is based on the corresponding sections in two of my previous workshops (see [Intro to Python - Summer 2024](https://github.com/bzrudski/micm_intro_to_python_summer_2024) and [Intermediate Python - Summer 2024](https://github.com/bzrudski/micm_intermediate_python_summer_2024)).

When writing code, there are a bunch of resources that can help you!

### Your Code Editor

Yes! That's write! The software you're using to write code can give you lots of help. It can suggest completions and tell when there are errors and even help you reformat your files and restructure your code. So, please, please, please, **DO NOT** write your code in a simple text editor that has not additional features. And ***PLEASE*** don't use a word processing software. Use software that is made for coding!

### Documentation

I mentioned this one earlier. Documentation isn't just something that you should do. Big established projects have big documentation. Take a look at their guides for getting started. For example, [Pandas](https://pandas.pydata.org/) has a [10 minutes to pandas](https://pandas.pydata.org/docs/user_guide/10min.html) tutorial. Other packages, like [NumPy](https://numpy.org) and [Matplotlib](https://matplotlib.org) have very thorough guides and/or examples. Use these resources! If you want to learn how to use a function, **look it up** and read the paragraph about it. It will tell you how to use the arguments, any quirks to expect, and in some cases it will give you references about the papers behind the function. This is especially true in image processing and other fields that rely heavily on algorithms. So, the documentation will tell you not only how to use the code, but also **where it comes from**. And make sure to check out the Official Python docs at https://docs.python.org/3/.

### Books

Books, books, books! There are tons! And tons of books out there! For example, there are a couple of general books that are free online:
* *Think Python 2e* by Allen B. Downey (FREE book): https://greenteapress.com/wp/think-python-2e/
* *Data Structures and Information Retrieval in Python* also by Allen B. Downey (FREE book): https://greenteapress.com/wp/data-structures-and-information-retrieval-in-python/
* *Introduction to Python Programming* by Udayan Das et al., published by OpenStax: https://openstax.org/details/books/introduction-python-programming
* *The Hitchhiker's Guide to Python* by Kenneth Reitz and Tanya Schlusser: https://docs.python-guide.org/

There are also books online about more specialised topics, such as:

* Package development: *Python Packages* by Tomas Beuzen and Tiffany Timbers -- https://py-pkgs.org/
* Data science:
    * *Python for Data Analysis, 3E* by Wes McKinney -- https://wesmckinney.com/book/
    * *Python Data Science Handbook* by Jake VanderPlas -- https://jakevdp.github.io/PythonDataScienceHandbook/

Another book that covers software development for research more generally, including more emphasis on the tools used is:

* *Research Software Engineering with Python* by Damien Irving, et al.: https://third-bit.com/py-rse/index.html

Through the databases at the McGill Library, we also have access to lots of books **for free**. Check out the library's online catalogue to see more.

### Tutorials

Tutorials are also great! And very much abundant! From more formal ones on sites like [freeCodeCamp](https://www.freecodecamp.org/) and [W3Schools](https://www.w3schools.com/python/default.asp) to less formal ones on [DEV](https://dev.to/), you can get lots of insight from these. There are also lots posted on Medium that you can check out. In addition to text-based tutorials, there are also videos on YouTube. And don't forget the official tutorials in the documentation! Tutorials are a very valuable resource that can help you see how to put pieces of code together in real-world examples.

### Stack Overflow (and Pitfalls)

If you have a Python question, chances are that someone, somewhere has asked it on [Stack Overflow](https://stackoverflow.com/). Stack Overflow is a **great** resource for finding answers to real questions about programming. **But** make sure that you're using it properly. Try the other resources before going to Stack Overflow. The answer may turn out to be on the documentation page for the function you're looking for. If there's a link to the docs in a Stack Overflow answer, **use it**. Check out in more detail. Make sure that you understand the code that you're about to add to your project and don't just copy-paste it. Coding is a thinking game. Make sure that you have thought about all the code that you're putting in and that you understand why it's there. And use your judgement and intuition when borrowing that code. If it looks sketchy, it could very well be sketchy and there may be a better way.

### ChatGPT (and Pitfalls)

Everything I said above about Stack Overflow. And more. Answers on Stack Overflow are written by humans who have written the code, tested it, and run the results. Be careful when using ChatGPT for code (if you're allowed to at all). Make extra sure that it makes sense, and test it. Don't just trust it because AI wrote it for you. You need to make extra sure that it actually makes sense and runs properly, because you don't have that same guarantee that a human has used this exact code in their own experience. Use your coding judgement and intuition.

Again, ALWAYS remember to **read the documentation**. Often, if you're stuck, the answer is **right there**. If it's not, then it's probably on Stack Overflow. It's often a good idea to check the documentation **first** to see if there's an official explanation or an official example. And don't just copy a Stack Overflow answer or sample code. Think about what the code is doing. Does it make sense? Is there a better way? Try to look line by line to understand what is going on (play around in the IPython interpreter or in a Jupyter notebook!).

## Other Cool Programming Topics

So, I talked a bit about functions and classes, but there's much more that you can look into to help build your programming skills and write code that others will want to use.

### Writing Packages

We've seen how to install and use packages. But, you can also **write your own packages**. There are many great resources online about writing packages. The one that I most recommend is [this free online book](https://py-pkgs.org/): *Python Packages* by Tomas Beuszen and Tiffany Timbers. It's an easy read and helps you learn not only how to organise your code, but how to publish it, too. The authors also walk through how to render your own nice-looking documentation and host that online.

### Object-Oriented Programming

Writing code with loops and control flow is fun, but it's even better when we can combine everything into functions and classes and work in an **object-oriented** manner. This paradigm helps you organise your code differently, constructing building blocks that can work together to build elaborate programs.

### Developing Graphical User Interfaces

Jupyter notebooks and command line scripts are powerful, but they aren't accessible for people who don't know how to code. Solution: build a graphical user interface! Using PyQt, the process is quite straightforward. Check out [this online tutorial series](https://www.pythonguis.com/) by Martin Fitzpatrick to learn about developing GUIs in Python.

### Hosting Projects on GitHub

What fun is a project if other people can't use it? By hosting your project on GitHub, you let others easily contribute to your project and build on it. Learning Git and GitHub are essential! And so are a few other skills along the way, like writing documents in Markdown. MiCM often has Git and GitHub workshops, so check out their workshop schedule!

## The End

We've reached the end of our workshop! For those of you who have previous programming experience, congratulations on adding another language to your repertoire. For those of you who are new, welcome to the world of programming! Just remember, programming is like art: you start with an empty text file and soon enough, you have hundreds (or thousands) of lines of code!

Don't hesitate to reach out if you have any further questions. Happy coding!

In [106]:
from time import sleep


print("Good luck with your programming future!", end=" ")

i = 1
s = "/-\\|"

print(s[0], end="")

while i < 10:
    print("\b" + s[i % len(s)], end="")
    i += 1
    sleep(0.5)

print("\b🎉")

Good luck with your programming future! 🎉
