# Introduction to Python
MiCM Workshop - August 1, 2023

Benjamin Rudski, PhD Student, Quantitative Life Sciences, McGill University

Dear `Reader | Workshop Attendee`,  
Welcome! In this interactive Jupyter Notebook, I will introduce you to the [Python](https://www.python.org) programming language. In this journey, we'll go from understanding the basics of computers to variables, functions and package management.

This notebook is the **student version**, which contains several blanks where I will write code during the workshop and where you can fill out exercises. There is a **solution version** in the [`solutions`](../solutions/) folder. I recommend trying the exercises yourself before looking at the solutions, There is often more than one way to answer a programming question, so you should focus more on understanding the code that you are writing, instead of just copying my answers.

Here's the outline of this workshop:

1.	Module 1 – Introduction to Programming (30 minutes)
    1.	Basic Concepts and Definitions
        *	What is a computer? 
        *   What is a program?
        *	What are programming languages?
    2.	Welcome to Python
        *	What is Python?
        *	How to install Python
        *	Tools for using Python
2.	Module 2 – Python Basics (1 hour)
    1.	Foundations of Python
        *	Mathematical Operations
        *	Variables
    2.	Numbers and Comparisons
        *	Integers and Floating-Point Numbers
        *	Booleans
    3.	Intro to Control Flow and Loops
        *	Control Flow: the `if` Statement
        *	`while` Loops
        *	Basic `for` Loops
    4.	Exercise: Numbers and Loops
3.	Module 3 – Strings (40 minutes)
    1.	String slicing
    2.	String Operations and Methods
        *	Concatenation
        *   Converting Strings to Numbers
        *	Finding a Substring
        *	Replacing Characters
    3.	String Iteration and the `for` Loop
    4.	Exercise: DNA transcription and mRNA processing
4.	Module 4 - Collection Types (45 minutes)
    1.	Tuples
        *	Accessing Elements
        *	Tuple Unpacking
    2.	Lists
        *   Length of a List
        *	List Slicing
        *	Adding Elements
        *	Removing Elements
        *	List Iteration
        *   List Exercise
    3.	Dictionaries
        *	Key-Value Storage
        *	Accessing Elements
        *	Adding Keys
        *	Removing Keys
        *	Dictionary Iteration
    4.	Exercise: Translation from mRNA to protein
5.	Module 5 – Functions (35 minutes)
    1.	Intro to Functions
        *	What is a Function?
        *	Function Parameters and Return Values
        *	Documentation
    2.	Exercise: Write a function to perform transcription and translation
    3. Sneak Peek: Object-Oriented Programming
        * Quick introduction to classes and objects
6.	Module 6 – Modules and Packages (20 minutes)
    1.	Using Modules
        *	Importing a Module
        *	Importing Specific Functions
    2.	Package Management
        *	Installing Packages using `conda`
        *   Installing Packages using `pip`
        *   Other Installation Tips
        *	Reading documentation
7.	Where to go from here (5 minutes)
    1.	Where to go for help
    2.	Closing remarks
        *	Really important: Documentation
        *	Other important skills to learn: packages, markdown, GitHub

When this workshop is over, you should be able to write simple Python scripts. More importantly, I am hoping to give you the tools so that you can learn new Python skills and read package documentation to find what you need. In my opinion, the most important part of programming is knowing how to get help when you need it.

# Module 1 - Introduction to Programming
## 1. Basic Concepts and Definitions

## What is a Computer? What is Programming?
In this section, we'll briefly see what a computer is and how programming helps us do what we want to do with it.

This is the inside of a computer:

![Computer inside](assets/Dell_G5_5000_motherboard.jpg)

**Image credit:** [Dell G5 5000 motherboard.jpg](https://commons.wikimedia.org/wiki/File:Dell_G5_5000_motherboard.jpg), by [Project Kei](https://commons.wikimedia.org/wiki/User:Keita.Honda), licensed under the Creative Commons [Attribution-Share Alike 4.0 International](https://creativecommons.org/licenses/by-sa/4.0/deed.en) license.

For our intents and purposes, a computer is a machine that has **two parts**:
1. RAM (memory)
2. CPU (processor)

These parts each do very specific tasks:
* The **memory** stores information that we want to process.
* The **processor** performs operations on data, using inputs to produce an output.

In reality, computers are much more complicated that these two parts, but these are the most important for us. 

## What is a Program?

We have this hardware... but what can we do with it? We must provide it with a set of **instructions** to tell the hardware what to do. These instructions are a *program*. The job of *programming* involves writing instructions that tell the computer what *operations* the CPU should perform, and which *data* it should operate on.

A program is a **text file**. That's it. Well, actually, not exactly, but that's part of it. We'll discuss more about this later.

But... how are these instructions actually written? Introducing...

## What are Programming Languages?

**Programming languages** provide the rules for writing programs.

Who are programming languages designed to help: you (the programmer) or the computer?

Let's pause to think about it...

The answer: **you**

A program contains code that **you** write using a programming language. The computer only sees this as text. That's it. There's absolutely nothing special about the file. The computer doesn't run this text file. The computer has to turn this file into something that it can understand.

Let's make a biology analogy. Let's think about DNA, RNA and Proteins. Let's ignore non-coding RNA and ribozymes and let's just focus on the classical paradigm of the central dogma. DNA encodes instructions to build proteins. But, the DNA itself doesn't do the same function as the protein. It only tells the cell how to make the protein. The DNA must be transcribed to RNA and then **translated** using the ribosome to form a functional protein. 

Well, computers are similar! The program is a text file containing instructions. **You** write code using a programming language, like Python, C++, Java, Kotlin, Swift, etc. All the computer sees is a text file. The computer must process it before it can run it.  The computer then needs to turn these instruction that are *human-readable* into instructions that are *machine-readable*. This is done either using an **interpreter** that runs code line-by-line, for languages like Python, or a **compiler** that process also the code, for languages like C++. 

## 2. Welcome to Python

## What is Python?
For more on the history, see: https://en.wikipedia.org/wiki/History_of_Python

Python was introduced by Guido Van Rossum in 1991. It has a number of features:
* Free and open source
* Interpreted language
* Object-oriented language

### Free and Open Source
Being a free and open source language, anyone can download and use Python, but also more! Anyone can distribute Python and even modify Python, or contribute to its development. Python is developed by the **community** that uses it.

### Interpreted
Python is an *interpreted* language, **not** a *compiled* language. This means that the entire program file is not translated into machine code before it is run. Instead, only parts of the program are translated when they need to be. This means that when you make a change to one part of a program, you don't need to rebuild your entire project. You often just need to restart your interpreter. This also means that you can open Python in your terminal, type in a single line, and it will work!

### Object-Oriented
In Python, we use **objects** to represent things. Yeah... I know that doesn't really help. So, objects are a way of representing something, and they group together **data** about that thing (in the form of *attributes*) and **manipulations** that you can do with that thing (in the form of *methods*). Lots of things in Python are represented using these objects from strings of text to lists and you can even make your own (we won't cover that today).

## How to install Python

You've probably done this in the setup for the workshop. There are a number of ways to download Python. If you're on macOS or Linux, lucky you! You might already have Python installed. To see if you do, just open up a terminal window (on macOS, it should be under `Applications > Utilities > Terminal` and on Linux you may be able to just press `Ctrl+Alt+T` to open a terminal window). Once you're there, type `which python`. If you see something that isn't an error message, you already have python. You may also have to try `which python3`.

Now, if you don't have Python installed, you can install it from Python's website at https://www.python.org or get it from the Windows Store or your local software repository on Linux. But, there's another way that is quite helpful: using a Python distribution such as **Anaconda** or **miniconda**. 

Why use Anaconda? Well, it comes with **many** pre-installed packages which are very helpful in science, such as NumPy, SciPy, Matplotlib, Pandas, and more! It's a bit of a big download and the install takes a bit of time, but it is definitely worth it. To get Anaconda, just go to https://www.anaconda.com/ and hit the big green `Download` button. There is a graphical installer for Windows and macOS and a text-based installer for Windows, macOS and Linux. If you don't want to perform a 600 MB download, you can opt for miniconda instead. Go https://docs.conda.io/en/latest/miniconda.html and click on one of the download links,. Unlike Anaconda, miniconda doesn't come with the packages pre-installed, but it provides you with the `conda` tool to help you install them. We'll discuss Packages more later.

Finally, if you don't want to install Python, then good news! If you have a Google Account, then you can use Python on the web. I'll discuss this more very soon.

## Tools for Programming in Python

There are many tools out there for programming in Python:
* **Jupyter Notebooks:** Let's start with the tool that we're using now! This tool lets us combine code, explanations and figures. This is really good if you want to share your code with extra details. With a Google Account, you can use Jupyter notebooks remotely via **Google Colab**.
* **`python` shell:** This is the most basic way of running a script in the command line or using an interpreter to run one line at a time.
* **`ipython` shell:** Similar to the regular Python shell, but with better auto-complete and syntax highlighting.
* **Microsoft Visual Studio Code:** code editor that can also be used for debugging and running Jupyter notebooks. Python extension necessary.
* **Spyder:** Fully-fledged integrated development environment (IDE). Write and debug code, view figures.
* **PyCharm:** Fully-fledged IDE developed by JetBrains. Community edition is open-source.

![vscode](assets/vscode.png)  
A familiar Jupyter notebook opened in Microsoft Visual Studio Code.

![pycharm](assets/pycharm.png)  
Working with a Python file in PyCharm Community Edition.

![spyder](assets/spyder.png)
A sample Spyder window. This may look familiar if you're used to working with MATLAB or RStudio.

# Module 2 - Python Basics

In this section, we'll see the basic, foundational concepts of programming in Python. We'll start with the basics of mathematical operations and we'll see variables for storing data. Then, we'll also start seeing how to get things done in Python. Along the way, I'll also point out possible places where users of different programming languages need to pay special attention.

**Topics:**
1. Foundations of Python
    * Mathematical Operations
    * Variables
2. Numbers and Comparisons
    * Integers and Floating-Point Numbers
    * Booleans
3. Intro to Control Flow and Loops
    * Control Flow: the `if` statement
    * `while` loops
    * Basic `for` loops

## Foundations of Python

This section explores the most basic things that we can do: 
* mathematical calculations
* store small amounts of data

But first! It's conventional that the first program we write in a new language is the "Hello, World!" program. This is a simple program that writes the text "Hello, World!" to the screen. In Python, it's quite easy to do:

In [None]:
# Your exciting first line of Python code here!
# In this line of code, we'll display, or print, the text string "Hello, World!"


This very simple program introduces a few important ideas. You'll notice that the first line doesn't really look like code. Actually, it's not! The first line is a **comment**. We (and the computer) know this because the line starts with the symbol `#`. Python ignores that symbol and everything that comes after it, letting you write notes about what your code is doing. It's very important to put comments in your code, especially if you're going to need to come back to it after a few weeks or if you're going to share it with other people.

On the second line, we have two things:
* the `print` function
* the string "Hello, World!"

The `print` function displays output to the screen or to the console. While it's not necessary in this Jupyter notebook (which automatically outputs the result of the last line of code), it's very helpful if you're ever writing code in a different program, like *PyCharm* or *Spyder*. We'll discuss functions in more detail later, but the idea is that functions take inputs, known as **arguments**, do operations on them and optionally return some sort of modified result. Here, the `print` function takes in the **string** of text "Hello, World!", writes it onto the screen and doesn't return any new data.

The **argument** that we pass to this function is the text **string** `"Hello, World!"`. We'll discuss strings in more detail later. The important thing is that a string is a group of characters surrounded by quotation marks (either single quotes `'Hello'` or double quotes `"Hello"`).

Now that we've passed this important milestone, let's dive into basic Python!

### Mathematical Operations

Python gives users the ability to perform simple mathematical operations on numbers. The following operations that you know very well can be easily done:
* **Addition** is performed using the `+` operator
* **Subtraction** is performed using the `-` operator
* **Multiplication** is performed using the `*` operator
* **Division** is performed using the `/` operator (does not round)

Python offers a few other operations as well:
* **Exponents** can be taken using the `**` operator (**NOT** `^`)
* **Modulus** (remainder) can be taken using the `%` operator (**Warning:** for anyone who uses MATLAB, this is **not** a comment!)
* **Integer division** (dividing and rounding down) can be performed using the `//` operator (**Warning:** for anyone who knows Java or C or any number of other languages, this is **not** a comment in Python!)

To perform a basic mathematical operation, all you need to do is type in the numbers, along with the operator, in the same way that you'd write the expression on paper. For example, to add 5 and 4, we would write the following:

In [None]:
# Put your code here


We can also chain operations together. Remember that the rules of **BEMDAS** apply. Let's do an example to show this. 

Write code that computes and prints the following results: $4+5\times 3$ and $(4+5) \times 3$. 

**Hint:** Remember the `print` function from above and use it to show the result of two different calculations in the same Jupyter notebook cell.

In [None]:
# Put your code here



These examples contained integers, known in Python as `int`s. We can also do calculations that involve decimal numbers, known as **floating point numbers** or simply, `float`s. We can also mix the two different types of numbers.

**NOTE:** For anyone who has used C, Java or Swift (or many other languages), Python does ***not*** have a separate `double` type. Python also doesn't have type modifiers like `long` or `unsigned`. If you wind up working with NumPy, then you may have to think about different types of integers and floating point numbers (but, we won't talk about that today).

Now, it's your turn! Write code to perform the following calculations:
* $3\times 4 - 6 \div 2$
* $(3.23 + 5.2) \times 4.3^2$
* $\textup{floor}(\frac{5}{2}) \times (6 \mod 4)$

In [None]:
# Put your code here




You'll notice some placeholder text that I've written in the box above. Remember, this is a **comment**. Python ignores everything on a line that comes after the `#` symbol. This sort of text is designed to help anyone reading your code understand what it's doing.

### Variables

So, we've seen basic mathematical operations, but they're not really useful if we can't store the values that we're calculating. To store data, we use **variables**. A **variable** gives a name to a piece of data stored in memory so that you can easily access it later. The information stored in a variable can change (or **vary**).

**NOTE:** Python has no constants. Only variables.

#### Variable Names
There are rules for naming a variable:
* Variable names are **case-sensitive**.
* A variable name must contain only letters, numbers and underscores.
* A variable name cannot start with a number.
* A variable name cannot be the same as a reserved word in Python (see [here](https://docs.python.org/3/reference/lexical_analysis.html#keywords) for list).

A variable name may consist of multiple words combined. There are a few different conventions for putting words together. Two common ones are known as `snake_case` and `camelCase`:
* In `snake_case`, all letters are lowercase and words are separated by underscores.
* In `camelCase`, different words are combined with no spaces, and the first letter of a new word is put as a capital.

Different people use different conventions. Your code editor may suggest one over the other (for example, PyCharm prefers `snake_case`). The choice depends on your project setup and any existing code you may be adding to.

**Notes:** 

* Although Python has no constants, `ALL_CAPS_NAMES` are sometimes used to denote variables that shouldn't change.
* Variable names *can* start with underscores, but this often has a special meaning.

Now, to test your skills, find the incorrect variable names and what the problems are:
* `my_variable12.3`
* `-myVariableName2`
* `@myVariable`
* `my-variable&`
* `my+variable`
* `23variable`
* `my_variable2`
* `myVariable`
* `myV#ariable`
* `my_variaBle_32`
* `import`

#### Variable Assignment

The way that we assign a variable is easy. We just use the `=` sign. That's it. We can also change the value of a variable by just assigning a new value using the equal sign (and so, the value **varies**).

Now, let's do a few examples of variable assignment. Here, we'll make use of the `print` function to track the value of the variables.

1. **Assignment and re-assignment**  
 Let's create a variable called `my_variable` with the value `42` and then re-assign it to have the value `16`.

In [None]:
# Put your code here
my_variable = ...
print("The value of my_variable is:", my_variable)

my_variable = ...
print("The value of my_variable is:", my_variable)

2. **Assignment with Mathematical operations on variables**  
Not only can we use mathematical operations, but we can also store mathematical results and even use the variables in the calculation!

In [None]:
my_variable = 35
print("The value of my variable is", my_variable)

# Put your code here to multiply `my_variable` by 2
my_variable = ...
print("The value of my variable is", my_variable)

For some of these operations, we have a shortcut so that we don't have to rewrite the variable name twice. For each operation, we can use a new assignment operator:
* We replace assignment and `+` with `+=`
* We replace assignment and `-` with `-=`
* We replace assignment and `*` with `*=`
* We replace assignment and `/` with `/=`
* We replace assignment and `**` with `**=`
* We replace assignment and `%` with `%=`
* We replace assignment and `//` with `//=`

So, we can rewrite the last example we did:

In [None]:
my_variable = 35
print("The value of my variable is", my_variable)

# Put your code here to multiply `my_variable` by 2
my_variable = ...
print("The value of my variable is", my_variable)

### Numbers and Comparisons

Different types of data are stored and represented differently by Python. We've mostly been working with numbers, either `int`s or `float`s, but there are other types of data that we can store in Python variables (see [here](https://www.datacamp.com/tutorial/data-structures-python) for more details and discussion):
* Integer numbers (`int`)
* Floating-point decimal numbers (`float`)
* Boolean values (`bool`) - `True` or `False`
* Text strings (`str`) -- next section
* Collection types -- coming up later

The first four are basic, or "primitive" data types (see [here](https://www.datacamp.com/tutorial/data-structures-python)). For now, we'll focus on the first three, but we'll get to the others later...

### Integers and Floating-Point numbers

We've already seen `int`s and `float`s. An `int` is a whole number and a `float` is a decimal number. There are many mathematical operations that can be performed on `int`s and `float`s. For more information on the numeric types in Python, see [this page](https://docs.python.org/3/library/stdtypes.html#typesnumeric) from the official Python documentation.

### Booleans

A **boolean** represents a value that is either `True` or `False`. In this section, we'll see how to generate them, and then we'll see fun things we can do with them!

#### Comparisons

Think back to when you were starting to learn math... What was one of the first things they taught you? For me, it was **comparisons** and **inequalities**. We had two numbers, and we had to put the correct sign, `>,<,=` in between (some of you were maybe also told to think of a crocodile opening its mouth to the bigger number...).

Well, this is an important idea in programming too! We can use the following operations to generate boolean values. Let's say that `a` and `b` are both numbers (either `int`s or `float`s):
* `a > b` -- **greater than**, evaluates to `True` if `a` is bigger than `b`, otherwise evaluates to `False`
* `a >= b`-- **greater than or equal to**
* `a < b` -- **less than**
* `a <= b` -- **less than or equal to**
* `a == b` -- **equal** -- ***NOTE:*** there are ***TWO*** equal signs!!!!!
* `a != b` -- **not equal**

Again, I want to emphasize that for the equals comparison, you must must must put two equal signs `==`! Otherwise, Python will think you're trying to assign a variable and it will get mad at you and give you an error!

Also, for `>=` and `<=`, the order of the two signs matters! Do **NOT** write `=>` or `=<`! If you forget, remember that the order is the same as we read it. **Less that or equal to** is first *less than*, so `<` and then *equal to*, so `=`, so the order is `<=`.

Now, let's see some examples:

In [None]:
a = 92
b = 43

# Complete these lines: # Your code here
print("a is greater than b:", ...)
print("a is less than b:", ...)
print("a is equal to b:", ...)
print("a is not equal to b:", ...)
print("a is greater than or equal to b:", ...)
print("a is less than or equal to b:", ...)

Feel free to change the values of `a` and `b` and see how the output changes!

#### Boolean Operations

We've seen how to generate booleans using numbers. We can also perform operations on booleans to get... more booleans! These three operations are *logical operations*:
* `and`
* `or`
* `not`

#### The `and` operation
The `and` operation takes **two** boolean values `a` and `b`. If **both** `a` and `b` are `True`, then `a and b` is also `True`. Otherwise, `a and b` is `False`. People coming from other programming languages may know `and` as `&&` or `&`. We can represent this operation using a **truth table**:

| `a` | `b` | `a and b` |
| --- | --- | --- |
| `False` | `False` | `False` |
| `False` | `True` | `False` |
| `True` | `False` | `False` |
| `True` | `True` | `True` |

Let's also see some examples:


In [None]:
a = True
b = False

print("The value of a and b is:", a and b)

In [None]:
# Your code here

a = 4
b = 5
c = 6

print("The value of (a < b) and (c > b) is:", ...)

Let's think about that last example: we have `a=4`, `b=5`, `c=6`. We're looking at the logical expression
```python
a < b and c > b
```

So, we start by breaking it up into the two parts:
* `a < b`
* `c > b`

Now, we look at each part separately:
* `a < b`: well, we have `a=4` and `b=5`, so we have `4 < 5`, which is `True`
* `c > b`: we have `c=6` and `b=5`, so we have `6 > 5`, which is `True`

Now, we can put these two back together: for `a < b and c > b` both the left and the right are `True`, which makes the whole expression `True`!

#### The `or` operation

The `or` operation also takes **two** boolean values `a` and `b`, but it evaluates to `True` if **at least one** of `a` or `b` is `True`. If both values are `False`, then `a or b` is `False`. Otherwise, `a or b` is `True`. In other programming languages, the `or` operation is represented as `a || b` or `a | b`.

To help visualise, here's the truth table:

| `a` | `b` | `a or b` |
| --- | --- | --- |
| `False` | `False` | `False` |
| `False` | `True` | `True` |
| `True` | `False` | `True` |
| `True` | `True` | `True` |

Now, let's do some examples

In [None]:
a = True
b = False

print("The value of a or b is:", a or b)

In [None]:
# Now, for a numeric example:
a = 5
b = 6
c = 7

# Your code here
print("The value of a > b or c > b is:", ...)

Let's go through that last example. We have `a=5`, `b=6`, `c=7`. Let's again break up our expression into two parts:
* `a > b`
* `c > b`

Let's look at each one:
* `a > b` --> `5 > 6` --> `False`
* `c > b` --> `7 > 6` --> `True`

Since at least one of the two boolean values is `True`, then `a > b or c > b` is `True`.

#### The `not` operation

The `not` operation only takes in **one** boolean value `a` and flips its value. If `a` is `True`, then `not a` is `False` and if `a` is `False`, then `not a` is `True`. In other languages, it may be represented by `!a` or `~a`.

Here's the truth table:

| `a` | `not a` |
| --- | --- |
|`False` | `True`|
|`True` | `False` |

And here are a couple of examples:

In [None]:
a = True

print("The value of not a is:", not a)

In [None]:
a = 6
b = 8

print("The value of not a > b is:", ...)

For the last example, let's look a bit more closely. We have `a=6` and `b=8`.

The value of `a > b` is `6 > 8`, which is `False`. But the `not` operation flips this from `False` to `True`.

Now that we have a basic understanding of booleans, let's see one of their most practical uses...

## Intro to Control Flow and Loops

So far, our code has just run line-by-line. Everything we've written has run. But, we have ways of making decisions and repeating certain lines. In this section, we'll see how to do this using:

* Control Flow
* `while` Loops
* `for` Loops 

### Control Flow: the `if` Statement

Let's say you're heading down to campus. You take the metro and get off at Peel. You get out at the corner of Peel and de Maisonneuve and look around. In your head, you're thinking, `if` Peel is open, I'll walk up there, otherwise (`else`), I'll go to Metcalfe. **Congratulations!!!** You've just done control flow!

Control flow is about **making decisions** using boolean values. The important keyword here is `if`. Our basic control flow has the structure:
```python
    if boolean_value:
        do_something

    some_other_code_here...
```

Here are a few things to note:
* there is a **colon** (:) after the boolean value.
* the line `do_something` only runs if the `boolean_value` evaluates to `True`. 
* the line `do_something` is **indented**. In other languages, you might be used to curly brackets. Python **DOES NOT** use these. In Python, different blocks of code are indented. Also, note that in Python, we don't need to write `end` when we're done! It's enough to stop indenting.
* the line `some_other_code_here` runs *regardless* of whether the `boolean_value` is `True`. We can tell because it's **not** indented.

Let's see an example to help illustrate.

In [None]:
# Your code here


Try changing the variable `peel_is_closed` to `False` and see what happens...

You may have noticed that the lines under the `if` statement are indented. That tells Python that they will only run when the `if` condition is met. The lines underneath that aren't indented tell Python that they run no matter what.

You may have also noticed that I didn't write:
```python
if peel_is_closed == True
```

This isn't necessary, since we already have a boolean. Putting in the extra comparison makes our code less clean. Also, just try reading the code like it's a sentence. It even sounds like a conversation:
"If Peel is closed, [print] I'm taking Metcalfe".

Now, we can also replace the boolean with one of the comparisons we have above...

In [None]:
# Freezing point example: Your code here

In this example, we put an expression that evaluates to a boolean after the `if`. Try setting the value of `current_temperature` to be above zero and see what happens.

In the last example, it would've been nice if we could print a different message if we were above freezing... or if we're in some different temperature range. Well, we can do this with `elif` clauses and a final `else` clause! We can extend the structure do be:

```python
    if some_boolean:
        do_something
    elif some_other_boolean:
        do_something_else
    elif yet_another_boolean:
        do_another_something_else
    ...
    else:
        all_else_has_failed_so_lets_do_this
```

So, if the `some_boolean` is `True`, then the line `do_something` runs. If it isn't `True`, then we test to see if `some_other_boolean` is `True`. If it is, then we run `do_something_else`. Otherwise, we keep going down the list of conditions until one of them is `True`. If all conditions are `False`, then the code under `else` runs.

**Notes:**
* There is no limit to the number of `elif` clauses you can have. You can have zero, one, or as many as you want.
* There is no requirement to add an `else` clause. You can lots of `elif` clauses without a final `else`.
* You can only have at most **one** `else` clause.

Now, for practice, let's write code that again takes a temperature, and this time tells you specifically if you are:
* below freezing
* at freezing
* above freezing

In [None]:
# Freezing point example: Your code here

current_temperature = -250


### `while` loops

So, control flow is great for choosing which lines of code to run, but what if we want to run a line more than once? To do this, we can use **loops**. There are two main kinds of loops in Python:
* `while` loops
* `for` loops

They are similar, but `for` loops run for a predetermined number of times and `while` loops run for an arbitrary number of iterations. We'll start with `while` loops.

Syntax:

```python
    while some_boolean:
        do_some_code
    
    code_after_loop...
```

Now, you'll pretty much **NEVER** want to put a raw boolean value in the `while`. You'll instead want to use some sort of operation that returns a boolean. This operation usually involves a variable that you update in the loop. Again, notice the indent!

Sticking with our temperature theme... Let's write an example where the temperature starts at -15 and increases by 2° at every iteration. We then print a message saying that we're either at or above freezing:

In [None]:
current_temperature = -15

# Your code here


### `for` loops

`for` loops are a bit simpler, since they involve running for a pre-determined number of times. To use a `for` loop, we need something to iterate over. One basic iterable uses the `range` function.

The `range` function takes up to three arguments:
```python
    range(a,b,c)
```

This function gives you all the numbers going from `a` up to but excluding `b`, skipping by `c`. If you leave out the last argument, it will give you every number from `a` to `b`. If you only give one argument, it will give you every number from 0 to that number (excluding it).

Here is the `for` loop syntax:
```python

    for var_name in iterable:
        some_code
    
    code_when_finished

```

At each step, we get a new value stored in `var_name`.

Let's see an example where we're calculating the squares of all numbers between 1 and 10 (excluding 10):

In [None]:
# Your code here



Sometimes, you may want to interrupt a loop early, or skip one iteration. For this, we have the keywords `break` and `continue`.

We use `break` if we want to stop going through a loop. For example, let's say we are using a `for` loop to calculate squares, but we don't want to go above 50:

In [None]:
# Your code here


For an example with `continue`, let's say we want to compute squares of all numbers and cubes of factors of three:

These two keywords can also be used in `while` loops.

## Exercise: Temperature Conversions

We have reached the end of this module!!!

Here's a mini-project to work on based on what we saw this module:

In the United States, the temperature is commonly reported in Fahrenheit. But, here in Canada (and in much of the rest of the world), the temperature is recorded in Celsius. The conversion to Fahrenheit from Celsius is given by:
$$
    \textup{F} = \frac{9}{5}\textup{C} + 32
$$

(P.S. if you ever forget, easy way to remember: the relationship is linear -- the lines intersect at -40 -- and we know that water freezes at 32°F and 0°C and boils at 212°F and 100°C; with any two of these three points, you can definitely find the line).

Find the temperature in Fahrenheit for all Celsius temperatures from $-40^\circ \textup{C}$ to $+35^\circ \textup{C}$ (inclusively), incrementing by $5^\circ$.

**BONUS:** Write this code twice: once using a `for` loop and once using a `while` loop.

In [None]:
# Put your code here...


### BONUS: Replacing `for` Loops with `while` Loops

Any time that you use a `for` loop, you can actually use a `while` loop instead. It's just not always as nice and clean:

In [None]:
# Done using a `for` loop
for i in range(10):
    print("The value of i is now", i)
    # print("The operation 2 * i gives us:", 2 * i)


# Done using a `while` loop.
i = 0

while i < 10:
    print("The value of i is now", i)
    i += 1

# Module 3 - Text Strings

A **string** is a sequence of text characters, surrounded by quotation marks. We saw an example above when we wrote the "Hello, World!" program. We can use either single quotes or double quotes:

In [None]:
print("This is a string")
print('This is also a string')

We can also use triple-quotes to have a longer string that has line breaks in it.

In [None]:
print("""
This is a much longer string.

It spans multiple lines.

Look at all this text.
""")

These types of strings will be useful later on. It's very very very important that you remember the quotation marks! Otherwise, Python will think you're talking about variables:

In [None]:
# This line produces an error
# print(Not a valid string!)

In [None]:
print("Valid string!")

There's lots of stuff that we can do with these strings. Let's discuss a few operations on strings.

First, we can store a string in a variable! For example:

In [None]:
# Your code here to store a string in a variable


Notice that we didn't have to put the quotation marks around `my_string`. Python knows that it's a string and just outputs the value.

Printing strings is great, but we want to actually process them. We can use the `len` function to get the number of characters in a string:

In [None]:
# Your code here
string_length = ...
print("The length of my string is", string_length, "characters")

**NOTE**: Those of you who have learned Java have undoubtedly seen that you can't compare strings with the `==` operator. Well, good news! In Python, you **CAN**.

In [None]:
string1 = "Hello"
string2 = str("Hello")

print("Checking string equality:", string1 == string2)

Even though we made a new `str` object, the equality still holds!

## String Slicing

We can also access individual characters or substrings using the **bracket operator** `[]`. But first, we need to talk about **indexing**. In a Python string, every character has a numbered position. It's **extremely** important to remember that in Python, the first position is indexed with the number **0**.

Again, I'll repeat that...

***The first character in a Python string has index 0.***

So, you can also figure out that the last character in a string with *n* characters has index *n-1*, **not** *n*.

This diagram should help clarify it:

![string indexing](assets/StringIndexingPositive.png)

Note that blank spaces are counted! To get the character at an index, stored in variable `i`, we'd write the following:

```python
character_of_interest = my_string[i]
```

To get a substring starting at index `i` and going to the character at index `j` (**excluding** that character), we write:
```python
my_substring = my_string[i:j]
```

If we omit `i`, then we get everything from the beginning up to (but **excluding**) `j`. If we omit `j`, then we get the substring starting at index `i`.

We can even skip every `k` characters by adding a third number:
```python
my_substring = my_string[i:j:k]
```

Now, let's see some examples of string indexing and taking substrings. In Python, this process is commonly referred to as *slicing*.

In [None]:
my_string = "my string text"

# Your code here

# Let's look at single characters
print("The first character in the string is:", ...)
print("The last character in the string is:", ...)

# Now, let's look at substrings
print("The substring from index 3 to index 12 is:", ...)

# Now, let's skip a few characters
print("The substring from 5 to the end, skipping every 2 is:", ...)

Python also has a great feature where we can use **negative** indices! The last character has an index of -1 and the values go back to -n, where n is the length of the string. Here's an updated diagram:

![Negative indices](assets/StringIndexingNegative.png)

Now, it's your turn! Let's do some string indexing with negative indices. **Note:** We *can* combine positive and negative indices.

In [None]:
# Reproduce the above strings using negative indexing where convenient
my_string = "my string text"

# Your code here

# Let's look at single characters
print("The first character in the string is:", ...)
print("The last character in the string is:", ...)

# Now, let's look at substrings
print("The substring from index 3 to index 12 is:",  ...)
print("The substring from the beginning to index 6 is:", ...)
print("The substring from index 7 to the end is:", ...)

One last note on string slicing and indexing: Strings are **immutable**, meaning that you can't change any of the individual characters or substrings. You can create a new string using existing strings, but you **cannot** change the content of a string.

In [None]:
# This code produces an error:
# my_string[3] = 'b'

## String Operations and Methods

### Concatenation and Formatting
A common operation on strings is **concatenation**, or combining strings. We can combine strings with the `+` sign:

In [None]:
string_1 = "Hello,"
string_2 = "World!"

# Your code here
concatenated_string = ...

print("Concatenated string is:", concatenated_string)

This example shows something very important! Concatenation does **NOT** add in any spaces. It just takes the two strings and combines them together. If you want there to be spaces, you need to make sure to add them in!

Also, concatenation only works on **strings**! Let's look at this example:

In [None]:
string_1 = "The meaning of life, the universe and everything is "
meaning_of_life = 42

# This gives an error!
# print(string_1 + meaning_of_life)

This is very important to remember if you know JavaScript! Running this gives us an error! We can't concatenate an integer and a string. If we want to add the two together, we **must convert the `int` to a string** using `str`:

In [None]:
string_1 = "The meaning of life, the universe and everything is "
meaning_of_life = 42

# Your code here
complete_sentence = ...

print(complete_sentence)

But, there's a shortcut using **string formatting**, or **f-strings**, which let you put a variable directly into a string:
```python
    my_formatted_string = f"The meaning of life, the universe and everything is... {meaning_of_life}"
```

Notice that there is an **f** before the opening quotation mark and that the variable goes in curly braces. This tool makes life **much** easier!

### Converting Strings to Numbers

Let's say, you've gotten some data from a file or the internet and it contains a number. You want to do some sort of mathematical operation on it... and you rush to Python and you do this:

```python
    my_number_from_file = "32.3"

    my_answer = 3 * my_number_from_file

    print("The answer to my computation is:", my_answer)
```

What do you think will print?

In [None]:
my_number_from_file = "32.3"

# Your code here to multiply by 3
my_answer = ...

print("The answer to my computation is:", my_answer)

The answer may surprise you. Depending on which operation you're doing, you'll either get:
* a complete nonsense answer
* an error

There's an important step that we need to do before we can do any mathematical operations: we must convert the strings to numeric types. This is very easy:
* To convert a string to a `float`, just call the `float()` function with the string as the argument.
* To convert a string to an `int`, just call the `int()` function with the string as the argument.

For example:

In [None]:
my_string_float = "32.3"
my_string_int = "41"

# Fill in the blanks to perform the type conversions
# Your code here

my_int = ...
my_float = ...

print("The product of 32.3 and 41 is:", my_float * my_int)

**Fun fact**: The `int` function can also be used on numbers that are not base-10!

### Finding a Substring
And now, for a string exercise! Remember that I said you can't change the contents of a string. Well, let's now create a new string that has a single character that is different. And, since this is an MiCM workshop, let's use DNA as an example.

In [None]:
dna_sequence = "AAGGACCTTAGAAGGGGACCATTATTAAATTCCCGCA"

There are more things that we can do with strings. In Python, strings are a type of **object**. An **object** is a grouping of variables, known as **attributes**, and functions, known as **methods** that all relate to one thing. String objects have various methods that we can use, or **call**, to do different things with the text contents. To call a method, we use the syntax

```python
    variable_name.method_name(arguments)
```

***This syntax will look quite familiar to anyone coming from Java or a C-based language. It may be a bit confusing for people coming from R or MATLAB. Remember, in Python, the dot `.` is NOT part of the variable name. It is an operator that lets us access functions and variables that belong to certain objects.***

We'll introduce functions formally later. But, remember from earlier that functions may take inputs, perform calculations, and then return outputs. Let's see a few examples of methods that we can use on strings.

For example, one method we can use on strings is `find`. Let's look at the documentation to see what this method does: https://docs.python.org/3/library/stdtypes.html#str.find

The `find` method looks for a specified substring within a whole string, or part of a string, and returns the index where it is located.

In [None]:
dna_sequence = "AAGGACCTTAGAAGGGGACCATTATTAAATTCCCGCA"

# Put in your code to find the index of the first T nucleotide


print("The first thymine nucleotide is located at index", index_of_first_t)
print(dna_sequence[index_of_first_t])

### Replacing Characters

Well, let's say we want to replace this `T` nucleotide with a `G` nucleotide. We can use another useful method: `replace`. As the name suggests, this method replaces specified characters or substrings with the provided new ones. It's documentation is [here](https://docs.python.org/3/library/stdtypes.html#str.replace).

The syntax is:
```python
    new_string = my_string.replace("old", "new", optional_count)
```

Let's go back to our DNA sequence and replace only the first `T` with `G`:

In [None]:
# Your code here
mutated_dna_sequence = ...

print("Our modified sequence is:", mutated_dna_sequence)

There are many more methods we can call for strings. To learn more, see the `str` reference on the Python documentation website (https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str).

## String Iteration and the `for` Loop

Remember, earlier we saw the `for` loop. Well, we can do fun things with the `for` loop in strings! We can iterate over each character in the string.

Here's the syntax:

```python
    for c in my_string:
        do_something
```

Here, `c` is a single character in the string. Let's see an example:

In [None]:
my_dna_sequence = "ACGGACAGGAGCGAGATTTGACAGCATTA"

number_of_purines = 0
number_of_pyrimidines = 0

# Your code here


print(f"In our sequence there are {number_of_purines} purines and {number_of_pyrimidines} pyrimidines.")

There's actually an easy way to clean up our boolean conditions. Instead of using string equality, we can check if the nucleotide is in a sequence using the `in` keyword:

In [None]:
my_dna_sequence = "ACGGACAGGAGCGAGATTTGACAGCATTA"

number_of_purines = 0
number_of_pyrimidines = 0

for nucleotide in my_dna_sequence:
    # Your code here to simplify
    ...

print(f"In our sequence there are {number_of_purines} purines and {number_of_pyrimidines} pyrimidines.")

## Exercise: DNA transcription and mRNA processing

Now that we're done discussing variables, numeric types and strings, let's do a few exercises!

1. Ahhh, the joys of transcription! Remember that DNA and RNA share *most* of their nucleotides, but they differ in one of the pyrimidines. DNA has thymine while RNA has uracil. I'm giving you the **non-template** strand DNA. Recall that the non-template strand is identical to the produced mRNA, with the exception that the thymine is replaced by uracil. 

Replace all the thymine nucleotides with uracil to get the result of transcription.

(a) The non-template strand is `AGCAGATGCATTAGCCATTAGTTTGCACCAGTATATGCAGAGTTTAGGAGACCATAATTAACGAGAGCCGATAGCTAGA`.

In [None]:
dna_sequence = "AGCAGATGCATTAGCCATTAGTTTGCACCAGTATATGCAGAGTTTAGGAGACCATAATTAACGAGAGCCGATAGCTAGA"

# Put your code here


(b) (Time Permitting) Now, let's say the **template** strand is `AGCAGATGCATTAGCCATTAGTTTGCACCAGTATATGCAGAGTTTAGGAGACCATAATTAACGAGAGCCGATAGCTAGA`. Now, you must find the complementary nucleotides to transcribe to mRNA.

**Remember:** we first need to find the complementary strand!

In [None]:
dna_sequence = "AGCAGATGCATTAGCCATTAGTTTGCACCAGTATATGCAGAGTTTAGGAGACCATAATTAACGAGAGCCGATAGCTAGA"

# Put your code here



2. In eukaryotes, mRNA must be processed before it is translated by the ribosome. This processing involves three steps:
* Capping
* Splicing
* Polyadenylation

We'll skip the capping, but let's now do some splicing! We won't deal with actual splice sites. Instead, let's say that there are introns at the following indices (start and end both in intron):
* Start at the ninth **nucleotide** and ending at the 17th nucleotide
* Start at 20 nucleotides from the end of the sequence and going until 12 from the end

Splice out these introns and stick the exons together.

In [None]:
# Put your splicing code here


Now, for polyadenylation, add a sequence of 15 `A` nucleotides. Hint: you can use the `*` operator to repeat a string!

In [None]:
# Polyadenylation code goes here


3. What is the maximum number of codons we could fit in a sequence with the same length as the original? How many nucleotides would be left over?

In [None]:
# Put your calculations here.
maximum_number_of_codons = ...
print(f"We can fit a maximum of {maximum_number_of_codons} codons in the original.")

remainder = ...
print(f"We would have {remainder} nucleotides left.")

# Module 4 - Collection Types

We've seen that we can store data in basic types, like strings, `int`s and `float`s. But, let's say we want to store many of these at a time. For example, let say we have 100 DNA sequences that we want to store and process? Well, for this we have **collection types**. In this section, we'll see three important collection types:
* Tuples
* Lists
* Dictionaries

For more information on tuples and lists, see [this page](https://docs.python.org/3/library/stdtypes.html#sequence-types-list-tuple-range) of the Python documentation. For more info about dictionaries, see [here](https://docs.python.org/3/library/stdtypes.html#mapping-types-dict).

## Tuples

A tuple is a way of packaging a fixed number of values together. The number of values can't be changed, and neither can the values themselves. Tuples are **immutable**, like strings. Remember, though, we can always assign a new tuple to the same variable. Tuples are represented using multiple values separated by commas within round brackets (parentheses) -- `()`.

In [None]:
# Your code here


In [None]:
# Your code here


In [None]:
# Your code here


### Accessing Elements
There are two different ways to access individual elements in a tuple:
* Slicing
* Unpacking

When working with tuples, **slicing** works the *exact same way* that it did with strings, described above.

Sorry to be pedantic and repetitive, but remember that **_INDEXING STARTS AT ZERO_**.

Fill in the following example to confirm that.

We have the tuple `(4, 5, "Hello", "World!", 12, True, 4.5)`.

1. Use slicing to isolate the sub-tuple containing "Hello" and "World!".
2. Use slicing to get the last two elements.

In [None]:
my_tuple = (4, 5, "Hello", "World!", 12, True, 4.5)

In [None]:
# Put your code here for question 1.
tuple_1 = ...
print("Question 1:", tuple_1)

In [None]:
# Put your code here for question 2.
tuple_2 = ...
print("Question 2:", tuple_2)

### Tuple Unpacking
**Unpacking** is a different process. Let's say we have a tuple with 2 elements in it. We can assign each one of these elements to a variable, like this:

In [None]:
my_point = (-3, 5)

# Your code here to assign x and y


print("The value of my point is:", my_point)
print("The value of x is:", x)
print("The value of y is:", y)

**NOTE:** You **MUST** have the same number of variables and the number of elements in the tuple. Otherwise, unpacking won't work and you'll get an error from Python.

Finally, like with strings, we can concatenate tuples using the `+` operation.

In [None]:
# Your code here to concatenate tuples

## Lists

List are more exciting than tuples. Lists are **mutable**! So, we can add entries to a list, remove entries from a list, and change the entries in a list. Lists are represented as comma-separated values in square brackets -- `[]`. Unlike tuples, we can't unpack lists. Lists also *usually* contain elements of the same or similar type (although they don't have to).

In [None]:
# Your code here
# Here's an example of a list


In [None]:
# Your code here for another example list


In [None]:
# Your code here for yet another example list


Now, I've told you all these great things that we can do with lists... but how do we do them?

### Length of a List
Well, let's start with the simplest thing... taking the **length** of a list. We do this in the exact same way that we took the length of a string! We use the `len` function.

In [None]:
# Your code here
print("My squares list from before had length:", ...)

### List Slicing

We can obtain individual items and sublists through *slicing*, exactly the same way that we did with strings and tuples.

Here's an exercise to test your skills with this...

I'm giving you this list: `[1, 1, 2, 3, 5, 8, 13, 21, 34]`

Using slicing, find:
* the last element
* the values `3, 5, 8`
* the values `1, 2, 5, 13`

In [None]:
my_list = [1, 1, 2, 3, 5, 8, 13, 21, 34]

# Your code here
print("The last element in the list is:", ...)
print("The sublist is:", ...)
print("The sublist is:", ...)

But, there's more that we can do with the slicing! We can now update values using the `=` sign! We can do this for both individual elements and for sublists!

Let's take this example: `[1, 2, 4, 9, 16, 32, 64, 129, 257]`

Any idea what this sequence is? There are three mistakes that we need to correct!

So... Where are the mistakes? How do we correct them?

In [None]:
# Here is our error-filled list:
powers_of_two = [1, 2, 4, 9, 16, 32, 64, 129, 257]

# Your code here to correct


print("The corrected list is:", powers_of_two)

### Adding Elements

Now for the fun part! Let's insert new items! Remember that the list is **mutable**, so when we add new items, we are actually changing the list. We are **not** creating a new list. To change the list, we use **methods** from the list object.

Let's start with adding a new item at the **end** of the list. This process is known as *appending* to a list. So, naturally, the method to do this is called `append`:

In [None]:
# Example using our powers of two
# Your code here to continue the list

print("Powers of two is now:", powers_of_two)

We can also insert at any index `i` using the method called... `insert`! This method takes **two** arguments: the index `i` *before which* we want to insert the new element and the new element that we want to insert. 

***NOTE:*** You must respect this order of arguments.

Here's an example:

In [None]:
days_of_the_week = ["Sunday", "Tuesday", "Wednesday", "Thursday", "Saturday"]

# Your code here to add Monday in the correct spot


# Your code here to add Friday in the right spot (hint: negative indexing)


print(f"The {len(days_of_the_week)} days of the week are: {days_of_the_week}")

### Removing Elements

Sometimes, we want to delete elements from a list. There are a few ways to do this:
- using the `del` keyword
- using an assignment
- using the `pop` method
- using the `clear` method

Here are the details:
* The `del` keyword can be used to get rid of single elements or a range. `del` is **not** a function, so we **don't** use brackets. 
* To remove a range, we can alternatively just use slicing and assign an empty list to the desired range (see [here](https://docs.python.org/3/library/stdtypes.html#mutable-sequence-types)).
* We can use the `pop` method without an argument to remove the last item from a list, or with an index as argument to remove the item at index `i`. The `pop` method returns the removed element, so it can be stored in a variable.
* We can use the `clear` method to remove **all** items from a list.

In [None]:
test_list = [2, 3, 5, 7, 9, 11, 13, 17, 19, 23]

# Your code here

# Get index of 9
my_index = ...

# Remove the number which doesn't belong using `del`...


print("Test list is now:", test_list)

In [None]:
test_list = [2, 3, 5, 7, 9, 11, 13, 17, 19, 23]

# Remove the number which doesn't belong using `pop`...


print("Test list is now:", test_list, "since we removed item:", removed_element)

In [None]:
test_list_2 = [1, 2, 2, 1, 2, 3, 1, 2, 2, 1, 1, 2, 2]

# Remove the numbers that disrupt the pattern using `del`.


print("Test list 2 is now:", test_list_2)

In [None]:
test_list_2 = [1, 2, 2, 1, 2, 3, 1, 2, 2, 1, 1, 2, 2]

# Remove the numbers that disrupt the pattern using assignment.


print("Test list 2 is now:", test_list_2)

In [None]:
# Remove all elements using `clear`


print("The test list 2 is now:", test_list_2)

### List Concatenation

One last operation: lists can be concatenated using the `+` operator. Remember that **both** the left and the right must be lists! You can't add a number to a list by concatenation! You must first embed it into a list.

In [None]:
# Your code here to define list_a and list_b and concatenate the two lists


print("The joined list is:", joined_list)

In [None]:
# Your code here to add 6 to the end of list_a


print("Modified list a is:", list_a)

### List Iteration

Remember how we went through each character in a string? Well, we can do the exact same thing with a list!

```python
    for item in my_list:
        do_something...
```

Here's an example:

In [None]:
my_list = [2, 4, 6, 5, 8, 7, 1, 3, 5, 7, 8, 9, 10, 22, 11, 95]

# Your code here to extract the even and odd numbers from the sequence

even_numbers = []
odd_numbers = []



print(f"Our list has {len(even_numbers)} even numbers and {len(odd_numbers)} odd numbers.")

Now, let's say we want to get the index of the element... Well, we can use the `enumerate` function. This returns a tuple containing the index and the item from the list.

In [None]:
my_list = [2, 4, 6, 5, 8, 7, 1, 3, 5, 7, 8, 9, 10, 22, 11, 95]

number_of_even = 0
number_of_odd = 0

last_even_index = -1
last_odd_index = -1

# Your code here to extract the number of odd and even and get the final indices of each



print("Our list has", number_of_even, "even numbers and", number_of_odd, "odd numbers.")
print("The last even number was at index", last_even_index, "and the last odd number was at index", last_odd_index)


### List Exercise

Now, time to practice lists! Let's take a string of RNA and turn it into a list of codons. At the end, print the number of codons.

In [None]:
my_rna = "AGCAGCAUGACCGAGUCAGUCAGCUUGCGGCUACGUACUGGCCAUUAGCAGUAC"

# Your code here

In [None]:
my_rna = "AGCAGCAUGACCGAGUCAGUCAGCUUGCGGCUACGUACUGGCCAUUAGCAGUAC"

# Here are a few hints ...

# 1. Create an empty codon list
my_codons = ...

# 2. Find the start codon
start_codon_index = ...

# 3. Iterate over the string
for i in range(..., ..., ...):
    # 4. Get the codon...
    new_codon = ...

    # 4. Add codon to list
    

print("We found", len(my_codons), "codons")
print(my_codons)

### BONUS CONTENT: Quirk of Lists... Passing by Reference

Lists are big and complicated objects. So, when you assign multiple variables to the same list, they all actually refer to the same list in memory. Let me illustrate with an example:

In [None]:
list_a = [1,2,3,4]

list_b = list_a

print("List A is:", list_a)
print("List B is:", list_b)

list_a.pop()

print("List A is:", list_a)
print("List B is:", list_b)

Wait! What!!?!?!?!??! So, we expected list A to be missing the last value, but we weren't expecting it to disappear from list B also! (Anyone who has any experience in C or C++ should be laughing now...)

So, we need to be careful when working with lists. We may want to create a copy before doing anything destructive. Thankfully, there are two ways of copying a list!
* The `copy` method
* Slicing! (No, it's not the answer to everything, but it is useful!)

The copy method is very easy... I'll show it in the example. What about slicing? Well, remember that we had a shortcut for getting everything up until an index: `my_list[:i]`. We also had a shortcut for getting a sublist for everything after an index `i`: `my_list[i:]`. Well, let's just forget the index! We can copy a list simply by writing `my_list[:]`.

In [None]:
# Let's try this again...
list_a = [1,2,3,4]

list_b = list_a.copy() # PUT IN CODE HERE!

print("List A is:", list_a)
print("List B is:", list_b)

list_a.pop()

print("List A is:", list_a)
print("List B is:", list_b)

In [None]:
# And now using slicing...
list_a = [1,2,3,4]

list_b = list_a[:]

print("List A is:", list_a)
print("List B is:", list_b)

list_a.pop()

print("List A is:", list_a)
print("List B is:", list_b)

Hey! This works!!! The two lists aren't tied together anymore!

## Dictionaries

So... How many of you can remember using a paper dictionary? What's the idea behind them?

### Key-Value Storage

Well, we're not going to be defining words... but think about the **structure** of a dictionary. You look up a word and you get an associated piece of information, a definition. Let's call the word a **key** and the associated information a **value**. A **dictionary** is a collection that stores **Key-Value** pairs.

Now, for the syntax... Well, tuples involved round brackets, and lists involved square brackets... so it's only natural that the syntax for dictionaries uses curly brackets, or brace brackets `{}`. But, there's another twist here. 

We need both keys and values! The **values** can be any type, but the **keys** must be **immutable**. So, the keys can be numbers, tuples or strings (or booleans, I guess, but that may not be useful), but they **cannot** be lists. In addition, keys **cannot** be duplicated, but values can. If you try to duplicate a key, only one of the values is kept.

In [None]:
# Your code here: dictionary example for exam_scores


Note: the keys and the values don't have to be in order. I just put them in order for the example.

Now, there are lots of operations that we can do on dictionaries!

### Accessing and Modifying Dictionary Entries

Recall that in strings, tuples and lists we used the square brackets `[]` for indexing. We're still going to use them here, but instead of using a *numeric* index, we put a key in the brackets instead. We can then perform our usual operations of retrieving and replacing values.

In [None]:
# Your code here to access Alex's score and store it in alex_score

print("Alex has a score of", alex_score, "on the exam!")

# Your code here to modify Dorothy's score


print("Exam scores are now:", exam_scores)


### Adding Keys

Adding new elements to a dictionary is easy! We just need the new key and the new value, and then we write:
```python
    my_dictionary[new_key] = new_value
```

For example:

In [None]:
# Your code here to add exam score of 87 for Evan


print("Exam scores are now:", exam_scores)

### Removing Entries

To remove an entry, we can again use the `del` keyword, or we can use `pop`. Like with lists, `pop` gives us the value that we removed in a variable.

In [None]:
# Your code here to remove Evan's score and store it in evan_score


print("Evan's score was:", evan_score)

print("Our dictionary is now:", exam_scores)

### Other Operations

Much of the expected behaviour of dictionaries is similar to lists. There are a few methods that are exclusively used by dictionaries:
* The `keys` method returns the keys in the dictionary.
* The `values` method returns the values in the dictionary.
* The `items` method returns tuples containing `(key, value)` pairs.
* The `update` method can be used for combining dictionaries (Concatenation doesn't work!).

In [None]:
my_keys = ...
my_values = ...
my_items = ...

print("The keys are:", my_keys)
print("The values are:", my_values)
print("The items are:", my_items)

In [None]:
new_scores = {
    "Benjamin": 84,
    "Alan": 93,
}

# Your code here to update the dictionary



print("New scores are:", exam_scores)

### Dictionary Iteration

To do things with all data stored in the dictionary, we don't usually iterate over indices. Instead, we can iterate over the keys, or the values, or the `items` which contain both. To iterate over the keys, we can just do the following:

```python
    for k in my_dictionary:
        do_something
```

As an example, let's find the average of our exam scores from above:

In [None]:
exam_scores = {
    "Alex"  : 96,
    "Beverly": 89,
    "Cameron": 75,
    "Dorothy": 62
}

# Your code here to compute the average exam score and store it in average_score


print("The average exam score is", average_score)

## Exercise: Translation

So... We made a list of codons before. Now, let's take it a step farther. In this exercise, we will write code to translate the mRNA to proteins. I'll provide you with a codon table... but backwards! You need to start by creating the table that goes from codon to amino acid. Codon table from here: https://en.wikipedia.org/wiki/DNA_and_RNA_codon_tables.

**Recall:** Your list of codons from the DNA sequence earlier should still be in the variable `my_codons`.

In [None]:
amino_acid_to_codon_table = {
    "F": ["UUU", "UUC"],
    "L": ["UUA", "UUG", "CUU", "CUC", "CUA", "CUG"],
    "I": ["AUU", "AUC", "AUA"],
    "M": ["AUG"],
    "V": ["GUU", "GUC", "GUA", "GUG"],
    "S": ["UCU", "UCC", "UCA", "UCG", "AGU", "AGC"],
    "P": ["CCU", "CCC", "CCA", "CCG"],
    "T": ["ACU", "ACC", "ACA", "ACG"],
    "A": ["GCU", "GCC", "GCA", "GCG"],
    "Y": ["UAU", "UAC"],
    "STOP": ["UAA", "UAG", "UGA"],
    "H": ["CAU", "CAC"],
    "Q": ["CAA", "CAG"],
    "N": ["AAU", "AAC"],
    "K": ["AAA", "AAG"],
    "D": ["GAU", "GAC"],
    "E": ["GAA", "GAG"],
    "C": ["UGU", "UGC"],
    "W": ["UGG"],
    "R": ["CGU", "CGC", "CGA", "CGG", "AGA", "AGG"],
    "G": ["GGU", "GGC", "GGA", "GGG"]
}
# Your code here

# Start by creating a new dictionary where the codons are the keys


# Use iteration to create the opposite table: Codons to Amino Acids


# Perform the translation


print("Our protein has amino acid sequence:", ...)

### BONUS CONTENT: Copying Dictionaries

Similar to lists, dictionaries are passed by reference. To be able to make a change in one variable without affecting the original data, we must make a copy of the dictionary.

In [None]:
my_dictionary = {
    "Python": 3,
    "Swift": 5,
    "PHP": 8,
    "HTML": 5
}

other_dictionary = my_dictionary

other_dictionary["Dart"] = 2

print("Original dictionary:", my_dictionary)
print("Second dictionary:", other_dictionary)


In [None]:
my_dictionary = {
    "Python": 3,
    "Swift": 5,
    "PHP": 8,
    "HTML": 5
}

other_dictionary = my_dictionary.copy()

other_dictionary["Dart"] = 2

print("Original dictionary:", my_dictionary)
print("Second dictionary:", other_dictionary)


# Module 5 - Functions

So far, we've run bits of code, had some decision points, and used a few loops. But, every time we want to run our code on a new input, we have to copy the entire block of code. It would be nice if we could define reusable chunks of code that we can call on any input...

Good news! We can!

## What is a Function?

A **function** takes **inputs**, does some sort of operation on them, and produces **outputs**. People often represent functions as a little black box.

![black box](assets/Function.png)

The following is the syntax for defining a function in Python:

```python
    def function_name(parameters):
        do_stuff
        ...
        return some_output
```

There are a few important details for us to notice:
* The function definition starts with the `def` keyword.
* The function parameters are given as a comma-separated list in parentheses. As we'll discuss, parameters are optional.
* The function **body** is indented. We'll discuss the function body more later.
* If the function returns output, the last line begins with the `return` keyword.

Now, we also need to discuss how to **call** a function. **Calling** a function refers to running it on a certain set of arguments. The syntax to call a function and store its result in a variable `x` is as follows:

```python
    x = function_name(arguments_here)
```

We'll now discuss function definitions in more detail.

### Function Names

Remember how we spent all that time talking about variable names? Good news! The same rules apply to function names! Usually, though, the function name should say something about what the function **does**.

### Simple Function

Let's start with a trivial example:

In [None]:
# Here, we define a useless function
# Your code here



You may be wondering why there is no output from the previous cell. Well, it's important to remember that this was the function **definition**. We were just ***defining*** the function, but we weren't ***calling*** it.

It's very important to remember that ***DEFINING A FUNCTION DOESN'T RUN THE FUNCTION***.

To run the function, we actually call the function, like so:

In [None]:
# Your code here to call the function that does nothing


## Function Parameters and Return Values

It's all fun to make a function that does nothing... but usually we want functions to take some inputs, process them and produce an output that we can use later. To add parameters to a function, we put in variables between the parentheses. To get a result from the function, we use a `return` statement.

```python
    def my_function_with_parameters(a, b, c):
        ...

        result = do_something...

        return result
```

Then, when calling the function, we must provide the arguments and we can store the result in a new variable:

```python
    my_a = ...
    my_b = ...
    my_c = ...

    my_result = my_function_with_parameters(my_a, my_b, my_c)
```

The function result is stored in the new variable called `my_result`. We can then do fun things with the result.

**Note:** to access a variable that you've defined inside a function from outside, you **must** use a return statement. Functions define their own *scope* and the variables defined in the function only live inside the function and **cannot** be accessed later.

Here's an example to help illustrate:

In [None]:
# Your code here to define a function `pyrimidine_counter` that counts the number of pyrimidines in a nucleotide sequence `seq` of type `seq_type`.



The function we've defined takes two arguments:
* `seq` - a nucleotide sequence
* `seq_type` - a variable that says if the sequence is DNA or RNA

The function also returns a number, `pyrimidine_number`. When you call the function, you can call the variable that you are storing the result in *anything* (well, at least any legal variable name). You don't have to use the same name as in the function.

Let's look at an example of calling the function:

In [None]:
# Your code here to call our pyrimidine counter on DNA and RNA sequences
my_dna_sequence = "CAGCTGCTAGTCGTAGCGATCGTAGCTGCTAGCGTATCGATGCTAGCTAGCTAGCTTGCTAGCTGACTAG"


print(f"The DNA sequence has {pyrimidine_count} pyrimidines.")

my_rna_sequence = "CAUGCAUGCAUGCUAGCUAGCUGACUUUAGCAGCAGCUAGCUAGCUGACGAGGAUGGAGAGGGAGGGA"


print(f"The RNA sequence has {pyrimidine_count} pyrimidines.")

See how easily we were able to run the same code on two different sets of arguments?

Now, what if we wanted to give a default value for a variable? We can do this using **keyword arguments**. After the variable name, we put `=` and then the default value.

**Important:** the **keyword** arguments must go **after** the *positional* arguments (the ones without default values).

In [None]:
def pyrimidine_counter(seq, seq_type="DNA"):
    pyrimidine_number = 0
    
    if seq_type == "RNA":
        pyrimidines = "CU"
    else:
        pyrimidines = "CT"

    for nt in seq:
        if nt in pyrimidines:
            pyrimidine_number += 1

    return pyrimidine_number

The following function calls are **all equivalent**:

In [None]:
pyrimidine_count = pyrimidine_counter("CAGCTGCTAGTCGTAGCGATCGTAGCTGCTAGCGTATCGATGCTAGCTAGCTAGCTTGCTAGCTGACTAG")
print(f"Our sequence has {pyrimidine_count} pyrimidines.")

pyrimidine_count = pyrimidine_counter("CAGCTGCTAGTCGTAGCGATCGTAGCTGCTAGCGTATCGATGCTAGCTAGCTAGCTTGCTAGCTGACTAG", "DNA")
print(f"Our sequence has {pyrimidine_count} pyrimidines.")

pyrimidine_count = pyrimidine_counter("CAGCTGCTAGTCGTAGCGATCGTAGCTGCTAGCGTATCGATGCTAGCTAGCTAGCTTGCTAGCTGACTAG", seq_type="DNA")
print(f"Our sequence has {pyrimidine_count} pyrimidines.")

pyrimidine_count = pyrimidine_counter(seq="CAGCTGCTAGTCGTAGCGATCGTAGCTGCTAGCGTATCGATGCTAGCTAGCTAGCTTGCTAGCTGACTAG")
print(f"Our sequence has {pyrimidine_count} pyrimidines.")

pyrimidine_count = pyrimidine_counter(seq="CAGCTGCTAGTCGTAGCGATCGTAGCTGCTAGCGTATCGATGCTAGCTAGCTAGCTTGCTAGCTGACTAG", seq_type="DNA")
print(f"Our sequence has {pyrimidine_count} pyrimidines.")

pyrimidine_count = pyrimidine_counter(seq_type="DNA", seq="CAGCTGCTAGTCGTAGCGATCGTAGCTGCTAGCGTATCGATGCTAGCTAGCTAGCTTGCTAGCTGACTAG")
print(f"Our sequence has {pyrimidine_count} pyrimidines.")


# Don't work:
#pyrimidine_counter("DNA", seq="CAGCTGCTAGTCGTAGCGATCGTAGCTGCTAGCGTATCGATGCTAGCTAGCTAGCTTGCTAGCTGACTAG")
#pyrimidine_counter(seq="CAGCTGCTAGTCGTAGCGATCGTAGCTGCTAGCGTATCGATGCTAGCTAGCTAGCTTGCTAGCTGACTAG","DNA")

Note that you can even put keywords for the positional arguments. Keyword arguments can go in any order.

### Returning Multiple Values

We can also use **tuples** to return more than one value. Let's say we want to return the number of `A`, `T`, `G`, `C` in a sequence (without using a dictionary):

In [None]:
def count_nucleotides(seq):
    a_count = 0
    t_count = 0
    g_count = 0
    c_count = 0

    for nt in seq:
        if nt == "A":
            a_count += 1
            continue
        elif nt == "T":
            t_count += 1
            continue
        elif nt == "G":
            g_count += 1
            continue
        elif nt == "C":
            c_count += 1
    
    # Your code here to return a tuple of (A, T, G, C)
    return ...

**Note:** when returning a tuple, we don't need to surround it by brackets. It's enough just to separate the values by commas.

Let's call the function:

In [None]:
my_sequence = "CAGCTGCTAGTCGTAGCGATCGTAGCTGCTAGCGTATCGATGCTAGCTAGCTAGCTTGCTAGCTGACTAG"

# Your code here to use tuple unpacking to get the return value!


print("Our sequence has an A count of:", a)
print("Our sequence has an T count of:", t)
print("Our sequence has an C count of:", c)
print("Our sequence has an G count of:", g)


In this example, we see unpacking in action!

## Function Documentation

Let's say I give you this function name:

In [None]:
def find_amino_acid_properties(protein):
    ...

What does the return value look like? How do we know what we can do with this function if we don't have the entire function code? The answer: **documentation**!

Let's say a give you this instead:


In [None]:
def find_amino_acid_properties(protein):
    """
    Find the number of amino acids by property.

    This function takes in a peptide sequence and returns a dictionary with the number of
    amino acids that are:
    * non_polar
    * polar
    * basic
    * acidic
    """
    ...

Now, even though we don't know what the function itself looks like, we know that we can expect that it will return a dictionary with the keys:
* `'non_polar'`
* `'polar'`
* `'basic'`
* `'acidic'`

The big string right under the first line is called a **docstring**. The **docstring** is used to provide *documentation*, by telling the **user** what the function does, what to pass as arguments and what the function returns. Let's write a docstring for our nucleotide counter from before:

In [None]:
def count_nucleotides(seq):
    
    # Your code here to put in the docstring...

    a_count = 0
    t_count = 0
    g_count = 0
    c_count = 0

    for nt in seq:
        if nt == "A":
            a_count += 1
            continue
        elif nt == "T":
            t_count += 1
            continue
        elif nt == "G":
            g_count += 1
            continue
        elif nt == "C":
            c_count += 1
    
    # Returning a tuple
    return a_count, t_count, g_count, c_count

The docstring makes it much easier to know what the function returns without having to look at all the code. If we didn't have the docstring, we wouldn't know what data structure the result would be in, and how the bases would be ordered.

This will be very helpful when we are using code that others have written.

Now, how do we see the docstring? In Python, we can use the `help` function:

In [None]:
# call the help function for our count_nucleotides function
help(count_nucleotides)

Note that we can do this for **any** function!

In [None]:
# Call help for the print function!
help(print)

## Exercises

1. We've gone from the basics of replacing all the `T`s with `U`s to perform transcription, we've processed the mRNA and we've translated it. Now, let's combine all the steps we've done into a function that takes DNA and returns a protein sequence (ignoring splicing and poly-adenylation). Let's write functions for each step, and then one overall function for the whole process.

    As an extra challenge, I've added an argument that dictates whether or not the DNA provided is on the template strand. Think about how we'll handle this if we need to deal with the template strand... Also, I may have mixed up the 5' and 3' ends... so let's ignore the directionality of DNA for this exercise.

In [None]:
# Put your code here
def perform_transcription(dna_sequence, is_template_strand=False):
    pass

def get_codons_for_mrna_sequence(mrna_sequence):
    pass

def translate_sequence_of_codons(codon_list):
    pass

def translate_dna_to_protein(dna_sequence, is_template_strand=False):
    pass

Test it on random sequences:

In [None]:
# Let's make random sequences
import random

def generate_random_sequence(number_of_codons):
    """
    Generate a random DNA sequence of length `number_of_codons`.

    Args:
        - number_of_codons: The desired number of codons in the sequence to generate (not all will be coding).
    Returns:
        - DNA sequence containing `number_of_codons`, plus an additional start codon and stop codon.
    """
    random_sequence = random.choices(list(codon_table.keys()), k=number_of_codons)
    start_codon_position = random.randint(0, number_of_codons-2)
    random_sequence.insert(start_codon_position, "ATG")
    random_sequence = "".join(random_sequence)
    first_start_codon_position = random_sequence.find("ATG")
    stop_codon_position = first_start_codon_position + 3 * random.randint(0, (len(random_sequence) - first_start_codon_position) //  3 - 1)
    stop_codon = random.choices(amino_acid_to_codon_table["STOP"], k=1)[0]
    random_sequence = random_sequence[:stop_codon_position] + stop_codon + random_sequence[stop_codon_position:]
    
    return random_sequence


# Run this code to see your translation results:

random_sequence = generate_random_sequence(25)
print(f"My random DNA sequence is: {random_sequence}")

amino_acid_sequence = translate_dna_to_protein(random_sequence, is_template_strand=False)

print(f"Translated to: {amino_acid_sequence}")

2. In the docstring introduction, we discussed a hypothetical function to find amino acid properties. Now, let's write that function that matches the docstring provided. If we don't have time to get to this question, feel free to try it on your own and compare with the solutions. Here's a skeleton of the function for you to fill in (amino acid properties from https://en.wikipedia.org/wiki/DNA_and_RNA_codon_tables):

In [None]:
def find_amino_acid_properties(protein):
    """
    Find the number of amino acids by property.

    This function takes in a protein sequence and returns a dictionary with the number of
    amino acids that are:
    * non_polar
    * polar
    * basic
    * acidic
    """
    # As a starting point, here are the amino acids that fit each category.
    non_polar = "FLIMVPAWG"
    polar = "STYQNC"
    basic = "HKR"
    acidic = "DE"

    # Fill in the rest...



## Sneak Peek: Object-Oriented Programming

In object-oriented programming, everything is an **object**. *Objects* are collections of variables, known as **attributes**, and functions, known as **methods**. These objects are often representations of real objects. To create new objects, we define **classes**. A **class** is a *template* that is used to create new objects. For example, if we want to create a class to represent a DNA sequence, we could write something like:

In [None]:
class DnaSequence:

    def __init__(self, seq):
        # This is the initialiser. It creates the instance of 
        # the specific class and assigns the first internal variables, known as attributes.
        # Here, `self` refers to that individual instance. We're setting a new attribute of `self` called `seq`
        # and we're assigning the sequence to this value.
        self.seq = seq

    def transcribe(self):
        # Here, we describe a method that we'll run. The first parameter is pretty much always `self` 
        # so that we can refer to the attributes of the instance in the function.
        ...

    def translate(self):
        ...

We can then create sequences using the **constructor** or **initialiser**:

In [None]:
my_sequence = DnaSequence("AAGTTTGAAAGAGGGTGGTCCTGCACACCCTGACCCCAGTC")
my_rna_sequence = my_sequence.transcribe()
my_protein_sequence = my_sequence.translate()

# Module 6 - Modules and Packages

We've been looking at how to write functions, but you'll often need to use code that others have written. Code written by others is arranged in *modules* which are easy to distribute. Python comes with lots built-in modules that you can import for extra functionality. But, there's often functionality that you'll need that *isn't* included in Python. Thankfully, other people distribute their code in packages that you can easily install. In the first part of this section, we'll see how to work with modules, and then we'll see how to install new packages.

## Using modules

I said that Python comes with a lot of modules. When you install Anaconda, there are many others that come pre-installed. To see a list of all available modules, we can type this:

In [None]:
help("modules")

You won't usually do this, but I wanted to show you the scale of how many modules there are.

### Importing Modules

To import a module so that we can use it in our code, here's the syntax:
```python
    import module_name
```

We've come full-circle... Let's go back to a math example. Let's calculate the factorial of a number... We could manually write the function, or we can take advantage of Python's `math` module:

In [None]:
# Your code here... Calculate factorial using `math`


# We will calculate the factorial of 7


print("The value of 7! is:", ...)

The two important lines here are:
* `import math` --> we imported the `math` module
* `x = math.factorial(7)` --> we use the `factorial` function from the `math` module

Note that we have to write `module_name.function_name`. This is because we are importing the ***whole module***.

There's a common trick that we do for some modules. For modules that we use often, we may not want to write the entire module name. We can instead import it using a shorter name:
```python
    import package_name as short_name
```

You'll see this commonly for the NumPy package:

In [None]:
import numpy as np

Now that we have done this, we don't write `numpy` before all functions. Instead, we write `np`:

In [None]:
my_arr = np.arange(8)

For more details on [NumPy](https://numpy.org/), check out its [website](https://numpy.org/) and [documentation](https://numpy.org/doc/stable/).

Note that we can also have submodules within modules. A common example of this is the `matplotlib` plotting library. We commonly use the `pyplot` sub-module. To import it, you'll commonly see this line:

In [None]:
import matplotlib.pyplot as plt

Here, we've imported a submodule and given it a much shorter name.

### Importing Specific Functions

Sometimes, we don't want to import an entire module. We may want to just import a specific function. For this, the syntax is:
```python
    from module_name import function_name
```

Then, when we call the function, we **don't** need to write the module name. We only need to write the function name. Let's see this with our `factorial` example from before:

In [None]:
# Your code here: Change the import to a from ... import
import math

# We will calculate the factorial of 7
x = math.factorial(7)

print("The value of 7! is:", x)

Notice that we were able to call the `factorial` function directly. We can also import multiple functions by separating them with a comma:

In [None]:
from math import cos, sin

print("cos(180) is", cos(180))
print("sin(180) is", sin(180))

Wait! What!?!?!? This isn't what we expected! Quick! Check the docstring or the [online docs](https://docs.python.org/3/library/math.html#math.cos)!

In [None]:
help(cos)

This is an example of important documentation! The docstring tells us that if we're using the `cos` function, the angle **must be in radians**. So, let's change our code... and import a **constant**, too!

In [None]:
from math import pi

print("cos(pi) is", cos(pi))
print("sin(pi) is", sin(pi))

This makes more sense! The weird result (1.22e-16) is in part due to estimation and problems that computers have with storing decimals.

## Package Management

All this has helped us see how to import modules that are already installed, but what if we don't already have a package? The good news is that Python makes it **very, very easy** to install new packages. There are two main tools that you'll use:
* `conda` -- available if you've installed Anaconda or miniconda
* `pip` -- generally available, even if not using Anaconda

### Installing Packages using `conda`

In general, to install a package with `conda`, at the **command prompt** in a terminal, **NOT IN A PYTHON SHELL**, you would write:
```bash
$ conda install package_name
```

Press enter, wait for it to prompt you, type `y` and hit enter again to install! If you don't want to be prompted, then you can just add `-y` to the command so that it automatically answers "yes" to the prompt for installation.

Sometimes, the package that you want isn't available in the main channel, so you may have to specify an additional option (see [here](https://docs.conda.io/projects/conda/en/latest/commands/install.html) for more details). For example, packages may come from `conda-forge`, so you would have to specify:
```bash
$ conda install -c conda-forge package_name
```

You can also add additional channels, such as [**bioconda**](https://bioconda.github.io/).

**Note:** On Windows and macOS, if you installed Anaconda, you can perform package management graphically using the "Anaconda Navigator". Also, on Windows, if you are installing packages from the command line, make sure you open the **Anaconda Prompt** from the Start Menu and you **don't** just go in through `cmd` or PowerShell.

For reproducibility, you may want to store specific versions of packages. For this, you can export your `conda` environment to a file. This will let other users recreate your setup. The process is discussed briefly in the `conda` documentation [here](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html).

At the command prompt (with you `conda` environment active), type the following (after the dollar sign):

```bash
    $ conda env export -f requirements.txt
```

To create a new `conda` environment called `new_env` based on the file, type this:
```bash
    $ conda env create --file requirements.txt --name new_env
```

To activate this new environment, you would type:
```bash
    $ conda activate new_env
```

Environments help you keep multiple versions of Python separate.

### Installing Packages using `pip`

To install packages using `pip`, again you must open the command line. At the prompt, you write:
```bash
    $ pip install package_name
```

**Note:** `pip` should come with just about any installation of Python. If you didn't install Anaconda, things may get a bit messy. There are two major versions of Python in use: 2.7 and 3.*. On some operating systems, typing in `python` or `pip` on the command line use Python 2, while you must use `python3` or `pip3` to use the more updated and supported version of Python. When you install Anaconda, you no longer need to deal with this issue.

Now, let's say you're doing research and you have all your packages installed and you want someone to be able to reproduce your environment with the exact same versions. We can do this easily using `pip`. The package information is stored in a **requirements** file, commonly called `requirements.txt`. To create this file, at the command line, you use `pip freeze`:

```bash
    $ pip freeze > requirements.txt
```

Then, on another computer, if you want to install all these packages, you just write:
```bash
    $ pip install -r requirements.txt
```

You can also create multiple environments using `pip` and `virtualenv`. If you have both `conda` and `pip` installed, the `conda` [documentation](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html) recommends trying to install packages with `conda` first.

### Other Installation Tips

Most packages give you information in the **documentation** about how to install them. For example, NumPy provides the following [page](https://numpy.org/install/). Matplotlib provides [this page](https://matplotlib.org/stable/users/getting_started/index.html#installation-quick-start). Another interesting example in CuPy, which allows performing NumPy and SciPy operations of the GPU. This package **does not** work on all systems. It requires an NVIDIA GPU and CUDA, which is not available on macOS. In these cases, it's very important to read the [installation instructions](https://docs.cupy.dev/en/stable/install.html).

### Reading Documentation

When in doubt, **consult the documentation!** The documentation usually provides instructions on installation, but also basic usage and it often has a collection of all the docstrings so that you can understand what all the function, classes and methods. For example, let's look at [NumPy](https://numpy.org).

# Where to Go From Here

In this workshop, we've seen the basics of Python, from mathematical operations and variables to defining functions and importing packages. But this is just the beginning! There are still many more concepts to cover. To learn more, there are plenty of online tutorials, videos and books (some of them even open-source).

* Python official documentation: https://docs.python.org/3/
* W3Schools Tutorial: https://www.w3schools.com/python/default.asp
* *Think Python 2e* by Allen B. Downey (FREE book): https://greenteapress.com/wp/think-python-2e/
* *Data Structures and Information Retrieval in Python* also by Allen B. Downey (FREE book): https://greenteapress.com/wp/data-structures-and-information-retrieval-in-python/
* Books available from the McGill Library
* YouTube videos
* Specific questions can be answered on [Stack Overflow](https://stackoverflow.com/).

The most important thing to remember is to **read the documentation**. Often, if you're stuck, the answer is **right there**. If it's not, then it's probably on Stack Overflow. It's often a good idea to check the documentation **first** to see if there's an official explanation or an official example. And don't just copy a Stack Overflow answer or sample code. Think about what the code is doing. Does it make sense? Is there a better way? Try to look line by line to understand what is going on (play around in the IPython interpreter!).

There are also many other skills that you can learn to help with your programming and development skills. The other MiCM workshops should provide you with some of these skills. It's great to learn how to use GitHub and write documents in Markdown.

For those of you who have previous programming experience, congratulations on adding another language to your repertoire. For those of you who are new, welcome to the world of programming! Just remember, programming is like art: you start with an empty text file and soon enough, you have hundreds (or thousands) of lines of code!

Don't hesitate to reach out if you have any further questions. Happy coding!

In [None]:
from time import sleep


print("Good luck with your programming future!", end=" ")

i = 1
s = "/-\|"

print(s[0], end="")

while i < 10:
    print("\b" + s[i % len(s)], end="")
    i += 1
    sleep(0.5)

print("\b🎉")