# Week 1: Introduction to Python programming for biology students 🧪

Welcome! This is the first of many modules in this course designed to help you learn various programming tools for biologists. Over the course of this semester, you will learn the following:
1. Python programming
2. Data science with Pandas
3. Machine Learning fundamentals

The goal of learning programming as a biologist is not to become an expert software developer (although you certainly can!), but to *understand how to use programming as a tool* to:
1. automate and accelerate computation
2. organize and visualize data
3. employ statistical/machine learning models to reach conclusions about datasets

...all within a scientific context. Whether you will work in a research lab or a corporate setting, these skills will complement your biology background. We hope you find these modules helpful as you continue your undergraduate studies.



# Premodule: Jupyter and Python Basics
This week's focus is on how to use the Python programming language. In this first part, you will learn about:
1. Jupyter Notebooks
2. Variables and datatypes in Python

We recommend spending some time to really understand the first few weeks of content, as subsequent modules will build upon fundamental Python skills. We highly encourage you to look up the documentation for certain functions or do quick Google searches for additional help as we go along. The goal here is not to know every feature and function of Python by heart, but to be able to find useful information and apply it to the problem you are trying to solve.

## Jupyter Notebooks 📓

Before we get into learning Python, let's explore the page you are seeing right now, a Jupyter Notebook.
<span style="background-color: #AFEEEE">**Jupyter Notebooks**</span> **are a document that blends interactive computation and their outputs, explanatory text, mathematics, images, and more.** It is widely used in the scientific computing community and research labs.



### Cells
This rectangular block is called a **cell**. There are two types of cells in Jupyter Notebooks:

<span style="background-color: #AFEEEE">**Text cell**</span>: an editable cell where you can write text (and insert other elements like images) in Markdown. (See Further Reading at the bottom of this notebook)
<span style="background-color: #AFEEEE">**Code cell**</span>: an editable cell where you can write and execute code.

#### Text cell
This one contains Markdown*, which is a way of describing text with formatting. If you click on this text, you will see that the tab for this this rectangular box is highlighted <span style="color:green">green</span> and you are able to edit the contents. This is editor mode, which you'll need to be in to edit code or Markdown.

If you click anywhere outside this editor area, the tab will turn <span style="color:blue">blue</span>. This is command mode; you should be in this mode to run cells.
You can switch between the two modes by pressing *Enter* for edit mode and *Esc* for command mode.

#### Code cell
Notebooks can also have code cells. Jupyter Notebooks allows you to run the current selected code cell by clicking on "Run" (with a sideways triangle icon) on the top toolbar or with *shift+Enter*. You can find more run settings under "Cell".

### Important ❗
As you follow along in these modules, please read each text cell and follow the directions to complete/run the code cells that follow before moving onto the next portion. Skipping sections may cause issues later on, as some cells depend on previous cells. You will be writing code directly into code cells marked with a "TODO:" or something similar, and sometimes providing answers in text format (in markdown cells).

*For an overview of Jupyter Notebooks, go to `Help -> Notebook Tour` and follow the prompts.*

### Running cells

Below is an example of a code cell to perform basic arithmetic operations and print the result in Python. Don't worry about understanding the code yet, but we will run some cells to get an idea of how this works.


<span style="background-color: #FFD700">**Run the cell below by selecting it, then clicking "Run" in the taskbar OR using the shortcut *shift+Enter*.**</span>

In [None]:
x = 5
y = 10
z = x + y
z

You should see a number in square brackets appear to the left of the cell. This is the execution number. The notebook tracks which cells you have ran so far, because execution order matters (discussed later).

You also should see "15" appear below the code cell you just ran. This is the result or *output* of running the code. The result we see after execution is from the last line of the code block. We see whatever **z** represents, which is 15.
Note how the code and the results of running the code are bundled together (the result is directly under the code block).

<span style="background-color: #FFD700">**Run the cell below**</span>

In [None]:
print(x)
print(y)
z

There are three values in the output now. Outputs will be displayed in the order that the lines of code are executed, from top to bottom. The first two outputs, 5 and 10, are outputs from the first two lines of code. The last output is the value of z, as we saw previously.

### Comments
Comments help the reader to understand the intention behind a code block or a specific line of code. This helps not only other programmers, but you as the author as you develop your code. Comments are not executed but will be visible in the code file.

You can start a **line comment** the hash symbol (#). Everything written on the same line after this symbol will be part of the comment.

A useful keyboard shortcut to comment/uncomment a line of code is **Ctrl + /** (Windows) or **Cmd + /** (Mac). You can <span style="background-color: #FFD700">**try out this shortcut on the code cell below**</span>.

In [None]:
# This is a line comment
# Whatever you write AFTER the # sign on this line will not get executed.
some_code = 10  # Notice that code written before the # sign is NOT COMMENTED and WILL be executed.
some_code

Now that we know how this notebook works, let's hop into our first topic in Python.

## Python Basics: Variables and datatypes 👾

<span style="background-color: #AFEEEE">**Variables**</span> are entities with a <span style="background-color: #AFEEEE">**value**</span>, which is of a <span style="background-color: #AFEEEE">**data type**</span> (explained later in this notebook). We declare variables with a unique name of our choosing in our program. Declaring a variable in Python includes these simple steps:
1. We write a new, unique variable name (this is the left hand side)
2. We write a ```=``` sign
3. We write some value (this is the right hand side)
This assigns the value on the right hand side of the ```=``` sign to the variable name on the left hand side. For example, in this line of code:

```
my_variable = 3
```
We have declared a variable called ```my_variable```, and it it has a value of 3.

A few rules about naming variables:
1. must not start with a digit (0-9)
2. must not be a keyword
3. variable names are case-sensitive
4. no spaces and no special characters such as a #

Keywords are reserved words in Python that are intended to do special things. Some examples are "for", "in", and "continue". There are many more keywords, so don't worry about them for now. You will learn some of them as we go through these modules.

In python, we must assign a value to the variable when it is declared. **=** is the assignment operator. On the left of the operator is the variable we wish to assign the value to. On the right of the operator is the actual value we wish to assign. Try running the cells below.

<span style="background-color: #FFD700">**Run the 3 code cells below and see what happens.**</span>

In [None]:
# Declaring a variable without a value [X]
var                  # var is not defined

In [None]:
# Declaring a new variable
var = 10
# Declaring another variable; note the uppercase V
Var = 5
print(f"var: {var}, Var: {Var}")

Only the last cell ran without errors. Why?

The first cell failed with an error that says "name 'var' is not defined". This means that you tried to declare (i.e. create) a variable without specifying what the variable holds. While this may be possible in other programming langauges, you must **define** your variable by **assigning** it a value.

Some data types are summarized below:

| Data type |      Example       |
| :---------| :------------------|
| int       | 0, -1, 99, 12345   |
| float     | 1.1, -0.9, 123.4   |
| string    | "hello", "h", "i"  |
| boolean   | False (0), True (1)|


Other data types (revisited later):
* list
* tuple
* set
* dict

### Int
As mentioned before, int stands for **integer**.

### Float
Float represents fractional (decimal) values.

### String
A string is comprised of one or more characters. Characters can be alphanumerical (a-z and 0-9), symbols (@,#,$), spaces, etc. This datatype can represent basically any letter, word, or phrase you want. The string must be wrapped in quotation marks.

### Boolean
A boolean is either one of two values: True or False. This datatype is used to evaluate conditions, which we will cover in the next module. If you ask a question such as, Is val1 equal to val2?, the answer can only be either True or False. We can assign this boolean value to a variable as well.

In [None]:
is_true = True
is_false = False
val1 = 1
val2 = 2

print(f"Value of the boolean variable is_true: {is_true}")
print(f"Value of the boolean variable is_false: {is_false}")
print(f"Value of the integer variable val1: {val1}")
print(f"Value of the integer variable val2: {val2}")
print(f"Is val1 equal to val2? {val1 == val2}")


### print()
So far we have been printing some stuff in the code cells to demonstrate how variables work, but we haven't explained how this done. Printing to the output is possible using Python's built-in ```print()``` function. Printing output can be helpful for checking the state of our program (values of certain variables at a point in time) as we write our code, so that we can see results and spotting errors in our code if we see something we don't expect.

To use this function, simply put the message you want to print as a string inside the parentheses.

In [None]:
print("Hello World!")

We can also print the values of variables:

In [None]:
greeting = "Hello World!"
print(greeting)

We can print strings and variables together in one line by separating them with commas.

In [None]:
print(greeting, "My name is Sophie!")

Sometimes you will see, in the code given to you, the following format:

In [None]:
print(f"{greeting} My name is Sophie!")

The ```f``` in front of the opening quotation marks inside the print statement indicates the print statement is using a special formatting. It is just another way to print strings and variables together. Feel free to use either method that you find easier. (See Further Reading for more on input/ouput formatting.)

Now it's your turn! Let's write your first line of code to execute. Below, you will write a string that says: "Hello World! My name is _____" with your own name in the space. You should assign the string to the variable ```message```, where it says ```# TODO: COMPLETE THIS LINE```. Be sure to write your code after the ```=``` sign but before the comment (better yet, you can remove the comment but we would advise you to keep it there until it's clear to you which part of the code you are supposed to complete).

<span style="background-color: #FFD700">**Complete the code in the cell below and run it**</span>

In [None]:
"""
Write a string that says:

Hello World! My name is _(name)_

Replace the space with your name
and assign the string to message.
"""
message = # TODO: COMPLETE THIS LINE
print(message)

### Cell execution order
Variables persist throughout the notebook. Recall that in an earlier example, we added variables x and y together and assigned the result to z. Even though x, y, and z were assigned in a previous cell, we can access and reassign (overwrite) them in subsequent cells. Then, the effect of that action will be reflected when you run/re-run a cell with that variable anywhere in the notebook. You can see the execution order by referencing the number in square brackets to the left of each cell (if there is no number, that cell hasn't been executed yet). Typically notebooks are intended to be executed top to bottom.

For example, run CELL 1 below to see what the value of z is currently. Then, if you reassign variable z in CELL 2 and re-run CELL 1 above it, you will see that z is now the new reassigned value.

In [None]:
# CELL 1
print(z)

In [None]:
# CELL 2
z = 20
z
# after running this cell, try running the cell above that says print(z). it should output 20 now instead of 15.

The takeaway: do not confuse the *visual order* of notebook cells with the *execution order*. CELL 1 comes before CELL 2 in the notebook, but by running it again you will see the execution number of CELL 1 is now higher than that of CELL 2.

To avoid confusion, you should strive to keep your notebook organized linearly so that execution follows visual order from top to bottom. You should also be wary of accidentally running the same cell repeatedly, as this could have undesired results (think about a code cell that adds 1 to a variable. If you ran this twice, you would actually be adding 2). You can go to Cell > Run All to run all cells in the notebook from top to bottom.

## Graded Exercise

**GQ: In the final cell below, fill in the blank with the final value of my_var, assuming all cells are executed in order.**

In [None]:
my_var = 10

my_var = (my_var + 5) / 2


In [None]:
my_var = 20

In [None]:
# print out the value of my_var
print(___)