# **Introduction to Python Basics**

Welcome to this introduction to Python for the Developmental Genetics Lab Course. If you have no prior experience with programming, do not worry, that is our expectation as we move through these modules. If you do have prior experience with coding, we're looking forward to seeing what creative solutions you come up with to address the problems we'll be going over.

So why are you learning Python at all? This is a Biology course, after all. For one, it sure beats having to use excel for data analysis and making plots, but as massive biological datasets become more common, having a familiarity with computational tools and approaches is a vital skill for biologists to have.

Python in general offers a great way to make your life easier. Over the course of the next three modules, you'll be programming a tool that generates the nucleotide sequence of the reagents you'll need to add any tag you like to any gene you like. 

## Variables and Basic Data Types

### Variables

In order to manipulate date in Python, we'll need to have a way to track and modify that data. We do that by assigning pieces of data to variables. **A variable is essentially a name that you give to some piece of data in Python.** The name can be whatever you want, with some restrictions. It can't have any spaces, can't start with a number, and shouldn't be the same as one of Python's inherent variables or functions (don't worry about this just yet). Notably, **variables are case-sensitive**. Usually, you want to name the variable in a way that makes sense for the data assigned to that variable. For example, I could create a variable called `my_age`, to store my age.

The data that you store in a variable can be an integer (e.g. `3`), a float (e.g. `3.1` - a number with a decimal), a string of characters (`'Hello world'`, called a "string"), or any other data type python can handle. We'll talk about some other examples of datatypes later.

We can assign our variables using the `=` operator. So I can set the value of `my_age` to be the integer `20` by doing the following:

`my_age = 20`

The spaces next to the `=` aren't actually necessary, and in general, Python doesn't care too much about spaces (assuming those spaces are not part of a string or at the beginning of a line). There are some exceptions, but don't worry about that right now.

Once we assign a variable, we can access the associated value by just typing the variable name instead. For all intents and purposes, the variable *IS* that value.

### Integers, Strings, and Floats

Integers, strings, and floats are the three most basic data types that Python can work with and they serve as the building blocks of other, more complex data types. We discussed them above, but we'll reiterate what each of the three are here.

**Integers**: Integers are counting (i.e. whole) numbers. They can be positive or negative. Notably, *they do not have a decimal point*. So in Python `3` is an integer, but `3.0` is not. 

**Floats**: Like integers, floats (or floating point numbers) are numbers. The distinction is that floats contain decimal points; they don't have to be whole numbers. Floats are just your every day rational numbers and are used as such. `3.0`, `3.1`, and `4.675` are all examples of floats. 

**Strings**: Strings are just sequences of characters. This can include spaces, tabs, line breaks and numbers; we're not limited to letters. Strings are case-sensitive. You define a string using quotations marks, either single `'this is a string'` or double `"this is also a string"`. Numbers that are enclosed in quotation marks are part of a string, so this `"4.0"` is a string, whereas this `4.0` is a float.

Let's create some variables that store our name, age, and our favorite approximation of pi. 

**Name**: Our name should be a string. Strings should be enclosed in quotes. Either double quotes `"like this"` or single quotes, `'like this'`.

**Age**: Our age should be an integer, so we just write the number without a decimal point.

**Pi**: Our approximation of pi should be a float, and should include a decimal point.


In [None]:
my_age = 20
my_name = "Johnathan"
my_pi = 3.14

To check the value of any of our variables, we can use the `print` function. To do so, we write `print(<value>)`. This is just a quick way of checking our variables and what they are. So if we want to check the value of the `my_name` variable we just assigned, we would run the following code:

In [None]:
print(my_name)



We can also modify the values of variables. Let's say we want to make ourselves 5 years older, change our name to a nickname, and make our approximation of pi more accurate. Then let's check that our changes worked.

In [None]:
my_age = 25
print(my_age)

my_name = 'John'
print(my_name)

my_pi = 3.1415
print(my_pi)

Note that there are multiple `print()` statements in this cell. Python runs the cell top to bottom, so the first thing shown will be from the first print statement (here the first is `my_age`, then `my_name`).

<font color='red'>**BIG NOTE:**</font> Google Colab, and Python notebooks in general do not necessarily run top to bottom. Within a cell, code is run top to bottom, but over the course of the whole notebook, things will be stored chronologically. What does that mean? If you go back and run the cell above that simply says `print(my_name)` it should print out "John" instead of "Johnathan". The notebook remembers that you changed the value stored in that variable, and when you print it out (even _above_ where you assigned that variable), it will print whatever the last value assigned to that variable was.

### Basic Operators

Like any programming language, Python has built-in operators that let us manipulate our data. Some of the most basic ones are:<br/>
**Addition**: `+`<br/>
**Subtraction**: `-`<br/>
**Multiplication**: `*`<br/>
**Division**: `/`<br/>

These operators should work as expected. Let's look at some examples. Note which of these return integers as opposed to floats.

In [None]:
print(6+2)
print(6+3.4)
print(6-2)
print(6*2)
print(6/2)

Remember that in Python, we'll frequently be using values stored in variables as opposed to raw values. So let's see how we can use these operators with variables.

In [None]:
var1 = 4
var2 = 3

print(var1+var2)
print(var2/var1)

We can also store the output of using these operators to new variables, or even modify existing variables using these operators. Try to figure out what the below code will print before running it.

In [None]:
var3 = var1*var2
print(var3)

var3 = var3 + 2.7
print(var3)

Notice how we modified the value of `var3` above using the `+` operator. There's actually a shortcut to do the same thing using the `+=` operator. To modify the value of `var3` in the same way as we did above, we could have written: `var3 += 2.7`. The operator takes the value on the right and adds it to the variable on the left. The `-=`, `*=`, and `/=` operators all also exist.

We can also use the `+` or `+=` operators to combine (the technical word is "concatenate" strings); for example, if you combined "Hello" and "class" to "Hello class", you'd be concatenating the strings. You can imagine this will be incredibly useful when we're working with manipulating nucleotide sequences.

Let's see how we could add our last name to our `my_name` variable.

In [None]:
my_name = my_name + ' Doe'
print(my_name)

Note that we have to include the space between the first and last name in the added string. If we don't include it, it won't end up in our final string.

In [None]:
my_sisters_name = 'Jane'
my_sisters_name += 'Doe'
print(my_sisters_name)

### Lists

A list is... well a list. It can contain variables of any type: one can have a string, followed by an integer, followed by a float... it doesn't matter. Lists are ordered, meaning the first variable you put in the list will be first; the second variable will be second. This allows you to search a list by position. 

You define a list using square brackets `[]` and values in the list are separated by commas.

Let's define a simple list and store it to a variable.

In [None]:
sample_list = ["my", "dog", "is", 1]

Here we have three string values and one integer value in our list. Cool. But how can we access the values inside our list?

We can access the items in our vector by **indexing**. This means that we use a value's position inside the list to access it.

<font color='red'>**BIG NOTE:**</font> Python indexes from zero. If we want the value our eyes see first (i.e. "my"), that is actually the zero-th item.

To index the list, we just use the variable name of the list, followed by `[]` with the index we want. So for the "zero-th" item:

In [None]:
print(sample_list[0])

This is one of the times the distinction between integers and floats is important. Indexing lists requires that we use integers, specifically, because we're interested in the position inside the list.

Sometimes we need to know how long our list actually is. Here, we created the list ourselves by hand, so we know how long the list is, but if we use some automated process to generate a list from some other data, we might not know how long the list is. To find the length of a list, we use the `len()` function. You can either just query the length of the vector and use `print()` to see it, or you can assign that length value to a variable to be used later.

The format is `len(<list to check>)`

In [None]:
print(len(sample_list))

In [None]:
len_sample_list = len(sample_list)

print(len_sample_list)

### Dictionaries

And one last data storage method: Dictionaries.

A dictionary essentially allows you to store pairs of data. Each pair has one value, called the "key" that you can use to "look up" the associated value, called (perhaps somewhat confusingly) the "value". A dictionary is just a set of key/value pairs.

<font color='red'>**NOTE:**</font> Keys cannot be repeated in a dictionary. This should make sense, because if you had different values associated with the same key, Python would not know which value to grab when "looking up" that key in the dictionary. 

So let's say you wanted to count and store the number of times each two-basepair-long DNA sequence occured in some input DNA sequence. You could create a dictionary that had the 16 possible two-basepair-long sequences as keys with the corresponding counts as values.

A dictionary is defined using curly brackets `{}`, each key/value pair is written `<key> : <value>`, and pairs are separated by commas.

Let's say we want to encode a mapping of each base to it's reverse complement. This will be useful when generating CRISPR constructs

So if we wanted to encode 'A', 'C', 'G', and 'T' as 'T', 'G', 'C', and 'A' respectively, we could create the below dictionary.

In [None]:
encode_reverse = {
    'A' : 'T',
    'C' : 'G',
    'G' : 'C',
    'T' : 'A'
}
print(encode_reverse)

We noted above that you can "look up" the value associated with a given key in a dictionary, so how do we do that? The syntax is actually the same as indexing a list.

To look up a key in a dictionary, we just use the variable name of the dictionary, followed by `[]` with the key we want. So to find the reverse complement of `'C'` we could do the following.

In [None]:
print(encode_reverse['C'])

<font color='red'>**NOTE:**</font> Dictionaries, unlike lists, are not ordered. There is no "zeroth" or "first" pair in a dictionary. If you're interested in the value assigned to a specific key, you simply look up that key.

## Modifying variables

Above, we discussed different data types and how to assign different data to variables. We also *briefly* discussed how you could modify the value assigned to some variable. We'll discuss a few more ways of modifying variables here.

You can always just change the value assigned to a variable by assigning that variable a completely different value. So I can assign a variable called `var4` some value, and then completely change it. It doesn't even need to be the same data type.

In [None]:
var4 = 'This is the starting value'
print(var4)

var4 = 7.3
print(var4)

Here, we're a little bit more interested in modifying the values already assigned to variables in meaningful ways, rather than just completely changing them.

### Modifying Strings

We discussed above how we can concatenate strings using the `+` or `+=` operators.

Be careful using the `+=` operator, because it only concatenates strings onto the right end of the string currently assigned to the variable. If you want to concatenate a string to the left of another string, you'll have to use the `+` operator.

Take a look at the examples below, and see if you can figure out what each `print()` statement will print.

In [None]:
sample_string = 'My name is'
print(sample_string)

sample_string += ' John'
print(sample_string)

sample_string = 'Hello, ' + sample_string
print(sample_string)

Note that you can't modify a single position in a string. You can add to it as above, and we'll learn how to separate and join it. But you can't change letters within it. The following code block will give an error. 

In [None]:
sample_string[0] = 'A'

### Modifying Integers and Floats

As discussed above, we can use the `+=`, `-=`, `*=`, and `/=` operators to modify numerical values in Python. As above, see if you can figure out what each `print()` statement will print. Note that some operators return a float (number with a decimal) rather than an integer (no decimal). 


In [None]:
sample_number = 7
print(sample_number)

sample_number += 2
print(sample_number)

sample_number *= 3
print(sample_number)

sample_number /= 9
print(sample_number)

sample_number -= 2
print(sample_number)

### Modifying Lists

Because lists are vectors of values, there are actually a few different ways we can modify them. 

Firstly, we can "append" or add new values onto the end of existing lists. We can do this with the `append()` method.

What's a method? They're essentially functions, like `print()` or `len()`, but they only work on specific data types. `append()` is a **list** method. When using methods, the syntax is as follows: `<variable_name>.method(<argument>)`.

So for the `append()` method, the structure would be as follows: `<list_variable>.append(<thing_to_append>)`

This is pretty confusing, so let's take a look at an example below.

In [None]:
sample_list = [1, 2.0, '3']
print(sample_list)

sample_list.append('four')
print(sample_list)

[1, 2.0, '3']
[1, 2.0, '3', 'four']


The value `'four'` is appended to the end of our `sample_list`, changing it from a length 3 list to a length 4 list. 

Notice that we don't actually have to reassign the variable name (i.e. we don't have to do `sample_list = sample_list.append('four')`. This is because, like the `+=` operator, this method modifies values *in place*.

As with strings, we can actually concatenate multiple lists together (also useful if you want to append multiple values to a list). To do so, we use either the `+` operator or the `+=` operator, as with strings. Check out an example below,

In [None]:
sample_list += [4, 5.0, '6', 'seven']
print(sample_list)

#### **Modifying list elements**

In addition to appending additional values onto the end of lists, we can also modify the values of specific elements within a list.

Recall how we used square brackets `[]` to index a specific position within the list. For example to print the third element in our list we could do the following (remember that Python indexes from zero).

In [None]:
print(sample_list[3])

We can actually use the same syntax to modify specific elements in our list. Let's say we wanted to add `13` to the value stored in position `0` in our list. We could do so using the following syntax.

In [None]:
sample_list[0] += 13
print(sample_list)

What if instead of adding 13 to `sample_list[0]`, we just want to make that zeroth position in sample_list the number 13? 

In [None]:
sample_list[0] = 13
print(sample_list)

You can modify the elements within a list in any way you could modify the data type of whichever element you're trying to modify. So we can modify the strings as well.

In [None]:
sample_list[3] += ' is the best number'
print(sample_list)

Note that we can also simply just replace the elements with new values.

In [None]:
sample_list[3] = 4
print(sample_list)

### Modifying Dictionaries

Adding new key/value pairs to a dictionary and modifying existing key/value pairs actually uses the same syntax, and it should be familiar.

Recall the way that we grab the value associated with some key in a dictionary. `<dict_name>[<key>]`.

For example, when we wanted to check the reverse complement of "C" in our `encode_reverse` dictionary, we printed `encode_reverse['C']`

We use this same syntax to modify and add key/value pairs in dictionaries. If we want to add/modify a key/value pair in a dictionary, we use the following syntax: `<dict_name>[<key>] = <value>`. **If the key you used is already in the dictionary, the associated value will be changed. If not, it will add this key/value pair to the dictionary.**

Let's take a look at an example below.


In [None]:
sample_dict = {'AT' : 3,
               'GC' : 2}
print(sample_dict)


sample_dict['AG'] = 4
print(sample_dict)

sample_dict['AT'] = 7
print(sample_dict)

## Subsetting Strings and Lists

Recall that we were able to grab specific elements from lists by using square brackets `[]` and using the index of the element we were interested in.

We can actually do the same thing with strings. If I wanted to grab the character at the fourth index of some string (so the fifth character we see), I could do the following:

In [None]:
sample_string = 'Hello there'

print(sample_string[4])

So we can use this indexing to grab single elements of lists or single characters of strings, but we can actually use a slightly more complex syntax to grab *multiple* elements in a list, or multiple characters from a string.

If we want to grab a list's elements or a string's characters between some start and end positions, we can use the following syntax: `<list_or_string>[<start_pos>:<end_pos>]`

<font color='red'>**NOTE:**</font> Subsetting in Python is "half open", meaning that the element or character at the end position is not included. When we subset in this way, we grab the elements from the start position up to *but not including* the end position.

Let's take a look at some examples below. Try to figure out what will be printed before running the cell.

In [None]:
sample_string = 'Hello there'

print(sample_string[1:4])
print(sample_string[3:7])

In [None]:
sample_list = [3,1,4,1,5,9,2,6,5]

print(sample_list[0:3])
print(sample_list[3:7])

There are some shortcuts built into the syntax for this subsetting operation that may make your life a little easier.

If you don't include a start position before the colon `:`, Python will assume you just want to start at the beginning of whatever you're subsetting (i.e. index `0`). So if you wanted to grab the first five characters of some string, you could use the following: `<my_string>[:5]`

Similarly, if you don't include an end position *after* the colon `:`, Python will assume you want to grab everything after (and including) the start position. Essentially, this just sets the end position to be the length of whatever you're subsetting. So if you wanted to grab everything *except* the first five characters of some string, you could use the following: `<my_string>[5:]`

Let's take a look at some examples.

In [None]:
sample_list = [3,1,4,1,5,9,2,6,5]

print(sample_list[:4])
print(sample_list[4:])

#### **Step Size**

When we're subsetting lists or strings, we can also choose a "step size" at which we subset.

Recall that when we're subsetting a list or a string, we choose a start position and an end positions, and then we can grab all of the elements or characters between those positions.

But Python also allows us to grab every *other* (or every *third*, etc.) between these two positions. We do this using the following syntax: `<list_or_string>[<start_pos>:<end_pos>:<step_size>]`

This should look very similar to the base syntax used for subsetting. The difference is that we've included an additional colon `:` within the square brackets `[]` and our desired step size. The start position and end position work the same as before.

Let's take a look at an example. Try to figure out what the cells will print before running them.

In [None]:
sample_string = 'schooled'

print(sample_string[0:8:2])

In [None]:
sample_string = 'schooled'

print(sample_string[1:8:2])

We can also use the same shortcuts we discussed earlier in regard to the start position and the end position.

<font color='red'>**NOTE:**</font> If your square brackets `[]` contain one colon `:`, Python will assume you are giving it the start and end positions (in that order), and will use the default step size of `1`. If your square brackets `[]` contain two colons `:`, Python will assume you are giving it the start and end positions and the step size (in that order)

Check out some examples below, as always, try to predict what will be printed before running the cells.

In [None]:
sample_string = 'calliope'

print(sample_string[::2])
print(sample_string[1::2])

## Commenting your Code

Not everything you write in your code needs to be actual _code_. In Python, we can add comments to our code that might help other people understand our code, or even help us keep track of what our code is doing as some part.

Comments are not interpreted by Python as actual code and will be ignored when running your code.

There are two ways to add comments to code.

Firstly, you can add a single-line comment to code using the hash `#` character. For a single line, anything after a `#` will be treated as a comment and will be ignored by Python

In [None]:
# This is a demonstration of how comments work
print('Hello')

In [None]:
print('Hello') # This is another demonstration of how comments work

In [None]:
my_dict = {'Name': 'John', # I can even do this
           'Pet' : 'Dog'}

print(my_dict['Name'])

You can also add multi-line comments using triple quotes `"""` or `'''`. Anything between one set of triple quotes and another will be treated as a comment and will be ignored by Python.

In [None]:
"""This is a
multi-line
comment"""
print('Hello')

<a name="built-ins"></a>
## Built-in Python Functions (and Methods)

Functions in Python, like functions in math, take inputs, and transform them into outputs. Consider the following:

$f(x,y)=x+y$

Here, $f$ is a function that takes two inputs or <font color='green'>**arguments**</font>, and adds them together to produce an output.

In Python, functions take some argument or arguments, and *does something* with them. 

![picture](https://drive.google.com/uc?export=view&id=1hGX0Jv-zGTvR0wt0b1Wc1__t3pMyxSgY)

Many functions <font color='green'>**return**</font> some output (like the function above), but many simply use those inputs to perform some operation, rather than returning some specific output.

If a function returns an output, we should be able to assign that output to a variable. If it does not return an output, trying to assign the ouput of the function to a variable will result in the variable being assigned a `None` value.

We'll discuss how to build our own functions later, but for now, let's go over some functions that Python already has *built-in*. We'll take a look at some functions that return outputs and some that don't.

### print()

The `print() function is one we've already been using a ton, but let's look into how it works a little deeper.

At it's core, the `print()` function takes whatever argument you pass it and prints it to the screen, so that the user can see it. This works for different data types we've learned. 

In [None]:
sample_string = 'Hello there'

print(sample_string)

But does the `print()` function actually return an output? Let's try assigning the output of the `print()` function to a variable.

In [None]:
sample_string = 'Hello there'

print_output = print(sample_string) # you'll notice that this still prints "Hello there" to the screen, because we're still calling the function.

print(print_output) # This is the line that will tell us what the value of the print_output variable is.

So the `print()` does NOT return an output. Let's take a look at some more functionality.

The `print()` function can actually accept multiple arguments. If we call the `print()` function with multiple arguments, separated by commas, it will print those arguments to the screen, separated by spaces. Let's take a look at an example.

In [None]:
name = 'John Doe'

print('Name:', name)

### append()

We previously went over the `append()` method. This adds elements to lists.

Because this is a method, it is specific to a certain data type, and is called using the syntax `<variable_name>.method(<argument>)`.

`append()` takes the arguments passed to it and adds them to the list it is called on. Let's go over some examples.

In [None]:
sample_list = [1,2,3]

sample_list.append(4)
print(sample_list)

Does the `append()`method return an output? Test it out in the code cell below.

In [None]:
sample_list = {1,2,3}

print(sample_list.append(4))

### len()

Another function we've seen already, the `len()` function returns the length of the data structure passed to it. It works on pretty much any data structure, including strings, lists, and dictionaries.

Check out some examples in the code cell below. Try to guess the output before running the code cell.

In [None]:
sample_string = 'Hello world'
print(len(sample_string))

sample_list = [1,'two', 3.0]
print(len(sample_list))

sample_dictionary = {'Name': 'John',
                    'Pet' : 'Dog'}
print(len(sample_dictionary))

### type()

We've used the `type()` function earlier in this tutorial, but we didn't really explain what it was doing. 

The `type()` function checks the argument you pass it and returns an object that describes the data type of the passed argument. The type function actually returns a data type that isn't a string, but you can print the output of calling the `type()` function and it should be interpretable.

The `type()` function is great for checking the data type of some variable, espcially as your program gets more complicated.

Let's check some examples below.

In [None]:
sample_str = 'Hi!'
sample_dict = {'Name' : 'Bob',
               'Age' : 32}

print(type(sample_str))
print(type(sample_dict))

In [None]:
# If you're interested in seeing what data type the `type()` function returns, you can run this cell

type_output = type('This is a test')
print(type(type_output))

### upper() and lower()

`upper()` and `lower()` are both string methods that take an input string and *return* the same string, except all characters are either upper case or lower case.

The syntax here is a little bit different from the methods we've seen previously. For methods like `add()` or `append()`, the arguments in the parantheses `()` were added to the set or list the method was called on.

Here, `upper()` and `lower()` don't take arguments within the parentheses `()`. Instead, the string that the method is called on acts as a kind of input, and the method `returns` an output string.

This might be a little bit confusing, so let's take a look at an example below.

In [None]:
starter_string = 'Hello there'

uppercase_string = starter_string.upper()
print(uppercase_string)

To recap, we called the `upper()` method on the string assigned to the `starter_string` variable. We then assigned the output of this method to the `uppercase_string` variable.

These methods will be incredibly useful with the type of gene annotations we'll be working with.

### split() and join()

`split()` and `join()` are both string methods with somewhat opposite functionality.

`split()` is called on a string, and returns a list that is created by *splitting* that string at whatever character or character you choose. The syntax is as follows: `<str_to_split>.split(<delimiter>)`

Let's take a look at an example using space ` ` as a delimiter.

In [None]:
sample_string = 'Hello my name is John'

output_list = sample_string.split(" ")
print(output_list)

So here, we split our `sample_string` variable at each of the spaces, and stored those entries in a list, which we then assigned to the variable `output_list`. Notice that when doing this, the spaces were omitted from the output list. We can also use multicharacter delimiters.

In [None]:
sample_string = 'TCA, AGT, TCC, TCG, TCT, AGC'

output_list = sample_string.split(', ') # This is a two-character delimiter (comma followed by a space)
print(output_list)

You can split your string using nearly any delimiter. Here we'll split our string into a new list object any time we have the letter e. 

In [None]:
sample_string = 'Genomics graduate students like computers'

output_list = sample_string.split("e")
print(output_list)

`join()` on the other hand, is called on a string, and returns a string that is created by joining a list *with* the string the method is called on. To simplify, the string that you call `join()` on will be placed between each of the elements of the list you are joining. The syntax is as follows: `<delimiter>.join(<list_to_join>)`

<font color='red'>**NOTE:**</font> you are welcome to use an empty string `""` to with `join()` if you simply want to concatenate the elements of your list together.

<font color='red'>**ANOTHER NOTE:**</font> The list that you are attempting to join can *only* contain string elements. This method will fail if any of the list elements are not strings.

Let's take a look at an example below.

In [None]:
starter_list = ["chr1", "100000", "110000", "0.00156"]

output_string = "  ".join(starter_list)
print(output_string)

So, we called `join()` on the string `"  "`, and passed our `starter_list` variable as the argument. This will join the elements of `starter_list` with `"  "` connecting neighboring elements.