# (Slighty more) Advanced Python

Now that we've gone over variables and some of the basic data types available in Python (integers, floats, strings, lists, and dictionaries), as well as some of the basic things we can do with these data types, we'd like to discuss some more advanced operations you can do in Python.

## For Loops

Frequently, it is useful to walk through an entire set of data and perform the same (or similar) actions on that data. That's a fairly broad statement, so let's provide some context.

Imagine you have a list of genes, as DNA sequences, and you want to convert all of those DNA sequences into amino acid sequences. You need to be able to move through your list and "translate" each DNA sequence string into an amino acid string.

Furthermore, to be able to translate your DNA sequences, you need to be able to move through each of those sequences, converting each three-character-long codon into an amino acid.

We need to be able to step or **loop** through our data, and perform the same operation at each step.

For this purpose we can use something called a `for` loop.

Let's start simple. Say we have the following list:

In [None]:
my_list = [1,2,3,4,5,6]

We want to run through this list and print each element of the list. We can print the whole list at once by simply running this command `print(my_list)`, but here we want to print each element individually. To do so we use the following structure:

In [None]:
for x in my_list:
    print(x)

There are a few things to note here.

The `x` is a temporary variable. Each time you iterate through the list (i.e., each time you "move" from one element of the list to the next one), `x` gets assigned to the next element in the list. As with other variables, you can (mostly) name this variable whatever you want. So the below code cell will print exactly the same thing as the loop above.

In [None]:
for number in my_list:
    print(number)


The structure of the for loop is:

```
for <temporary_variable> in <thing_to_loop_through>:
    {do something}
```

The instructions of what to do on each iteration are after the first line and are indented by a single `tab`.<br/><br/>

Let's break this down a little bit more.

In our example above, the `for` loop starts at the beginning of the list, setting `x` as the zeroth element of the list (in this case, the number `1`). Then it applies the instructions we gave it to each element. In this case, `print(x)` which will print `1`, because the temporary variable `x` is currently assigned the value of `1`. Because our instructions are now over, it will "move" to the next element in the list, setting `x` as `2` and so on. It will continue until the end of the list.

What if we wanted to print each element of the list multiplied by 2?

In [None]:
for x in my_list:
    print(2*x)

Once we un-indent, we're no longer in the `for` loop. What should the below print?

In [None]:
for x in my_list:
    print(x)

print('Hello')

<font color='red'>**NOTE:**</font> The `for` loop does not necesssarily modify the thing that we are looping through. Even if we do `print(2*x)`, the values in `my_list` remain unchanged. You **can** use a for loop to modify something, but we won't do that for this class. 

In [None]:
print(my_list) # note that it's still the same list as before, even though we've put it through a couple of `for` loops.

`for` loops also work with strings and dictionaries.

In the case of strings, you loop through each character in the string. Take a look at the example below. (Recall that the temporary variable can be anything; it doesn't have to be "x".)

In [None]:
my_string = 'ATGGCA'

for char in my_string:
    print(char)

You'll notice that for both lists and strings, the `for` loop moves through sequentially, from beginning to end.

Dictionaries are not ordered, and looping through them occurs in a random order. That being said, each element in the dictionary will only be "visited" once. When looping through dictionaries, the temporary variable is assigned to the **keys** in the the dictionary. We can then look up the assigned key in the dictionary to grab the associated value.

In [1]:
my_dict = {'ATGACAG' : 4,
           'GGACATG' : 2,
           'GAATACT' : 1}

for sequence in my_dict:
    print(sequence)

ATGACAG
GGACATG
GAATACT


In [2]:
my_dict = {'ATGACAG' : 4,
           'GGACATG' : 2,
           'GAATACT' : 1}

for sequence in my_dict:
    sequence_value = my_dict[sequence]
    print(sequence, sequence_value)

ATGACAG 4
GGACATG 2
GAATACT 1


Alternatively, we can use the `.items()` method on our list to loop through the keys and values at the same time. Here, we pass the for loop two temporary variables, `sequence` and `sequence_value`. Whatever instruction (or function) we call within the loop will apply to both of those temporary variables. 

The `items()` method grabs the key-value pairs within the dictionary. Our two temporary variables apply names to the key and the value, so we can do operations on them. Here's an example where we just print both the key and the value:

In [4]:
my_dict = {'ATGACAG' : 4,
           'GGACATG' : 2,
           'GAATACT' : 1}

for sequence, sequence_value in my_dict.items():
    print(sequence, sequence_value)

ATGACAG 4
GGACATG 2
GAATACT 1


Here's the same idea, but we'll treat the two variables differently; we'll print the value plus two: 

In [6]:
for sequence, sequence_value in my_dict.items():
    print(sequence)
    print(sequence_value + 2)

ATGACAG
6
GGACATG
4
GAATACT
3


So for the first key-value pair in the dictionary, it prints its sequence and then the value plus two; then it moves on to the next key-value pair. 

### The range() function

The `range()` function returns something like a list of sequential integers. The basic syntax is `range(start, stop)`. Like with indexing lists, the stop is *not* included in the sequence.

Also like indexing lists, we can add a third argument when calling this function that defines the *step size* at which the sequential integers are generated. In this case, the syntax is `range(start, stop, step)`

<font color='red'>**NOTE:**</font> `range()` only works with integers. `start`, `stop`, and `step` (if included) must all be integers.

In [None]:
# We have a series of numbers from 1 through 10 (not including 10) 
# We want to print every other number (step in groups of 2)

sample_range = range(1, 10, 2)
for value in sample_range:
    # What should this show us?
    print(value)

## Conditionals

Sometimes, we want to check the value of a variable and act differently based on whether a specific condition is met. Our code would carry out one set of instructions if the variable meets a certain condition, and another set of instructions if it doesn't (or meets another condition).

We can do so using the `if`, `else` structure.

```
if {some conditional}:
    {do something}
else:
    {do something else}
```

Like with the `for` loop, whatever code we want to run if a given conditional is met is indented (using the `tab` key, or four spaces) under that condition.

There are tons of conditions we can check, but the basic ones are:

`x < y`  is x less than y<br>
`x <= y` is x less than or equal to y<br>
`x > y`  is x greater than y<br>
`x >= y` is x greater than or equal to y<br>
`x == y` is x equal to y (note that this is the mathematical operation for checking whether two variables are equal; it is different from `=` which is used for variable assignment)<br>
`x != y` is x NOT equal to y

We can also check whether a certain element appears in a list or set, whether a certain key appears in a dictionary, or whether a certain substring is present in a string. We do so using the following conditional: `x in y`

We first check a condition using `if`. For example, we could check whether our variable is smaller than 20, and print something if it is.

In [None]:
my_var = 12

if my_var < 20:
    print("This is a small number")

As it is now, if our variable is greater than or equal to 20, our code doesn't do anything and will just move on. If we want to do something else if this is the case, we can use an `else` statement.

In [None]:
my_var = 12

if my_var < 20:
    print("This is a small number")
else:
    print("This is a big number")

Let's say we're looping through a list using the `range()` function. We want to print each value of the list *unless* it's the first element, in which case we want to print that value multiplied by 2. Try to figure out what the below code will print before running it.

In [None]:
my_list = [5,4,3,2,1]

for i in range(len(my_list)):
    if i == 0:
        print(2*i)
    else:
        print(i)

### Booleans

When we're checking conditionals, we're actually evaluating another data type: **booleans**. 

Very basically, booleans can have one of two values: `True` or `False`. Let's check what happens if we *print out* one of our conditions:

In [None]:
print(2 < 4)

When we're checking conditionals, we're checking whether the expression after the `if` evaluates to `True` or `False`.

One thing we can also do is reverse a boolean value by writing the word `not` before the value.

In [None]:
print(not (2 < 4))

We can also check multiple booleans using the `and` and `or` operators.

When using `and`, if **both** of the two expressions on either side of the `and` evaluate to `True`, the whole expression will evaluate to `True`. If only one evaluates to `True` while the other is `False` (or, intuitively, if both are `False`), the whole expression will evaluate to `False`.

In [None]:
print((2<4) and (3>4))

In [None]:
print((2<4) and (3<4))

When using `or`, if **either** or both of the two expression on either side of the `or` evaluate to `True`, the whole expression will evaluate to `True`. 

In [8]:
print((2 < 4) or (3 > 4))

True


In [None]:
print((2 < 4) or (3 < 4))

## Building Functions

The last topic we need to discuss is functions. We've seen several examples of built-in functions already (like `split()` or `upper()`, but now we're going to learn how to make our own. 

At its most basic, a function takes some number of inputs and *returns* some output or outputs (typically based on those inputs). As discussed above, when we say a function **returns** an output, we mean that we can store the output of the function to a variable. Some functions do not return an output, but instead do some operation (for example, the `print()` function, as we saw, does not return an output; assigning the output of `print(my_var)` to a variable will result in a variable with the value of `None`).

See the [Built-in Python Functions](#built-ins) section for more information. 

A function has the following structure:

```
def my_function(<arg1>, <arg2>, ...):
    {do something}
    return <output1>, <output2>, ...
```

First, we name our function. Above, we named it `my_function`. Then, we decide on the inputs (called <font color='green'>**arguments**</font>) we want, and we put those in parentheses and separated by commas.

The body of our function contains information (or instructions) on what we want the function to do; this often be involves manipulating the inputs in some way. 

At the end of the body of our function, we have a `return` statement that will "output" what we tell it to. This output can be stored as a variable, used in a conditional, added to a list, etc.

Let's write a simple function that takes two numbers (`x` and `y`) and adds them together.

In [None]:
def our_add_function(x, y):
    added_value = x + y
    return added_value

Now we <font color='green'>**call**</font> our function. To call a function simply means to run it; we do so by writing the name of the function  followed by parentheses and writing the arguments inside the parentheses. Let's call `our_add_function()` on two numbers and store that output in a variable we call `added_number`. 

In [None]:
added_number = our_add_function(3, 5.5)

print(added_number)

Let's explore a little more deeply what's happening here. Like with `for` loops, the argument names chosen when defining your function are *temporary* variable names. When defining our function, we named our arguments `x` and `y`, and it is these variable names we use within the body of our function.

When we call our function, the arguments we pass the function are stored in those temporary variables. For the above example, when we call `our_add_function(3, 5.5)`, the `x` variable in our function is set to `3`, and the `y` variable is set to `5.5`. 

We can also write a function that returns two (or more) outputs. When we do so, the output is stored in a data type called a **tuple**, which is similar to a list. Like a list, we can index a tuple to grab an element at a specific position.

Let's write a function that takes two numbers (`x` and `y`) as inputs and returns their sum, and also returns their product.

In [None]:
def our_double_function(x, y):
    return x+y, x*y

As before, let's call our function on two numbers and store the output in a variable.

In [None]:
results = our_double_function(2, 3)

print(results[0])
print(results[1])

Python is actually smart enough to store both outputs to variable names at the same time, if you want to.

In [None]:
our_sum, our_product = our_double_function(2, 3)

print(our_sum)
print(our_product)

# Bonus Materials 


## Sets

Sets are the last data type we'll discuss in this notebook. Sets have similarities with both lists and dictionaries, and are an incredibly powerful tool in Python.

Sets are like lists in that they comprise a group of individual elements. Like in lists, these elements can be different data types: integers, floats, or strings.

<font color='red'>**NOTE:**</font> Unlike lists, sets cannot contain other sets, lists, or dictionaries. The reason why is far beyond the scope of this tutorial. For now, just try to remember that sets can contain the three most basic data types (integers, floats, and strings) and none of the more "complex" data types (lists, dictionaries, sets) that are built on the more basic data types.

Sets are like dictionaries in that:
1.   Sets are not ordered. There is no "zeroth" or "first" element in a set
2.   Elements cannot be repeated within a set

A set is defined using curly brackets `{}` surrounding the elements, and elements are separated with commas `,`. Let's take a look at an example below.

In [None]:
sample_set = {1,2,3}

<font color='red'>**NOTE:**</font> If you want to make an empty dictionary, you can simply just use curly brackets with nothing inside of them `{}`. If you want to make an empty set, you can't do the same thing, because Python will assume that it is a dictionary. Instead, you can simply write `set()`. Take a look below.

In [None]:
# Recall that `type()` is a built-in function that tells you the data type of whatever argument you pass it
empty_dict = {}
print(type(empty_dict)) 

empty_set = set()
print(type(empty_set))

If you want to add an element to a set, you can use the `add()` method (this works the same way as the `append()` does for a list). Let's take a look at the example below.

In [None]:
sample_set = {1,2,3}
print(sample_set)

sample_set.add(4) 
print(sample_set)

Note that if you try to add an element that's already in the set, nothing happens; each element can only be in a set once.

In [None]:
# What happens if you try to add an element that's already in the set?
sample_set.add(2)
print(sample_set)

Finally, as with dictionaries, we can loop through sets, though recall that sets are not ordered (so the output may not match the order in which you initally defind the set). Let's take a look at an example:

In [12]:
sample_set = {"one", 2.0, 3}

for num in sample_set:
    print(num)

3
one
2.0


## The setdefault() Method

The `setdefault()` method acts similarly to the `append()` and `add()` methods but is specific to *dictionaries*.

The `setdefault()` method takes two arguments: a key, and an associated value. When calling the method on a dictionary, it does a few things. First, the method checks whether the key passed to the argument is present in the dictionary. If the key *is* already in the dictionary, it does nothing. However, if the key *is not* in the dictionary, it adds that key to the dictionary, with the value you also passed to the method.

The syntax is as follows: `<dict_name>.setdefault(<key>,<value>)`

Let's take a look at an example below.

In [13]:
sample_dict = {'Name' : 'Alice',
               'Age' : 25}

sample_dict.setdefault('Favorite Animal', 'Dog')
print(sample_dict) # How does this change the dictionary?

sample_dict.setdefault('Favorite Animal', 'Cat')
print(sample_dict) # How does this change the dictionary?

{'Name': 'Alice', 'Age': 25, 'Favorite Animal': 'Dog'}
{'Name': 'Alice', 'Age': 25, 'Favorite Animal': 'Dog'}


See if you can figure out whether the `setdefault()` method returns an output in the cell below.

In [None]:
# Use this space to check whether `setdefault()` returns an output

## List comprehension

This is a fairly advanced concept and is  not required to understand for this course. That being said, it is a useful skill to have if you are interested in doing any sort of programming in the future.<br/><br/>

Let's say that we have some list of numbers and we want to create a new list where the elements are the elements of the intitial list multiplied by 2. We should be able to do that without any new functionality by using a for loop.

Before you look at the code cell below, try to figure out how you might do this.

In [None]:
starting_list = [1,2,3,4,5,6]

new_list = [] # Start with an empty list. We'll be filling this list with our new values.
for x in starting_list:
  new_list.append(2*x)

print(new_list)

Python actually has built-in functionality to do this more efficiently, called <font color='green'>**list comprehension**</font>. To generate a new list by applying an operation (or operations) to each of the elements in a starting list, you can use the following syntax:

`<new_list> = [{operations on <temporary variable>} for <temporary_variable> in <starting_list>]`

Here, `operations` just represents the set of operations you're doing to the elements in `starting_list`.

So let's see how we would replicate our "multiply by two" `for` loop above using list comprehension.

In [None]:
starting_list = [1,2,3,4,5,6]

new_list = [2*x for x in starting_list]

Here, the `2*x` is the `operations` discussed above. You can imagine, we could include quite complex operations if we wished.