# Lesson 3: Tests, Loops and Escapes <a name='home' />

Last time we used Python to learn about types of data and ways to work with classic data types such as strings, lists, and dictionaries. We also explored libraries such as `pandas` and `matplotlib`, so you could see how quickly datasets can be analyzed and developed into useful visualizations. 

Table of Contents:
- <a href=#bookmark0>0. Online sources for more Python and Overview.</a> 
- <a href=#bookmark1>1. The Syntax of IF.</a> 
- <a href=#bookmark2>2. The Nature of Truth.</a> 
- <a href=#bookmark3>3. for loops.</a> 
- <a href=#bookmark4>4. Informative Interlude: Mutable and Immutable Objects.</a> 
- <a href=#bookmark5>5. enumerate().</a> 
- <a href=#bookmark6>6. while loops.</a> 
- <a href=#bookmark7>7. Escaping Loops.</a> 
- <a href=#bookmark8>8. List and dictionary comprehensions</a> 
- <a href=#bookmark9>9. Nested Structures</a> 

## 0. Online sources for more Python and Overview

There are many resources online for learning Python, and we go very quickly through the basics. If you want more places to look for Python tutorials and lessons, these are links that others have previously shared with us - if you have one to share, let us know and we will add them on. 
1. [Python3](https://docs.python.org/3/tutorial/index.html) - This is the official Python3 tutorial, and documentation on Python 3 can be found on this website.
2. [ChatGPT](https://realpython.com/chatgpt-coding-mentor-python/) - ChatGPT is supposed to be great for troubleshooting code, and this article describes how to use ChatGPT as a mentor. 
3. https://www.tutorialspoint.com/python/
4. https://wiki.python.org/moin/BeginnersGuide/Programmers (many links)
5. https://wiki.python.org/moin/ChineseLanguage (links to Chinese tutorials)

The main goal of today's lesson is to give you the tools for automating your code to run over large amounts of data, and provide conditional statements that will let you do different things to your data depending on your conditions. The major grammatical structure we will use involves the following:

1. **if**
2. **for**
3. **while**

The objective, when you begin programming, is to make the computer do your work for you. Instead of normalizing that microarray data or computing that crystal structure fit with a pocket calculator and a bucket of coffee, you want to be able to say, 'Computer, do my work,' and then have it done. Of course, you'd have to say it a little more clearly than that -- for instance, to pick out over-expressed membrane proteins from microarray data, you could tell the computer that for each gene in the genome:

"If the protein is located in the membrane, tell me if that protein's expression level under a given set of experimental conditions [stored in column 2, for example] is 5-fold or more greater than the expression level under control conditions [stored in column 4, for example]."

orrr 

"If the [National Land Cover Dataset](https://www.usgs.gov/centers/eros/science/national-land-cover-database#data) pixel has a value between 21-24, reclassify it as simply, 'developed.'"

    Q: What's the magic word in each of these questions?

    A: if

The **if statement** is one of the most fundamental components of programming, and it is found in some form or another in every general-purpose programming language (and nearly all specialized ones, too). It simply tells the computer to execute a block of code if a given statement is true. If the statement is false, the code is skipped. Although the concept is simple, conditional statements like **if** allow your programs to make decisions on their own.

Now, how can we ask the above question about every protein in our dataset of 5, or 20, or 10,000 proteins? Or, we have a raster with over 10,000 pixels? 

The really great thing about computers is their ability to do things over and over again really quickly, without getting bored. It's easy for us to do stuff on a handful of data points, but more than a couple dozen (or two, depending on how much patience we have) and we'd really start wishing the computer was doing it for us. (Prof Spera's favorite mental game with ArcGIS is: "Do I click all the buttons 40 times, or do I take the time to code it?" And usually, when she chooses 'click the buttons 40 times' she wishes she's coded it because she will inevitably be redoing whatever she did more than once).

Every programming language has a way to do this, looping over a section of code multiple times. Python has two kinds of loops: the **for** loop and the **while** loop.

In principle, anything you can do with one you should be able to do with the other somehow, but it's often a lot cleaner to pick the appropriate one for the situation. Hopefully once you've seen how each one works, it will be reasonably obvious which one is best for any given job at hand.

With the **if** statement, you can now introduce logic into your scripts, and with **for** and **while** loops, you introduce repetition into your scripts. By the end of today's lesson, you will have the foundational core to write code for almost anything you can imagine (though perhaps not with the greatest efficiency...that will come!). 

# 1. The Syntax of *if*  <a name='bookmark1' />
In Python, we construct our **if** statements with this general form:

```python
if < whatever statement whose condition must be True>:
    execute code
    execute more code
```

The key pieces of syntax here are the **if**, the condition that we would like to evaluate (note: the condition is expressed within angle brackets **<>** for emphasis in this example, but angle brackets are not used in real code), the colon **:** that signifies the end of the logical test, and the indentation for the things to do below the first line of the statement.

In [1]:
temperature = 30

if temperature > 25:
    print(f"{temperature}. that's hot.")

30. that's hot.


We will discuss what makes things **True** in a minute, but for now we should focus on the syntax. Look at the code above. Beyond the **if** and **":"**, Python will execute all of the following indented code. Indentation is the signal to the interpreter that this block of code is under the control of the **if** statement. 

Try changing the temperature in the code above to '5'. Note that the **print** statement did not occur because the statement was printed *only if* the temperature is above 30C. 

Speaking of decision making, we could use this syntax to write a program that spares us the trouble of deciding where I can eat each each day. The scenario is "I need a script that will help me decide what I can eat today." To plan out this code, I think about the steps I need.

1. A method of indicating how much money I have.
2. Different outcomes for each of my conditions.
3. A means to communicate the outcome.

Below is the first go of my code. It is going to throw an error at me.

In [None]:
wallet = input("How much money is in my wallet? ") 

##What is my error? Try a few numbers, what happens?
##What did I change in the next cell to avoid this error?

print (f"I have ${wallet}.")

if wallet < 15:
    fooditem = 'D-Hall'
    
if wallet >= 15:
    fooditem = 'Organic Krush'
 
print (f"We are eating at {fooditem} today.")

What's the error above? It should say:

```python
TypeError: '<' not supported between instances of 'str' and 'int'
```

This is because the input you give is a number, but Python reads it in as a `string`. It can't compare a string to a number for equality, so it spits out an error saying something is wrong with the data type. 

Brainstorm how you would fix this code to make it work before moving on. 

Below is another version where I changed my conditions and how I printed the final outcome. 

In [None]:
wallet = input("How much money is in my wallet? ") 

wallet=int(wallet)

print (f"I have ${wallet}.")

if wallet < 15:
    fooditem = 'D-Hall'
    
if wallet >= 15:
    fooditem = 'Organic Krush'
 
print (f"We are eating at {fooditem} today.")

Hopefully you made the same or a similar fix! We said that whatever was input into the script had to be changed to the `int` type, which is a number that is now comparable for an equality condition. Note that while this works better, it's not perfect - if you put in characters, you'd now get an error because they can't be converted to integers. But we won't worry about that and just put in digits!

Notice that though all the code above works great, the result is nonsensical - if you had $20, you most certainly could eat anywhere on campus - this type of error is a **logical error**. Logical errors are errors that do not break the code - that is, the rules of coding are not broken, but your output is not what you intended. The mistake in these cases are often due to a mistake in the reasoning on how the program is working. These errors are much harder to solve because there are no handy errors to let us know something is wrong! We'll learn more about how to build in error troubleshooting in a later lesson, but a good first step is printing your variables out a lot, and testing whether what gets printed matches what you would have expected based on your understanding of the code. 

**The key to coding is a deep understanding of what you are coding.** 

Bio kids: If you need to go from the DNA sequence to the translated amino acid sequence, you need to have a deep understanding of complementary base pairing, the redundancy for amino acid coding, how alternative splicing works, and how to recognize a gene! You may be converting from one sequence to another in code, but whether your final result is BIOLOGICALLY MEANINGFUL will always be rooted in your understanding of biology. Thus, computation is a methodological skill that assists in answering your biological questions - don't forget that! While you may spend a lot of time figuring out code, it's just as important to remember you spent valuable time developing the question and background the code is being developed for. 

Geog kids: If you need to iterate through hundreds of satellite data rasters and do complex raster calcuations like calcuating NDVI to back out a rough estimate of land-surface emissivity so you can look at thermal Landsat data because you're interested in the urban heat island, you need to fully understand the how NDVI is connected to emissivity, and how that connects to measured temperature. Or else, you could write all this code and not know if your results are meaningful.


Let's go back to our food example, we might want this to be a bit more generalized. For instance, D-hall became more expensive (inflation!)? We would have to change multiple numbers here. A good catchall is the **else** statement, which says if an **if** statement is not satisfied, then do this other command:

```python
if wallet > 0: 
    print ("I have money to eat!")
else: 
    print ("My eyes feast but my stomach growls.")
```

Furthermore, we might have more conditions we want to add then just two. This uses the term **elif** (standing for "else if"), which allows the inclusion of multiple conditional statements. When using **elif**, each conditional statement is run through in order. Thus, if two conditional statements overlapped (i.e., x<5 and x<6), then it would only execute the commands related to the first conditional statement in the script. 

In [1]:
wallet = float(input("How much money is in my wallet? "))

print (f"I have ${wallet:.2f}.")

if wallet < 10:
    fooditem = 'a banana stolen from d-hall yesterday'
elif wallet < 15:    
    fooditem = 'at D-Hall'
elif wallet < 100:
    fooditem = 'at Organic Krush'
else: 
    fooditem = "off campus"
 
print (f"We are eating {fooditem} today.")

How much money is in my wallet? 300
I have $300.00.
We are eating off campus today.


Now we have a script we can use every morning to determine what we can eat! All our decision making problems are solved, right?

## Knowledge Check

Write code below that will print "it is above freezing out" if the temperature is above freezing, "it is below freezing out" if the tempeature is below freezing", and "it's literally freezing out" if the temperature is 0C.  

In [3]:
# write your freezing code here
temp = int(input("What's the temperature - IN C"))

if temp > 0: 
    print("it is above freezing out")
elif temp == 0:
    print("it is literally freezing")
else:
    print("it's cold")


What's the temperature - IN C-40
it's cold


<a href=#home>Return to Top</a> 

# 2. The Nature of Truth  <a name='bookmark2' />

What is truth, anyway? Here, we have covered some simple conditional statements the program checks for "truth".

In [None]:
wallet = float(input("How much money is in my wallet? "))

print (f"I have ${wallet:.2f}.")

if wallet < 10: print ("less than")
if wallet >= 10: print ("greater than or equal to")
if wallet != 10: print ("not equal to")
if wallet == 10: print ("equal to")

These logical operators (**==,!=,>=,<=,>,<**) evaluates whether the value of the two operands on either side are equal or not, and if they are, the statement evaluates to **True**. To learn more about operators, [go here](http://www.tutorialspoint.com/python/python_basic_operators.htm).

In [None]:
wallet = 5

x1 = wallet < 10
x2 = wallet == 15

print ('x1', x1)
print ('x2', x2)

These equality statements mean either **True** or **False**, and if you were to set them to a variable, they would have the value of **True** or **False**.

Truth has a formal meaning in Python, as it does in all programming languages, and it extends to more than just equality statements. 

In short:

Any nonzero number, nonempty object, and the special object **True** are things Python believes to be **True**.
Zero numbers, empty objects, and the special objects **None** and **False** are things Python believes to be **False**. 

The function **bool** will tell you whether your object is **True** or **False** 

```python
bool(object)
```


In [None]:
## After running the code, 
## try substituting 1 with each of the following: ['a'], '123', 0, []

myobject = 1
if bool(myobject) == False: print (myobject, "is False!")
if bool(myobject) == True: print (myobject, "is True!")

## 2.1 *and*, *or*, and *not*

Finally, what if we want to evaluate more than one **if** statement at once? The Boolean operators **and**, **or**, and **not** can help us out. The operator **and** will only return **True** if both accompanying statements are **True**, while **or** will return **True** if either accompanying statement is **True**. **not** returns the inverse of the logical value of the statement given.


In [None]:
true_statement = 'truth itself' # non-empty strings are true
false_statement = '' # empty strings are false
 
print (true_statement, bool(true_statement))
print (false_statement, bool(false_statement))
print ()

if true_statement and false_statement: ##if both are True
    print ("'and', This should not print.")
    print ()
else:
    print ("'and', This should print because one is False and both need to be True.")
    print ()
 
if true_statement or false_statement: ##if at least one is True
    print ("'or', This should print because one variable is True.")
    print ()
 
if not false_statement:  ##if not True == False
    print ("'not', This should print because 'not' false_statement is True.")
    print ()
 
if not (true_statement and false_statement): ##if none or one is False, then the "and" statement is False, so "not" of it is True
    print ("'not' + 'and', This should print because one variable is False, so 'not' False is True.")
    print ()

If statements (and for loops below) are so vital to writing code, that we snuck in an introduction last lesson, because it is so hard to construct exercises without them. Do you remember the following?

In [4]:
topic = {'Jennifer':'crispr','Barbara':'transposons','Ruth':'blood groups',
           'Janaki':'plants','Ruby':'cancer'}

if 'Barbara' in topic: print (topic['Barbara'])
if 'Charles' in topic: print (topic['Charles'])

transposons


Here, we used the if statement in reference to lists/dictionaries using in. For a dictionary as above, it asks if 'Barbara' or 'Charles' are in the list of keys belonging to favorites. This includes:

```
topickeys = ['Jennifer', 'Barbara', 'Ruth', 'Janaki', 'Ruby']
'Charles' is not in the topickeys list but 'Barbara' is in the list, and thus

bool('Charles' in favorites)  = ??
bool('Barbara' in favorites) = ??

```

Thus, in lists and dictionaries, you use **in** or **not in** for your conditional statements.

In [5]:
topic = {'Jennifer':'crispr','Barbara':'transposons','Ruth':'blood groups',
           'Janaki':'plants','Ruby':'cancer'}

print ("using in")
print (bool('Charles' in topic))
print (bool('Barbara' in topic))
print ()

print ("using not in")
print (bool('Charles' not in topic))
print (bool('Barbara' not in topic))
print ()

print ('using in and not in with if statements')
if 'Charles' not in topic: print ('Charles does not exist in topic') 
elif 'Charles' in topic: print ('Charles does exist in topic')
else: print ('This should not print anything, as the top two catches all conditions')

using in
False
True

using not in
True
False

using in and not in with if statements
Charles does not exist in topic


## Knowledge Check
Use the following format to evaluate whether the objects presented below (x) are True or False. Before testing it in a script, try to guess for yourself whether it is True or False. 
```python
query = x #fill in blank
if query:
    print ('Query',query,'is true')
else:
    print ('Query',query,'is false')
```

```python
query = "" ##the empty string
query = []
query = {}
query = [[]]
query =  0
query = [0]
query = [0][0] ##also-- what is this? Print out the value of [0][0] and [1][0] before doing the truth test)
query = [[ ]] [0]
```

In [18]:
query = [1] #fill in blank
if query:
    print ('Query',query,'is true')
else:
    print ('Query',query,'is false')

    query = "" ##the empty string
query = []
query = {}
query = [[]]
query =  0
query = [0]
query = [0][0] ##also-- what is this? Print out the value of [0][0] and [1][0] before doing the truth test)
query = [[ ]] [0]

Query [1] is true


## Knowledge Check x2

Let's go back to the weather. Write code below that helps us decide what to do. If it's raining, or the wind is greater than 8 m/s (8 m/s  [17 mph] is the limit for a "fresh breeze", or *navakka tuuli* in Finnish - they're a robust people), print, "Just stay home." But, if it's not raining or windy, print, "Go out and enjoy the weather."

The code has been started for you below

In [None]:
# useful variables
weather = "rain"
wind_speed = 9

<a href=#home>Return to Top</a> 

# 3. *for* loops  <a name='bookmark3' />

The **for** loop allows you to perform the same actions multiple times, but with different data each time (some languages have an equivalent construct and call it a for each loop). We also used this concept before when talking about lists, we just didn't explain what was going on:

In [None]:
names = {'Jennifer': 'Doudna', 'Barbara': 'McClintock', 'Ruth': 'Moore', 'Janaki': 'Ammal', 'Ruby': 'Hirose'}
topic = {'Jennifer':'crispr','Barbara':'transposons','Ruth':'blood groups',
           'Janaki':'plants','Ruby':'cancer'}

keys = names.keys()

for x in keys: 
    print (f"{x} {names[x]}'s topic of study is {topic[x]}.")

The general syntax here is that 
1. **for** goes through each item in the list (or tuple or dictionary) after the **in**, 
2. the item is stored in the variable name before **in**, 
3. the code after the colon (usually on next line) is executed, assuming the variable is the stored item. 
4. Once it's done all that code, it gets a new element from the list, and repeats.

Just like an **if** statement, if you want to do more than one thing inside the loop, you can start a new block of indented lines after the colon, and then when you're done with the code you want to run every time, go back to the original indentation. 

```python
for VARIABLE_NAME in CONTAINER:
    DO_SOMETHING
    DO_SOMETHING_ELSE
    ....
 
# is the same as
 
VARIABLE_NAME = CONTAINER[0]
DO_SOMETHING
DO_SOMETHING_ELSE
 
VARIABLE_NAME = CONTAINER[1]
DO_SOMETHING
DO_SOMETHING_ELSE
 
VARIABLE_NAME = CONTAINER[2]
DO_SOMETHING
DO_SOMETHING_ELSE
...
# and so on
```

I'm sure you can imagine how frustrating it is to manually write out each set of commands again and again for each element in your list. The **for loop** greatly simplifies and shortens your code, allowing for easier reading, editing, and avoidance of errors. Combined with the **if** statement, this allows for easy organization of code and short scripts with a powerful ability to run through large blocks of data. 

The following is an example of a script including ***for loops and if statements***. See if you can comment what is happening with the for loop and each conditional statement. 

In [19]:
##based on wikipedia (https://en.wikipedia.org/wiki/Dynasties_in_Chinese_history)
earlydynasties = ["Xia","Shang","Zhou","Qin","Han","Three Kingdoms","Jin",
                  "Northern and Southern","Sui","Tang"]
startdate = [-2070,-1600,-1046,-221,-206,-220,265,420,581,618]

for dynasty in earlydynasties:
    dynastyindex = earlydynasties.index(dynasty)
    
    if startdate[dynastyindex] < 0: 
        startdatevalue = -startdate[dynastyindex]
        datename = "BCE"
    else: 
        startdatevalue = startdate[dynastyindex]
        datename = "AD"
        
    if dynasty == "Three Kingdoms": whattype = "period"
    elif dynasty == "Northern and Southern": whattype = "dynasties"
    else: whattype = "dynasty"
        
    print (f"The {dynasty} {whattype} began {startdatevalue} {datename}.")

The Xia dynasty began 2070 BCE.
The Shang dynasty began 1600 BCE.
The Zhou dynasty began 1046 BCE.
The Qin dynasty began 221 BCE.
The Han dynasty began 206 BCE.
The Three Kingdoms period began 220 BCE.
The Jin dynasty began 265 AD.
The Northern and Southern dynasties began 420 AD.
The Sui dynasty began 581 AD.
The Tang dynasty began 618 AD.


Finally, `range()` is a useful function for iterating over numbers.

In [20]:
mylst=range(4)
print (mylst)

for x in mylst:
    print ('number:', x)

range(0, 4)
number: 0
number: 1
number: 2
number: 3


## 3.1 Mutating lists with loops
When we were discussing lists, we said that they were mutable. We showed that in a couple of ways, such as using the **sort()** method. 

It is reasonable to think that code such as
```python
li = [1,2,3]
for x in li:
    x = x + 42
```
would change the list **li** to contain [43,44,45]. However, ***this is not the case***, as shown in the cell below.

In [22]:
li = [1,2,3]
print ('before loop:', li)
for x in li:
    x = x + 42
    print(x)
print ('after loop:', li)

before loop: [1, 2, 3]
43
44
45
after loop: [1, 2, 3]


To actually update the **for loop**, we instead can do: 
```python
li1 = [1,2,3]
for x in range( len(li1) ):
    li1[x] = li1[x] + 42
```
Now, we are directly updating the list!

In [23]:
li1 = [1,2,3]
print ("'li1' before loop:", li1)
for x in range( len(li1) ):
    li1[x] = li1[x] + 42
print ("'li1' after loop:", li1)

'li1' before loop: [1, 2, 3]
'li1' after loop: [43, 44, 45]


In the above, it is important to think about what you are updating. In the first example, you were updating `x`, but `x` was reset each time in the **for loop**. In the second example, you were updating the element in the list at the index of interest - thus your list changed. 

## Knowledge Check

Write a for loop with an **elif statement** below to change the list of students' first names to include their home department

In [None]:
students = ["eli", "flora", "gloria", "katie", "marian", "marlo", "matteo", "mia", "michelle"]
#so many m names, btw

#goal output
#students = ["eli-geo", "flora-bio", "gloria-bio-geo", "katie-geo", "marian-bio", "marlo-bio", "matteo-geo", "mia-geo", "michelle-bio"]

<a href=#home>Return to Top</a> 

# 4. Informative Interlude: Mutable and Immutable Objects  <a name='bookmark4' />
When we say that something is *mutable*, we mean that we can change the value without destroying the original data structure. To understand what this means, let's first discuss things that are *immutable*, such as strings and integers.

In [None]:
x=5
print(x)
## We can't make 5=10, but we can reassign x so x=10.
x=10
print(x)

The string or integer or float is immutable. For each variable where we assign a string or integer, you cannot change the string or integer - instead, when you make a change, you are reassigning a 'new' string or integer to the same variable. 

For lists and dictionaries, which can get very large, it can be inefficient to always recreate the entire list or dictionary with a few elements changed. Python in fact does not delete the list or dictionary and overwrite a variable with a new similar, but slightly updated list or dictionary. Instead, when we do 
```python
x=[1,2,3]
x[1]='a'
x=[1,'a',3]
```
Python is only changing the portion of the list that was updated from a 2 to an 'a'. This means that instead of having to make resource-consuming copies of our data, we can change the mutable variables directly. Lists and dictionaries are mutable; tuples and strings are not.

In [None]:
S = 'Spam'
print (S)
 
S = 'z' + S[1:]
print (S)

In [None]:
# Let's make a list
a = ['a', 'b', 'c', 'd']
b = a
print (a, "is the original value of 'a'")
print (b, "is the original value of 'b'")
print () # prints an empty line just so it looks nice below

b[1] = 'Y'
print (a, "is the value of 'a'")
print (b, "is the value of 'b'")


Above, we only changed an index in the list `b` but both `a` and `b` were updated. What happened? 

When any object is instantiated (created) and assigned to a variable, what is actually happening is that the variable 'points' to where the object is located. It remembers the 'address' of the object and when you refer to a variable, the variable points to where the object is stored using the saved 'address' and retrieves the object for use. For immutable objects like strings and tuples, this subtlety does not matter much, as each time they are altered, you are actually overwriting the entire variable, and the variable points to an entirely new object (that may look identical to the previous one). 

For mutable objects, however, this subtlety is important. For 
```python
a = ['a', 'b', 'c', 'd']
a=b
```
what is actually happening is that you are saying `b` will remember the same address to an object as for `a`. That is, `a` and `b` both point to the exact same object! Thus, if you use one variable to update the mutable object, then all variables pointing to the same object will display the updated object, 

A metaphor is thinking of python's memory as a giant warehouse full of stuff, and in the front office of that warehouse is a catalog telling the names of all the stuff, and where each thing is. When you make a new variable, Python does two things: it puts that new thing into the warehouse (it instantiates an object), and it updates the catalog with a new entry (it stores the address of the object in your variable). When you make a copy of a variable (`a=b`), Python simply puts a new card in the warehouse catalogue with the address of the same object.

In the following example, we use **is**, which is different from **==** in that it asks whether the two variables are referring to the exact same object. 



## Knowledge Check

**Try running the following code through this [code visualization](http://people.csail.mit.edu/pgbovine/python/tutor.html#mode=visualize), which may help to wrap your head around what is happening here.**


In [None]:
# Lists are mutable
a = [1, 2, 3]
b = [1, 2, 3]
c = a

print ('a is b', a is b)
print ('a is c', a is c)
print ()

c[0]='o'
print (a)
print (b)
print (c)

<a href=#home>Return to Top</a> 

# 5. enumerate()  <a name='bookmark5' />

Before we continue, I wanted to point out a helpful function, `enumerate`. Above, to get the correct **range**, we used a somewhat awkward function within a function:
`for x in range( len(li1) ):`

In [None]:
li1 = [1,2,3]
print ("'li1' before loop:", li1)
for x in range( len(li1) ):
    li1[x] = li1[x] + 42
print ("'li1' after loop:", li1)

We can change this to: `for x,val in enumerate(li1):`

`enumerate` is a useful function that acts on lists. It returns two objects, the index of the element in the list, and the element itself. See below for two examples.

In [None]:
li1 = [1,2,3]
print ("'li1' before loop:", li1)
for x,val in enumerate(li1):
    li1[x] = li1[x] + 42
print ("'li1' after loop:", li1)

The following code from the start of the for loop section on Chinese dynasties has been updated to use `enumerate`. 

In [None]:
##switched to using enumerate!

##based on wikipedia (https://en.wikipedia.org/wiki/Dynasties_in_Chinese_history)
earlydynasties = ["Xia","Shang","Zhou","Qin","Han","Three Kingdoms","Jin",
                  "Northern and Southern","Sui","Tang"]
startdate = [-2070,-1600,-1046,-221,-206,-220,265,420,581,618]

for dynastyindex,dynasty in enumerate(earlydynasties):
    #dynastyindex = earlydynasties.index(dynasty)
    
    if startdate[dynastyindex] < 0: 
        startdatevalue = -startdate[dynastyindex]
        datename = "BCE"
    else: 
        startdatevalue = startdate[dynastyindex]
        datename = "AD"
        
    if dynasty == "Three Kingdoms": whattype = "period"
    elif dynasty == "Northern and Southern": whattype = "dynasties"
    else: whattype = "dynasty"
        
    print (f"The {dynasty} {whattype} began {startdatevalue} {datename}.")           

## Knowledge Check


Take your list of student names (copied into your code block below to get you started) and print them out so that you list them in order of appearance in the list. Your output should look like the following:

    0 eli
    1 flora
    2 gloria
    3 katie
    4 marian
    5 marlo
    6 matteo
    7 mia
    8 michelle

In [None]:
students = ["eli", "flora", "gloria", "katie", "marian", "marlo", "matteo", "mia", "michelle"]    

<a href=#home>Return to Top</a> 

# 6. *while* loops  <a name='bookmark6' />
The rough format of a **while** loop is similar to a **for** loop: there's a colon at the end of the first line, and an indented block of code that gets run every time through. Despite this general similarity, **while** acts a little differently than a **for** loop. Instead of giving a list of items to iterate over, a **while** loop continues until the statement between **while** and the colon no longer has truthiness (e.g. False, (), [], '', etc).

Be careful as it is VERY VERY EASY to make a while loop that never ends - in these cases, the only way to break the script is to interrupt the kernel (Ctrl+C when working on the Terminal, the square symbol at the top when on Jupyter Notebook). Sometimes if you were printing a lot to the screen or adding a lot to a list or dictionary, this may cause your notebook to crash or freeze - this is because the memory has filled up. If this happens on Terminal, use Ctrl+C to get the command to stop running. If this happens on Jupyter Notebook, click the square symbol at the top, and you may want to reset the kernel using the circular arrow symbol. 

The following script uses the **while** statement to remove the first dynasty in the list. The while loop continues until it's given an empty list, which would mean `False`, thus breaking the while loop. 

In [None]:
earlydynasties = ["Xia","Shang","Zhou","Qin","Han","Three Kingdoms","Jin",
                  "Northern and Southern","Sui","Tang"]
startdate = [-2070,-1600,-1046,-221,-206,-220,265,420,581,618]

numdynasties = 0
while earlydynasties:
    print (earlydynasties)
    earlydynasties = earlydynasties[1:]
    numdynasties += 1
    
print ()
print (f"There are {numdynasties} dynasties listed.")

## Knowledge Check
The following code shows a while loop in action. Note the question in the comment, and explain why that line is important. Try commenting out that line - what happens? Be ready to use our tips above to use your notebook again. 

In [None]:
num=0

while num<5:
    print (num)
    num += 1 ##THIS LINE IS VERY VERY IMPORTANT TO NOT CRASH YOUR NOTEBOOK! Why?


<a href=#home>Return to Top</a> 

# 7. Escaping loops  <a name='bookmark7' />
Occasionally, you might want to get out of a loop before the truth statement is met (with a **while** loop) or you've gone through every element (with a **for** loop). In fact, some loops are designed such that the control condition at the top of the loop is never met! You can modify the default flow of the loop using **break** and **continue**. The keyword **break** ends the loop right where you are, while the keyword **continue** goes back to the top of the loop (bringing in the next item in the list if it's a **for** loop).

In [None]:
while True:
    number = input("Number to test for primeness: ")
    
    # Quit if nothing is entered.
    if number == '':
        break
    else:
        # Convert the entry into a float from a string
        number = float(number)
    
    # Prime numbers are >1 by definition.
    # If a number <= 1 is entered, stop and start over.
    if number <= 1:
        print ('Please enter a number greater than 1')
        continue
    
    prime = True
    x = 2
    while x < number:
        # Use module to test if x is a divisor of number
        # if so, the number is not prime, stop the search
        if number % x == 0:
            print ('Not prime,', x, 'is a factor')
            prime = False
            break
        x = x + 1
        
    if prime:
    #else:
        print (number, 'is prime!')

In this second example there are two loops. The top **while** loop will run until the user enters a blank input, but will otherwise constantly ask the user for numbers to test. If the number entered is <=1, we don't even bother checking for divisors and the loop goes back to the **while** logical expression.

In the second loop, if the user enters a number >1, we assume the number is prime, then check every integer between 1 and 'number' to see if it's a divisor. If we find a divisor, we know that 'number' is not prime, so set 'prime' to **False**, then use **break** to stop checking the rest of the integers. Lastly, if 'prime' is still set to **True**, we report that the number is prime.

You can also use **else** to check whether a **for** or **while** loop finished normally without hitting a **break**. Just like an **if** statement the **else** should be at the same level of indentation as the **for** or **while**, have a colon, and then a block of code to run. This eliminates the need for the 'prime' flag. 

## Knowledge Check

Can you think of something that can break the code? Do you have ideas for how to fix the code to account for more possible inputs?

<a href=#home>Return to Top</a> 

# 8. List and dictionary comprehensions  <a name='bookmark8' />

Often we'll want to change every item in a list or dictionary in a systematic way. 

## 8.1 **List Comprehensions**

For example, we may want to add one to each of a list of integers:
```python
a = range(10) ## a = [0,1,2,3,4,5,6,7,8,9]
for i, x in enumerate(a):
    a[i] = (x + 1)
## a = [1,2,3,4,5,6,7,8,9,10]
```
This is a totally acceptable way to solve this problem. However, because it is very common to commit the same operation on every member of a list, Python provides for us a shortcut that is both syntactically more concise, and computationally optimized to be much, much faster. It's called list comprehension, and it works like this:

```python
a = range(10) ## a = [0,1,2,3,4,5,6,7,8,9]
a = [ x + 1 for x in a ]
## a = [1,2,3,4,5,6,7,8,9,10]
```
A list comprehension is bracketed by [] and has the for loop command at the beginning, followed by the for loop. 

```python
[ my_command for x in my_initial_list if satisfies_condition ]
or
[ my_command if satisfies_condition else my_new_command for x in my_initial_list ]
or
[ my_command for x in my_initial_list if satisfies_condition ]
```

In [None]:
a = range(10) ## a = [0,1,2,3,4,5,6,7,8,9]

a = [ x + 1 for x in range(10) if x % 2==0]
#a = [ x + 1 if x % 2==0 else 0 for x in range(10)]

print (a)

## 8.2 **Dictionary Comprehensions**

By extension, dictionary comprehensions follow a parallel syntax. This can be very handy for creating 'empty' dictionaries that you want to fill later:
```python
# Without a comprehension
base_frequencies = {}
for base in 'ATCG':
    base_frequencies[base] = 0

base_frequencies = {'A': 0, 'C': 0, 'T': 0, 'G': 0}
```

However, like with list comprehensions, there are shorter code we can write:

```python
##With a comprehension
base_frequencies = {base:0 for base in 'ATCG'}

base_frequencies = {'A': 0, 'C': 0, 'T': 0, 'G': 0}
```

How would we count up the number of each nucleotide base within a sequence? What fraction of the sequence is each nucleotide base type?

In [None]:
seq = ('TTACCCGGGGTTAAAGTTGAATATTAAGGAGTGTAGTAGTACGAATAAACCGGCTCGAAC'
       'TATTACCTTTAGGAAATTTAAGTTTAAGTAGGAAGAAAAAATAAAAAAGTTAAAGAAGAA'
       'GGAGGATATAATTAAAGTTTTATAAATATAGAGAAGGTAAAAGAAGCGTTAGAAAAATGG'
       'ATATTAATCTTAGAAAAGATTAATATAATAAAAGACTTTAAATTTACCTTTCGCAAACTT'
       'AATAGAGAATTTATAAGTATTGTAAAGGAATTCGTCCTATATTAAAAGAAAAATTGAAGG'
       'AGGAAAATAAATTAATAAATATTAGAATATAAAATAATTATAAAAGAAGTATAGAAGATA'
       'TACTAGGAGTAGTTTAAGTACCGAATAGTATCGAATTAAAGGGAATTTATTAAAGCTATA'
       'ATAAAAAAGAAGAGGATTATTATAAAGGCTTAATAGGCTACGTAGTACAATAGTATTACC'
       'GAAGCTTCGAAGAATCTAAATCGATTCTAATTATTAGAATAATAGGCTTGGATCCGAAGT'
       )

base_counts = {base:0 for base in 'ATCG'}
print (base_counts)

# Count up all the bases
for base in seq:
    base_counts[base] += 1
print ('base counts:', base_counts)

# Let's calculate the percent of each base in the sequence - here we get the total number of bases.
total_bases = float(len(seq)) # We will want this to be a float because we want a decimal result.
print ('total bases:', total_bases)


In [None]:
##No comprehension used
base_percents = {}
for base, counts in base_counts.items():
    base_percents[base] = (counts / total_bases) #float("%.2f" % ((frequency / total_bases)*100))
print ('base percents:', base_percents)
print (sum(base_percents.values()))

In [None]:
# With a comprehension
base_percents = {base: (counts / total_bases) for base, counts in base_counts.items()}
print ('base percents:', base_percents)

The above three cells illustrates how you can loop through the sequence and count numbers of each nucleotide, then calculate the frequency, and in the last cell, how to write the code in a more compact, computationally more efficient way through list comprehension. 

Note that list comprehension can get very complicated - I suggest writing them out first within a regular **for loop**, and then as you get more comfortable with loops, start looking to see if you can turn them into list comprehensions. 

## Knowledge Check

Below, I will give you a regular for loop construction where we take the `students` list and identify all students whose name starts with 'm'. Your task is to rewrite lines 2-4 of the code as a single list comprehension statement. 

In [None]:
students = ["eli", "flora", "gloria", "katie", "marian", "marlo", "matteo", "mia", "michelle"] 
mstudents=[]
for i in students:
    if i[0]=='m': mstudents.append(i)
print (mstudents)

<a href=#home>Return to Top</a> 

# 9 Nested Structures and Loops  <a name='bookmark9' />

## 9.1 Nested Structures

1. Lists of lists
2. Dictionaries of dictionaries
3. Lists of dictionaries
3. Dictionaries of lists

etc...

We have been showing examples with simple lists and dictionaries, where the elements within have been strings or numbers. However, these lists and dictionaries can get very complicated very fast. 

Consider
```python
earlydynasties = ["Xia","Shang","Zhou","Qin","Han","Three Kingdoms","Jin",
                  "Northern and Southern","Sui","Tang"]
startdate = [-2070,-1600,-1046,-221,-206,-220,265,420,581,618]
```

We placed these into two separate lists, but we could instead have made a single list of lists:
```python
dynastyinfo = [["Xia","Shang","Zhou","Qin","Han","Three Kingdoms","Jin",
                  "Northern and Southern","Sui","Tang"],
                  [-2070,-1600,-1046,-221,-206,-220,265,420,581,618]]
```
All the usual list operations still apply, but it might take a bit more logical thinking to make sure you're referring to the list you want. 


In [None]:
dynastyinfo = [["Xia","Shang","Zhou","Qin","Han","Three Kingdoms","Jin",
                  "Northern and Southern","Sui","Tang"],
                  [-2070,-1600,-1046,-221,-206,-220,265,420,581,618]]
print (dynastyinfo[0])
print ()

dynastyinfo.append(['this','is','a','completely','random','list','i','am','adding'])
print (dynastyinfo)
print ()

for mylst in dynastyinfo:
    print (mylst)
print ()

dynastyinfo[0][0] = "Xia???"
print (dynastyinfo)

Similarly, you can put dictionaries as elements of a list. For dictionaries, you can put dictionaries or lists as values.

For instance, 
```python
a = [{},{}, ]  # a is a list of empty dictionaries

b = {'l1': [], 'l2': []} # b is a dictionary of empty lists

mut_types = { "transitions":["AG","GA","CT","TC"],
              "transversions":["GC","CG","GT","TG","AC","CA","AT","TA"] }
```

Remember to keep the order of your straight and curly braces (and parantheses if using sets or tuples!) correct, or you'll get a syntax error. 

You do not need to keep everything in a list or dictionary the same data type. For instance, 
```python
a = [[1,2,3],'a',{'k':'l','m':['a','b','c']},0]
```
is completely valid. However, this is something rarely done, and *is not recommended*, as you might want to systematically apply particular functions that can only work on certain data types. Do-able, but you have to be very careful or you may get lots of errors!

## 9.2 Nested Loops

Once you have nested data structures, it is very easy to make nested loops!

Below are two examples using nested loops. The first is simple, for you to see how they are constructed. 

In [None]:
##Example 1: Replace odd numbers with 'odd' and even numbers with 'even
mymatrix = [[1,2,3],
            [4,5,6],
            [7,8,9]]

for indrow,myrow in enumerate(mymatrix):
    for indcol,mycol in enumerate(myrow):
        myval = mymatrix[indrow][indcol]
        if myval % 2 == 0: mymatrix[indrow][indcol] = 'even'
        else: mymatrix[indrow][indcol] = 'odd'
print (mymatrix)

The second is more complicated, showing how quickly thinking with nested loops can get confusing. In the second question, the ultimate goal is to ask, using the sequence of nucleotide bases from before, what is the longest unbroken distance (number of base pairs) separating two of the exact same nucleotide base. 

In [None]:

##Example 2: using the sequence of nucleotide bases from before, 
##what is the longest unbroken distance (number of base pairs) separating 
##two of the exact same nucleotide base. 

seq = ('TTACCCGGGGTTAAAGTTGAATATTAAGGAGTGTAGTAGTACGAATAAACCGGCTCGAAC'
       'TATTACCTTTAGGAAATTTAAGTTTAAGTAGGAAGAAAAAATAAAAAAGTTAAAGAAGAA'
       'GGAGGATATAATTAAAGTTTTATAAATATAGAGAAGGTAAAAGAAGCGTTAGAAAAATGG'
       'ATATTAATCTTAGAAAAGATTAATATAATAAAAGACTTTAAATTTACCTTTCGCAAACTT'
       'AATAGAGAATTTATAAGTATTGTAAAGGAATTCGTCCTATATTAAAAGAAAAATTGAAGG'
       'AGGAAAATAAATTAATAAATATTAGAATATAAAATAATTATAAAAGAAGTATAGAAGATA'
       'TACTAGGAGTAGTTTAAGTACCGAATAGTATCGAATTAAAGGGAATTTATTAAAGCTATA'
       'ATAAAAAAGAAGAGGATTATTATAAAGGCTTAATAGGCTACGTAGTACAATAGTATTACC'
       'GAAGCTTCGAAGAATCTAAATCGATTCTAATTATTAGAATAATAGGCTTGGATCCGAAGT')

basepos = {base:[] for base in "AGCT"}

for pos,base in enumerate(seq): 
    basepos[base].append(pos)

diffs = {base:{} for base in "AGCT"}

for base in basepos:
    for index,pos in enumerate(basepos[base]):
        if index == 0: continue
        numposbtwn = pos-basepos[base][index-1]
        if numposbtwn in diffs[base]:
            diffs[base][numposbtwn]+=1
        else:
            diffs[base][numposbtwn]=1

print ("The total number of basepairs is:", len(seq))

for i in diffs: 
    print (f"For base {i}, the maximum distance between {i}s is {max(diffs[i].keys())}")
    

To state the question more concretely, if we were counting the number of bases between every adenine (A), what is the longest distance between two As? Two Gs/Cs/Ts?  Which of these four sets shows the longest distance? This in and of itself might not seem useful, but it is a great exercise to demonstrate the logic to employ, how to use nested loops. And if we think about the biology just a little bit, we can see potential applications - for instance, it is very relevant to think about the distance between two genes, perhaps by locating the distance between two gene promoters.

## Knowledge Check x 2

First, go through Example 2 below and comment on what each line of code is doing. At the nested **for loop**, can you keep track of what is happening? Write out to yourself what is occurring for the first few iterations. Use **print()** and **break** to print what each variable is for the first few iterations and check it against what you thought to make sure you really understand. 

Second - let's make a nested for loop ourselves! Take `students` and write a nested for loop pairing every set of students. Only keep pairs if they both start with the letter 'm'. It is okay to have the same pair twice, but if you want a bit more of a challenge, consider what you might modify so each unique pair is only printed once.  

Below, I show what you should print out, keeping all pairs, even non-unique. 
```
marian and marian
marian and marlo
marian and matteo
marian and mia
marian and michelle
marlo and marian
marlo and marlo
marlo and matteo
marlo and mia
marlo and michelle
matteo and marian
matteo and marlo
matteo and matteo
matteo and mia
matteo and michelle
mia and marian
mia and marlo
mia and matteo
mia and mia
mia and michelle
michelle and marian
michelle and marlo
michelle and matteo
michelle and mia
michelle and michelle
```

In [None]:
students = ["eli", "flora", "gloria", "katie", "marian", "marlo", "matteo", "mia", "michelle"] 

<a href=#home>Return to Top</a> 