In [None]:
# Summary from the last class: lines starting with a # are comments

an_integer_number = 1
a_float = 0.33

a_string_of_characters = 'abcd'
another_string         = "dcba"

my_list = [ "apple", "banana", "coconut", "durian" ]

my_dictionary = {
                  "apple":"red",
                  "banana":"yellow",
                  "cherry":"red",
                  "durian":"varies"
                }


# Learning Python: Iterations and control statements

# Section 1. For Loops

Last time we ended on collections. Often times we want to do something with every element in a collection. To do this we __iterate__ over a collection. To iterate means to start at the beginning of a collection, do something with the first value, then continue until we run out of values. Even though a dictionary is not ordered, it is still iterable, we just don't know the order in which we will iterate. 

We can imagine iterating over a list and then transforming the values in that list. Imagine you have a list of words and you want to find out the average word length ( $\bar{x}$, the __arithmetic mean__). You would first sum ( $\sum$ ) all the words $w$ and then divide this sum by the number of words $n$. In formulae we would say you sum all the elements: 

$$\bar{x} = \frac{1}{n}\sum_{i=1}^{n} w_i $$

In this case, that big letter $\sum$ means add all the things afterwards in the specified range. The range is specified on the top (in this case $n$ or the number of words). The i = 1 means that we start from the first word. 

Now we can translate this into computer code:


~~~ python
words = ["apple","banana","chocolate","dumpling"]

word_length_sum = 0

for i in words:
    word_length_sum += len(i) # notice we use += here. We'll explain it in a bit.
    
n = len(words) 

average_length = word_length_sum / n
~~~ 

Above, the ```for i in words:``` is what defines the loop. It's pretty similar to English. ```i``` is our iterator. We could have named it anything, but traditionally when you don't care what your iterator is called, we use ```i``` for the first loop, ```j``` for an inner loop and ```k``` for a third inner loop. More than three inner loops and you really should rethink your program design. 

You can notice that ```i``` and ```words``` are variables because they are not given special colors. These colors help us understand which words are system words and which words are user-defined.  This is called __syntax highlighting__. 

We also use a shortcut above. Recall that in the past we have seen 

~~~ python 
result = x + y
~~~

But if we wanted to add something to an existing variable, like when we add some numbers to a ```total```, we can do this: 

~~~ python
total += y 
~~~ 

This is sometimes called __syntactic sugar__ as it is just a way to simplify code. In some languages, there's additional idiom of saying ```x++``` for ```x += 1```. This is where C++ gets its name. Python doesn't have this feature. It's just ```x += 1``` for us. However, later we will see some very clever syntactic sugar in python...all in due course. 

Below we will practice the concept of a loop and see some of its features. 

In [4]:
# As a matter of style I like to define variables before doing my loops. 

food_list = ["apple","banana","chocolate","dumpling"]

word_length_sum = 0

n = len(food_list) 

for j in food_list:
    word_length_sum += len(j) 

average_length = word_length_sum / n

print(average_length)

7.0


## Question: But what if I wanted to iterate over all elements, and print them?

In [18]:
for foodname in food_list:
    print(foodname)

apple
banana
chocolate
dumpling


In [15]:
foodname = 2+2
print(foodname)

4


## Iterating through a dictionary

It's pretty obvious how you iterate through a list. One element after the other. But dictionaries have keys and values. What are you iterating over then? It depends on what you ask of python. Try the default: 

In [19]:
# You can also write dictionaries in many lines, to make it easier to read. Look!

food_dict = {'salmon': 'fish',
             'enoki':  'mushroom',
             'apple':  'fruit',
             'potato': 'vegetable'}

for something in food_dict:
    print(something)


salmon
enoki
apple
potato


In [21]:
food_dict.values()

dict_values(['fish', 'mushroom', 'fruit', 'vegetable'])

That's not really what we wanted. Let's try to list the keys instead, and then print the value corresponding to each key.

In [11]:
list( food_dict.keys() )

['potato', 'salmon', 'apple', 'enoki']

In [27]:
food_list = list( food_dict.keys() )

print(food_list)

['salmon', 'enoki', 'apple', 'potato']


In [30]:
for k in food_list:
    print( k,  '----->', food_dict[k] )

salmon -----> fish
enoki -----> mushroom
apple -----> fruit
potato -----> vegetable


It appears it printed the keys. Now it seems that we can print the values by using ```food_dict.values()``` or the complete items (i.e., key-value pairs) by using ```food_dict.items()```. Observe: 

In [23]:
food_dict = {'salmon': 'fish',
             'enoki':  'mushroom',
             'apple':  'fruit',
             'potato': 'vegetable'}

for x in food_dict.keys():
    print(x)

salmon
enoki
apple
potato


In [25]:
food_dict.values()

dict_values(['fish', 'mushroom', 'fruit', 'vegetable'])

In [26]:
print() # just so I have an empty line between them

for x in food_dict.values():
    print(x)


fish
mushroom
fruit
vegetable


In [31]:
for x in food_dict.items():
    print(x)

('salmon', 'fish')
('enoki', 'mushroom')
('apple', 'fruit')
('potato', 'vegetable')


In [36]:
my_tuple = ('salmon','fish')

print(my_tuple[0], my_tuple[1])
my_tuple.append('veg')

salmon fish


AttributeError: 'tuple' object has no attribute 'append'

### Slight diversion: the tuple

Notice how it prints the items as 
~~~ python 
('salmon','fish') 
~~~
What is that? Well, it's actually a new kind of collection. A __tuple__. A tuple (I pronounce it like two-ple) is basically a list except it's immutable and has ```()``` instad of ```[]```. So with a list we could go ```my_list[2] = "grasshopper"``` and it would replace the _third_ element in the list with grasshopper (assuming there's already a third element). With a tuple, you cannot. You can query for the third item in a tuple with ```my_tuple[2]``` but you can't assign a new value. See below (it gives an error).  

In [37]:
my_list = ["ant","ladybug","beetle"]

print(my_list[2])

beetle


In [39]:
my_list[2] = "grasshopper"

print(my_list)

['ant', 'ladybug', 'grasshopper']


In [40]:
my_tuple = ("ant","ladybug","beetle")
print(my_tuple[2])

beetle


In [41]:
my_tuple[2] = "grasshopper"

print(my_tuple[2])

TypeError: 'tuple' object does not support item assignment

One of the nice things about the fact that ```dict.items()``` returns a tuple is that we can actually make use of this in the ```for``` loop. Instead of ```for i in dict:``` where i would be (key,value) we can literally go ```for i,j in dict``` and then do things with these values directly, using i for the key and j for the value.

In [None]:
food_dict = {'salmon': 'fish',
             'enoki':  'mushroom',
             'apple':  'fruit',
             'potato': 'vegetable'}

for key,value in food_dict.items():
    print(key, "is a", value)
    
print()    
    
# Reminder: We don't need to use the words 'key' and 'value' 
for food,foodtype in food_dict.items():
    print(food, "is a", foodtype)
    
print()

In [44]:
for t in food_dict.items():
    print(t)

for t0, t1 in food_dict.items():
    
    print(t0, t1)
  


('salmon', 'fish')
('enoki', 'mushroom')
('apple', 'fruit')
('potato', 'vegetable')
salmon fish
enoki mushroom
apple fruit
potato vegetable


In [48]:
peoplelist = [ 
    ("chico","brazil","são paulo"),
    ("mary","usa","philadelphia"),
    ("bernie","canada","toronto")
]

for person, place in peoplelist:
    print(place)

('chico', 'brazil', 'são paulo')
('mary', 'usa', 'philadelphia')
('bernie', 'canada', 'toronto')


In [49]:
print()    
    
for element in peoplelist:
    print(element[2])


são paulo
philadelphia
toronto


# Section 2. If statements and boolean logic. 

Boolean logic is very useful and really important to computation. If a language can implement the basics of ```not```, ```and``` and ```or``` it can do pretty much any computation with enough memory and time. We use boolean logic to evaluate the truth of a statement. Then if a statement is true, we will ask the computer to do something. We can also ask it to do something else if the statement is false. 

In python these are the boolean operators: 

- ```==``` is used for comparison. Does X equal Y? ```x == y```
- ```and``` is used to ask if two things are both true. ```x and y```
- ```or``` is used to ask if either thing is true. ```x or y```
- ```not``` as well as ```!``` are used for not. 
- ```>``` is used for left side greater than right side. 
- ```<``` is used for left side less than right side. 

In [None]:
True False true false

In [55]:
x = 4
y = 5
z = 5 

answer = not(z == y)

print( answer )

False


Python does comparisons all over the place. Any time you use one of the operators it will evaluate them. But sometimes you want to use these operators to __control the flow__ of a program. For example, if you get some data and it includes a URL you might want to do something with that URL, whereas if it doesn't contain a URL you might want to do something else. For this we use ```if``` and ```else``` statements.

In [57]:
x = 5 
y = 3
z = x + y 

print(z)

if (z != 7): 
    print("Yes, z is different from 7.")
else:
    print("My math is not good today.")

8
Yes, z is different from 7.


You can have nested statements with ```elif``` which means ```else if ```. It's a way to try another conditional statement:

In [62]:
#x = 5 
#y = 5
#z = x + y 

z = 11

if z > 10: 
    print("Hmm...should this be? ")
elif z > 7:
    print("Okay, I was worried for a second there.")    
elif z > 4:
    print("another one")    
else: 
    print("I give up.", z)


SyntaxError: invalid syntax (<ipython-input-62-b3e52af1e59d>, line 9)

In [69]:
n = 7

food_list =  ['bla', 'ble']

for food in food_list:
    print(food)
    if n > 10:
        print("Yay! n > 10")

        if (n % 2 == 0):
            print("Yay! n is even")
        else:
            print("n is odd")

    else:
        print("I was doing nothing")

bla
I was doing nothing
ble
I was doing nothing


## Important notes on comparisons 

### Note 1: _You can compare strings._ 
String encodings have code points. These are used to evaluate whether one string is greater than another. So you can ask if ```a > b```. The behavior can be a bit unexpected so I would only use this with caution. For example, what's greater: ```A, a,``` or ```B```? 

In [71]:
# String comparisons 
print("Is a < b?")
print('a' < 'b')

Is a < b?
True


In [73]:
print("\nIs A > b?")
print('A' > 'b')


Is A > b?
False


In [72]:
print("\nIs a > A?")
print('a' > 'A')


Is a > A?
True


The reason a > A is because characters are represented as numbers in the computer memory.

You shouldn't need this anytime soon, but if you are curious, here are those characters, in order: http://www.asciitable.com/

### Note 2. _Zero is False, One is True and the rest don't evaluate well_

This is the same for a great deal of programming languages. If a variable is ```0``` it will return false. A value of ```1``` will evaluate to ```True```. All other numbers are neither considered ```True``` nor ```False``` on thier own. 

In [None]:
print("What evalues to 'True'?")
print("-1\t", -1 == True)
print("0\t",   0 == True)
print("1\t",   1 == True)
print("2\t",   2 == True)

In [75]:
print("\nWhat evalues to 'True'?")

print(-1 == True)
print(0 == True)
print(1 == True)
print(2 == True)


What evalues to 'True'?
False
False
True
False


### Note 3. Not everything that is empty...is False.

There are a number of ways of expressing _nothing_ in python.
- There's the notion of a variable being ```None``` or empty.
- There's a numeric variable that isn't actually a number, called ```nan```, for _Not A Number_, and it is used for things that don't compute or are missing
- There's the empty string ``` "" ```
- And I'm sure there are more.

Be extra careful when evaluating these. For example, ```None``` is not equal to ```False```, but you could still use it that way sometimes. 


An example of why you should be careful: see how any comparison with ```nan``` returns ```False```, but ```nan``` itself returns ```True```:

In [77]:
import numpy as np # The python numeric package 'numpy'; we will be using this more later.

print(np.nan == True)

print(np.nan == False)


False
False


The comparisons with ```None``` below return ```False```, and ```None``` itself returns ```False```:

In [76]:
(1 / 0) == 

ZeroDivisionError: division by zero

In [82]:
a = None
print(a+1)

TypeError: unsupported operand type(s) for +: 'NoneType' and 'int'

In [78]:
print(None == True)

print(None == False)

False
False


The comparisons with ```""``` below return ```False```, and ```""``` itself returns ```False```:

In [28]:
print("" == True)
print("" == False)

if "": 
    print("Empty quotes are True")
else:
    print("Empty quotes are False")

False
False
Empty quotes are False


Yeah, it is a bit confusing. This is more to just remind you to be careful. 

# Section 3. Combining Loops and control statements

Often we want to do something under certain conditions. For example, you might loop through a list of email addresses and add the domain name (e.g., gmail.com, yahoo.com, etc..) to a set of domain names if it hasn't appeared before. This means that within each loop you want to include an ```if``` statement. 

Doing this might involve looping through an awful lot of data and you might also want a way to report on progress along  the way. so for example, if you are examining a million email, then to report every 20,000 email just too remind you that the program isn't stuck in a loop. Here we introduce a function called __enumerate__. This function returns a number every time you go through a loop. See these two examples below, one with a counter and one with enumerate: 

In [84]:
food_list = ["apple","banana","chocolate","dumpling"]

counter = 0
for foodname in food_list:
    print("Food number", counter, "is", foodname)
    
    counter += 1
    
    print('At this iteration, counter =', counter)
    

Food number 0 is apple
At this iteration, counter = 1
Food number 1 is banana
At this iteration, counter = 2
Food number 2 is chocolate
At this iteration, counter = 3
Food number 3 is dumpling
At this iteration, counter = 4


In [29]:
print()

for i,foodname in enumerate(food_list):
    print("Food number", i, "is", foodname)

Food number 0 is apple
Food number 1 is banana
Food number 2 is chocolate
Food number 3 is dumpling

Food number 0 is apple
Food number 1 is banana
Food number 2 is chocolate
Food number 3 is dumpling


So here is how you would use enumerate to do something every 'nth' time.

Here we'll use the function ```range(N)```, which just produces a list with ```N``` numbers, from ```0``` to ```N-1```.

In [87]:
list( range(50)  )

[0,
 1,
 2,
 3,
 4,
 5,
 6,
 7,
 8,
 9,
 10,
 11,
 12,
 13,
 14,
 15,
 16,
 17,
 18,
 19,
 20,
 21,
 22,
 23,
 24,
 25,
 26,
 27,
 28,
 29,
 30,
 31,
 32,
 33,
 34,
 35,
 36,
 37,
 38,
 39,
 40,
 41,
 42,
 43,
 44,
 45,
 46,
 47,
 48,
 49]

In [94]:
sum_of_all_numbers = 0

for i in range(10000):
    sum_of_all_numbers += i
    
    if (i % 1000) == 0:
        print("Added the first", i, "numbers")

            
average_of_all = sum_of_all_numbers / 100000000
print(average_of_all)

Added the first 0 numbers
Added the first 10000000 numbers
Added the first 20000000 numbers
Added the first 30000000 numbers
Added the first 40000000 numbers
Added the first 50000000 numbers
Added the first 60000000 numbers
Added the first 70000000 numbers
Added the first 80000000 numbers
Added the first 90000000 numbers
49999999.5


In [103]:
mylist = []

for i in range(1,10,2):
    square = i*i
    mylist.append(square)
    
mynewlist = [ i*i for i in range(1,10,2) ]
mynewlist = [ food.upper() for food in food_list ]
print(mynewlist)

['APPLE', 'BANANA', 'CHOCOLATE', 'DUMPLING']


# Section 4. List comprehensions

The list comprehension is literally my favorite syntactic sugar in python. You will encounter it all over the place in my code and in other people's code, so it is worth understanding it now. It also will help us think about operating on a full list at a time. This is important as we will be doing this a lot with data later on, by for example, transforming a column of data. 

The list comprehension is very much like a for loop but is very condensed. 

Here is an example in the traditional way:

~~~ python 
my_list = ["allspice","basil","cumin"]

new_list = []
for i in my_list:
    i = i.upper()
    new_list.append(i)
~~~

Now here it is as a list comprehension: 

~~~ python
my_list = ["allspice","basil","cumin"]

new_list = [ i.upper() for i in my_list ]  
~~~

We have condensed it to one line. But it gets better. You can append a control statement at the end, so it will only include that value if if it meets the condition. For example, only do something if the words are of length 5.

~~~ python 
my_list = ["allspice","basil","cumin"]

new_list = []
for i in my_list:
    i = i.upper()
    if len(i) == 5:
        new_list.append(i)
~~~

Now here is the same outcome using a list comprehension: 

~~~ python
my_list = ["allspice","basil","cumin"]

new_list = [ i.title() for i in my_list if len(i) == 5 ]  
~~~

The second way is much more condensed and yet it still reads in an intelligible way. Try them out below: 

In [39]:
my_list = ["allspice", "basil", "cumin"]

new_list = []
for i in my_list:
    i = i.title()
    if len(i) == 5:
        new_list.append(i)

print(new_list)

['Basil', 'Cumin']


In [105]:
my_list = ["allspice", "basil", "cumin"]

new_list = [ i.upper() for i in my_list if len(i) == 5 ]  

print(new_list)

['BASIL', 'CUMIN']


# Section 5. While Loops

While loops are useful when you want to continue looping through a program until something happens. They are especially useful for opening and closing files as we will see in the next lecture. Today however, we will simply use a while loop to ask for a user input. If the input is what we expect then it **breaks** the loop. The basic syntax is: 

~~~ python 
while <condition is True>: 
    do_something
~~~

Often times we just say ```while True``` or ```while 1```. This is an example of an infinite loop and it will not end on its own.

To leave such a loop we must explicitly break it by clicking on the stop button on top of the screen.

( Hint: there's also a shortcut for that. Jupyter has many shortcuts. If you want to see all of them, press **Esc**, and then **H** )

Now here's an example of another ```while``` loop, this time one that will ```break``` when something happens:

In [49]:
import numpy.random as random

n_tries = 0

while True:
    n_tries += 1
    random_number = random.randint(0,5)
    
    print("After", n_tries, "tries:", random_number)
    
    if random_number == 3:
        break

After 1 tries: 1
After 2 tries: 4
After 3 tries: 4
After 4 tries: 4
After 5 tries: 0
After 6 tries: 1
After 7 tries: 4
After 8 tries: 1
After 9 tries: 0
After 10 tries: 3


In [None]:
while True:
    print("no")

In [110]:
for i in range(3):
    print(i)
    print(food_list[i])

0
apple
1
banana
2
chocolate
