**PySDS Week 1. Lecture 2. V1.1** Author: Bernie Hogan

# Learning Python: Iterations and control statements

# Section 1. For Loops

Last time we ended on collections. Often times we want to do something with every element in a collection. To do this we __iterate__ over a collection. To iterate means to start at the beginning of a collection, do something with the first value, then continue until we run out of values. Even though a dictionary is not ordered, it is still iterable, we just don't know the order in which we will iterate. 

We can imagine iterating over a list and then transforming the values in that list. Imagine you have a list of words and you want to find out the average word length ( $\bar{x}$, the __arithmetic mean__). You would first sum ( $\sum$ ) all the words $w$ and then divide this sum by the number of words $n$. In formulae we would say you sum all the elements: 

$$\bar{x} = \frac{1}{n}\sum_{i=1}^{n} w_i $$

In this case, that big E ($\sum$) means add all the things afterwards in the specified range. The range is specified on the top (in this case $n$ or the number of words). The i = 1 means that we iterate one word after another, rather than skipping a word or taking every third word (in which case it would show $i=3$). 

Now we can translate this into computer code:


~~~ python
word_list = ["apple","banana","chocolate","dumpling"]

word_length_sum = 0

for i in words:
    word_length_sum = word_length_sum + len(i) # notice we use += here. 
    
n = len(word_list) 

average_length = word_length_sum / n
~~~ 

Above, the ```for i in words:``` is what defines the loop. It's pretty similar to English. ```i``` is our iterator. We could have named it anything, but traditionally when you don't care what your iterator is called, we use ```i``` for the first loop, ```j``` for an inner loop and ```k``` for a third inner loop. More than three inner loops and you really should rethink your program design. 

You can notice that ```i``` and ```words``` are variables because they are not given special colors. These colors help us understand which words are system words and which words are user-defined.  This is called __syntax highlighting__. 

We also use a shortcut above. Recall that in the past we have seen 

~~~ python 
result = x + y
~~~

But if we wanted to add something to an existing variable, like when we add some numbers to a ```total```, we can do this: 

~~~ python
total += y 
~~~ 

This is sometimes called __syntactic sugar__ as it is just a way to simplify code. In some languages, there's additional idiom of saying ```x++``` for ```x += 1```. This is where C++ gets its name. Python doesn't have this feature. It's just ```x += 1``` for us. However, later we will see some very clever syntactic sugar in python...all in due course. 

Below we will practice the concept of a loop and see some of its features. 

In [1]:
# As a matter of style I like to define variables before doing my loops. 

food_list = ["apple","banana","chocolate","dumpling"]
word_length_sum = 0
n = len(food_list) 

for i in food_list:
    word_length_sum += len(i) 

average_length = word_length_sum / n
print(average_length)

7.0


## Iterating through a dictionary

It's pretty obvious how you iterate through a list. One element after the other. But dictionaries have keys and values. What are you iterating over then? It depends on what you ask of python. Try the default: 

In [2]:
food_dict = {'salmon': 'fish', 'enoki': 'mushroom', 'apple': 'Fruit', 'potato': 'Vegetable'}

for something in food_dict:
    print(something)


salmon
enoki
apple
potato


In this case, when we iterate through a dictionary it returns a key each time. First was ```salmon``` (most likely) and then the other foods. Depending on which verion of python you are running, those keys may or may not come down in a specific order. That can be configured but by default you shouldn't rely on a dictionary's order. 

It appears it printed the keys. Now it seems that we can print the values by using ```food_dict.values()``` or the complete items (i.e., key-value pairs) by using ```food_dict.items()```. Observe: 

In [13]:
food_dict = {'salmon': 'fish', 'enoki': 'mushroom', 'apple': 'Fruit', 'potato': 'Vegetable'}

for key in food_dict.keys():
    print(key)

print()

for value in food_dict.values():
    print(value)

print()

for item in food_dict.items():
    print(item)

salmon
enoki
apple
potato

fish
mushroom
Fruit
Vegetable

('salmon', 'fish')
('enoki', 'mushroom')
('apple', 'Fruit')
('potato', 'Vegetable')


### Slight diversion: the tuple

Notice how it prints the items as 
~~~ python 
('salmon','fish') 
~~~
What is that? Well, it's actually a new kind of collection. A __tuple__. A tuple (I pronounce it like couple) is basically a list except it's immutable and has ```()``` instad of ```[]```. So with a list we could go ```my_list[2] = "grasshopper"``` and it would replace the ~~second~~ third element in the list with grasshopper (assuming there's already a third element). With a tuple, you cannot. You can query for the third item in a tuple with ```my_tuple[2]``` but you can't assign a new value. See below (it gives an error).  

In [16]:
my_list = ["ant","ladybug","beetle"]
print(my_list[2])
my_list[2] = "grasshopper"
print(my_list[2])

my_tuple = ("ant","ladybug","beetle")
print(my_tuple[2])
my_tuple[2] = "grasshopper"
print(my_tuple[2])

beetle
grasshopper
beetle


TypeError: 'tuple' object does not support item assignment

One of the nice things about the fact that ```dict.items()``` returns a tuple is that we can actually make use of this in the ```for``` loop. Instead of ```for i in dict:``` where i would be (key,value) we can literally go ```for thekey,thevalue in dict``` and then do things with these values directly.

In [22]:
food_dict = {'salmon': 'fish', 'enoki': 'mushroom', 'apple': 'fruit', 'potato': 'vegetable'}

for key,value in food_dict.items():
    print(key, "is a", value)
    
print()
# Reminder: We don't need to use the words 'key' and 'value' 
for food,foodtype in food_dict.items():
    print(food, "is a", foodtype)
    
print()
for tuple in food_dict.items():
    print(tuple[0], "is a", tuple[1])

The tuple is not simply for two things together. We can see below the creation of three-element tuples whhich are returned when we iterate through the list that contains the tuples. 

In [4]:
peoplelist = [ 
    ("bernie","canada","toronto"),
    ("sian","uk","portsmouth"),
]
for person,place,school in peoplelist:
    print(place)

# same data, just indexed differently
for element in peoplelist:
    print(element[1])

canada
uk
canada
uk


# Section 2. If statements and boolean logic. 

Boolean logic is very useful and really important to computation. If a language can implement the basics of ```not```, ```and``` and ```or``` it can do pretty much any computation with enough memory and time. We use boolean logic to evaluate the truth of a statement. Then if a statement is true, we will ask the computer to do something. We can also ask it to do something else if the statement is false. 

In python these are the boolean operators: 

- ```==``` is used for comparison. Does X equal Y? ```x == y```
- ```and``` is used to ask if two things are both true. ```x and y```
- ```or``` is used to ask if either thing is true. ```x or y```
- ```not``` as well as ```!``` are used for not. 
- ```>``` is used for left side greater than right side. 
- ```<``` is used for left side less than right side. 

In [23]:
x = 4
y = 5
z = 5 

print( x == y )
print( y == z )
print( x == y )
print( not (x == y) ) 

False
True
False
True


Python does comparisons all over the place. Any time you use one of the operators it will evaluate them. But sometimes you want to use these operators to __control the flow__ of a program. For example, if you get some data and it includes a URL you might want to do something with that URL, whereas if it doesn't contain a URL you might want to do something else. For this we use ```if``` and ```else``` statements.

In [24]:
x = 5 
y = 2 
z = x + y 

if z == 7: 
    print("Yes, Z equals 7.")
else:
    print("My math is not good today.")

Yes, Z equals 7.


You can have nested statements with ```elif``` which is a contraction of ```else if ```.

In [27]:
x = 5 
y = 5
z = x + y 

if z == 10: 
    print("Hmm...should this be? ")
elif z == 7:
    print("Okay, I was worried for a second there.")
else: 
    print("I give up.")

Hmm...should this be? 


## Important notes on comparisons 

### Note 1. You can compare strings. 
String encodings have code points. These are used to evaluate whether one string is greater than another. So you can ask if ```a > b```. The behavior can be a bit unexpected so I would only use this with caution. For example, what's greater: ```A, a,``` or ```B```? 

In [28]:
# String comparisons 
print("Is a > b?")
print('a' > 'b')

print("\nIs A > b?")
print('A' > 'b')

print("\nIs a > A?")
print('a' > 'A')

Is a > b?
False

Is A > b?
False

Is a > A?
True


### Note 2. Zero is False, One is True and the rest don't evaluate well

This is the same for a great deal of programming languages. If a variable is ```0``` it will return false. A value of ```1``` will evaluate to ```True```. All other numbers are neither considered ```True``` nor ```False``` on thier own. 

In [29]:
print("What evalues to 'True'?")
print("-1\t", -1 == True)
print("0\t",   0 == True)
print("1\t",   1 == True)
print("2\t",   2 == True)


print("\nWhat evalues to 'False'?")

print("-1\t", -1 == False)
print("0\t",   0 == False)
print("1\t",   1 == False)
print("2\t",   2 == False)

What evalues to 'True'?
-1	 False
0	 False
1	 True
2	 False

What evalues to 'False'?
-1	 False
0	 True
1	 False
2	 False


### Note 3. Not everything that is empty...is False.

There are a number of ways of expressing _nothing_ in python. There's the notion of a variable being ```None``` or empty. There's a numeric variable that isn't actullay a number (```nan```, for _Not A Number_ for things that don't compute or are missing), there's the empty string ``` "" ``` and I'm sure more. Be extra careful when evaluating these. In general, because you might not be sure if they evaluate to true or false, you should be explicit when doing your compare statements. 

In [4]:
import numpy as np # The python numeric package 'numpy'; we will be using this more later.

print(np.nan == True)
print(np.nan == False)

if np.nan: 
    print("Nan is True")
else: 
    print("Nan is False")

print()
print(None == True)
print(None == False)

if None: 
    print("None is True")
else:
    print("None is not True")

print()
print("" == True)
print("" == False)

if "": 
    print("Empty quotes are True")
else:
    print("Empty quotes are not True")


False
False
Nan is True

False
False
None is not True

False
False


TypeError: bad operand type for unary ~: 'str'

Yeah, it is a bit confusing. This is more to just remind you to be careful. 

# Section 3. Combining Loops and control statements

Often we want to do something under certain conditions. For example, you might loop through a list of email addresses and add the domain name (e.g., gmail.com, yahoo.com, oii.ox.ac.uk, etc..) to a set of domain names if it hasn't appeared before. This means that within each loop you want to include an ```if``` statement. 

Doing this might involve looping through an awful lot of data and you might also want a way to report on progress along  the way. so for example, if you are examining a million email, then to report every 20,000 email just too remind you that the program isn't stuck in a loop. Here we introduce a function called __enumerate__. This function returns a number every time you go through a loop. See these two examples below, one with a counter and one with enumerate: 

In [31]:
food_list = ["apple","banana","chocolate","dumpling"]

counter = 0
for i in food_list:
    print("Food number",counter,"is",i)
    counter += 1
    
print()

for c,i in enumerate(food_list):
    print("Food number",c,"is",i)


Food number 0 is apple
Food number 1 is banana
Food number 2 is chocolate
Food number 3 is dumpling

Food number 0 is apple
Food number 1 is banana
Food number 2 is chocolate
Food number 3 is dumpling


So here is how you would use enumerate to do something every 'nth' time. 

In [5]:
example_str = "Print every third character of this line as upper case" 
outstr = ""
for c,i in enumerate(example_str):
    if c%3==0:
        outstr += i.upper()
    else:
        outstr += i
        
print(outstr)

PriNt EveRy ThiRd ChaRacTer of thIs LinE aS uPpeR cAse


In [33]:
for i in range(1000):
    if i%50 == 0:
        print(i)

0
50
100
150
200
250
300
350
400
450
500
550
600
650
700
750
800
850
900
950


# Section 4. List comprehensions

The list comprehension is literally my favorite syntactic sugar in python. You will encounter it all over the place in my code and in other people's code, so it is worth understanding it now. It also will help us think about operating on a full list at a time. This is important as we will be doing this a lot with data later on, by for example, transforming a column of data. 

The list comprehension is very much like a for loop but is very condensed. 

Here is an example in the traditional way:

~~~ python 
my_list = ["allspice","basil","cumin"]

new_list = []
for i in my_list:
    i = i.upper()
    new_list.append(i)
~~~

Now here it is as a list comprehension: 

~~~ python
my_list = ["allspice","basil","cumin"]

new_list = [i.upper() for i in my_list]  
~~~

We have condensed it to one line. But it gets better. You can append a control statement at the end, so it will only include that value if if it meets the condition. For example, only do something if the words are of length 5.

~~~ python 
my_list = ["allspice","basil","cumin"]

new_list = []
for i in my_list:
    i = i.upper()
    if len(i) == 5:
        new_list.append(i)
~~~

Now here is the same outcome using a list comprehension: 

~~~ python
my_list = ["allspice","basil","cumin"]

new_list = [i.title() for i in my_list if len(i) == 5]  
~~~

The second way is much more condensed and yet it still reads in an intelligible way. Try them out below: 

In [36]:
my_list = ["allspice","basil","cumin"]

new_list = []
for i in my_list:
    i = i.title()
    if len(i) == 5:
        new_list.append(i)

print(new_list)

In [None]:
l = ["allspice","basil","cumin"]

for i in l: i = i.title().sort().heelo()

In [36]:
my_list = ["allspice","basil","cumin"]

new_list = [i.title() for i in my_list if len(i) == 5]  

print()
print(new_list)

['Basil', 'Cumin']

['Basil', 'Cumin']


Here we can see another comparison with the use of for loops and list comprensions

# Section 5. While Loops

While loops are useful when you want to continue looping through a program until something happens. They are especially useful for opening and closing files as we will see tomorrow. Today however, we will simply use a while loop to ask for a user input. If the input is what we expect then it **breaks** the loop. The basic syntax is: 

~~~ python 
while <condition is True>: 
    do.Something()
~~~

Often times we just say ```while True``` or ```while 1```. This is an example of an infinite loop and it will not end on its own. To leave such a loop we must explicitly break it. 

In [5]:
import random 

while 1:
    x = True
    random_number = random.randint(0,5)
    print(random_number)
    
    if random_number == 1: # If you comment this out, then it will run indefinitely
        break              # and you will have to use keyboard I,I or from the menu
                           # Kernel -> Restart Kernel. It will lose memory of everything
                           # so please try to avoid infinite loops. 

1
