## Loops and Conditionals

We started working with loops in the Introduction notebook. They are one of the most powerful ideas in programming. Here's Mickey Mouse, one of the worlds first programmers realising that  if you can make a computer (or a broomstick) do something once, it can do it a million times - sometimes with unpredictable results: https://vimeo.com/7878564

It's good to develop a mindset where if you find yourself doing something more that once, you think "how can I put this in a loop" - even if you're just doing it 2 or 3 times, there's a good chance that later on, you'll end up doing it lots of times.

#### Objectives

*   Explain what a for loop does.
*   Correctly write for loops to repeat simple calculations.
*   Trace changes to a loop variable as the loop runs.
*   Trace changes to other variables as they are updated by a for loop.
*   Explain what a list is.
*   Create and index lists of simple values.
*   Use a library function to get a list of filenames that match a simple wildcard pattern.
*   Use a for loop to process multiple files.

### For Loops

Suppose we want to print each character in the word "lead" on a line of its own.
One way is to use four `print` statements:

In [26]:
print('l')
print('e')
print('a')
print('d')


l
e
a
d


If we wanted to be able to print any word, we would want to put this into a function (more on those soon!), which might look like this:

In [27]:
# Create a function (more on this later!)
def print_characters(element):
    print( element[0] )
    print( element[1] )
    print( element[2])
    print( element[3])

#Now call your function with a string
print_characters('lead')

l
e
a
d


This is a terrible idea! There are two reasons:

1.  It doesn't scale:
    if we want to print the characters in a string that's hundreds of letters long,
    we'd have to add a hundred print statements, which would be confusing, and we would be better off just typing them in.

1.  It's fragile:
    if we give it a longer string,
    it only prints part of the data,
    and if we give it a shorter one,
    it produces an error because we're asking for characters that don't exist.

In [28]:
print_characters('tin')

t
i
n


IndexError: string index out of range

Here's a better approach:

In [29]:
def print_characters(element):
    for char in element:
        print( char )

print_characters('lead')

l
e
a
d


Rather than asking for each element by number, we are telling the computer: "do something for *each* letter".

This is shorter---certainly shorter than something that prints every character in a hundred-letter string---and
more robust as well:

In [30]:
print_characters('oxygen')

o
x
y
g
e
n


Try calling this with some different values, to check that it always works

In [31]:
# Try a few values
print_characters('some arbitrary string here')

s
o
m
e
 
a
r
b
i
t
r
a
r
y
 
s
t
r
i
n
g
 
h
e
r
e


The improved version of `print_characters` uses a [for loop](./gloss.html#for-loop)
to repeat an operation---in this case, printing---once for each thing in a collection.
The general form of a loop is:

<pre>
<strong>for</strong> <em>variable</em> <strong>in</strong> <em>collection</em><strong>:</strong>
    <em>do things with variable</em>
</pre>

Some things to note:
- we can call the "loop variable" anything we like, so long as we use that name in the rest of the code. For example, these do the same thing:

```
for char in element:
        print( char )
        
for letter in element:
        print( letter )     
```

- there must be a colon at the end of the line starting the loop
- we must indent the body of the loop, so that Python knows where the loop starts and ends

Here's another loop that repeatedly updates a variable:

In [32]:
length = 0
for vowel in 'aeiou':
    length = length + 1
print( 'There are', length, 'vowels' )

There are 5 vowels


It's worth tracing the execution of this little program step by step.
Since there are five characters in `'aeiou'`,
the statement on line 3 will be executed five times.
The first time around,
`length` is zero (the value assigned to it on line 1)
and `vowel` is `'a'`.
The statement adds 1 to the old value of `length`,
producing 1,
and updates `length` to refer to that new value.
The next time around,
`vowel` is `'e'` and `length` is 1,
so `length` is updated to be 2.
After three more updates,
`length` is 5;
since there is nothing left in `'aeiou'` for Python to process,
the loop finishes
and the `print` statement on line 4 tells us our final answer.

Note that a loop variable is just a variable that's being used to record progress in a loop.
It still exists after the loop is over,
and we can re-use variables previously defined as loop variables as well:

In [33]:
letter = 'z'
for letter in 'abc':
    print( letter )
print( 'after the loop, letter is', letter )

a
b
c
after the loop, letter is c


Note also that finding the length of a string is such a common operation
that Python actually has a built-in function to do it called `len`:

In [34]:
print( len('aeiou') )

5


`len` is much faster than any function we could write ourselves,
and much easier to read than a two-line loop;
it will also give us the length of many other things that we haven't met yet,
so we should always use it when we can.

#### Challenges

1.  Python has a built-in function called `range` that creates a list of numbers:
    `range(3)` produces `[0, 1, 2]`, `range(2, 5)` produces `[2, 3, 4]`, and `range(2, 10, 3)` produces `[2, 5, 8]`.
    Using `range`,
    write a function called print_N that prints the first $N$ natural numbers, starting at 1:
    
    ~~~python
    print_N(3)
    1
    2
    3
    ~~~

In [36]:
def print_N(value):
    for n in range(1,value+1):
        print(n)

print_N(3)
print_N(10)

1
2
3
1
2
3
4
5
6
7
8
9
10


### Lists

Just as a `for` loop is a way to do operations many times,
a list is a way to store many values.
Arrays are built into the language.
We create a list by putting values inside square brackets:

In [37]:
# Make a list of some odd numbers
odds = [1, 3, 5, 7]
print( 'odds are:', odds )

odds are: [1, 3, 5, 7]


We select individual elements from lists by indexing them:

In [38]:
print( 'first and last:', odds[0], odds[-1] )

first and last: 1 7


and if we loop over a list,
the loop variable is assigned elements one at a time:

In [39]:
for number in odds:
    print( number )

1
3
5
7


There is one important difference between lists and strings:
we can change the values in a list,
but we cannot change the characters in a string.
For example:

In [40]:
names = ['Newton', 'Darwing', 'Turing'] # typo in Darwin's name
print( 'names is originally:', names )
names[1] = 'Darwin' # correct the name
print( 'final value of names:', names )

names is originally: ['Newton', 'Darwing', 'Turing']
final value of names: ['Newton', 'Darwin', 'Turing']


works, but:

In [41]:
name = 'Bell'
name[0] = 'b'

TypeError: 'str' object does not support item assignment

does not.

> #### Ch-Ch-Ch-Changes
>
> Data that can be changed is called [mutable](./gloss.html#mutable),
> while data that cannot be is called [immutable](./gloss.html#immutable).
> Like strings,
> numbers are immutable:
> there's no way to make the number 0 have the value 1 or vice versa
> (at least, not in Python&mdash;there actually *are* languages that will let people do this,
> with predictably confusing results).
> Lists and arrays,
> on the other hand,
> are mutable:
> both can be modified after they have been created.
>
> Programs that modify data in place can be harder to understand than ones that don't
> because readers may have to mentally sum up many lines of code
> in order to figure out what the value of something actually is.
> On the other hand,
> programs that modify data in place instead of creating copies that are almost identical to the original
> every time they want to make a small change
> are much more efficient.

There are many ways to change the contents of in lists besides assigning to elements:

In [42]:
odds.append(11)
print( 'odds after adding a value:', odds )

odds after adding a value: [1, 3, 5, 7, 11]


In [43]:
del odds[0]
print( 'odds after removing the first element:', odds )

odds after removing the first element: [3, 5, 7, 11]


In [44]:
odds.reverse()
print( 'odds after reversing:', odds )

odds after reversing: [11, 7, 5, 3]


#### Challenges

1.  Write a function called `total` that calculates the sum of the values in a list.
    (Python has a built-in function called `sum` that does this for you.
    Please don't use it for this exercise.)

In [45]:
def total(input):
    current_total = 0
    for i in input:
        current_total += i
    # your code here
    # use return to pass the value back
    return current_total

print(total([3,5,8]))  # Should print 16
print(total(range(1,200))) 

16
19900


### Nesting

Another thing to realize is that `if` statements can be combined with loops
just as easily as they can be combined with functions.
For example,
if we want to sum the positive numbers in a list,
we can write this:

In [46]:
numbers = [-5, 3, 2, -1, 9, 6]
total = 0
for n in numbers:
    if n >= 0:
        total = total + n
print('sum of positive values:', total)

sum of positive values: 20


We could equally well calculate the positive and negative sums in a single loop:

In [47]:
pos_total = 0
neg_total = 0
for n in numbers:
    if n >= 0:
        pos_total = pos_total + n
    else:
        neg_total = neg_total + n
print('negative and positive sums are:', neg_total, pos_total)

negative and positive sums are: -6 20


We can even put one loop inside another:

In [48]:
for consonant in 'bcd':
    for vowel in 'ae':
        print(consonant + vowel)

ba
be
ca
ce
da
de


As the diagram below shows,
the [inner loop](./gloss.html#inner-loop) runs from start to finish
each time the [outer loop](./gloss.html#outer-loop) runs once:

<img src="img/python-flowchart-nested-loops.svg" alt="Execution of Nested Loops" />

#### Challenges

1.  Will changing the nesting of the loops in the code above&mdash;i.e.,
    iterating over the vowels first and then the consonants change the output?
    Why or why not? Demonstrate it!

2.  Python (and most other languages in the C family) provides [in-place operators](./gloss.html#in-place-operator)
    that work like this:
    
    ~~~python
    x = 1  # original value
    x += 1 # add one to x, assigning result back to x
    x *= 3 # multiply x by 3
    print x
    6
    ~~~
    
    Rewrite the code that sums the positive and negative numbers in a list
    using in-place operators.
    Do you think the result is more or less readable than the original?

### Processing Multiple Files

We now have a lot of tools for looping over many things. The last thing we are going to loop over is files - this is very useful for data analysis - you often have hundreds of files to deal with.

To work with files, we use a library called `glob`:

In [49]:
import glob

The `glob` library contains a single function, also called `glob`,
that finds files whose names match a pattern.
We provide those patterns as strings:
the character `*` matches zero or more characters,
while `?` matches any one character.
We can use this to get the names of all the IPython Notebooks we have created so far:

In [50]:
print( glob.glob('*.ipynb') )

['Loops and Conditions-clean.ipynb', 'Loops and Conditions-solved.ipynb', 'Simple Functions.ipynb']


or to get the names of all our CSV data files:

In [51]:
print( glob.glob('*.csv') )

['a.csv', 'b.csv', 'd.csv']


As these examples show,
`glob.glob`'s result is a list of strings,
which means we can loop over it
to do something with each filename in turn.

To start with, write a `for` loop that prints the name of each file on its own line:

In [56]:
filenames = glob.glob('*.csv')
for n in filenames:
    print(n[:len(n)-4])

a
b
d


Now, write a for loop that opens each file, and prints each line in it. Remember from last weak that you can use:
```
with open('myfile.csv', 'r') as csvfile:
```
to open the file, and then
`for line in file:` to iterate over each line

In [57]:
for n in filenames:
    with open(n, 'r') as csvfile:
        for r in csvfile:
            print(r)

4,5,6

3,5,6

3,5,6

2,9,1



Now, update this to add up all the values in each file. You have two choices (both from last week):
* open each file using the csv file reader. Then convert each cell in each row to an int, and add them together.
* open each file using the read_csv function in Pandas, and get the sum of each row
You should print the sum of each row, the sum of each file, and the total sum. Your output should be something like:
```
Opening file: a.csv
Row total: 15
Total in file a.csv is 15
Opening file: b.csv
Row total: 14
Total in file b.csv is 14
Opening file: d.csv
Row total: 14
Row total: 12
Total in file d.csv is 26
Overall total is 55
```

In [58]:
overallTotal = 0
for n in filenames:
    with open(n, 'r') as csvfile:
        print('Opening file:', n)
        total = 0
        for r in csvfile:
            rowTotal = 0
            values = r.split(',')
            for v in values:
                total = total + int(v)
                rowTotal = rowTotal + int(v)
            print('Row total: ', rowTotal)
        print('Total in file', n, 'is', total)
        overallTotal = overallTotal + total;
print('Overall total is', overallTotal)    
    

Opening file: a.csv
Row total:  15
Total in file a.csv is 15
Opening file: b.csv
Row total:  14
Total in file b.csv is 14
Opening file: d.csv
Row total:  14
Row total:  12
Total in file d.csv is 26
Overall total is 55


Finally, convert some of the code inside your loop into a function, that takes a filename and returns the total count of that file. Rewrite your loop to use that function, and check that you still get the same total

In [59]:
def file_count(filename):
    with open(filename, 'r') as csvfile:
        print('Opening file:', n)
        total = 0
        for r in csvfile:
            rowTotal = 0
            values = r.split(',')
            for v in values:
                total = total + int(v)
                rowTotal = rowTotal + int(v)
            print('Row total: ', rowTotal)
        print('Total in file', n, 'is', total)
        return total;

overallTotal = 0
for file in glob.glob('*.csv'):
    overallTotal = overallTotal + file_count(file)
print('Overall total is', overallTotal)   

Opening file: d.csv
Row total:  15
Total in file d.csv is 15
Opening file: d.csv
Row total:  14
Total in file d.csv is 14
Opening file: d.csv
Row total:  14
Row total:  12
Total in file d.csv is 26
Overall total is 55


#### Key Points

*   Use `for variable in collection` to process the elements of a collection one at a time.
*   The body of a for loop must be indented.
*   Use `len(thing)` to determine the length of something that contains other values.
*   `[value1, value2, value3, ...]` creates a list.
*   Lists are indexed and sliced in the same way as strings and arrays.
*   Lists are mutable (i.e., their values can be changed in place).
*   Strings are immutable (i.e., the characters in them cannot be changed).
*   Use `glob.glob(pattern)` to create a list of files whose names match a pattern.
*   Use `*` in a pattern to match zero or more characters, and `?` to match any single character.

#### Next Steps

We can now analyze any number of data files with a single command.
More importantly,
we have met two of the most important ideas in programming:

1.  Use functions to make code easier to re-use and easier to understand.
1.  Use lists and arrays to store related values, and loops to repeat operations on them.


## Making Choices

Our previous lessons have shown us how to manipulate data,
define our own functions,
and repeat things.
However,
the programs we have written so far always do the same things,
regardless of what data they're given.
We want programs to make choices based on the values they are manipulating.
To help us see what decisions they're making,
we'll start by looking at how computers manipulate images.

#### Objectives

*   Create a simple "image" made out of colored blocks.
*   Explain how the RGB model represents colors.
*   Explain the similarities and differences between tuples and lists.
*   Write conditional statements including `if`, `elif`, and `else` branches.
*   Correctly evaluate expressions containing `and` and `or`.
*   Correctly write and interpret code containing nested loops and conditionals.
*   Explain the advantages of putting frequently-modified code in a function.

### Conditionals

The other thing we need in order to create a heat map of our own
is a way to pick a color based on a data value.
The tool Python gives us for doing this is called a [conditional statement](./gloss.html#conditional-statement),
and looks like this:

In [60]:
num = 37
if num > 100:
    print('greater')
else:
    print('not greater')
print('done')

not greater
done


The second line of this code uses the keyword `if` to tell Python that we want to make a choice.
If the test that follows it is true,
the body of the `if`
(i.e., the lines indented underneath it) are executed.
If the test is false,
the body of the `else` is executed instead.
Only one or the other is ever executed:

<img src="img/python-flowchart-conditional.svg" alt="Executing a Conditional" />

Conditional statements don't have to include an `else`.
If there isn't one,
Python simply does nothing if the test is false:

In [61]:
num = 53
print('before conditional...')
if num > 100:
    print('53 is greater than 100')
print('...after conditional')

before conditional...
...after conditional


We can also chain several tests together using `elif`,
which is short for "else if".
This makes it simple to write a function that returns the sign of a number:

In [62]:
def sign(num):
    if num > 0:
        return 1
    elif num == 0:
        return 0
    else:
        return -1

print('sign of -3:', sign(-3))

sign of -3: -1


One important thing to notice the code above is that we use a double equals sign `==` to test for equality
rather than a single equals sign
because the latter is used to mean assignment.
This convention was inherited from C,
and while many other programming languages work the same way,
it does take a bit of getting used to...

We can also combine tests using `and` and `or`.
`and` is only true if both parts are true:

In [63]:
if (1 > 0) and (-1 > 0):
    print('both parts are true')
else:
    print('one part is not true')

one part is not true


while `or` is true if either part is true:

In [64]:
if (1 < 0) or ('left' < 'right'):
    print('at least one test is true')

at least one test is true


In this case,
"either" means "either or both", not "either one or the other but not both".

#### Challenges

1.  `True` and `False` aren't the only values in Python that are true and false.
    In fact, *any* value can be used in an `if` or `elif`.
    After reading and running the code below,
    explain what the rule is for which values are considered true and which are considered false.
    (Note that if the body of a conditional is a single statement, we can write it on the same line as the `if`.)
    
    ~~~python
    if '': print 'empty string is true'
    if 'word': print 'word is true'
    if []: print 'empty list is true'
    if [1, 2, 3]: print 'non-empty list is true'
    if 0: print 'zero is true'
    if 1: print 'one is true'
    ~~~

2.  Write a function called `near` that expects two floating point numbers, and returns `True` if its first parameter is within 10% of its second
    and `False` otherwise.
    Compare your implementation with your partner's:
    do you return the same answer for all possible pairs of numbers?

In [65]:
def near(a,b):
    # your code here
    # return True or False
    c = abs(b-a)
    return c < b*.1    
    
print(near(1.1,1.2))
print(near(1.0,0.9))
print(near(100,90))
print(near(100,91))

True
False
False
True


#### Key Points

*   Use `if condition` to start a conditional statement, `elif condition` to provide additional tests, and `else` to provide a default.
*   The bodies of the branches of conditional statements must be indented.
*   Use `==` to test for equality.
*   `X and Y` is only true if both X and Y are true.
*   `X or Y` is true if either X or Y, or both, are true.
*   Zero, the empty string, and the empty list are considered false; all other numbers, strings, and lists are considered true.
*   Nest loops to operate on multi-dimensional data.
*   Put code whose parameters change frequently in a function, then call it with different parameter values to customize its behavior.