## Iteration And Arrays

The true power of programming is iteration.  Without iteration programming would not be what it is today.

With iteration we've:

* Cracked the German Enigma Machine (during world war 2)
* Got a man back from space
* And literally everything else that is of worth or note involving a computer

## Ways to do iteration

* while loop
* for loop (most common)
* recursion
* list comprehension

In [1]:
x = 0
while x < 10:
    print("The current value of 'x' is", x)
    x += 1


The current value of 'x' is 0
The current value of 'x' is 1
The current value of 'x' is 2
The current value of 'x' is 3
The current value of 'x' is 4
The current value of 'x' is 5
The current value of 'x' is 6
The current value of 'x' is 7
The current value of 'x' is 8
The current value of 'x' is 9


Syntax of a while loop:

```
while [CONDITION TO CHECK]:
....indented block
....indented block
....indented block
....indented block
```

The big thing with while loops is they keep going until a given condition is no longer true.  This means with iteration, in a sense our code becomes "dynamic" with looping.  It updates on its own, without our intervention.

Let's look at code that does the same thing, without a while loop.  

In [2]:
x = 0
print("The current value of 'x' is", x)
x += 1
print("The current value of 'x' is", x)
x += 1
print("The current value of 'x' is", x)
x += 1
print("The current value of 'x' is", x)
x += 1
print("The current value of 'x' is", x)
x += 1
print("The current value of 'x' is", x)
x += 1
print("The current value of 'x' is", x)
x += 1
print("The current value of 'x' is", x)
x += 1
print("The current value of 'x' is", x)
x += 1
print("The current value of 'x' is", x)
x += 1

The current value of 'x' is 0
The current value of 'x' is 1
The current value of 'x' is 2
The current value of 'x' is 3
The current value of 'x' is 4
The current value of 'x' is 5
The current value of 'x' is 6
The current value of 'x' is 7
The current value of 'x' is 8
The current value of 'x' is 9


As you can see, the code prints out the same thing, however this time, we had to write out the code every single time.  It should be clear the while loop syntax is much, much cleaner.  Now if we want to iterate over a piece of code a million times, it's as easy as changing one line!  Truly a powerful thing.

## The problem with while loops

So while loops are great!  They make our lives easier and mean no more copy/paste.  However they have a serious draw back - what happens when the condition never becomes false?  Let's look at a simple example of this:

In [None]:
while 5 < 7:
    print("This code is still running")
    

If I happen to run the above code then the kernel will die and the code will never complete!  All because I forgot to add a condition that at some point resolves to false!  While it's obvious in this example that this will never resolve to false, (because 5 is _always_ less than 7) it may not be the case that's its always so clear.  In fact, most of the time, it's likely non-obvious when a loop will terminate, with the given syntax, for any real cases.

This is because most humans, for better or worse cannot easily think in boolean logic.  

## Enter the for loop, mainstay of iteration

Luckily, there is a solution for this that tends to make matters much easier to reason about - the for loop.  Let's see an example!

In [3]:
for x in range(0, 10):
    print("The current value of 'x' is", x)

The current value of 'x' is 0
The current value of 'x' is 1
The current value of 'x' is 2
The current value of 'x' is 3
The current value of 'x' is 4
The current value of 'x' is 5
The current value of 'x' is 6
The current value of 'x' is 7
The current value of 'x' is 8
The current value of 'x' is 9


Notice a few notational conveniences here versus the while loop -

No variable initialization.  The variable 'x' is created by the for loop while it runs.  It also gets updated automatically!  Of course, if a variable x already existed, it would get overwritten with whatever value is in the for loop - let's see an example of this:

In [4]:
x = 255
for x in range(0, 10):
    print("The current value of 'x' is", x)

The current value of 'x' is 0
The current value of 'x' is 1
The current value of 'x' is 2
The current value of 'x' is 3
The current value of 'x' is 4
The current value of 'x' is 5
The current value of 'x' is 6
The current value of 'x' is 7
The current value of 'x' is 8
The current value of 'x' is 9


This is because there is an implicit assignment statement to the variable x, which overwrites any previous values for x.  So we need to pick our value of iteration with care (basically it shouldn't be any of the values already defined elsewhere in the same scope)

## Data Structures

In general a data structure applies some extra structure to our data.  All the data structures we see throughout the course will be compositions of simplier pieces of data.  For now, we'll assume that only the data we've introduced can be stored in these structures.  However, we'll relax this assumption over the course.

## Lists

Now that we've touched on for loops, let's dive a little deeper into how the range function works.  In order to understand that we'll need to introduce lists, our first data structure.

In [32]:
listing = [1, 2, 3]
print("The first index of the list", listing[0])
print("The second index of the list", listing[1])
print("The third index of the list", listing[2])

The first index of the list 1
The second index of the list 2
The third index of the list 3


As you can see the list data structure allows us to store multiple variables together, in one structure or "object".  We'll formally define an object in a future lecture, but for now you can informally think of an object as a collection of data and other functionality.

The above list allows us to also select values from our structure via a convention called indexing.

So getting the first element of our list, requires us to request the zeroth index via the 'slice' notation.  The general syntax is:

`list_data_structure[INDEX_OF_INTEREST]`

Notice the brackets at the end of the list.  If this feels like a function call, that's on purpose.  The syntax here is intended to be similar.  In fact, if it's helpful, you can think of the slice operator as a function that takes in a natural number and returns it's corresponding value in the given list.

In addition to initializing the list with all the desired elements and then accessing them via indexing, we have other ways of adding elements to our list data structure:

In [33]:
listing = []
listing.append(1)
listing.append(2)
listing.append(3)
print("The first index of the list", listing[0])
print("The second index of the list", listing[1])
print("The third index of the list", listing[2])

The first index of the list 1
The second index of the list 2
The third index of the list 3


This notation of `[OBJECT].[FUNCTION_NAME]([PARAMETERS])` is called dot notation and allows us to make calls to functions, called methods, associated with Python objects.  All Python objects have different associated methods, pieces of functionality that can augment, transform or give information about the associated stored data.

This convention is not specific to Python, but is instead used across most modern programming languages, more on this later.

Now that we have the beginnings of an understanding of lists.  Let's see if we can recover the `range` function we've been making use of.

Just to provide a definition of the functionality we are after, a `range` function is provided some arguments.  But rather than having me explain it, let's let Python explain itself!

In [34]:
help(range)

Help on class range in module builtins:

class range(object)
 |  range(stop) -> range object
 |  range(start, stop[, step]) -> range object
 |  
 |  Return an object that produces a sequence of integers from start (inclusive)
 |  to stop (exclusive) by step.  range(i, j) produces i, i+1, i+2, ..., j-1.
 |  start defaults to 0, and stop is omitted!  range(4) produces 0, 1, 2, 3.
 |  These are exactly the valid indices for a list of 4 elements.
 |  When step is given, it specifies the increment (or decrement).
 |  
 |  Methods defined here:
 |  
 |  __bool__(self, /)
 |      self != 0
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __getitem__(self, key, /)
 |      Return self[key].
 |  
 |  __gt__(self, value, /)
 |      Return self>value.
 |  
 |  __hash__(self, /)
 |

The `help` function can be passed any Python object, be that a function or piece of data or data structure.  And it will try to give you more semantic information about the object.  Part of writing good code is making sure to have well documented code.  As well see in a later lecture, it is possible for the programmer to provide this code documentation so that the help method can make use of it.

Now that we have a reasonable definition of the range function we can write our own!

In [35]:
def range_function(start, stop, step):
    iterator = start
    while iterator < stop:
        yield iterator
        iterator += step

for i in range_function(0,10,1):
    print(i)

0
1
2
3
4
5
6
7
8
9


There is only really one new piece here - the `yield` keyword.  It acts similarly to the return statement, except it remembers where it left off the last time, meaning that as you iterate over a set of outputs, you can pause iteration from one function call to the next.  This allows for something rather surprising - we can actually choose not to specify a final stopping point to our iteration.  Of course if we try to iterate forever, we'll eventually run out of memory.  But now we can go as high as well might like in iteration without having to bother to set a ceiling.

## Looping of a different kind - recursion

Unlike the explicit looping we've been talking about thus far, there is a completely different kind of iteration called recursion.  Where function calls string together one after another forming a sort of implicit iteration.

Let's look at a simple example:

In [5]:
def factorial(x):
    if x == 0:
        return 1
    if x == 1:
        return 1
    else:
        return x*factorial(x-1)
    
factorial(5)

120

It may not be clear how this could possibly be iteration.  So let's look at our return statements.  Notice we have 3 of them instead of one.  This comes to us from the definition of a factorial -

factorial(0) is defined as 1

factorial(1) is also defined as 1

Any other factorial of any number is defined as `n*(n-1)*(n-2)*...*3*2*1`

So that's why the code looks this way, at least sort of.  What's tricky at first is why we only need `x*factorial(x-1)`.  Why should that lead to iteration?

Well the answer it turns out is somewhat simple, when you really think about it -

We called the function inside it's definition.  What this means is we call the function again and again until the value passed in is either 0 or 1.

We could of course write the factorial function as a for-loop as follows:

In [10]:
def factorial(x):
    if x == 0:
        return 1
    if x == 1:
        return 1
    result = x
    for i in range(x-1, 1, -1):
        result *= i
    return result

factorial(7)

5040

While recursion, once you are used to it, appears more elegant than iteration, it is far more costly, especially in Python.  In some languages recursion is optimized.  But the typical case is for or while loops are much faster than recursion.  Also they are far more memory efficient, because you don't need to make anywhere near as many function calls.

Let's do a quick timing test to see how much faster the for loop version is than the recursive version.

In [14]:
import time


def factorial_recursive(x):
    if x == 0:
        return 1
    if x == 1:
        return 1
    else:
        return x*factorial(x-1)

    
def factorial_for_loop(x):
    if x == 0:
        return 1
    if x == 1:
        return 1
    result = x
    for i in range(x-1, 1, -1):
        result *= i
    return result

for i in range(10000, 10050):
    start_recursive = time.time()
    factorial_recursive(i)
    running_time_recurse = time.time() - start_recursive
    start_for_loop = time.time()
    factorial_for_loop(i)
    running_time_for_loop = time.time() - start_for_loop
    print("For i=", i)
    print("the running time for recursive=", running_time_recurse)
    print("the running time for for-loop=", running_time_for_loop)

For i= 10000
the running time for recursive= 0.023797988891601562
the running time for for-loop= 0.02331686019897461
For i= 10001
the running time for recursive= 0.023462295532226562
the running time for for-loop= 0.02345871925354004
For i= 10002
the running time for recursive= 0.023214340209960938
the running time for for-loop= 0.02341318130493164
For i= 10003
the running time for recursive= 0.023720502853393555
the running time for for-loop= 0.023316144943237305
For i= 10004
the running time for recursive= 0.023775577545166016
the running time for for-loop= 0.02316594123840332
For i= 10005
the running time for recursive= 0.023030519485473633
the running time for for-loop= 0.023274660110473633
For i= 10006
the running time for recursive= 0.026077985763549805
the running time for for-loop= 0.023395299911499023
For i= 10007
the running time for recursive= 0.023053407669067383
the running time for for-loop= 0.023135900497436523
For i= 10008
the running time for recursive= 0.0230138301849

Interestingly, for one recursive call things look like they are actually doing okay.  Let's see if this holds up for making two recursive calls per function call.

In [30]:
def fib(x):
    if x == 0:
        return 1
    if x == 1:
        return 1
    else:
        return fib(x-1) + fib(x-2)

    
def fib_two(x):
    fib_seq = [1, 1]
    if x <= 1:
        return x
    else:
        for i in range(2, x+1):
            fib_seq.append(fib_seq[i-1] + fib_seq[i-2])
    return fib_seq[x]


for i in range(10, 40):
    start_recursive = time.time()
    fib(i)
    running_time_recurse = time.time() - start_recursive
    start_for_loop = time.time()
    fib_two(i)
    running_time_for_loop = time.time() - start_for_loop
    print("For i=", i)
    print("the running time for recursive=", running_time_recurse)
    print("the running time for for-loop=", running_time_for_loop)

For i= 10
the running time for recursive= 2.2172927856445312e-05
the running time for for-loop= 2.6226043701171875e-06
For i= 11
the running time for recursive= 3.457069396972656e-05
the running time for for-loop= 3.5762786865234375e-06
For i= 12
the running time for recursive= 5.2928924560546875e-05
the running time for for-loop= 2.86102294921875e-06
For i= 13
the running time for recursive= 8.440017700195312e-05
the running time for for-loop= 2.86102294921875e-06
For i= 14
the running time for recursive= 0.00016999244689941406
the running time for for-loop= 3.814697265625e-06
For i= 15
the running time for recursive= 0.0002167224884033203
the running time for for-loop= 2.86102294921875e-06
For i= 16
the running time for recursive= 0.0003476142883300781
the running time for for-loop= 4.291534423828125e-06
For i= 17
the running time for recursive= 0.0005626678466796875
the running time for for-loop= 3.337860107421875e-06
For i= 18
the running time for recursive= 0.0009059906005859375
t

As you can see, for the case when multiple recursive calls need to be made per functional call, the iterative case for exceeds the recursive case in performance.  

## List comprehensions

The last looping technique we will look at today is more of a syntactic sugar than anything else.  That just means it looks pretty but doesn't really do anything new.  Let's see an example!  

In [41]:
list_comp = [elem for elem in range(100)]
print(list_comp)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]


The general syntax for a list comprehension is:

`[[ITERATOR VALUE] for [ITERATOR VALUE] in [ITERABLE OBJECT]]`

Here the `[ITERATOR VALUE]` is the value being interated over.  The `[ITERABLE OBJECT]` is any object that can be iterated over.  Two examples of iterables are the lists we saw above and the result of the `range` function.

In [42]:
list_for = []
for elem in range(100):
    list_for.append(elem)
print(list_for)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]


In [43]:
list_for == list_comp

True

As you can see the two pieces of code are equivalent.  The only real difference is that we can accomplish what typically takes 3 lines in 1 line.

List comprehensions are really great for expressing a single idea in a loop.  If looping takes a few ideas and brings them together in a loop, probably best to use a traditional for loop.  The more complex the code is inside the for loop, the harder it is to express as a list comprehension.

That said, there are some extra things you can do with list comprehensions to make them a bit more flexible.

In [45]:
evens = [elem for elem in range(101) if elem % 2 == 0]
print(evens)

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100]


As you can see, the evens list has all the even numbers between 0 and 100 inclusive.  The reason only the evens are stored is because of the code at the end - `if elem % 2 == 0`.

This code says evaluate whether or not the element is divisible by 2 with no remainder.  Basically the definition of an even number.  In general, adding an if statement of this kind to a list comprehension acts as a filter restricting our resultant list to those elements for which the boolean condition is true.

In [50]:
evens_and_sevens = [elem if elem % 2 == 0 else elem % 7 
                    for elem in range(101)]
print(evens_and_sevens)

[0, 1, 2, 3, 4, 5, 6, 0, 8, 2, 10, 4, 12, 6, 14, 1, 16, 3, 18, 5, 20, 0, 22, 2, 24, 4, 26, 6, 28, 1, 30, 3, 32, 5, 34, 0, 36, 2, 38, 4, 40, 6, 42, 1, 44, 3, 46, 5, 48, 0, 50, 2, 52, 4, 54, 6, 56, 1, 58, 3, 60, 5, 62, 0, 64, 2, 66, 4, 68, 6, 70, 1, 72, 3, 74, 5, 76, 0, 78, 2, 80, 4, 82, 6, 84, 1, 86, 3, 88, 5, 90, 0, 92, 2, 94, 4, 96, 6, 98, 1, 100]


if you want to do additional filtering you'll need to put the if/else statement infront of the for loop. Although this is usually also best accompanied by a line split, as shown above.