# Loops and avoiding loops

There are several ways to loop, explicitly or implicitly.

For numerical problems, the fastest ways is to avoid a loop by using a **NumPy array**. We'll look at this last.

For general problems, the **`for` loop** is probably the best way to go. If the contents of the loop are fairly simple, then a **list comprehension** might be slightly better. Occasionally, it might make more sense to use a **`while` loop**.

If you don't actually need to manifest the new object in memory (for example, because you're going to loop over it), then a **generator expression** or **`map`** is good because they don't actually perform the operation until required to yield the next item. (This is called 'lazy evaluation'). 

----

Let's solve the same problem &mdash; collecting the squares of the numbers 0, 10, 20... 100 &mdash; a few different ways. 

We could use something like `range(0, 101, 10)` to generate the list of input numbers on the fly, but to keep the amount of code I need to explain down to a minimum, I'll just define the input variable `numbers` literally:

In [56]:
numbers = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100]

## `while`

In [57]:
squares = []
i = 0

while i <= 10:
    n = numbers[i]
    squares.append(n**2)
    i += 1

squares

[0, 100, 400, 900, 1600, 2500, 3600, 4900, 6400, 8100, 10000]

In practice, it's relatively rare to see `while` loops in Python, and you almost never see counters used in this way (to index into some array). The `while` loop is useful sometimes though, e.g. in situations where you are iterating some process operating on a set of inputs, rather than iterating over an array or collection of items. For that kind of task, `for` loops are usually a better way to go.

## `for`

In [58]:
squares = []
for n in numbers:
    squares.append(n**2)
    
squares

[0, 100, 400, 900, 1600, 2500, 3600, 4900, 6400, 8100, 10000]

### Exercise

- Use a `for` loop and string methods to process this list into a new one that is just a list of rock names.

In [59]:
rocks = '# Sandstone: 2300 m/s\n# Limestone: 3500 m/s\n# Shale: 2450 m/s\n# Salt: 4500 m/s'
print(rocks)

# Sandstone: 2300 m/s
# Limestone: 3500 m/s
# Shale: 2450 m/s
# Salt: 4500 m/s


In [60]:
# Your code here.

In [61]:
names = []
for rock in rocks.split('\n'):
    name = rock[:rock.find(':')]
    names.append(name.strip('# '))
    
names

['Sandstone', 'Limestone', 'Shale', 'Salt']

In fact, we often use [regular expressions](https://docs.python.org/3/library/re.html) for tasks like this. 

In [62]:
import re

rock = '# Sandstone: 2300 m/s'
name = re.findall(r'# (\w+):', rocks)
name

['Sandstone', 'Limestone', 'Shale', 'Salt']

## List comprehension

A compact way to write a `for` loop... it's also faster than a loop.

In [64]:
squares = [n**2 for n in numbers]

squares

[0, 100, 400, 900, 1600, 2500, 3600, 4900, 6400, 8100, 10000]

Sometimes, we don't actually need to instantiate the entire list &mdash; or if it very large, we might not be able to without running out of memory or disk space. If we're going to pass this new object on to some other loop or  function, we might be able to use a **generator expression** instead:

In [92]:
(n**2 for n in numbers)

<generator object <genexpr> at 0x7f64b4d69138>

If we use this in the next step of our process, the new list of squares never actually gets instantiated, so we save taking up a chunk of memory.

In [93]:
for item in (n**2 for n in numbers):
    print("Result:", item)

Result: 0
Result: 100
Result: 400
Result: 900
Result: 1600
Result: 2500
Result: 3600
Result: 4900
Result: 6400
Result: 8100
Result: 10000


### Exercise

- Use a list comprehension to perform the string processing task in the previous exercise.

In [65]:
# Your code here.

In [66]:
[re.sub(r'# ([a-zA-Z]+).*', r'\1', rock) for rock in rocks.split('\n')]

['Sandstone', 'Limestone', 'Shale', 'Salt']

## `map`

We often perform tasks on sequences by writing a function that takes a list as an argument. We pass our list to the function and it gets processed, and we get some result back.

You can think of `map` as 'sending' a function to an object, rather than the other way around. This approach is an example of what is known as 'functional programming'. 

We start by defining a function to map to the object. The function should take a single argument, which represents the element of the sequence at each step of the iteration:

In [67]:
def square(x):
    return x**2

In [68]:
squares = map(square, numbers)

squares

<map at 0x7f64b4d42be0>

Notice that this doesn't give us a list, we get a `map` object. We can loop over this without doing anything to it, but if we just want a list, like we got from our previous methods, we'll have to typecast it to a `list`.

In [30]:
list(squares)

[0, 100, 400, 900, 1600, 2500, 3600, 4900, 6400, 8100, 10000]

There is an alternative to this. Since Python 3.5 we are also allowed to use unpacking with a strange-looking comma, to get a tuple, but I think this barely looks like Python code:

In [38]:
*map(square, range(0, 101, 10)),

(0, 100, 400, 900, 1600, 2500, 3600, 4900, 6400, 8100, 10000)

### A note about partials

Notice that we don't get to send any parameters to the function in `map`, we essentially just send each element of the collection one at a time. It is possible to use `map` with functions that take multiple arguments with the application of something called ["partial application"](https://en.wikipedia.org/wiki/Partial_application). 

Suppose that instead of a function `square()`, we happen to have a more general function `power()`, that also takes the power to which we want to raise a number:

In [69]:
def power(x, y):
        return x ** y

power(30, 2)

900

We'd like to use this in our `map` with `y = 2` in every case. Naively, we might try this:

In [74]:
list(map(power(y=2), numbers))

TypeError: power() missing 1 required positional argument: 'x'

To satisfy `map`, we need to create a new function that only requires `x` as input. We can do this in a couple of ways &mdash; by wrapping our function in another function, or by the use of Python's `functools` library.

First, let's wrap our function. Notice that this new function just returns the result of sending its input to `power` with `y = 2`.

In [75]:
def square(x):
    return power(x, 2)

list(map(square, numbers))

[0, 100, 400, 900, 1600, 2500, 3600, 4900, 6400, 8100, 10000]

Alternatively, we can use the `partial()` function from the `functools` library to do the same job. It takes `square` as the first argument, and the arguments of `square()` to set as subsequent arguments (by default, it takes them in order, so we have to specify that we're setting `y`).

In [76]:
from functools import partial

square = partial(power, y=2)

list(map(square, numbers))

[0, 100, 400, 900, 1600, 2500, 3600, 4900, 6400, 8100, 10000]

### A note about `lambda`

Notice that in the previous sections, we define the function `square()`, and immediately use it. It may turn out that we don't need to use it again &mdash; it was a one-time need.

In practice, we often don't define a function in this kind of situation. If we don't want to use it again, and this is the only thing we needed it for, then we often make a `lambda` instead. 

Lambdas are just unnamed functions. So our `square()` function is equivalent to this:

In [81]:
lambda x: x**2

<function __main__.<lambda>(x)>

We can use a `lambda` anywhere we would normally use a function. Here's how we might define the `map` with a lambda:

In [82]:
squares = map(lambda x: x**2, numbers)

list(squares)

[0, 100, 400, 900, 1600, 2500, 3600, 4900, 6400, 8100, 10000]

By the way, here's a gotcha: we can only cast it once. The map is an iterator, and once it has iterated it is 'used up' so to speak.

In [83]:
list(squares)

[]

### Exercise

- Perform the string processing task as a `map`.

In [84]:
# Make a function to process each item. Call it 'process'.
# Your code here.

rock_list = rocks.split('\n')
list(map(process, rock_list))

['Sandstone', 'Limestone', 'Shale', 'Salt']

This should give:

    ['Sandstone', 'Limestone', 'Shale', 'Salt']

In [85]:
def process(rock):
    return rock[:rock.find(':')].strip('# ')

list(map(process, rock_list))

['Sandstone', 'Limestone', 'Shale', 'Salt']

## `numpy` (and `pandas`)

We can also avoid the loop by using NumPy's n-dimensional array or 'ndarray'. Many mathematical operations involving arrays are automatically 'vectorized', meaning that the operation is carried out either elementwise or 'broadcast' over one or more dimensions. We can write very compact code, and it's almost always much faster than trying to do math on Python's native objects.

Pandas's `Series` objects (columns) are essentially NumPy arrays, so this works on them too.

In [88]:
import numpy as np

numbers = np.array(numbers)

numbers

array([  0,  10,  20,  30,  40,  50,  60,  70,  80,  90, 100])

Now we can do math directly on this array object:

In [89]:
numbers**2

array([    0,   100,   400,   900,  1600,  2500,  3600,  4900,  6400,
        8100, 10000])

So our problem becomes very easy:

In [91]:
squares = numbers**2

That's it! 

You can probably see why for numerical problems &mdash; iterating over a list of numbers and transforming them into other numbers &mdash; NumPy arrays are almost always the way to go. 

Normally we'd stop there, since having an array is just as good as (or better than!) having a list. That is, we can use it in many of the places we can use a list. But to have strictly the same result as the other blocks here, we need to cast to a list:

In [87]:
list(squares)

[0, 100, 400, 900, 1600, 2500, 3600, 4900, 6400, 8100, 10000]