## Comprehensions

Python provides syntax for list comprehensions, dictionary comprehensions, and set comprehensions.

List comprehensions are by far the most common, so let's start with those.


#### List comprehensions

A list comprehension is a concise way to create a list.

Before their introduction, the most common way to create a list was to use a 'for-loop'.

```
>>> my_list = []
>>> for _ in range(10):  # '_' is often used as a 'throw-away' variable
...     my_list.append(_)
...
>>> my_list
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
```

The syntax for a list comprehension is

```
my_list = [expression for member in iterable (if condition)]
```

So the above example can be written as -

```
>>> my_list = [_ for _ in range(10)]
>>> my_list
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
```

In this example, 'expression' is the member itself. However, you can use any valid Python expression.

```
>>> my_list = [_**2 for _ in range(10)]
>>> my_list
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
```

```
>>> my_list = [str(_)*2 for _ in range(10)]
>>> my_list
['00', '11', '22', '33', '44', '55', '66', '77', '88', '99']
```

You can add a condition to a list comprehension.

```
>>> my_list = [_ for _ in range(20) if not _%3]
>>> my_list
[0, 3, 6, 9, 12, 15, 18]
```

```
>>> my_list = [_ for _ in range(100) if not _%2 and not _%5]
>>> my_list
[0, 10, 20, 30, 40, 50, 60, 70, 80, 90]
```


You can have 'nested' list comprehensions. A typical use is to 'flatten' a 2-dimensional list.

If you need this, it can help to code it in 'for-loop' form first.

```
>>> matrix = [(12, 20, 34), (45, 65, 43), (26, 25, 62)]

>>> new_list = []
>>> for row in matrix:
...   for element in row:
...     new_list.append(element)
...
>>> new_list
[12, 20, 34, 45, 65, 43, 26, 25, 62]
```

This is how you would do it using a list comprehension.

```
>>> new_list = [element for row in matrix for element in row]
>>> new_list
[12, 20, 34, 45, 65, 43, 26, 25, 62]```

The important point is to nest it in exactly the same sequence as in the 'for_loop' form.

#### Dictionary comprehensions

These are not used as often as list comprehensions, but can be very useful in the right circumstances.

The syntax for a dictionary comprehension is

```
my_dict = {key: value for element in iterable (if condition)}
```

Assume you want to create a dictionary from the numbers 1-10, if the number is even, where the key is the number and the value is the number squared.

```
>>> my_dict = {n: n**2 for n in range(1, 11) if not n%2}
>>> my_dict
{2: 4, 4: 16, 6: 36, 8: 64, 10: 100}
```


As a practical example, assume you are reading a csv file, where the first row contains column headings and the following rows contain the data.

```
col_head = ['acno', 'name', 'town']

rows  = []
rows.append(['A001', 'ABC Ltd', 'Jhb'])
rows.append(['B001', 'BCD Ltd', 'Pta'])
rows.append(['C001', 'CDE Ltd', 'Dbn'])
```

Now assume you want to process each row as a dictionary.

```
>>> for row in rows:
...     {k: v for k, v in zip(col_head, row)}
...
{'acno': 'A001', 'name': 'ABC Ltd', 'town': 'Jhb'}
{'acno': 'B001', 'name': 'BCD Ltd', 'town': 'Pta'}
{'acno': 'C001', 'name': 'CDE Ltd', 'town': 'Dbn'}
```


Here is another example using pandas. This arose from a question on RocketChat.

Let's say you have a dataframe with a number of columns, and you want to calculate and store the mean and standard deviation for each column according to the column name.

```
col_means = {col_name: df[col_name].mean() for col_name in df.columns}
col_stds = {col_name: statistics.stdev(df[col_name]) for col_name in df.columns}
```


#### Set comprehensions

A set is similar to a list, with the added feature that it will not store duplicate values. If you try to add a value that already exists, it will be ignored. Instead of square brackets, it is enclosed in braces (curly brackets).

The syntax for a set comprehension is

```
my_set = {expression for member in iterable (if condition)}
```


Here is a simplified example of a real-world use-case. Assume a list of 2-part tuples, where the first element represents a group code. We want to know how many group codes there are.

```
>>> values = [(1, 23), (1, 35), (1, 13), (2, 33), (2, 23), (2, 45), (3, 76), (3, 54), (3, 24)]
>>> {group for group, _ in values}
{1, 2, 3}
```


## Iterators

You may have heard the terms 'iterator' and 'iterable'. What do they mean, and what is the difference?

In simple terms, an 'iterable' is something that can be iterated (or 'looped') over. It is a 'container' for a number of objects that allows a caller to access the objects one at a time.

If you loop over a list, a tuple, or a set, it returns the elements. If you loop over a dictionary, it returns the keys. If you loop over a string, it returns the characters in the string. They are all 'iterable'.

An 'iterator' is what is used to actually return the objects. When you iterate over a container, Python internally calls the \_\_iter\_\_() method of the object, which returns an iterator.

An iterator must have a method called \_\_next\_\_(). The first time it is called it returns the first object in the container. It keeps track of the last one returned, so on each subsequent call it returns the next object. Once the last object has been returned, the next time you call \_\_next\_\_() it will raise StopIteration.

You rarely have to call \_\_next\_\_() in your own code. If you iterate over the object using a 'for-loop' or a list comprehension, Python handles calling \_\_iter\_\_() and \_\_next\_\_(), and catching StopIteration, without you having to worry about the details.

If you do need to manage it yourself, Python has two built-in functions to make it easier.

iter(object) returns an iterator for the object.

next(iterator) requests the iterator to return the next item.

```
>>> my_list = ['A001', 'B001', 'C001']
>>> my_iter = iter(my_list)
>>> next(my_iter)
'A001'
>>> next(my_iter)
'B001'
>>> next(my_iter)
'C001'
>>> next(my_iter)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
```


An iterator can only start at the beginning and finish at the end. It cannot go backwards, skip items, or find a particular item. Once it has reached the end, it cannot be restarted. If you do want to re-run the iteration, you have to request a fresh iterator.


## Generators

A generator <b>is</b> an iterator. It returns a sequence of values, one at a time, on receiving a call to its \_\_next\_\_() method.

In a regular iterator, the sequence of values already exists, and the iterator simply returns each one in turn.

In a generator, you can use the full power of Python to 'generate' each value in the sequence.

Generators come in two forms - generator expressions and generator functions.

#### Generator expression

A generator expression looks almost the same as a list comprehension -

```
my_gen = (expression for member in iterable (if condition))
```

The only difference is that it is enclosed in round brackets, not square brackets.

As you have seen in the explanation of list comprehensions, using a very simple syntax you can achieve a lot.

#### Generator function

If the logic required to generate the value cannot be handled in a generator expression, you can write a generator function.

A generator function looks almost identical to a regular function. The only difference is that, somewhere in the function, the keyword 'yield' appears.

When first introduced, some people felt that generator functions were sufficiently different from regular functions to merit their own keyword, such as using 'gen' instead of 'def'. For various reasons this was not accepted. If you are interested, read the PEP at https://www.python.org/dev/peps/pep-0255/


A generator function looks something like this -

```
def my_gen(*args, **kwargs):
    [perform any setup, such as opening a file or connecting to a database]
    while True:  # there will always be some kind of looping mechanism
        if [condition]:  # there may be some kind of test
            yield value  # this is what makes it a generator function
        else:
            break  # something has to end the sequence, or it will continue forever!
    [perform any wrapup, such as closing the database connection]
```

The sequence of events is different from calling a regular function.

When a generator function is called, it returns a generator. Nothing in the function is executed yet.

On the first call to \_\_next\_\_(), the function is executed line-by-line from the top, until it reaches the 'yield' statement.

At that point, it will 'yield' (return) the value, and be suspended. The function remains alive, and all variables retain their current value.

On the next call to \_\_next\_\_(), the function continues from the statement following the 'yield', and executes line-by-line until it reaches the same or another 'yield' statement (there is nothing to stop you having many 'yield' points).

This continues until the code reaches a 'return' statement, or naturally reaches the end of the function. At that point, the function will automatically raise StopIteration. Unlike a regular function, it is not possible to return a value here.

```
>>> def my_gen(max):
...     print('starting execution')
...     for i in range(max):
...         print('yielding', i)
...         yield i
...     print('wrapping up')
...

>>> g = my_gen(5)
>>> g
<generator object my_gen at 0x00000187DF3E4B30>

>>> for x in g:  # or simply 'for x in my_gen(5):'
...   print('received', x)
...
starting execution
yielding 0
received 0
yielding 1
received 1
yielding 2
received 2
yielding 3
received 3
yielding 4
received 4
wrapping up
>>>
```


#### Example

We will simulate 'filtering' a sequence by passing it through 3 filters before arriving at the sequence that we are interested in.

#### Using list comprehensions

```
>>> list_1 = [_ for _ in range(100) if not _%2]
>>> list_2 = [_ for _ in list_1 if not _%3]
>>> list_3 = [_ for _ in list_2 if not _%5]
>>> my_iter = iter(list_3)
>>> next(my_iter)
0
>>> next(my_iter)
30
>>> next(my_iter)
60
>>> next(my_iter)
90
>>> next(my_iter)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
```

#### Using generator expressions

```
>>> gen_1 = (_ for _ in range(100) if not _%2)
>>> gen_2 = (_ for _ in gen_1 if not _%3)
>>> gen_3 = (_ for _ in gen_2 if not _%5)
>>> my_iter = iter(gen_3)  # not strictly necessary, as gen_3 is an iterator
>>> next(my_iter)
0
>>> next(my_iter)
30
>>> next(my_iter)
60
>>> next(my_iter)
90
>>> next(my_iter)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration```


#### How to visualise what is happening under the hood

Using list comprehensions and using generator expressions give identical results.

But the way they achieve it is very different.

How can we see the difference?

```
def mod_2(n):
    print(f"Mod 2 of {n} is {n%2}")
    return n%2

def mod_3(n):
    print(f"Mod 3 of {n} is {n%3}")
    return n%3

def mod_5(n):
    print(f"Mod 5 of {n} is {n%5}")
    return n%5
```

These 3 functions return exactly the same results as \_%2, \_%3, \_%5, but with the print statement we can see when they are called.



#### Visualise list comprehensions

```
>>> list_1 = [_ for _ in range(100) if not mod_2(_)]
Mod 2 of 0 = 0
Mod 2 of 1 = 1
[snip 96 lines]
Mod 2 of 98 = 0
Mod 2 of 99 = 1

>>> list_1
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98]
>>>

>>> list_2 = [_ for _ in list_1 if not mod_3(_)]
Mod 3 of 0 = 0
Mod 3 of 2 = 2
Mod 3 of 4 = 1
[snip 44 lines]
Mod 3 of 94 = 1
Mod 3 of 96 = 0
Mod 3 of 98 = 2

>>> list_2
[0, 6, 12, 18, 24, 30, 36, 42, 48, 54, 60, 66, 72, 78, 84, 90, 96]
>>>

>>> list_3 = [_ for _ in list_2 if not mod_5(_)]
Mod 5 of 0 = 0
Mod 5 of 6 = 1
Mod 5 of 12 = 2
[snip 11 lines
Mod 5 of 84 = 4
Mod 5 of 90 = 0
Mod 5 of 96 = 1

>>> list_3
[0, 30, 60, 90]
>>>
```


#### Visualise generator expressions

```
>>> gen_1 = (_ for _ in range(100) if not mod_2(_))
>>> gen_2 = (_ for _ in gen_1 if not mod_3(_))
>>> gen_3 = (_ for _ in gen_2 if not mod_5(_))

>>> gen_1
<generator object <genexpr> at 0x0000021544243C10>
>>> gen_2
<generator object <genexpr> at 0x0000021544243DD0>
>>> gen_3
<generator object <genexpr> at 0x0000021544243CF0>
>>>
```

As you can see, we have set up our 3 generator expressions, but nothing has happened yet.

Now let's get the first item in the sequence.

```
>>> next(gen_3)
Mod 2 of 0 = 0  # gen_1 called next, it satisfied the 'if' condition, so it yielded it
Mod 3 of 0 = 0  # gen_2 called next, it satisfied the 'if' condition, so it yielded it
Mod 5 of 0 = 0  # gen_3 called next, it satisfied the 'if' condition, so it yielded it
0
>>>
```

That one was easy, as they all returned the first item in the sequence.

Now let's get the next one -

```
>>> next(gen_3)
Mod 2 of 1 = 1  # gen_1 called next, it did not satisfy the 'if' condition, so it continued
Mod 2 of 2 = 0  # gen_1 called next, it satisfied the 'if' condition, so it yielded it
Mod 3 of 2 = 2  # gen_2 received next from gen_1, it did not satisfy the 'if' condition, so it asked gen_1 for next
Mod 2 of 3 = 1  # gen_1 called next, it did not satisfy the 'if' condition, so it continued
Mod 2 of 4 = 0  # gen_1 called next, it satisfied the 'if' condition, so it yielded it
Mod 3 of 4 = 1  # gen_2 received next from gen_1, it did not satisfy the 'if' condition, so it asked gen_1 for next
Mod 2 of 5 = 1  # gen_1 called next, it did not satisfy the 'if' condition, so it continued
Mod 2 of 6 = 0  # gen_1 called next, it satisfied the 'if' condition, so it yielded it
Mod 3 of 6 = 0  # gen_2 received next from gen_1, it satisfied the 'if' condition, so it yielded it
Mod 5 of 6 = 1  # gen_3 received next from gen_2, it did not satisfy the 'if' condition, so it asked gen_2 for next

[snip 39 lines]

Mod 5 of 30 = 0  # gen_3 received next from gen_2, it satisfied the 'if' condition, so it yielded it
30
>>>
```

#### Summary

I won't show the rest (you can run it yourselves) but here is a summary of the results -

list_1 called next() 100 times, list_2 called next() 50 times, list_3 called next() 17 times.

Total for list comprehensions - 167 calls.
They were all called before we requested the first item, and it created three lists.

gen_1, gen_2, gen_3 made no calls before we requested the first item, and created no lists.

The first call to next() resulted in 3 calls.
The second call to next() resulted in 50 calls.
The third call to next() resulted in 50 calls.
The fourth call to next() resulted in 50 calls.
The fifth call to next() resulted in 14 calls.

Total for generator expressions - 167 calls.

So the amount of work done by both methods is the same. The differences are that generator expressions do not create intermediate lists, and they return each result as soon as it has been evaluated.

If you are dealing with large amounts of data, these differences can be significant.


#### David Beazley on generators

If you want to learn more about generators, and practical uses for them, I thoroughly recommend reading the presentations by David Beazley.

http://www.dabeaz.com/generators/