# NB15: Comprehensions

## Programming Fundamentals

## L.EIC/2022-23

#### Nuno Macedo$^{1}$, João Correia Lopes$^{1}$, Pedro Vasconcelos$^{2}$
$^{1}$FEUP/DEI & INESC TEC\
$^{2}$FCUP/DCC & LIACC

> “Beware of bugs in the above code; I have only proved it correct, not tried it.”

Donald E. Knuth

## Goals

By the end of this class, the student should be able to:

- Simplify common list processing patterns using Comprehensions
- Describe the use of iterators
- Describe the use of List, Set and Dictionary comprehensions
- Describe the use of Generator comprehensions

## Bibliography

- A. M. Kuchling, *Functional Programming HOWTO*, Release 0.32 [[HTML]](https://docs.python.org/3/howto/functional.html)

- Python Course, *List Comprehension*, [Python3 Tutorial](https://python-course.eu/python3_list_comprehension.php)

# 15 Comprehensions

## 15.1 Introduction

-  "List Comprehensions" are  Guido van Rossums preferred way to do *list processing*, because he doesn't like lambda, map, filter and reduce either.

- In his article from May 2005 [All Things Pythonic: The fate of reduce() in Python 3000](http://www.artima.com/weblogs/viewpost.jsp?thread=98196), he gives his reasons for dropping `lambda`, `map()`, `filter()` and `reduce()`.

  - List comprehension is more evident and easier to understand

  - Having both list comprehension and "filter, map, reduce and lambda" is transgressing the Python motto "There should be one obvious way to solve a problem"


### Comprehensions

- Essentially, it is Python's way of implementing a well-known notation for sets as used by mathematicians.
  - In mathematics the square numbers of the natural numbers are, for example, created by
  $\{ x^2 | x ∈ ℕ \}$
  - or the set of complex integers
  $\{ (x,y) | x ∈ ℤ ∧ y ∈ ℤ \}$

> Using comprehensions is often a way both to make code more compact and to **shift our focus from the “how” to the “what”**


## 15.2 Iterators

### Lazy evaluation

> A powerful feature of Python is its **iterator** protocol (which we will get to shortly).

> This capability is only loosely connected to functional programming *per se*, since Python does not quite offer lazy data structures in the sense of a language like Haskell.

> However, the use of the iterator protocol — and Python’s many built-in or standard library iteratables — accomplish much the same effect as an actual lazy data structure.

David Mertz, *Functional Programming in Python*, O'Reilly Media, 2015

### Iterators

- An iterator is an object representing a stream of data and returns the data one element at a time

- Several of Python's built-in data types support iteration, the most common being lists and dictionaries

- An object is called **iterable** if you can **get an iterator for it**

You can experiment with the iteration interface manually:

In [None]:
it = iter(range(1,5))
it

In [None]:
next(it)

In [None]:
next(it)

In [None]:
it.__next__()  # same as next(it)

In [None]:
next(it)

### The for iteration

- Python expects iterable objects in several different contexts, the most notable being the `for` statement

In [None]:
it = iter(range(1,5))
for i in it:
    print(i)

Given an iterable, `for` calls it implicitly:

In [None]:
for i in range(1,5):
    print(i)

### iterators

- Iterators can be **materialised** as lists, tuples or sets by using the `list()`, `tuple()` or `set()` constructor functions



In [None]:
it = iter(range(1,5))
t = tuple(it)
print(t)


- Built-in functions such as `max()` and `min()` can take a single iterator argument as well

- The `in` and `not in` operators also support iterators

- Note that you can **only go forward in an iterator**; there's no way to get the previous element, reset the iterator, or make a copy of it

In [None]:
it = iter(range(1,5))
next(it) # discard the first value
for i in it:
  print(i)  # print the remaning values

What happens if you ask for another value after consuming the iterator? You get a **runtime error**:

In [None]:
next(it)

### Iterator algebra

 There are two main reasons why working with iterators is useful:
  - avoids repeated code (DRY - "Don't Repeat Yourself")
  - improved memory efficiency (by not keeping the collection of values all in memory at once)

- To see this, have a look at [What Is Itertools and Why Should You Use It?](https://realpython.com/python-itertools/#what-is-itertools-and-why-should-you-use-it)

Iterators usage is best viewed as a collection of building blocks that can be combined to form specialized “data pipelines”

In [None]:
list(zip(range(1,5), range(2,6)))

In [None]:
list(map(sum, zip(range(1,5), range(2,6))))

## 15.3 List Comprehensions

### List Displays

- For constructing a list, a set or a dictionary, Python provides special syntax called "displays"<sup>1</sup>

- The most common list *display* is the simple literal value:

```
    [ expression < , ... > ]
```

- For example:

```
    fruit = ["Apples", "Peaches", "Pears", "Bananas"]
```

- But Python has a second kind of list *display*, based on a list comprehension

<sup>1</sup>[The Python Language Reference](https://docs.python.org/3/reference/expressions.html#displays-for-lists-sets-and-dictionaries)

### List Comprehensions

- A list comprehension is an expression that combines a function, a  `for` statement, and an optional `if` statement

- This allows a simple, clear expression of the processing that will build up an iterable sequence

- The most important thing about a list comprehension is that **it is an iterable that applies a calculation to another iterable**

- A list display can use a list comprehension iterable to create a new list

![comprehension](https://raw.githubusercontent.com/fp-leic/public/main/notebooks/15/comprehension.png)

```
   even = [2*x for x in range(18)]
```

Try it here:

In [None]:
evens = [2*x for x in range(10)]
print(even)

### Example 1: Temperature conversion

We can use a list comprehension to convert Celsius temperatures into Fahrenheit.

Recall the convertion formula $F = \frac{9}{5}C + 32$ where $C$ is the Celsius temperatura and $F$ is the Fahrenheit.


In [None]:
celsius = [39.2, 36.5, 37.3, 37.5]
fahrenheit = [ ((9/5)*c + 32) for c in celsius ]
print(fahrenheit)

Alternatively, we could also have written this conversion using `map` and a lambda.

However, the list comprehension above is (probabily) more readable. What do you think?

In [None]:
celsius = [39.2, 36.5, 37.3, 37.5]
fahrenheit = list(map(lambda c: (9/5)*c+32, celsius))
print(fahrenheit)

### Example 2: Pythagorean triples

A *Pythagorean triple* consists of three positive integers $a$, $b$, and $c$, such that $a^2 + b^2 = c^2$.

For example: $(3,4,5)$ is a Pythagorean triple because $3^2 + 4^2 = 9 + 16 = 25 = 5^2$.

Let us write a list comprehension to find all Pythagorean triples up to 30:

In [None]:
[(a,b,c) for a in range(1,30) for b in range(1,30) for c in range(1,30) if a**2 + b**2 == c**2]

Note that all triples appear in pairs: if $(a,b,c)$ is a Pythagorean triple then $(b,a,c)$ is also a Pythagorean triple.

Can you find a way to avoid this duplication and list only one of such triples?

### List Comprehension Semantics

- When we write a list comprehension, we will provide an iterable, a variable and an expression

- Python will process the iterator as if it was a for-loop, iterating through a sequence of values

- It evaluates the expression, once for each iteration of the for-loop

- The resulting values can be collected into a fresh, new list, or used anywhere an iterator is used


In [None]:
string = "Hello 12345 World"
print([int(x) for x in string if x.isdigit()])

for n in [int(x) for x in string if x.isdigit()]:
    print(n*n)

### List Comprehension Syntax

- A list comprehension is --- technically --- a complex expression

- It's often used in list displays, but can be used in a variety of places where an iterator is expected

```
   expr <for-clause>
```

- The `expr` is any expression, usualy including the `for` loop variable

- It can be a simple constant, or any other expression (including a nested list comprehension)

- The `for-clause` mirrors the `for` statement

```
   for variable in sequence
```

### Comprehension in a List Display

- For example:

```
   # a list of values [0, 2, 4, ..., 34]
   even = [2*x for x in range(18)]

   # list of 2-tuples, each built from the values in the given sequence
   hardways = [(x,x) for x in (2, 3, 4, 5)]

   # a list of 10 random numbers
   samples = [random.random() for x in range(10)]
```

- A list display that uses a list comprehension behaves like the
    following loop:

```
   r = []
   for variable in sequence:
      r.append(expr)
```

$\Rightarrow$
<https://github.com/fp-leic/public/tree/master/lectures/15/for_comp.py>

The expression is a comprehension does not that need to depend on the iteration variable. In such cases it is common to use `_` (underscore) for the iteration variable. For example, here is a way to construct a list of 5 zeros:

In [None]:
zeros = [0 for _ in range(5)]
print(zeros)

We can use a similar trick to construct a list of lists i.e. a matrix. Here's a $3\times 5$ zero matrix:

In [None]:
matrix = [[0 for _ in range(5)] for _ in range(3)]
print(matrix)

Note that there this is comprehension whose expression is also a comprehension!

An expression that depends on the iteration variable:

In [None]:
squares = [x**2 for x in range(10)]
print(squares)

### The if Clause

- A list comprehension can also have an **if-clause**

```
   expr <for-clause> <if-clause>
```

- The `if-clause` includes a boolean expression

```
   if filter
```

- Here is an example of a complex list comprehension in a list display

```
   hardways = [(x,x) for x in range(1,7) if 2*x not in {2, 12}]
```

- This more complex list comprehension behaves like the following
    loop:

```
   r = []
   for variable in sequence :
      if filter:
         r.append(expr)
```

In [None]:
hardways = [(x,x) for x in range(1,7) if 2*x not in {2, 12}]
print(hardways)

### Another example

```
   >>> [(x, 2*x+1) for x in range(10) if x % 3 == 0]
```

- This works as follows:

    1.  The for-clause iterates through the 10 values given by
        `range(10)`, assigning each value to the local variable `x`

    2.  The if-clause evaluates the filter function, `x % 3 == 0`. If it
        is `False`, the value is skipped; if it is `True`, the
        expression, at `(x, 2*x+1)`, is evaluated and retained

    3.  The sequence of 2-tuples are assembled into a list


In [None]:
[(x, 2*x+1) for x in range(10) if x % 3 == 0]

## 15.4 Nested List Comprehensions

### Nested List Comprehensions

- A list comprehension can have any number of *for-clauses* and
    *if-clauses*, freely-intermixed

- A *for-clause* must be first

- The clauses are evaluated from left to right

$\Rightarrow$
[The Python Language Reference](https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions)

Let A and B be two sets, the cross product (or **Cartesian product**) of A and B, written A×B, is the set of all pairs wherein the first element is a member of the set A and the second element is a member of the set B.

In [None]:
colours = [ "red", "green", "yellow", "blue" ]
things = [ "house", "car", "tree" ]
coloured_things = [ (x,y) for x in colours for y in things ]
print(coloured_things)

### Example of Matrix transposition

- Given a 3x4 matrix implemented as a list of 3 lists of length 4

In [None]:
matrix = [
    [1, 2, 3, 4],
    [5, 6, 7, 8],
    [9, 10, 11, 12]]

print(matrix)

List comprehension will transpose rows and columns

In [None]:
n_col = len(matrix[0])
transposed = [[row[i] for row in matrix] for i in range(n_col)]
print(transposed)

Unroll the nested list comprehension:


In [None]:
transposed = []
for i in range(n_col):
    # the following 3 lines implement the nested listcomp
    transposed_row = []
    for row in matrix:
        transposed_row.append(row[i])
    transposed.append(transposed_row)

print(transposed)

## 15.5 Generator Expressions

### Building Generators With Generator Expressions

- Like list comprehensions, generator expressions allow you to quickly create a generator object in just a few lines of code

```
n_squared_g = (n**2 for n in range(5))

```

- Unlike list comprehensions, generator expressions do NOT build and hold the entire object in memory before iteration

- the generator is lazy, remember?

See the differences here:

In [None]:
n_squared_c = [n**2 for n in range(5)]
print(n_squared_c)

In [None]:
n_squared_g = (n**2 for n in range(5))
print(n_squared_g)

In [None]:
print(list(n_squared_g))

Let's inspect the memory usage of the resulting objects in both cases:

In [None]:
import sys
print(sys.getsizeof([i * 2 for i in range(10000)]))
print(sys.getsizeof((i * 2 for i in range(10000))))

### Comprehensions Outside List Displays

- We can use the iterable list comprehension in other contexts that
    expect an iterator

$\Rightarrow$
<https://github.com/fp-leic/public/tree/master/lectures/15/out_comp.py>

In [None]:
square = sum((2*a+1) for a in range(10))
print(square)

In [None]:
column_1 = tuple(3*b+1 for b in range(12))
print(column_1)

### Generator expressions and list comprehensions

- Two common operations on an iterator’s output are
  1. performing some operation for every element
  2. selecting a subset of elements that meet some condition

- List comprehensions and generator expressions (short form: “listcomps” and “genexps”) are a concise notation for such operations<sup>1</sup>

<sup>1</sup> borrowed from the functional programming language Haskell

For example, you can strip all the whitespace from a stream of strings with the following code:

In [None]:
line_list = ['  line 1\n', 'line 2  \n', 'line 3  \n']  # , ...

# Generator expression -- returns iterator
stripped_iter = (line.strip() for line in line_list)
print(stripped_iter)

In [None]:
# List comprehension -- returns list
stripped_list = [line.strip() for line in line_list]
print(stripped_list)

### Some more generator expressions

Create a generator object that will iterate over 100 values

In [None]:
import random
rolls = ((random.randint(1,6), random.randint(1,6)) for u in range(100))
print(rolls)

In [None]:
hardways = any(d1 == d2 for d1, d2 in rolls)
print(hardways)

### Generator internal state

The generator, as other lazy constructs, has an internal state: **it can only be used once**

In [None]:
import random
rolls = ((random.randint(1,6), random.randint(1,6)) for u in range(100))
for t in rolls:
    print(t)

What happens if you try to use it again?

In [None]:
for t in rolls:
    print(t)

The iterator has been consumed, hence nothing more is printed!

 **Iterators can only be used once!**

### The number primes between 1 and 100

- Calculation of the prime numbers between 1 and 100 using the
[sieve of Eratosthenes](https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes):

In [None]:
composites = [j for i in range(2, 8) for j in range(i*2, 100, i)]
primes = [p for p in range(2, 100) if p not in composites]
print(primes)

- We want to bring the previous example into more general form, so that we can calculate the list of prime numbers up to an arbitrary number n
- It is enough to examine the multiples of the prime numbers up to the square root of $n$

In [None]:
from math import sqrt

n = 100
limit = 1 + int(sqrt(n))
composites = [j for i in range(2, limit) for j in range(i*2, n, i)]
primes = [p for p in range(2, n) if p not in composites]
print(primes)

- If we have a look at the content of `composites`, we can see that there are lots of duplicate entries contained in this list.

In [None]:
composites.sort()
print(composites)

## 15.6 Set Comprehensions

### Set Comprehensions

- A set comprehension is similar to a list comprehension, but returns a set and not a list

- Syntactically, we use curly brackets instead of square brackets to create a set

- Set comprehension is the right functionality to solve our problem from the previous subsection
  - We are able to create the set of composites without duplicates:

In [None]:
from math import sqrt
n = 100
limit = 1+int(sqrt(n))
composites = {j for i in range(2, limit) for j in range(i*2, n, i)}
print(composites)

### Recursive Function to Calculate the Primes

- The following Python recursive function to calculate the prime numbers upto a given limit

In [None]:
from math import sqrt
def primes(n):
    if n <= 1:
        return {}
    else:
        lessthansqrt = primes(int(sqrt(n)))
        composites = {j for i in lessthansqrt for j in range(i*2, n+1, i)}
        result = {p for p in range(2, n + 1) if p not in composites}
        return result

In [None]:
for i in range(1,20):
    print(f"{i}\t{primes(i)}")

## 15.7 Dictionary comprehensions

- With dict comprehension or dictionary comprehension, one can easily create dictionaries

```
>>> dict= {k: v for k, v in zip(['a', 'b', 'c'], [1, 2, 3])}
>>> print(dict)
```

A dict comprehension to create dict with numbers as values

In [None]:
dict = {str(i):i for i in [1,2,3,4,5]}
print(dict)

Create a list of fruits

In [None]:
fruits = ['apple', 'mango', 'banana','cherry']

In [None]:
dict = {f:len(f) for f in fruits}
print(dict)

A dict comprehension example using the `enumerate()` function

In [None]:
f_dict = {f:i for i,f in enumerate(fruits)}
print(f_dict)

A dict comprehension to reverse the `key:value` pair in a dictionary

In [None]:
dict = {k:v for v,k in f_dict.items()}
print(dict)

Do you know what will happen if we provide repeated keys?

In [None]:
print({x:x**x for x in [1, 1, 2, 2, 3, 3]})

# Further reading

### `itertools` module

- This module implements a number of iterator building blocks inspired by constructs from APL, Haskell, and SML.

- Each has been recast in a form suitable for Python.

- The module standardizes a core set of fast, memory efficient tools that are useful by themselves or in combination.

- Together, they form an “iterator algebra” making it possible to construct specialized tools succinctly and efficiently in pure Python.

$\Rightarrow$
<https://docs.python.org/3/library/itertools.html>

Creating new iterators:

In [None]:
import itertools
list(itertools.combinations([1, 2, 3, 4, 5], 3))

### Itertools in Python 3, By Example

https://realpython.com/python-itertools/#what-is-itertools-and-why-should-you-use-it

### List Comprehension

Python Tutorial || Learn Python Programming -- Socratica

In [None]:
from IPython.display import YouTubeVideo
YouTubeVideo('AhSvKGTh28Q')

-- Nuno Macedo, João Correia Lopes & Pedro Vasconcelos