# Introduction to Python and Natural Language Technologies

__Lecture 04, List comprehension, iteration, context managers, functional Python__

__October 7, 2020__

__Judit Ács__

# List comprehension

- Transform any iterable into a list in one line.
- Syntactic sugar, could be replaced with a for loop.
- Example: create a list of the first N odd numbers starting from 1

In [1]:
l = []
for i in range(10):
    l.append(2*i+1)
l

One-liner equivalent

In [2]:
l = [2*i+1 for i in range(10)]
l

The general form of list comprehension is

~~~
[<expression> for <element> in <sequence>]
~~~

conditional expressions can be added to filter the sequence:

~~~
[<expression> for <element> in <sequence> if <condition>]
~~~

In [3]:
even = [n*n for n in range(20) if n % 2 == 0]
even

Which is equivalent to

In [4]:
even = []
for n in range(20):
    if n % 2 == 0:
        even.append(n)
even

- Since this expression implements a filtering mechanism, there is no `else` clause.

- An if-else clause can be used as the first expression though:

In [5]:
l = [1, 0, -2, 3, -1, -5, 0]

signum_l = [int(n / abs(n)) if n != 0 else 0 for n in l]
signum_l

In [6]:
n = -3.2
int(n / abs(n)) if n != 0 else 0

More than one sequence may be traversed. Is this depth-first or breadth-first traversal?

In [7]:
l1 = [1, 2, 3]
l2 = [4, 5, 6]

[(i, j) for i in l1 for j in l2]

In [8]:
for i in l1:
    for j in l2:
        print((i, j))

List comprehensions may be nested by replacing the first expression with another list comprehension:

In [9]:
matrix = [
    [1, 2, 3],
    [5, 6, 7]
]

[[e*e for e in row] for row in matrix]

But don't go overboard:

In [10]:
import random

[[[random.randint(1, 5) for k in range(3)] for j in range(2)] for i in range(5)]

What is the type of a (list) comprehension?

In [11]:
gen = (i for i in range(10))
type(gen), gen

# Generator expressions

Generator expressions are a generalization of list comprehension. They were introduced in PEP 289 in 2002.

Check out the memory consumption of these cells.

In [12]:
N = 8
s = sum([i*2 for i in range(int(10**N))])
print(s)

9999999900000000


In [13]:
s = sum(i*2 for i in range(int(10**N)))
print(s)

9999999900000000


Generators do not generate a list in memory:

In [14]:
even_numbers = (2*n for n in range(10))
even_numbers

<generator object <genexpr> at 0x7fb9c81f1970>

Therefore they can only be traversed once:

In [15]:
for num in even_numbers:
    print(num)

0
2
4
6
8
10
12
14
16
18


The generator is empty after the first run:

In [16]:
for num in even_numbers:
    print(num)

Calling `next()` raises a `StopIteration` exception

In [17]:
even_numbers = (2*n for n in range(10))

while True:
    try:
        print(next(even_numbers))
    except StopIteration:
        break

0
2
4
6
8
10
12
14
16
18


In [18]:
# next(even_numbers)  # raises StopIteration

These are actually the defining properties of the **iteration protocol**:

# Iteration protocol

A class satisfies the iteration protocol if:

1. it has a `__iter__` function that returns and iterator, which
1. has a `__next__` function,
2. which raises a `StopIteration` after a certain number of iterations.

For loops use the iteration protocol.

A minimal iterator looks like this:

In [19]:
class MyIterator:
    def __init__(self, iter_no):
        self.iter_no = iter_no
        self._i = iter_no
        
    def __iter__(self):
        return self
    
    def __next__(self):
        if self._i <= 0:
            self._i = self.iter_no
            raise StopIteration()
        self._i -= 1
        print("Returning {}".format(self._i))
        return self._i
    
myiter = MyIterator(3)
print("Iterate once")
for i in myiter:
    print(i)
print("Iterate the second time")
for i in myiter:
    print(i)

Iterate once
Returning 2
2
Returning 1
1
Returning 0
0
Iterate the second time
Returning 2
2
Returning 1
1
Returning 0
0


The built-in functions `min`, `max` and `sum` use the iteration protocol:

In [20]:
class AbsoluteNumberContainer:
    def __init__(self, numbers):
        self.numbers = []
        for n in numbers:
            self.numbers.append(abs(n))
        self._i = 0
            
    def __iter__(self):
        # Could be implemented with this line without __next__:
        # return iter(self.numbers)
        return self
    
    def __next__(self):
        if self._i >= len(self.numbers):
            # Reset the loop variable for the next iteration.
            self._i = 0
            raise StopIteration()
        self._i += 1
        return self.numbers[self._i - 1]
    
    
a = AbsoluteNumberContainer([-2, 1, -100])
for n in a:
    print(n)
    
print(f"{max(a) = }\n{min(a) = }\n{sum(a) = }")

2
1
100
max(a) = 100
min(a) = 1
sum(a) = 103


# Set and dict comprehension

Sets and dictionaries can be instantiated via generator expressions too.

A generator expression between curly brackets instantiates a set:

In [21]:
fruit_list = ["apple", "plum", "apple", "pear"]

fruits = {fruit.title() for fruit in fruit_list}

type(fruits), len(fruits), fruits

(set, 3, {'Apple', 'Pear', 'Plum'})

If the expression in the generator is a key-value pair separated by a colon, it instantiates a dictionary:

In [22]:
word_list = ["apple", "plum", "pear", "apple", "apple"]
word_length = {word: len(word) for word in word_list}
type(word_length), len(word_length), word_length

(dict, 3, {'apple': 5, 'plum': 4, 'pear': 4})

In [23]:
word_list = ["apple", "plum", "pear", "avocado"]
first_letters = {word[0]: word for word in word_list}
first_letters

{'a': 'avocado', 'p': 'pear'}

In [24]:
for letter, fruit in first_letters.items():
    print(letter, fruit)

a avocado
p pear


# `yield` keyword

- If a function uses `yield` instead of return, it becomes a **generator function**.
- `yield` temporarily gives back the execution to the caller.
- The generator function continues where it left off after `next` returns:

In [25]:
def hungarian_vowels():
    alphabet = ("a", "á", "e", "é", "i", "í", "o", "ó",
                "ö", "ő", "u", "ú", "ü", "ű")
    for vowel in alphabet:
        yield vowel

this function returns a generator object

In [26]:
type(hungarian_vowels())

generator

In [27]:
for vowel in hungarian_vowels():
    print(vowel)

a
á
e
é
i
í
o
ó
ö
ő
u
ú
ü
ű


In [28]:
def dummy_generator():
    yield "one"
    yield "two"
    yield "three"
    
for e in dummy_generator():
    print(e)

one
two
three


They can only iterated once:

In [29]:
gen = hungarian_vowels()

print("first iteration: {}".format(", ".join(gen)))
print("second iteration: {}".format(", ".join(gen)))

first iteration: a, á, e, é, i, í, o, ó, ö, ő, u, ú, ü, ű
second iteration: 


The `next` function returns the next element of the generator.
A `StopIteration` is raised when no more elements are left:

In [30]:
gen = hungarian_vowels()

while True:
    try:
        print("The next element is {}".format(next(gen)))
    except StopIteration:
        print("No more elements left :(")
        break

The next element is a
The next element is á
The next element is e
The next element is é
The next element is i
The next element is í
The next element is o
The next element is ó
The next element is ö
The next element is ő
The next element is u
The next element is ú
The next element is ü
The next element is ű
No more elements left :(


The generator function returns a new generator object every time it's called:

In [31]:
gen1 = hungarian_vowels()
gen2 = hungarian_vowels()

print(gen1 is gen2)
print("gen1 first time:", list(gen1))
print("gen1 second time:", list(gen1))
print("gen2 first time:", list(gen2))

False
gen1 first time: ['a', 'á', 'e', 'é', 'i', 'í', 'o', 'ó', 'ö', 'ő', 'u', 'ú', 'ü', 'ű']
gen1 second time: []
gen2 first time: ['a', 'á', 'e', 'é', 'i', 'í', 'o', 'ó', 'ö', 'ő', 'u', 'ú', 'ü', 'ű']


Iterators can only be traversed forward, but we can easily wrap an iterator to have memory:

In [32]:
def iter_with_memory(orig_iter):
    prev = None
    for current in orig_iter:
        yield current, prev
        prev = current

In [33]:
for i in iter_with_memory(hungarian_vowels()):
    print(i)

('a', None)
('á', 'a')
('e', 'á')
('é', 'e')
('i', 'é')
('í', 'i')
('o', 'í')
('ó', 'o')
('ö', 'ó')
('ő', 'ö')
('u', 'ő')
('ú', 'u')
('ü', 'ú')
('ű', 'ü')


## Applications

Generator expressions can be particularly useful for formatted output. We will demonstrate this through a few examples.

In [34]:
numbers = [1, -2, 3, 1]

# print(", ".join(numbers))  # raises TypeError
print(", ".join(str(number) for number in numbers))

1, -2, 3, 1


In [35]:
shopping_list = ["apple", "plum", "pear"]

~~~
The shopping list is:
item 1: apple
item 2: plum
item 3: pear
~~~

In [36]:
shopping_list = ["apple", "plum", "pear"]

print("The shopping list is:\n{}".format(
    "\n".join(
        f"item {idx+1}: {element}"
        for idx, element in enumerate(shopping_list))
))

The shopping list is:
item 1: apple
item 2: plum
item 3: pear


__Print the following shopping list with quantities.__

For example:

~~~
item 1: apple, quantity: 2
item 2: pear, quantity: 1
~~~

In [37]:
shopping_list = {
    "apple": 2,
    "pear": 1,
    "plum": 5,
}
# TODO
print(
    "\n".join(
        f"item {idx+1}: {element}, quantity: {quantity}"
        for idx, (element, quantity) in enumerate(shopping_list.items()))
)

item 1: apple, quantity: 2
item 2: pear, quantity: 1
item 3: plum, quantity: 5


__Print the same format in alphabetical order.__

Decreasing order by quantity:

In [38]:
shopping_list = {
    "apple": 2,
    "pear": 1,
    "plum": 5,
}
# TODO
print(
    "\n".join(
        f"item {idx+1}: {element}, quantity: {quantity}"
        for idx, (element, quantity) in enumerate(
            sorted(shopping_list.items(), key=lambda x: -x[1])))
            # sorted(shopping_list.items(), key=itemgetter...
)

item 1: plum, quantity: 5
item 2: apple, quantity: 2
item 3: pear, quantity: 1


# Context managers

There are two types of resources: managed and unmanaged.

__Managed resources__

- Resource acquisition and release are automatically done.
- No need for manual resource management.
- Example: memory

__Unmanaged resources__

- Unmanaged resources need explicit release.
- Otherwise the operating system may run out of the resource.
- Examples include files, network sockets.

- C++ has both managed and unmanaged memory management. The stack is managed, but the heap is not, we need to manually call `new` and `delete`.

In [39]:
fh = []
while True:
    try:
        fh.append(open("abc.txt", "w"))
    except OSError:
        break
len(fh)

4051

We can't open more files:

In [40]:
fh2 = []
while True:
    try:
        fh2.append(open("abc.txt", "w"))
    except OSError:
        break
len(fh), len(fh2)

(4051, 0)

The history saving thread hit an unexpected error (OperationalError('unable to open database file')).History will not be written to the database.

Exception in thread Exception in threading.excepthook:Exception ignored in thread started byException ignored in sys.unraisablehook

In [41]:
for f in fh:
    f.close()

- We need to manually close the file.
- What happens when an exception occurs?

In [42]:
s1 = "important text"
fh = open("file.txt", "w")
# fh.write(s2)  # raises NameError
fh.close()

- The file is never closed, the file descriptor **is leaked**.
- A solution would be to use try-except blocks with `finally` clauses.

In [43]:
from sys import stderr

fh = open("file.txt", "w")
try:
    fh.write(important_variable)
except Exception as e:
    stderr.write("{0} happened".format(type(e).__name__))
finally:
    print("Closing file")
    fh.close()

Closing file


NameError happened

__Context managers handle this automatically__

- The `with` keyword opens a resource,
- keeps it open until the execution leaves with's scope,
- releases the resource regardless whether an exception is raised or not.

In [44]:
with open("file.txt", "w") as fh:
    fh.write("abc\n")
    # fh.write(important_variable)  # raises NameError

`file.txt` is no longer open:

In [45]:
# fh.write("ab") # raises ValueError: I/O operation on closed file.

## Defining context managers

Any class can be a context manager if it implements:
  1. `__enter__`: runs at the beginning of the `with`. Returns the resource.
  1. `__exit__`: runs after the with block. Releases the resource.

In [46]:
class DummyContextManager:
    def __init__(self, value):
        self.value = value
        
    def __enter__(self):
        print("Dummy resource acquired")
        return self.value
    
    def __exit__(self, *args):
        print("Dummy resource released")
        
with DummyContextManager(42) as d:
    print("Resource: {}".format(d))

Dummy resource acquired
Resource: 42
Dummy resource released


`__exit__` takes 3 extra arguments that describe the exception: `exc_type`, `exc_value`, `traceback`

In [47]:
class DummyContextManager:
    def __init__(self, value):
        self.value = value
        
    def __enter__(self):
        print("Dummy resource acquired")
        return self.value
    
    def __exit__(self, exc_type, exc_value, traceback):
        if exc_type is not None:
            print("{0} with value {1} caught\nTraceback: {2}".format(exc_type, exc_value, traceback))
        print("Dummy resource released")
        
with DummyContextManager(42) as d:
    print(d)
    # raise ValueError("just because I can")  # __exit__ will be called anyway

Dummy resource acquired
42
Dummy resource released


# Functional Python: map, filter and reduce

Python has a few built-in functions that originate from functional programming.

## Map

`map` applies a _callable_ on each element of a sequence.

This can be a function:

In [48]:
def double(e):
    return e * 2

l = [2, 3, "abc"]

list(map(double, l))

[4, 6, 'abcabc']

In [49]:
map(double, l)

<map at 0x7fb9ba529040>

A `lambda` expression:

In [50]:
list(map(lambda x: x * 2, [2, 3, "abc"]))

[4, 6, 'abcabc']

Or a class:

In [51]:
class Doubler:
    def __call__(self, v):
        return v * 2

doubler_instance = Doubler()

list(map(doubler_instance, l))

[4, 6, 'abcabc']

In [52]:
class Multiplier:
    def __init__(self, k):
        self.k = k
        
    def __call__(self, v):
        return v * k
    
doubler = Multiplier(2)
tripler = Multiplier(3)

It's evaluated in a lazy fashion. The return type is an iterable:

In [53]:
map(double, l)

<map at 0x7fb9ba4fe1f0>

In [54]:
class Doubler:
    def __call__(self, v):
        print(f"Doubling {v}")
        return v * 2

doubler_instance = Doubler()

mapped_l = map(doubler_instance, l)
mapped_l

<map at 0x7fb9c81b2a60>

The actual doubling is only done when its result is needed:

In [55]:
list(mapped_l)

Doubling 2
Doubling 3
Doubling abc


[4, 6, 'abcabc']

The iterator is _empty_ now:

In [56]:
list(mapped_l)

[]

## Filter

Filter creates a list of elements for which a function returns true.

In [57]:
def is_even(n):
    return n % 2 == 0

l = [2, 3, -1, 0, 2]

list(filter(is_even, l))

[2, 0, 2]

In [58]:
list(filter(lambda x: x % 2 == 0, range(8)))

[0, 2, 4, 6]

In [59]:
r = range(12)
r, list(r)

(range(0, 12), [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])

In [60]:
list(r)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

Most comprehensions can be rewritten using `map` and `filter`:

In [61]:
l = [2, 3, -1, 0, 2]

[x for x in l if x % 2 == 0]

[2, 0, 2]

Signum example:

In [62]:
l = [2, 3, 0, -1, 2, 0, 1]

signum = [x / abs(x) if x != 0 else x for x in l]
print(signum)

[1.0, 1.0, 0, -1.0, 1.0, 0, 1.0]


In [63]:
list(map(lambda x: x / abs(x) if x != 0 else 0, l))

[1.0, 1.0, 0, -1.0, 1.0, 0, 1.0]

## Zip

`zip` pairs elements of two iterables:

In [64]:
l1 = ["apple", "plum", "pear"]
l2 = [10, 2, 3]

for elements in zip(l1, l2):
    print(elements)

('apple', 10)
('plum', 2)
('pear', 3)


They can have different length:

In [65]:
l1 = ["apple", "plum", "pear"]
l2 = [10, 2, 3, -1, -2]

for fruit, quantity in zip(l1, l2):
    print(fruit, quantity)

apple 10
plum 2
pear 3


More generally `zip` transposes a list of iterables:

In [66]:
row1 = [1, 2, 3, 4]
row2 = [1, 2, 3, 4]
row3 = [-1, -2, -3, -4]

for column in zip(row1, row2, row3):
    print(column)

(1, 1, -1)
(2, 2, -2)
(3, 3, -3)
(4, 4, -4)


We can implement matrix transpose with `zip`:

In [67]:
def transpose(mtx):
    return list(map(list, zip(*mtx)))
    # OR
    # return [list(col) for col in zip(*mtx)]


mtx = [[1, 2, 3], [4, 5, 6]]

transpose(mtx)

[[1, 4], [2, 5], [3, 6]]

In [68]:
row1 = [1, 2, 3, 4]
row2 = [1, 2, 3, 4]
row3 = [-1, -2, -3, -4]
z = zip(row1, row2, row2)
z, type(z)

(<zip at 0x7fb9ba534e00>, zip)

In [69]:
list(z)

[(1, 1, 1), (2, 2, 2), (3, 3, 3), (4, 4, 4)]

In [70]:
# next(z)  # raises StopIteration

## Reduce

- Reduce applies a rolling computation on a sequence.
- The first argument of `reduce` is two-argument function.
- The second argument is the sequence.
- The result is accumulated in an accumulator.

In [71]:
from functools import reduce

l = [1, 2, -1, 4]
reduce(lambda x, y: x*y, l)

-8

An initial value for the accumulator may be supplied:

In [72]:
reduce(lambda x, y: x*y, l, 10)

-80

Finding the maximum with reduce:

In [73]:
reduce(lambda x, y: max(x, y), l)

4

Same with the built-in function:

In [74]:
reduce(max, l)

4

Summing even numbers only:

In [75]:
l = [1, 2, -1, 4]
reduce(lambda x, y: x + int(y % 2 == 0) * y, l, 0)

6

Booleans can be summed:

In [76]:
sum(e % 2 == 0 for e in l)

2

For historical reasons, they are actually integers:

In [77]:
int(True), int(False), isinstance(False, int), isinstance(True, int)

(1, 0, True, True)

# Recommended reading

- [Itertools](https://docs.python.org/3.8/library/itertools.html) is a collection of iteration related building blocks.
- [Functools](https://docs.python.org/3.8/library/functools.html) is a module for higher order functions.