# Introduction to Python and Natural Language Technologies

# Lecture 04, Week 04

### February 28, 2018

# List comprehension

- transform any iterable into a list in one line
- syntactic sugar
- example: create a list of the first N odd numbers starting from 1

In [1]:
l = []
for i in range(10):
    l.append(2*i+1)
l

[1, 3, 5, 7, 9, 11, 13, 15, 17, 19]

one-liner equivalent

In [2]:
l = [2*i+1 for i in range(10)]
l

[1, 3, 5, 7, 9, 11, 13, 15, 17, 19]

## The general form of list comprehension is

~~~
[<expression> for <element> in <sequence>]
~~~

conditional expressions can be added to filter the sequence:

~~~
[<expression> for <element> in <sequence> if <condition>]
~~~

In [3]:
even = [n*n for n in range(20) if n % 2 == 0]
even

[0, 4, 16, 36, 64, 100, 144, 196, 256, 324]

which is equivalent to

In [4]:
even = []
for n in range(20):
    if n % 2 == 0:
        even.append(n)
even

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

- since this expression implements a filtering mechanism, there is no `else` clause

- an if-else clause can be used as the first expression though:

In [5]:
l = [1, 0, -2, 3, -1, -5, 0]

signum_l = [int(n / abs(n)) if n != 0 else 0 for n in l]
signum_l

[1, 0, -1, 1, -1, -1, 0]

In [6]:
n = -3.2
int(n / abs(n)) if n != 0 else 0

-1

More than one sequence may be traversed. Is this depth-first or breadth-first traversal?

In [7]:
l1 = [1, 2, 3]
l2 = [4, 5, 6]

[(i, j) for i in l1 for j in l2]

[(1, 4), (1, 5), (1, 6), (2, 4), (2, 5), (2, 6), (3, 4), (3, 5), (3, 6)]

In [8]:
[(i, j) for j in l2 for i in l1]

[(1, 4), (2, 4), (3, 4), (1, 5), (2, 5), (3, 5), (1, 6), (2, 6), (3, 6)]

List comprehensions may be nested by replacing the first expression with another list comprehension:

In [9]:
matrix = [
    [1, 2, 3],
    [5, 6, 7]
]

[[e*e for e in row] for row in matrix]

[[1, 4, 9], [25, 36, 49]]

## What is the type of a (list) comprehension?

In [10]:
i = (i for i in range(10))
type(i)

generator

# Generator expressions

Generator expressions are a generalization of list comprehension. They were introduced in PEP 289 in 2002.

Check out the memory consumption of these cells.

In [11]:
12

12

In [12]:
N = 8
s = sum([i*2 for i in range(int(10**N))])
print(s)

9999999900000000


In [13]:
s = sum(i*2 for i in range(int(10**N)))
print(s)

9999999900000000


Generators do not generate a list in memory

In [14]:
even_numbers = (2*n for n in range(10))
even_numbers

<generator object <genexpr> at 0x7fd288702468>

therefore they can only be traversed once

In [15]:
for num in even_numbers:
    print(num)

0
2
4
6
8
10
12
14
16
18


the generator is empty after the first run

In [16]:
for num in even_numbers:
    print(num)

calling `next()` raises a `StopIteration` exception

In [17]:
even_numbers = (2*n for n in range(10))

while True:
    try:
        print(next(even_numbers))
    except StopIteration:
        break

0
2
4
6
8
10
12
14
16
18


In [18]:
# next(even_numbers)  # raises StopIteration

these are actually the defining properties of the **iteration protocol**

# Iteration protocol

A class satisfies the iteration protocol if:

1. it has a `__iter__` function that returns and iterator, which
1. has a `__next__` function (this function is called `next` in Python 2),
2. raises a `StopIteration` after a certain number of iterations

For loops use the iteration protocol.

In [19]:
class MyIterator:
    def __init__(self):
        self.iter_no = 5
        
    def __iter__(self):
        return self
    
    def __next__(self):
        if self.iter_no <= 0:
            raise StopIteration()
        self.iter_no -= 1
        print("Returning {}".format(self.iter_no))
        return self.iter_no
    
myiter = MyIterator()

for i in myiter:
    print(i)

Returning 4
4
Returning 3
3
Returning 2
2
Returning 1
1
Returning 0
0


# Set and dict comprehension

Sets and dictionaries can be instantiated via generator expressions too.

A generator expression between curly brackets instantiates a set:

In [20]:
fruit_list = ["apple", "plum", "apple", "pear"]

fruits = {fruit.title() for fruit in fruit_list}

type(fruits), len(fruits), fruits

(set, 3, {'Apple', 'Pear', 'Plum'})

if the expression in the generator is a key-value pair separated by a colon, it instantiates a dictionary:

In [21]:
word_list = ["apple", "plum", "pear", "apple", "apple"]
word_length = {word: len(word) for word in word_list}
type(word_length), len(word_length), word_length

(dict, 3, {'apple': 5, 'pear': 4, 'plum': 4})

In [22]:
word_list = ["apple", "plum", "pear", "avocado"]
first_letters = {word[0]: word for word in word_list}
first_letters

{'a': 'avocado', 'p': 'pear'}

# `yield` keyword

- if a function uses `yield` instead of return, it becomes a **generator function**
- `yield` temporarily gives back the execution to the caller
- the generator function continues

In [23]:
def hungarian_vowels():
    alphabet = ("a", "á", "e", "é", "i", "í", "o", "ó",
                "ö", "ő", "u", "ú", "ü", "ű")
    for vowel in alphabet:
        yield vowel

this function returns a generator object

In [24]:
type(hungarian_vowels())

generator

In [25]:
for vowel in hungarian_vowels():
    print(vowel)

a
á
e
é
i
í
o
ó
ö
ő
u
ú
ü
ű


In [26]:
gen = hungarian_vowels()

print("first iteration: {}".format(", ".join(gen)))
print("second iteration: {}".format(", ".join(gen)))

first iteration: a, á, e, é, i, í, o, ó, ö, ő, u, ú, ü, ű
second iteration: 


The `next` function returns the next element of the generator.
A `StopIteration` is raised when no more elements are left:

In [27]:
gen = hungarian_vowels()

while True:
    try:
        print("The next element is {}".format(next(gen)))
    except StopIteration:
        print("No more elements left :(")
        break

The next element is a
The next element is á
The next element is e
The next element is é
The next element is i
The next element is í
The next element is o
The next element is ó
The next element is ö
The next element is ő
The next element is u
The next element is ú
The next element is ü
The next element is ű
No more elements left :(


the generator function returns a new generator object every time it's called

In [28]:
gen1 = hungarian_vowels()
gen2 = hungarian_vowels()

print(gen1 is gen2)
print("gen1 first time:", list(gen1))
print("gen1 second time:", list(gen1))
print("gen2 first time:", list(gen2))

False
gen1 first time: ['a', 'á', 'e', 'é', 'i', 'í', 'o', 'ó', 'ö', 'ő', 'u', 'ú', 'ü', 'ű']
gen1 second time: []
gen2 first time: ['a', 'á', 'e', 'é', 'i', 'í', 'o', 'ó', 'ö', 'ő', 'u', 'ú', 'ü', 'ű']


iterators can only be traversed forward, but we can easily wrap an iterator to have memory:

In [29]:
def iter_with_memory(orig_iter):
    prev = None
    for current in orig_iter:
        yield current, prev
        prev = current

In [30]:
for i in iter_with_memory(hungarian_vowels()):
    print(i)

('a', None)
('á', 'a')
('e', 'á')
('é', 'e')
('i', 'é')
('í', 'i')
('o', 'í')
('ó', 'o')
('ö', 'ó')
('ő', 'ö')
('u', 'ő')
('ú', 'u')
('ü', 'ú')
('ű', 'ü')


## Q. Add a `memory_size` parameter to the previous function which specifies how many of the previous elements are stored.

You can yield them in a list or better, wrap it in a class.

# Exercises

Generator expressions can be particularly useful for formatted output. We will demonstrate this through a few examples.

In [31]:
numbers = [1, -2, 3, 1]

# print(", ".join(numbers))  # raises TypeError
print(", ".join(str(number) for number in numbers))

1, -2, 3, 1


In [32]:
shopping_list = ["apple", "plum", "pear"]

~~~
The shopping list is:
item 1: apple
item 2: plum
item 3: pear
~~~

In [33]:
shopping_list = ["apple", "plum", "pear"]

print("The shopping list is:\n{}".format(
    "\n".join("item {0}: {1}".format(idx+1, element) for idx, element in enumerate(shopping_list))
))

The shopping list is:
item 1: apple
item 2: plum
item 3: pear


## Q. Print the following shopping list with quantities.

For example:

~~~
item 1: apple, quantity: 2
item 2: pear, quantity: 1
~~~

In [34]:
shopping_list = {
    "apple": 2,
    "pear": 1,
    "plum": 5,
}
print("\n".join(
    "item {0}: {1}, quantity: {2}".format( idx+1, item, quantity)
    for idx, (item, quantity) in enumerate(shopping_list.items())
))

item 1: apple, quantity: 2
item 2: pear, quantity: 1
item 3: plum, quantity: 5


## Q. Print the same format in alphabetical order.

- Decreasing order by quantity

In [35]:
shopping_list = {
    "apple": 2,
    "pear": 1,
    "plum": 5,
}
print("\n".join("item {0}: {1}, quantity: {2}".format(idx+1, item, quantity)
                for idx, (item, quantity) in sorted(enumerate(shopping_list.items()))
))

item 1: apple, quantity: 2
item 2: pear, quantity: 1
item 3: plum, quantity: 5


In [36]:
print("\n".join(
    "item {0}: {1}, quantity: {2}".format(idx+1, item, quantity) for idx, (item, quantity) in
    enumerate(sorted(shopping_list.items(), key=lambda x: -x[1]))))

item 1: plum, quantity: 5
item 2: apple, quantity: 2
item 3: pear, quantity: 1


## Q. Print the list of students. 

In [37]:
students = [
    ["Joe", "John", "Mary"],
    ["Tina", "Tony", "Jeff", "Béla"],
    ["Pete", "Dave"],
]

## Q. Print one class-per-line and print the size of the class too

Example:
~~~
class 1, size: 3, students: Joe, John, Mary
class 2, size: 2, students: Pete, Dave
~~~

## Q. Sort the classes by size in increasing order

Example:
~~~
class 1, size: 2, students: Pete, Dave
class 2, size: 3, students: Joe, John, Mary
~~~

# Exception handling

- fully typed exception handling

In [38]:
try:
    int("abc")
except ValueError as e:
    print(type(e), e)
    print(e)

<class 'ValueError'> invalid literal for int() with base 10: 'abc'
invalid literal for int() with base 10: 'abc'


- more than one except clauses may be defined
- ordered from more specific to least specific

In [39]:
try:
    age = int(input())
    if age < 0:
        raise Exception("Age cannot be negative")
except ValueError as e:
    print("ValueError caught")
except Exception as e:
    print("Other exception caught: {}".format(type(e)))

12


### More than one type of exception can be handled in the same except clause

In [40]:
def age_printer(age):
    next_age = age + 1
    print("Next year your age will be " + next_age)
    
try:
    your_age = input()
    your_age = int(your_age)
    age_printer(your_age)
except ValueError:
    print("ValueError caught")
except TypeError:
    print("TypeError caught")

alma
ValueError caught


In [41]:
def age_printer(age):
    next_age = age + 1
    print("Next year your age will be " + next_age)
    
try:
    your_age = input()
    your_age = int(your_age)
    age_printer(your_age)
except (ValueError, TypeError) as e:
    print("{} caught".format(type(e).__name__))

12
TypeError caught


### except without an Exception type

- without specifying a type, `except` catches everything but all information about the exception is lost

In [42]:
try:
    age = int(input())
    if age < 0:
        raise Exception("Age cannot be negative")
except ValueError:
    print("ValueError caught")
except:
#except Exception as e:
    print("Something else caught")

abc
ValueError caught


- the empty `except` must be the last except block since it blocks all others
- `SyntaxError` otherwise

In [43]:
try:
    age = int(input())
    if age < 0:
        raise Exception("Age cannot be negative")
#except:
    #print("Something else caught")
except ValueError:
    print("ValueError caught")

23


### Base class' except clauses catch derived classes too

In [44]:
try:
    age = int(input())
    if age < 0:
        raise Exception("Age cannot be negative")
except Exception as e:
    print("Exception caught: {}".format(type(e)))
except ValueError:
    print("ValueError caught")

abc
Exception caught: <class 'ValueError'>


### finally

- the `finally` block is guaranteed to run regardless an exception was raised or not

In [45]:
try:
    age = int(input())
except Exception as e:
    print(type(e), e)
finally:
    print("this always runs")

abc
<class 'ValueError'> invalid literal for int() with base 10: 'abc'
this always runs


### else

- try-except blocks may have an else clause that **only** runs if no exception was raised

In [46]:
try:
    age = int(input())
except ValueError as e:
    print("Exception", e)
else:
    print("No exception was raised")
    # raise Exception("Raising an exception in else")
finally:
    print("this always runs")

12
No exception was raised
this always runs


### `raise` keyword

- `raise` throws/raises an exception
- an empty `raise` in an `except`

In [47]:
try:
    int("not a number")
except Exception:
    # important log message
    # raise
    pass

### Defining exceptions

- any type that subclasses `Exception` (`BaseException` to be exact) can be used as an exception object

In [48]:
class NegativeAgeError(Exception):
    pass

try:
    age = int(input())
    if age < 0:
        raise NegativeAgeError("Age cannot be negative. Invalid age: {}".format(age))
except NegativeAgeError as e:
    print(e)
except Exception as e:
    print("Something else happened. Caught {}, with message {}".format(type(e), e))

-11
Age cannot be negative. Invalid age: -11


Using exception for trial-and-error is considered Pythonic:

In [49]:
try:
    v = input()
    int(v)
except ValueError:
    print("not an int")
else:
    print("looks like an int")

12
looks like an int


# Context managers

- there are two types of resources: managed and unmanaged

## Managed resources

- resource acquisition and release are automatically done
- no need for manual resource management
- example: memory
  - C++ has both managed and unmanaged memory management. The stack is managed, but the heap is not, we need to manually call `new` and `delete`.

## Unmanaged resources

- unmanaged resources need explicit release
- otherwise the operating system may run out of the resource
- examples include files, network sockets

In [50]:
fh = []
while True:
    try:
        fh.append(open("abc.txt", "w"))
    except OSError:
        break
len(fh)

969

In [51]:
for f in fh:
    f.close()

- we need to manually close the file
- what happens when an exception occurs

In [52]:
s1 = "important text"
fh = open("file.txt", "w")
# fh.write(s2)  # raises NameError
fh.close()

- the file is never closed, the file descriptor **is leaked**
- a solution would be to use try-except blocks with `finally` clauses

In [53]:
from sys import stderr

fh = open("file.txt", "w")
try:
    fh.write(important_variable)
except Exception as e:
    stderr.write("{0} happened".format(type(e).__name__))
finally:
    print("Closing file")
    fh.close()

Closing file


NameError happened

## Context managers handle this automatically

- the `with` keyword opens a resource
- keeps it open until the execution leaves with's scope
- releases the resource regardless whether an exception is raised or not

In [54]:
with open("file.txt", "w") as fh:
    fh.write("abc\n")
    # fh.write(important_variable)  # raises NameError

## Defining context managers

- any class can be a context manager if it implements:
  1. `__enter__`: runs at the beginning of the `with`. Returns the resource.
  1. `__exit__`: runs after the with block. Releases the resource.

In [55]:
class DummyContextManager:
    def __init__(self, value):
        self.value = value
        
    def __enter__(self):
        print("Dummy resource acquired")
        return self.value
    
    def __exit__(self, *args):
        print("Dummy resource released")
        
with DummyContextManager(42) as d:
    print("Resource: {}".format(d))

Dummy resource acquired
Resource: 42
Dummy resource released


`__exit__` takes 3 extra arguments that describe the exception: `exc_type`, `exc_value`, `traceback`

In [56]:
class DummyContextManager:
    def __init__(self, value):
        self.value = value
        
    def __enter__(self):
        print("Dummy resource acquired")
        return self.value
    
    def __exit__(self, exc_type, exc_value, traceback):
        if exc_type is not None:
            print("{0} with value {1} caught\n"
                  "Traceback: {2}".format(
                      exc_type, exc_value, traceback))
        print("Dummy resource released")
        
with DummyContextManager(42) as d:
    print(d)
    # raise ValueError("just because I can")  # __exit__ will be called anyway

Dummy resource acquired
42
Dummy resource released
