# Advanced topics in Python programming

This notebook explores (slighlty) more advanced elements of programming in Python. Among other we will look into
- functional programming 
- iterators
- documenting code 
- error handling

One aspect not addressed here is writting code following the PEP8 style guide, e.g. indentations, class/function names. There are tools to format your code that way (often given as extension to IDEs), for example [flake8](https://flake8.pycqa.org/en/latest/).

## Bits of functional programming

We have seen how to define a function, i.e. give it a name

```python
def identity(x): return x
```

However, sometimes we might have use for nameless function - enter **Anonymous functions**. For example in sorting

In [1]:
import random

# Let's generate random pairs with the idea of sorting them later
random_tuples = []
for i in range(10):
    random_tuples.append((random.random(), random.random()))
random_tuples

[(0.7518127197609292, 0.6174607083975878),
 (0.4115981594696888, 0.07303216359869236),
 (0.9390360730649631, 0.24915122361369646),
 (0.3319462980217811, 0.6122175533553326),
 (0.7548921538601876, 0.04918301480590781),
 (0.6803113087429741, 0.3338648303063745),
 (0.4320524001901337, 0.8606524038469604),
 (0.04109997598109416, 0.5309751379139265),
 (0.7370157321504252, 0.8598989017240944),
 (0.19781178243241992, 0.5006407091667007)]

In [2]:
# By default tuples are sorted by considering first the first elements, then comparing the rest, i.e.
sorted(random_tuples)

[(0.04109997598109416, 0.5309751379139265),
 (0.19781178243241992, 0.5006407091667007),
 (0.3319462980217811, 0.6122175533553326),
 (0.4115981594696888, 0.07303216359869236),
 (0.4320524001901337, 0.8606524038469604),
 (0.6803113087429741, 0.3338648303063745),
 (0.7370157321504252, 0.8598989017240944),
 (0.7518127197609292, 0.6174607083975878),
 (0.7548921538601876, 0.04918301480590781),
 (0.9390360730649631, 0.24915122361369646)]

In [3]:
# Treating them as points we might want to consider their l^2 norm
sorted(random_tuples, key=lambda t: (t[0]**2 + t[1]**2)**0.5)

[(0.4115981594696888, 0.07303216359869236),
 (0.04109997598109416, 0.5309751379139265),
 (0.19781178243241992, 0.5006407091667007),
 (0.3319462980217811, 0.6122175533553326),
 (0.7548921538601876, 0.04918301480590781),
 (0.6803113087429741, 0.3338648303063745),
 (0.4320524001901337, 0.8606524038469604),
 (0.9390360730649631, 0.24915122361369646),
 (0.7518127197609292, 0.6174607083975878),
 (0.7370157321504252, 0.8598989017240944)]

Here `lambda` is a key word used for defining anonymous functions. It is followed by arguments. Above the function accepts one argument (referred to as t). The function body follows after `:`. Side note, $\lambda$-calculus and its inventor [Alonzo Church](https://en.wikipedia.org/wiki/Alonzo_Church)

__In capturing variables beware of late binding__

In [4]:
# The idea is that foos[1](x) returns x+1 
foos = [lambda x: x+n for n in range(5)]
# But ...
for f in foos:
    print(f(0))

4
4
4
4
4


Definition is evaluated at runtime (then n = 4) and not at definition time

In [5]:
# Solution [referred to as currying]
foos = [lambda x, n=n: x+n for n in range(5)]
for f in foos:
    print(f(0))

0
1
2
3
4


Anonymous functions are often used to build **iterators**. Here the idea is that we want to compute on demand and not all the answers at once.

In [6]:
selected = filter(lambda p: p[0] < 0.5, random_tuples)
# Not the answers but ...
selected

<filter at 0x7f48dc718be0>

In [7]:
# Iterator needs to be forced 
next(selected)

(0.4115981594696888, 0.07303216359869236)

In [8]:
# or consumed
for item in selected:
    print(item)

(0.3319462980217811, 0.6122175533553326)
(0.4320524001901337, 0.8606524038469604)
(0.04109997598109416, 0.5309751379139265)
(0.19781178243241992, 0.5006407091667007)


In [9]:
# Note that we have now exhausted the iterator so that the following attempt to get the next item fails
next(selected)

StopIteration: 

Iterators can be combined to build processing pipelines

In [10]:
 # Keep only the elements in iterable for which the function is true
selected = filter(lambda p: p[0] < 0.5, random_tuples) 
# Apply sum function to all the elements in iterable
processed = map(sum, selected)  
processed

<map at 0x7f48dc219130>

What is the sum of such elements ? One option is 
```python
sum(list(processed))
```
Also ```sum(processed)``` would work but we want to showcase a nice module from the standard library, namely, `functools`.

In [11]:
# Option 1) to comsume and turn into a list
from functools import reduce
# combine first two items of iterable to make the input 
# for next round while the other argument is the next item in iterable
reduce(lambda x, y: x+y, processed)

3.9920265839767297

**Food for thoought:**
1. Could we use `reduce(sum, processed)` above ?
2. What does `functools.partial` do?

Many useful iterators can be constucted using standard library module ``itertools``. Let's do cartesian coordinates

In [12]:
from itertools import product

x = range(1, 5)
y = range(4, 12)
grid = product(x, y)
# Get them all
print(list(grid))

[(1, 4), (1, 5), (1, 6), (1, 7), (1, 8), (1, 9), (1, 10), (1, 11), (2, 4), (2, 5), (2, 6), (2, 7), (2, 8), (2, 9), (2, 10), (2, 11), (3, 4), (3, 5), (3, 6), (3, 7), (3, 8), (3, 9), (3, 10), (3, 11), (4, 4), (4, 5), (4, 6), (4, 7), (4, 8), (4, 9), (4, 10), (4, 11)]


Another example of ondemand/lazy computations are **generators**

In [13]:
def fibs():
    '''Generate Fibonacci numbers'''
    a, b = 0, 1
    while True:
        a, b = a+b, a
        yield a   # Yield keywors makes this function a generator

numbers = fibs()

In [17]:
# Let get first ten
for i, num in zip(range(5), numbers):
    print(i, num)
# NOTE: zip - pairs iterables into tuples, terminating when one of them is exhaused [range(5) determines this above]
# We can run this many times.

0 987
1 1597
2 2584
3 4181
4 6765


As a final generator example consider the following definition. Can you guess what is the result?

In [None]:
def count(n):
    yield n
    yield from count(n+1)  
# numbers = count(-10)
# for i in range(10):
#     print(next(numbers))

When we care about all results of pipeline it might be better/more explicit/readbable to use **comprehensions**. Here we consider list and dictionary comprehensions

In [20]:
with open('./data/file.txt', 'r') as stream:
    # NOTE: with invokes a context manager. We want to manage resources;
    # here open a file and then make sure that it is correctly closed no matter what
    # will happen during manipulation, e.g. some error 
    lines = [float(line.strip()) for i, line in enumerate(stream) if i % 2]
    
    # A Dictionary comprehension, create dict mapping row number to value
    d = {i: float(line.strip()) for i, line in enumerate(stream) if i % 2}

    
    
# To be compared with 
with open('./data/file.txt', 'r') as stream:
    iterator = map(lambda p: float(p[1].strip()), filter(lambda p: p[0] % 2, enumerate(stream)))
    lines_ = list(iterator)
# Check that they are the same. We will come back to the `assert` statement shortly
assert lines == lines_
(d, bool(lines))

({}, True)

**Food for thought:** 
1. Why is the dictionary empty while we clearly have lines as a non-empty list?
2. What is the performance of building list by for-loop versus list comprehensions? [Consider `%%timeit` magic]

In [21]:
%%timeit

def f(string):
    return sum(map(ord, string))

f('IN3110')

318 ns ± 3.91 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)


## Writting cleaner functions

(Personal opinion) A good function 1) does what it is supposed to do, 2) does it quickly, 3) is user/developer friendly. Here we will focus on friendlines

Python uses so called duck-typing but we can express our intensions of the input arguments and function output by type annotations. These can be checked by `mypy` (but are not enfoced)

```python
# Following is code included in factorial.py.
def factorial(n: int) -> int:
    if n == 0:
        return 1
    return n*factorial(n-1)

factorial('works?')
```

We run type analysis by
```bash
(in3110) mirok@evalApply:data|$ mypy factorial.py 
```

Role of arguments should be clarified in a docstring of a function (or class). Type can be part of the docstring. We can also include tests via [doctest](https://docs.python.org/3/library/doctest.html). Documentation in the form of for example HTML pages can be generated by [shinx](https://www.sphinx-doc.org/en/master/usage/quickstart.html). The following illustates a doctring with some nonexhaustive tests. For examples of Google-style docstrings see [here](http://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html)

```python
# Part of factorial_doctest.py
def factorial(n: int) -> int:
    '''Return the factorial of n, an exact integer >= 0.

    Args:
       n (int):  n!

    Returns:
       int.  The factorial value::

    >>> factorial(5)
    120
    >>> factorial(0)
    1

    '''
    if n == 0:
        return 1
    return n*factorial(n-1)

```
To run the doctest we execute
```bash
(in3110) mirok@evalApply:data|$ python factorial_doctest.py -v
```

We will come back to testing when discussing Python package development in the next part.

Enforcing the behaviour via exceptions (and their handling). There are several predifined exception types: eg. ValueError, AssertionError, MethodError. We can also define our own type.

In [None]:
class MyError(BaseException):
    def __init__(self, msg):
        self.msg = msg
        
    def __str__(self):
        return "MyError occured with error message \"{}\"".format(self.msg)

In [22]:
# This is contrived to illustrate the custom exceptions in action.

def factorial(n: int) -> int:
    '''Return the factorial of n, an exact integer >= 0.

    Args:
       n (int):  n!

    Returns:
       int.  The factorial value::

    >>> factorial(5)
    120
    >>> factorial(0)
    1
    >>> factorial(-1)
    Traceback (most recent call last):
        ...
    ValueError: Only non-negative inputs are expected
    '''
    # Raise AssertionError if the type is wrong
    assert isinstance(n, int)
    # Raise a different exception for negative integers
    if n < 0:
        raise ValueError('Only non-negative inputs are expected')
        
    if n == 42:
        raise MyError('This is not meant to be')
        
    if n == 0:
        return 1
    return n*factorial(n-1)

Handling the raised exceptions. There are several predifined expection types: eg. ValueError, AssertionError, MethodError. We can also define our own type.

In [24]:
val = 3.4 #  32

try:
    f = factorial(val)
# We will try with the integer value
except AssertionError:
    from math import ceil
    n = ceil(val)
    print(f'Calling instead with {n}')
    f = factorial(n)
    
# Let's say that for negative we flip the sign
except ValueError:
    n = -val
    print(f'Calling instead with {n}')
    f = factorial(n)
    
except MyError as e:
    print('42!')
    
finally:
    # Sieve through here
    pass

Calling instead with 4


## Modifying function behavior
By now we have written function, we have seen functions that take in functions. What we want to do now is to write functions that return __modified__ functions. In our first example we want to write a function which modifies the input function with timing information.

In [25]:
import time
from functools import wraps

def timeit(foo):
    '''Return exacution time'''
    @wraps(foo)
    def wrapper(*args, **kwargs):
        then = time.time()
        result = foo(*args, **kwargs)
        now = time.time()
        print(f'{foo.__name__} executed in {now-then} s')
        
        return result
    return wrapper

Here we use the `@wraps` in order to preserve metadata of `foo` (see below). Let's write the function to be timed.

In [26]:
def one_second(): 
    time.sleep(1)
    
print(one_second())
    
timed = timeit(one_second)
# As a **Food for thought** omit the @wraps decorator above and consider what happens with timed.__name__
timed.__name__

None


'one_second'

In [27]:
print(timed())

one_second executed in 1.0011653900146484 s
None


A syntacting sugar for applying timeit is via `@`

In [28]:
@timeit
def one_second(): 
    time.sleep(1)
    
print(one_second())

one_second executed in 1.0013813972473145 s
None


Memoization is a technique for caching the function's return value for given input such that it does not need to be computed again. We can test the idea with the functools.lru_cache decorator.

In [29]:
from functools import lru_cache


def slow_factorial(n):
    '''Factorial by recursion'''
    if n == 0:
        return 1
    return n*slow_factorial(n-1)

@lru_cache
def faster_factorial(n):
    return slow_factorial(n)

Let's see about the speed

In [30]:
%timeit slow_factorial(10)

825 ns ± 16.4 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)


In [31]:
%timeit faster_factorial(10)

48.9 ns ± 0.553 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


As an exercise let's write the cache decorator ourselves. However, unlike in `@timeit` we want to make a decorator which takes in an argument which is the cache size. Note that `@lru_cache` has this behavior too. We base our cache on a dictionary

In [32]:
class Cache(dict):
    def __init__(self, size):
        self.size = size
        
    def __setitem__(self, key, value):
        # Make room
        if len(self) >= self.size:
            # Grab some key
            key = next(iter(self))
            # and remove the entry
            self.pop(key)
        # Set it via parent (dict class)
        super().__setitem__(key, value)

Recall that decorator with arguments is applied as decorator(arguments)(function). That is decorator(arguments) must return a function

In [33]:
from functools import wraps

def cache(size):
    '''Memoize'''
    
    def decorate(foo):
        memory = Cache(size)
    
        @wraps(foo)
        def wrapper(*args):
            # Lookup arguments. NOTE: here we only assumed positional arguments
            if args in memory:
                return memory[args]
            # Compute and remember
            result = foo(*args)
            memory[args] = val
            return result
        return wrapper
    return decorate

@cache(10)
def faster_factorial2(n):
    return slow_factorial(n)

In [34]:
%timeit faster_factorial2(10)

165 ns ± 0.0774 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


Some other useful decorators are `@property` in class definitions.

In [35]:
class UnixName:
    '''Max 8 characters'''
    def __init__(self, name):
        self.name = name  # NOTE: here were're calling the setter
 
    # Get
    @property
    def name(self):
        return self._name
 
    # Set
    @name.setter
    def name(self, name):
        if len(name) > 8:
            name = name[:8]
        self._name = name

(UnixName('Miro').name, UnixName('Jawaharlal').name)

('Miro', 'Jawaharl')