# Advanced Coding

## Introduction

This chapter covers some more advanced programming concepts. It's not strictly necessary to master the content of this chapter for subsequent chapters but it's here in case you want a deeper understanding or in case you find that you eventually need to draw on more sophisticated programming tools and concepts.

```{note}
You can safely skip this chapter and come back to it later.
```

This chapter has benefitted from the online book [*Research Software Engineering with Python*](https://merely-useful.github.io/py-rse/), the [official Python documentation](https://www.python.org/), 

## Truthy and falsy values

Python objects can be used in expressions that will return a boolean value, such as when a list, `listy`, is used with `if listy`. Built-in Python objects that are empty are usually evaluated as `False`, and are said to be 'Falsy'. In contrast, when these built-in objects are not empty, they evaluate as `True` and are said to be 'truthy'.

(If you are building your own classes, you can define this behaviour for them through the `__bool__` dunder method.)

Let's see some examples:

In [None]:
def bool_check_var(input_variable):
    if not (input_variable):
        print('Falsy')
    else:
        print('Truthy')

listy = []
other_listy = [1, 2, 3]


bool_check_var(listy)

In [None]:
bool_check_var(other_listy)

The method we defined doesn't just operate on lists; it'll work for many various other truthy and falsy objects:

In [None]:
bool_check_var(0)

In [None]:
bool_check_var([0, 0, 0])

Note that zero was falsy, its the nothing of a float, but a list of three zeros is not an empty list, so it evaluates as truthy.

In [None]:
bool_check_var({})

In [None]:
bool_check_var(None)

Knowing what is truthy or falsy is useful in practice; imagine you'd like to default to a specific behaviour if a list called `list_vals` doesn't have any values in. You now know you can do it simply with `if list_vals`.

## Iterators

An iterator is an object that contains a countable number of values that a single command, `next`, iterates through. Before that's possible though, we need to take a countable group of some kind and use the `iter` keyword on it to turn it into an iterator. Let's see an example with some text:

In [None]:
text_lst = ["Mumbai", "Delhi", "Bangalore"]

myiterator = iter(text_lst)

Okay, nothing has happened yet, but that's because we didn't call it yet. To get the next iteration, whatever it is, use `next`:

In [None]:
next(myiterator)

In [None]:
next(myiterator)

In [None]:
next(myiterator)

Alright, we've been through all of the values so... what's going to happen `next`!?

In [None]:
next(myiterator)

Iterating beyond the end raises a `StopIteration` error. You can build your own iterators (here we used a built-in object type, the `list`, to give an iteratory of type `list_iterator`).

## Generators

Generator functions return 'lazy' iterators. They are lazy because they do not store their contents in memory. This has *big* advantages for some oprations in specific situations: datasets larger than can fit into your computer's memory, or a complex function that needs to maintain an internal state every time it’s called.

To give an idea of how and when they work, imagine that (exogeneously) integers are really costly, taking as much as 10 MB of space to store (the real figure is more like 128 bytes). We will write a function, "firstn", that represents the first $n$ non-negative integers, where $n$ is large. The most naive possible way of doing this would be to build the full list in memory like so:

In [None]:
def first_n_naive(n):
    """Build and return a list"""
    num, nums = 0, []
    while num < n:
        nums.append(num)
        num += 1
    return nums


sum_of_first_n = sum(first_n_naive(1000000))
sum_of_first_n

Note that `nums` stores *every* number before returning all of them. In our imagined case, this is completely infeasible because we don't have enough computer space to keep all $n$ 10MB integers in memory.

Now we'll rewrite the list-based function as a generator-based function:

In [None]:
def first_n_generator(n):
    """A generator that yields items instead of returning a list"""
    num = 0
    while num < n:
        yield num
        num += 1

sum_of_first_n = sum(first_n_generator(1000000))
sum_of_first_n

Now, instead of creating an enormous list that has to be stored in memory, we `yield` up each number as it is 'generated'. The cleverness that's going on here is that  the 'state; of the function is remembered from one call to the next. This means that when `next` is called on a generator object (either explicitly or implicitly, as in this example), the previously yielded variable `num` is incremented, and then yielded again.

That was a fairly contrived example but there are plenty of practical ones. Working with pipelines that process very large datasets is a classic use case. For example, imagine you have a csv file that's far too big to fit in memory, i.e. open all at once, but you'd like to check the contents of each row and perhaps process them. The code below would `yield` each row in turn.

```python
def csv_reader(file_name):
    for row in open(file_name, "r"):
        yield row
```

An even more concise way of defining this is via a *generator expression*, which syntactically looks a lot like a *list comprehension* but is a generator rather than a list. The example we just saw would be written as:

```python
csv_gen = (row for row in open(file_name))
```

It's easier to see the difference in the below example which clearly shows the analogy between *list comprehensions* and *generator comprehensions*.

In [None]:
sq_nums_lc = [num**2 for num in range(2, 6)]
sq_nums_lc

In [None]:
sq_nums_gc = (num**2 for num in range(2, 6))
sq_nums_gc

The latter is a generator object and we can only access individual values calling `next` on it.

In [None]:
next(sq_nums_gc)

Note that for small numbers of entries, lists may actually be faster and more efficient than generators-but for large numbers of entries, generators will almost always win out.

## Inner functions

Python supports functions within functions within ... (and so on). Here's a simple (if unnecessary!) example:

In [None]:
from datetime import datetime


def print_time_now():
    def get_curr_time():
        return datetime.now().strftime("%H:%M")
    
    now = get_curr_time()
    print(now)


print_time_now()

## Decorators

Decorators 'decorate' functions, they adorn them, modifying them as they execute. Let's say we want to run some numerical functions but we'd like to add ten on to whatever results we get. We could do it like this:

In [None]:
def multiply(num_one, num_two):
    return num_one*num_two


def add_ten(in_num):
    return in_num + 10


answer = add_ten(multiply(3, 4))
answer

This is fine for a one-off but a bit tedious if we're going to be using `add_ten` a lot, and on many functions. Decorators allow for a more general solution that can be applied, in this case, to any `inner` function that has two arguments and returns a numeric value.

In [None]:
def add_ten(func):
    def inner(a, b):
        return func(a, b) + 10
    return inner


@add_ten
def multiply(num_one, num_two):
    return num_one*num_two


multiply(3, 4)

We can use the same decorator for a different function (albeit one of the same form) now.

In [None]:
@add_ten
def divide(num_one, num_two):
    return num_one/num_two


divide(10, 5)

But the magic of decorators is such that we can define them for much more general cases, regardless of the number of arguments or even keyword arguments:

In [None]:
def add_ten(func):
    def inner(*args, **kwargs):
        print('Function has been decorated!')
        print('Adding ten...')
        return func(*args, **kwargs) + 10
    return inner


@add_ten
def combine_three_nums(a, b, c):
    return a*b-c

@add_ten
def combine_four_nums(a, b, c, d=0):
    return a*b-c-d


combine_three_nums(1, 2, 2)

Let's now see it applied to a function with a different number of (keyword) arguments:

In [None]:
combine_four_nums(3, 4, 2, d=2)

Decorators can be chained too (and order matters):

In [None]:
def dividing_line(func):
    def inner(*args, **kwargs):
        print(''.join(['-']*30))
        out = func(*args, **kwargs)
        return out
    return inner


@dividing_line
@add_ten
def multiply(num_one, num_two):
    return num_one*num_two


multiply(3, 5)

## Errors and exceptions

When a programme goes wrong, it throws up an error and halts. You won't be coding for long before you hit one of these errors, which have special names depending on what triggered them.

Let's see a real-life error in action:

In [None]:
denom = 0

print(1/denom)

Oh no! We got a `ZeroDivisionError` and our programme crashed. Note that the error includes a 'Traceback' to show which line went wrong, which is helpful for debugging.

In practice, there are often times when we know that an error *could* arise, and we would like to specify what should happen when it does (rather than having the programme crash). 

We can use *exceptions* to do this. These come in a `try` ... `except` pattern, which looks like an `if` ... `else` pattern but applies to errors. If no errors occur inside the `try` block, the `except` block isn’t run but *if* something goes wrong inside the `try` then the `except` block is executed. Let's see an example:

In [None]:
for denom in [-5, 0, 5]:
    try:
        result = 1/denom
        print(f'1/{denom} == {result}')
    except:
        print(f'Cannot divide by {denom}')

Now we can see two differences. First: the code executed just fine *without* halting. Second: when we hit the error, the `except` block was executed and told us what was going on.

In this case, we wrote an informative message about the error but it's convenient to use Python's built in messages where we can. In the below, not only do we send our own message about the error but we add info on what caused the error for the language too:

In [None]:
for denom in [-5, 0, 5]:
    try:
        result = 1/denom
        print(f'1/{denom} == {result}')
    except Exception as error:
        print(f'{denom} has no reciprocal; error is: {error}')

Sadly, division by zero is just one of the many errors you might encounter. What if a function is likely to end up running into several different errors? We can have multiple `except` clauses to catch these:

In [None]:
numbers = [-5, 0, 5]
for i in [0, 1, 2, 3]:
    try:
        denom = numbers[i]
        result = 1/denom
        print(f'1/{denom} == {result}')
    except IndexError as error:
        print(f'index {i} out of range; error is {error}')
    except ZeroDivisionError as error:
        print(f'{denom} has no reciprocal; error is: {error}')

A full list of built-in errors may be [found here](https://docs.python.org/3/library/exceptions.html#exception-hierarchy) and they are nested in classes (eg `ZeroDivisionError` is a special case of a `ArithmeticError`).

Where do these errors come from anyway? What tells the programming language to throw a tantrum when it encounters certain combinations of values and operations.

The answer is that the person or people who wrote the code that's 'under the hood' can specify when such errors should be raised. Remember, the philosophy of Python is that things should faily loudly (so that they do not cause issues downstream). Here's an example of some code that raises its own errors using the `raise` keyword:

In [None]:
for number in [1, 0, -1]:
    try:
        if number < 0:
            raise ValueError(f'no negatives: {number}')
        print(number)
    except ValueError as error:
        print(f'exception: {error}')

A `ValueError` is a built-in type of error and there are plenty of ones to choose from for your case. Some big or specialised libraries define their own types of error too.

One very clever feature of Python's exception handling is "throw low, catch high", which means that even if an error gets thrown way deep down in the middle of a code block, the catching exception can be used some way away. Here's an example: the error arises *within* the `sum_reciprocals` function, but is caught elsewhere.

In [None]:
def sum_reciprocals(values):
    result = 0
    for v in values:
        result += 1/v
    return result

numbers = [-1, 0, 1]
try:
    one_over = sum_reciprocals(numbers)
except ArithmeticError as error:
    print(f'Error trying to sum reciprocals: {error}')