# D3 - 01 - Exceptions, distributions, models

## Content
- How does Python handle exceptions?
- Sampling distributions with `numpy.random`
- Visualising distributions with `seaborn` and `pandas`
- Fitting polynomial models with `numpy.polyfit`
- A simple approach to numerical quadrature

## Prequisites

```bash
conda install seaborn pandas
```

## Remember jupyter notebooks
- To run the currently highlighted cell, hold <kbd>&#x21E7; Shift</kbd> and press <kbd>&#x23ce; Enter</kbd>.
- To get help for a specific function, place the cursor within the function's brackets, hold <kbd>&#x21E7; Shift</kbd>, and press <kbd>&#x21E5; Tab</kbd>.

## A notebook "preamble"
The first code block prepares our notebook by specifying how to render plots and importing the required packages.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

## Exceptions
When Python encounters a **problem**, an `exception` is `raise`d:

In [None]:
assert False, 'you shall not pass'

print('goal reached?')

These `exception`s can be caught and dealt with:

In [None]:
try:
    assert False, 'you shall not pass'
except:
    pass

print('goal reached?')

In [None]:
try:
    assert False, 'you shall not pass'
except Exception as e:
    print('Exception message:', e)

print('goal reached?')

Python knows many different types of exceptions for specific situations:

In [None]:
try:
    assert False, 'you shall not pass'
except ValueError:
    print('This catches not')
except AssertionError:
    print('But this does')

In [None]:
try:
    assert False, 'you shall not pass'
except Exception as e:
    print('Exception message:', e)

print('goal reached!')

The `finally` command allows to clean up:

In [None]:
try:
    assert False, 'you shall not pass'
except ValueError as e:
    print('Exception message:', e)
finally:
    print('This will still run')

print('goal reached?')

In [None]:
def func(parameter):
    try:
        assert parameter, 'parameter is False'
    except AssertionError as e:
        print(e)
        return False
    finally:
        print('This WILL run...')
    return True

func(True)

In [None]:
func(False)

**Example**: catching an **expected** `exeption` to save an `if` clause:

In [None]:
def func(parameter):
    print(parameter[0])

func(None)

In [None]:
def func(parameter):
    try:
        print(parameter[0])
    except TypeError:
        print(parameter)

func(None)

This is how you `raise` and `exception` on your own:

In [None]:
def func(parameter):
    if parameter is None:
        raise ValueError('I am not dealing with None!')
    print(parameter)

func(None)

**Exercise**: implement `scalar_product(a, b)` such that a `ValueError` is thrown if `len(a) != len(b)`

**Exercise**: modify the `linear_regression(x_values, y_values)` function below such that a `ValueError` from `scalar_product(a, b)` can be caught; the function should return `None, None` in this case.

In [None]:
def linear_regression(x_values, y_values):
    x_mean, y_mean = np.mean(x_values), np.mean(y_values)
    x = np.asarray(x_values) - x_mean
    y = np.asarray(y_values) - y_mean
    slope = scalar_product(x, y) / np.sum(x**2)
    const = y_mean - slope * x_mean
    return slope, const

x = [10, 14, 16, 15, 16, 20]
y = [ 1,  3,  5,  6,  5, 11]
linear_regression(x, y[:-1])

Remember, `exception`s are designed for specific purposes:

In [None]:
a = list(range(5))
print(a[100])

In [None]:
try:
    print(a[100])
except ValueError:
    print('ValueError')
except AssertionError:
    print('AssertionError')
except IndexError:
    print('IndexError')

In [None]:
try:
    print(a[100])
except Exception:
    print('Exception')
except IndexError:
    print('IndexError')

Let's write our own (subclass):

In [None]:
class FancyIndexError(IndexError):
    def __init__(self, message=None):
        super(FancyIndexError, self).__init__(message)

try:
    raise FancyIndexError()
except IndexError:
    print('normal')
except FancyIndexError:
    print('fancy')

In [None]:
try:
    raise IndexError()
except FancyIndexError:
    print('fancy')
except IndexError:
    print('normal')

## Distributions
Let's create some simple distributions with `numpy.random` and visualise them. `seaborn` is a frontend for `matplotlib` and simplifies **standard situations**:

In [None]:
a = np.random.rand(10000)

sns.distplot(a, hist=False, kde_kws=dict(shade=True))

print(np.mean(a), np.std(a))

In [None]:
a = np.random.randn(10000)

sns.distplot(a, hist=False, kde_kws=dict(shade=True))

print(np.mean(a), np.std(a))

In [None]:
a = np.random.randn(10000) * 15 + 100

sns.distplot(a, hist=False, kde_kws=dict(shade=True))

print(np.mean(a), np.std(a))

In [None]:
for k in range(2, 6):
    sns.distplot(
        np.random.randn(10**k),
        hist=False,
        label='10^%d' % k)

We can also create/visualise two-dimensional distributions:

In [None]:
x, y = np.random.rand(2, 500)
sns.kdeplot(x, y)

In [None]:
x, y = np.random.randn(2, 500)
sns.kdeplot(x, y)

In [None]:
state = np.random.choice(2, 500)
mean = np.asarray([[0, 0], [1, 2]])
x, y = np.random.randn(2, state.size) * 0.1 + mean[state].T
sns.kdeplot(x, y)

The `seaborn.jointplot()` function shows you the joint distribution as well as the marginals:

In [None]:
sns.jointplot(x, y, kind='kde')

In [None]:
state = np.random.choice(2, 500)
mean = np.asarray([[0, 0], [0, 3]])
x, y = np.random.randn(2, state.size) * 0.1 + mean[state].T
x *= 50
y += np.sqrt(np.abs(x))
sns.kdeplot(x, y)

In [None]:
sns.jointplot(x, y, kind='kde')

## Plotting with `seaborn` and `pandas`
Now we have a look at `pandas`: this tool helps to organise and clean data and `seaborn` depends on this package for its internal data organisition.

Using `seaborn` + `pandas` we can make plots like this:

In [None]:
data = pd.DataFrame(dict(x=[10, 14, 16, 15, 16, 20], y=[ 1,  3,  5,  6,  5, 11]))
sns.lmplot('x', 'y', data)

We will most likely only use the `panas.DataFrame` class to store data:

In [None]:
x = np.linspace(-2, 2, 100)
y = x + np.random.randn(x.size) * 0.5

data = pd.DataFrame(dict(x=x, y=y))
print(data)

If we supply a `pandas.DataFrame` to `seaborn`, plotting becomes simple, e.g., this is a linear regression:

In [None]:
sns.lmplot('x', 'y', data)

And this is a visualisation of joint and marginal distributions for a three dimensional dataset:

In [None]:
x = np.random.rand(10000) * 5
y = np.random.randn(10000)
z = np.random.exponential(size=10000)
sns.pairplot(pd.DataFrame(dict(x=x, y=y, z=z)))

## Polynomial regression with `numpy`

In [None]:
x = np.random.rand(1000) * 6 - 3
y = x**2 + np.random.randn(x.size)
plt.scatter(x, y, s=0.1)

We can do a polynomial regression and then create a polynomial function for further analysis:

In [None]:
z = np.polyfit(x, y, 2)
p = np.poly1d(z)
plt.scatter(x, p(x), s=1)

For example to work ion a different grid:

In [None]:
x2 = np.linspace(x.min(), x.max(), 100)
plt.plot(x2, p(x2), linewidth=2)
plt.scatter(x, y, s=1)

**Exercise**: perform a polynomial regression for the following dataset and visualize your model. What is a good degree choice?

In [None]:
x = np.random.rand(1000) * 3 - 1
c = [2, -3, 0, 1]
y = np.poly1d(c)(x) + np.random.randn(x.size) * 0.5

plt.scatter(x, y, s=1)

## Numerical quadrature
We already have the means to differentiate a function. With a quadrature method, e.g., the trapezoidal rule
$$\int\limits_a^b f(x) \text{d}x \approx \sum\limits_{n=0}^{N-1} \frac{f(x_n) + f(x_{n+1})}{2} (x_{n+1}-x_n),$$
with
$$a = x_0 < x_1 < \cdots < x_N = b,$$
we can approximate the integral of a function:

In [None]:
x = np.linspace(-np.pi, np.pi, 100)
y = np.sin(x)

plt.plot(x, y)
plt.fill_between(x, np.minimum(y, 0), np.maximum(y, 0), alpha=0.3)

**Exercise**: perform a trapezoidal rule calculation of the above dataset:

**Exercise**: perform the calculation for the dataset in the next cell:

In [None]:
x = np.linspace(0, 1, 100)
y = x

plt.plot(x, y)
plt.fill_between(x, np.minimum(y, 0), np.maximum(y, 0), alpha=0.3)



**Exercise**: implement a function
```Python
def integrate(func, a, b):
    pass
```
to integrate a given function `func` using the trapezoidal rule and test this function with at least two different mathematical functions which you have to implement yourself.

Let's put everything together!

**Exercise**: perform a polynomial regression on the data given below and integrate the resulting model over the entire range for which you have samples:

In [None]:
x = np.random.rand(1000) * 6 - 3
y = x**2 - 2 + np.random.randn(x.size)
plt.scatter(x, y, s=0.1)


