In [None]:
import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pymc as pm
import scipy.stats as stats
import pandas as pd
import rethinking as rt

RANDOM_SEED = 42
rng = np.random.default_rng(RANDOM_SEED)

__Probabilistic Programming. Wasowski. Pardo. IT University of Copenhagen__

This file contains the list of exercises for the week, as well as any related code.

## Exercises

The exercises for this week are: __all exercises__ from Chapter 4, McElreath, with exceptions and remarks noted below. 


* __Exercise 4M2__ asks for using `quap`. We do not use `quap` in this course. We use MCMC instead (`pymc.sampling.sample` does that). There is really no benefit to quadratic approximation which is only locally correct, and only gives us key summary statistics. If we can manage to achieve a full posterior sample, we have more information at hand.  If you have a good sample trace from 4M1, you can just call `arviz.summary` on it to get the mean and standard deviation from our parameters (which `quap` would give us). cf. https://arviz-devs.github.io/arviz/api/generated/arviz.summary.html
* __Exercise 4M7__ uses model m4.3. The model uses data, so first the data loading code:

In [None]:
data = pd.read_csv('Howell1.csv', sep=';').to_xarray()
adults = data.where(data.age >= 18, drop = True)# condition on adults
adults

This is the model in PyMC syntax:

In [None]:
xbar = adults.weight.data.mean()
with pm.Model() as m4_3:

    # Prior
    α = pm.Normal('α', mu = 178, sigma = 20) 
    β = pm.LogNormal('β', mu = 0, sigma = 1)
    σ  = pm.Uniform('σ', 0, 50)
    
    # Ignore for now, think of it as x. We could have used x, but this allows to do prediction later.
    x_mutable = pm.Data("x", adults.weight.values)
    
    # The linear model
    μ = pm.Deterministic('μ', α + β*(x_mutable-xbar))
    
    # The likelihood:
    height = pm.Normal('height', mu = μ, sigma = σ, observed = adults.height.data) 
    idata_m4_3 = pm.sample(draws=3000, random_seed = rng)

* Exercise __4M8__ is about splines. We have not talked about this in the lecture, and you can safely skip it if your only guidance is exam curriculum.  Otherwise it is a fascinating exercise, so feel free to look into  it.

* Exercise __4H2__ uses the Howell dataset again. It is loaded above into `data` and `adults`. You probably want to reuse the model formulation from above, too.  As usual, use the pPyMC sampler, not quadratic approximation.

* Exercise __4H3__ in part (b) refers to an R plot that might look mysterious to you.  Just use the same kind of plot as in __4H2__. This is what they mean :)

* The parabolic model Exercise __4H4__ talks about is the one found on p. 111

* Exercise __4H5__ is using the cherry Blosom data from p. 114. Below is the code to load it. Also the exercise is unclear which regression to use. Since we are trying to de-emphasize splines, try just with usual linear regression.

In [None]:
d = pd.read_csv("cherry_blossoms.csv").to_xarray()
# if you want to drop NAs:
d = pd.read_csv("cherry_blossoms.csv").dropna().to_xarray()

# summary stats the rethinking way
print(rt.precis(d))

# summary stats the pandas way
pd.read_csv("cherry_blossoms.csv").dropna().describe()

In [None]:
# summary stats the arviz way
az.summary(pd.read_csv("cherry_blossoms.csv").dropna().to_dict(orient="list"), kind="stats")

* Exercise __4H6__ is asking for cherry Blosom spline fitting, while we are trying to de-emphasize splines in the course a bit (the course is sufficiently large as it is already).  Instead try to generate prior predictive from a linear regression from 4H5 (just to train the concept). Use the prior predictive distribution to assess the choice of priors.

* Skip exercise __4H8__, as it is purely about splines.