# Hierarchical Regression Exercises

In [1]:
from typing import List

import arviz as az
import matplotlib as mpl
import matplotlib.lines as mlines
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pymc as pm
import pytensor.tensor as pt
import seaborn as sns
from scipy.special import expit as invlogit

In [2]:
plt.style.use("intuitivebayes.mplstyle")

mpl.rcParams["figure.dpi"] = 120
mpl.rcParams["figure.facecolor"] = "white"
mpl.rcParams["axes.spines.left"] = False

seed = sum(map(ord, "Hierarchical Regression Exercises"))
rng = np.random.default_rng(seed)

## Exercise 1 [Medium]

> Grasshopper models in PyMC

***1. Recreate the centered hierarchical grasshoppers model in PyMC. In Bambi terms: `1 + (1|mother) + (1|date) + (1|location) + (1|date:location)`. How well does it converge?***

In [3]:
# ADD CODE HERE

***2. Reparametrize the previous model in a non-centered fashion. Is convergence better?***

In [11]:
# ADD CODE HERE

***3. Remove the main effects for `date` and `location` from the previous model. In bambi terms: `1 + (1|mother) + (1|date:location)`. Does convergence change?***

In [15]:
# ADD CODE HERE

***4. Recreate the 4-panel forest plot from the lesson, showing the probability of having the chromosome for each mom's descendants.***

In [None]:
# ADD CODE HERE

## Exercise 2 [Hard]

> Prediction on new groups

**Predict from the new groups from the lesson, but now with PyMC, using the last model you just defined above. Predict both the probability of getting the chromosome (`p`) and the number of grasshopper observed with the chromosome (`y`). Visualize your predictions with `az.plot_posterior`.**

### Observe new groups

We'll look at the three representative cases from the lesson:

1. A new observation with known location, N and mom, but **new date**.
2. A new observation with known N and mom, but **new date and location**.
3. A new observation with known date, location and N, but **new mom**.

In [21]:
df_grasshopper["mom"] = df_grasshopper.index
new_data = df_grasshopper.loc[:2, ["date", "location", "n", "mom"]]

# new date
new_data.loc[0] = [
    df_grasshopper.date.max() + 1,
    np.random.choice(df_grasshopper.location),
    np.random.choice(df_grasshopper.n),
    np.random.choice(df_grasshopper.mom),
]

# new (date, location)
new_data.loc[1] = [
    df_grasshopper.date.max() + 2,
    df_grasshopper.location.max() + 1,
    np.random.choice(df_grasshopper.n),
    np.random.choice(df_grasshopper.mom),
]

# new mom
new_data.loc[2] = [
    np.random.choice(df_grasshopper.date),
    np.random.choice(df_grasshopper.location),
    np.random.choice(df_grasshopper.n),
    df_grasshopper.mom.max() + 1,
]
new_data

Unnamed: 0,date,location,n,mom
0,4,5,7,68
1,5,6,22,23
2,2,3,18,88


We'll need indexing to do that with PyMC, so let's add the index of each observation to the dataframe, while making sure it's really obvious when an observation is coming from a previously unobserved group:

In [22]:
new_data["date_idx"] = pd.Categorical(new_data.date, categories=dates).codes
new_data["location_idx"] = pd.Categorical(new_data.location, categories=locations).codes
new_data["mom_idx"] = pd.Categorical(new_data.mom, categories=COORDS["moms"]).codes
new_data = new_data.replace({-1: -1000})
new_data

Unnamed: 0,date,location,n,mom,date_idx,location_idx,mom_idx
0,4,5,7,68,-1000,4,68
1,5,6,22,23,-1000,-1000,23
2,2,3,18,88,1,2,-1000


Here, we've used `-1000` as a marker of novelty -- that way, we're sure indexing won't work if we mess up something. Now, that we have the data, we're ready to work on the predictions. 

As a reminder, here is how Bambi does it (which is what we used in the lesson): **each posterior sample for the new group is drawn from the posterior samples of a randomly selected _existing_ group**. In pseudo-code, if you want e.g 1000 samples for a new observation:

```python
new_obs_draws = [
    posterior.isel(group_dim=i).isel(sample=np.random.choice(range(nbr_samples), size=1)) for i in np.random.choice(
        range(nbr_of_existing_groups), size=1000
    )
]
```

Since different groups may be selected at each draw, the end result _represents the variation across existing groups_. That means that the uncertainty on our new groups will be higher than on our existing groups.

But it's exactly as it should: we know less about this new group, which means... we're more uncertain about it!

So now, how does it look like in code?

***Hint: it might help to think about this like bootstraping, and look at something like `pytensor.tensor.random.choice`...***

In [23]:
# ADD CODE HERE

## Exercise 3 [Hard]

> Using the 8-school model to predict two new schools

This exercise will consolidate what you learned in the previous one. We will use the famous [eight schools model](https://www.pymc.io/projects/docs/en/stable/learn/core_notebooks/model_comparison.html) (from Section 5.5 of Gelman et al (2003)), which attempts to infer the effects of coaching on SAT scores of students from 8 schools.

Below are the data, which include the observed treatment effects (`y`) and associated standard deviations (`sigma`) in the 8 schools:

In [28]:
y = np.array([28, 8, -3, 7, -1, 1, 18, 12])
sigma = np.array([15, 10, 16, 11, 9, 11, 10, 18])

### Hierarchical Model

***1. Fit a hierarchical model that allows for a random effect that partially pools the schools***

In [29]:
# ADD CODE HERE

***2. Use the model to simulate what a 9th and 10th schools might look like. Do it Bambi-style, and visualize your predictions like in the previous exercise.***

Let's also assume we don't know exactly what the sigma is for the new schools, but that we can come up with a unique prior for each. The 9th school is believed to have a measurement error between 10 and 20. The 10th school, on the contrary, is said to have a relatively low measurement error, between 5 and 7.

In [32]:
# ADD CODE HERE

***3. Now make the same predictions, but by using explicit priors for the new schools. Use a forest plot to visualize your predictions.***

In addition to the priors from the previous question, let's also assume that the mean of school 10 is expected to be one point above the baseline. Write the model in such a way that you can also get posterior predictive draws from the initial 8 schools.

In [35]:
# ADD CODE HERE

## Exercise 4 [Easy]

> Regression on diversity parameters

***Run a regression on the diversity parameters of the socio-demographic components (`age`, `gender` and `gender:nat`) from the Estonian deluxe model. Analyze the convergence and interpret these diversity parameters. Post your plots, analyses and interpretation on the Discourse (thread "[AR Course -- Lesson 9, Exercise 6 Discussion](https://community.intuitivebayes.com/)").***

***Bonus: Do it by writing a new function, that improves `hierarchical_zsn` at the same time***.

In [38]:
# ADD CODE HERE

## Exercise 5 [Hard]

> Interaction-only versus main effects

***1. Write down a simplified version of the deluxe model, where the only predictor is the interaction of `gender` and `nationality`. In Bambi terms: `1 + gender:nat`. Don't change anything else. Sample from this model, and do a pair plot of the covariates for any given party.***

In [53]:
# ADD CODE HERE

***2. Now, write down another simple version of the deluxe model, where the only predictors are the main effects of `gender` and `nationality`, as well as the interaction of both. In Bambi terms: `1 + gender + nat + gender:nat`. Sample from this model, and do a pair plot of the covariates for any given party. What are the differences? How can you explain them?***

***Feel free to ask questions on Discourse about this -- it's a hard one ;)***

In [56]:
# ADD CODE HERE

***3. Write down a final simpler version of the deluxe model, where the only predictors are the main effects of `age` and `education`, as well as the interaction of both. In Bambi terms: `1 + age + education + gender:education`. Sample from this model, and do a pair plot of the covariates for any given party. What are the differences? How can you explain them?***

In [62]:
# ADD CODE HERE