# Categorical Regression Exercises

Because life isn't always binary -- but you always need to exercise

In [1]:
from typing import Dict, Tuple

import arviz as az
import bambi as bmb
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pymc as pm
import pytensor.tensor as pyt
import xarray as xr
from scipy import special

In [2]:
plt.style.use("intuitivebayes.mplstyle")

mpl.rcParams["figure.dpi"] = 120
mpl.rcParams["figure.facecolor"] = "white"
mpl.rcParams["axes.spines.left"] = False

## Exercise 1 [Easy] 

Let's get more comfortable with the ZSN model. In the PyMC model for topping modeling, change the priors on the standard deviations of the ZSN. How does this change the prior predictive plot and sampling time?*

__Post your answers with screenshots [in the Discourse](https://community.intuitivebayes.com/) thread "AR Course -- Lesson 5, Exercise 1 Solution"__*, and compare them to the other students'.

In [1]:
# ADD CODE HERE

## Exercise 2 [Medium] 

In the same model that you modified in the previous exercise, now remove `zerosum_axes=2` on the categorical predictors (`β_nationality`). How did that impact sampling time? Do you have an idea why?

_Hint: a pair plot of two parameter types and for each topping will be useful..._

In [1]:
# ADD CODE HERE

## Exercise 3 [Hard]
Now let's modify the Estonian model. We're going to extend it in this exercise, adding extra predictors.

### Part 1

- In the PyMC and Bambi models for Estonian data:
    - *Add `age` and `gender` as predictors*
    - *Add an interaction on `age` and `education`*
    - *Do a trace plot of the posterior sample. Do you see any red flag? How does convergence look like to you? Do you think we should do other diagnostics?*
    
- Only in the PyMC model, add an `assert` clause to make sure the array of probabilities is close to 1

In [1]:
# ADD CODE HERE

### Part 2

Look at the following prior and posterior retrodictive plots, and assign the bottom line and top line to its respective model (PyMC or Bambi). Explain your choice.

*More globally, what do you think of those checks? Do they look good to you?*

In [1]:
# ADD CODE HERE

## Exercise 4 [Hard]
Now, let's work on other parts of the workflow, other than the model itself.

### Part 1

Update the `compute_post_p` and `plot_post_p` functions from the lesson, to reflect the latest model above.

In [1]:
# ADD CODE HERE

### Part 2

Let's see how age correlates to voting preferences, all else being equal. **For the PyMC model**, plot each party's posterior intention to vote for highly educated non-Estonian females, first for the 25-34 age group, then for the 75+ group. Do you already see differences?

In [1]:
# ADD CODE HERE

### Part 3

Now, use a forest plot to compare those two groups' voting intentions. What are the main differences?

In [1]:
# ADD CODE HERE

### Part 4

Do questions 2. and 3. again, but this time for the Bambi model.

In [1]:
# ADD CODE HERE

### Part 5

Use a forest plot again, but this time to compare the PyMC model's results to the Bambi model's. Your plot should therefore have four different estimates per party. Interpret this plot: do you see any difference in the models' posteriors?

In [1]:
# ADD CODE HERE

Well done on completing these exercises -- they certainly were not easy! Again, if everything doesn't click from the first time, it's totally expected. Let it sink in. And come back to the lesson and these exercises in a few weeks. You'll see that you understood more than you remembered, and it'll be easier for you to identify the areas that are still blurry.

Actually, to get you even more on your way, I encourage you to do the bonus exercise below.

## Bonus Exercise [Hard]

This one is next level. Not because the model is harder than the one you just did (if pressed, I'd say it's easier), but because you're gonna be as close to a real-life modeling situation as is possible in a course. You're gonna have to load the data, explore it, and follow the whole Bayesian workflow -- _all by yourself_. This is a crucial skill, because we wanna teach you how to model, not how to follow instructions. So what do you say? You ready?

Of course you are! Load up the 2016 US election dataset Ravin showed you in a previous lesson:

```python
bmb.load_data("ANES")
```

Now, **model each subject's voting intention, based on their age and party identification**. Follow the Bayesian workflow, and apply everything you've learned in this lesson and exercises. Make sure to **[check out the Discourse](https://community.intuitivebayes.com/)** if you have any questions, and also **post your answers there** (with code and plots) -- do it in the **thread "AR Course -- Lesson 5, Bonus Exercise Solution"**, and compare them to the other students'.

Now is your time to shine bright like diamond! Yes it's gonna be challenging, but we know you can do it -- we wouldn't ask you to otherwise. 

PyMCheers & Keep On Sampling 🖖

In [1]:
# ADD CODE HERE