# Multinomial Regression Exercises

In [2]:
import arviz as az
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pymc as pm
import seaborn as sns

In [3]:
plt.style.use("intuitivebayes.mplstyle")

mpl.rcParams["figure.dpi"] = 120
mpl.rcParams["figure.facecolor"] = "white"
mpl.rcParams["axes.spines.left"] = False

## Exercise 1 [Easy] 

**1. *Save to disk the trace of the PyMC model from the lesson. Load it in this notebook.***

In [None]:
# ADD CODE HERE

**2. *Check that there are 11 parties, and that the posterior mean of `baseline` across all dimensions is 0.***

In [None]:
# ADD CODE HERE

## Exercise 2 [Easy]

Let's examine the correlation of `education` with voting preferences.

**1. *Make a forest plot for the PyMC model's education effect.***

Small tip: for the coming plots, transpose the xarray objects, so that Arviz displays the party first and you can clearly compare the demographics of the other variable.

In [None]:
# ADD CODE HERE

**2. *Interpret this plot. In particular: How is education correlated with party preference? Can you see differences in the voting patterns of the different education levels?***

In [None]:
# ADD CODE HERE

Now let's look at the other estimates.

## Exercise 3 [Medium]

Now let's look at the association of `age` and vote.

**1. *Compute the posterior medians of `α_gender` and draw them as a heatmap, with gender horizontally and parties vertically.***

Hint: use `seaborn` to draw the heatmap.

In [None]:
# ADD CODE HERE

**2. *What are the median differences between men and women?***

ADD ANSWER HERE

**3. *Do a ridge plot of `α_gender`.***

In [None]:
# ADD CODE HERE

**4. *Do you spot any difference or new information compared to the medians?***

ADD ANSWER HERE

**5. *Do we need something else to compute the total gender effect? Or is `α_gender` all we need?***

ADD ANSWER HERE

Now let's look at the nationality effect now.

**6. *Compute the posterior medians of `α_nat` and draw them as a heatmap, with nationality horizontally and parties vertically.***

In [None]:
# ADD CODE HERE

**7. *What are the median differences between Estonians and Other?***

ADD ANSWER HERE

**8. *Do a ridge plot of `α_nat`.***

In [None]:
# ADD CODE HERE

**9. *Do you spot any difference or new information compared to the medians?***

ADD ANSWER HERE

This is still not the total nationality effect! Let's finally compute it.

**10. *Compute the posterior medians of the total nationality-gender effect and draw them as a heatmap, with gender-nationality horizontally and parties vertically.***

In [None]:
# ADD CODE HERE

**11. *What are the median differences between the combinations?***

ADD ANSWER HERE

Let's look at Estonians.

**12. *Do a forest plot of the total gender effect for Estonians.***

In [None]:
# ADD CODE HERE

**13. *How do Estonian men and women differ? Is this confirming or infirming posterior analyses we've done in the lesson?***

ADD ANSWER HERE

Conversely, if we want to compare nationalities, we need to fix the gender. Let's say we're interested in the differences between Russian and Estonian men.

**14. *Do a forest plot of the total nationality effect for men.***

In [None]:
# ADD CODE HERE

**15. *How do Russian and Estonian men differ?***

ADD ANSWER HERE

## Exercise 4 [Easy]

*Look at the demographics we didn't look at in the previous questions -- i.e the total gender effect for Russians, and the total nationality effect for women. Use the same workflow to anaylize them.*

*How surprised are you by all those parameters? Which questions would you ask to a domain expert, to poke at the model?*

*At this point, it should be clear to you how to do this, so we're not gonna give you the solution here. Rather, go* **post your questions and analysis [in the Discourse](https://community.intuitivebayes.com/) thread "AR Course -- Lesson 6, Exercise 4 Discussion"**, *and talk about it with the other students.*

## Exercise 5 [Hard] 

Remember the lesson? Of course you do, these were fun times! We promised you you'd look into exploring a given stratum of the population instead of the Estonian population as a whole. Well, this time has officially come!

But first, something you haven't seen yet and cannot guess. 

During the lesson, you may have wondered why we passed the index of the observations as the coordinates for the dimension `obs_idx`. Maybe you noticed that we could have set `demographics_order` as the index, and _then_ passed that index as coordinates. And you would be right, that would be an even better way!

The issue is that ArviZ doesn't handle `Multiindex` for now, so we have to add it _after_ backward and forward sampling. This requires some advanced use of xarray, so we're going to show you how to do it. Then, familiarize yourself with the code, and do it yourself at some point. Here we go!

First, we load our Estonian data, already aggregated:

In [40]:
observed_aggregated_answers = pd.read_csv("data/observed_aggregated_answers.csv")

In [42]:
demographics_order = [
    "age_group",
    "education",
    "gender",
    "nationality",
    "electoral_district",
    "unit",
]

In [43]:
# make ArviZ aware of the strata we're gonna select by:

obs_idx_detailed = observed_aggregated_answers.set_index(demographics_order).index
more_coords = {
    dem: ("obs_idx", observed_aggregated_answers[dem]) for dem in demographics_order
}
more_coords["obs_idx"] = obs_idx_detailed

And now let the magic happen. Here is the `posterior_predictive` group before:

In [44]:
idata_multinomial.posterior_predictive

Now we add the new coords (to `constant_data` as well, because we need it for plotting):

In [45]:
idata_multinomial.posterior_predictive = (
    idata_multinomial.posterior_predictive.assign_coords(more_coords)
)
idata_multinomial.constant_data = idata_multinomial.constant_data.assign_coords(
    more_coords
)

And here is the `posterior_predictive` group after:

In [46]:
idata_multinomial.posterior_predictive

Boom!

Now we can go ahead with our exercise.

***1. Compute the posterior retrodictive voting intentions of Estonians in the electoral district "Haabersti, Põhja-Tallinn ja Kristiine".***

In [None]:
# ADD CODE HERE

***2. Compute the observed proportions in this same strata (bonus points if you manage to only use the `InferenceData` object for this...).***

In [None]:
# ADD CODE HERE

***3. Plot the posterior retrodictives with `az.plot_posterior`, and overlay the observations as reference values.***

In [None]:
# ADD CODE HERE

Feel free to post your beautiful original plots in the [Discourse](https://community.intuitivebayes.com/)!

By the way, do you know why we get these weird wiggles for some parties? Of course I'm not gonna tell you 😉 Rather, **head over to the [Discourse](https://community.intuitivebayes.com/)** and tell us what you think in the thread **"AR Course -- Lesson 6, Exercise 5 Discussion"**. Maybe we'll even give you a bonus exercise if you take part in the discussion... See you there!