# Eight Schools problem (Alpha)

## Cultural background
American High School students (typically ages 13-18) typically take a test called the Scholastic Aptitude Test (SAT) which measures their ability math and verbal skills for college admissions. A historical version of the test scored students from ranges 200 to 800. The test is designed to be a "true" measure of mental aptitude but a question constantly gets asked whether students can be coached to perform better on the test. If students who who get coached perform better then the test is a poor measure of innate performance, and more so a test of who had access to coaching. This is an important question because it means students with access to greater wealth, who can purchase and attend training courses, could outperform their poorer peers reinforcing an economic gap. 

## Data Background
In experimental terms the question *Does coaching have an effect on SAT Scores?* was the focus of a study by [Alderman Etal](https://journals.sagepub.com/doi/abs/10.3102/00028312017002239). The study was published in 1980, and of particular interest to us is the dataset below.


In [1]:
eight_schools_data = {
        "J": 8,
        "y": [28.0, 8.0, -3.0, 7.0, -1.0, 1.0, 18.0, 12.0],
        "sigma": [15.0, 10.0, 16.0, 11.0, 9.0, 11.0, 10.0, 18.0],
    }

In the study the some students wrre given coaching for the SAT Verbal test. The resulting effect on scores was estimated by normal distribution with the parameters __y__ and **sigma**. For example in the first school the estimated effect of coaching is a normal distribution with _28 point mean_ and *15 point standard deviation*. By glancing at the data it's unclear whether coaching has an effect, some schools saw an increase for scores on average, but other saw a decrease. In addition some schools saw an increase on average but the estimated standard deviation of the distribution is so large it's unclear if it was just by chance.

#### Arviz Note
In the original study the schools were unnamed. However in ArviZ the schools were assigned names from the [Eight Schools Association](https://en.wikipedia.org/wiki/Eight_Schools_Association). While the verbiage *Eight Schools* is the same, the study, and the school association, have no known relation to each other

## Bayesian Interest
While the dataset is relatively simple, with only 16 data points, estimating the effect of treatment remains subtely challenging. Even after 30 years the problem continues to be referenced in discussions.

### Original Paper
The original paper is titled *Estimation in Parallel Randomized Experiments* and is written by Donald Rubin in 1981. 

### Examples of Model Formulation
Gelman etal use the Eight Schools data as a motivating example of model formulation (Pooled vs unpooled vs hierarchical) in their textbook *Bayesian Data Analysis*. The example can be found in any edition of the book.

### Centered vs Non Centered approach
Michael Betancourt uses the eight schools data to showcase how seemingly subtle changes in model definition can adversely affect the efficiency of gradient based samplers. The [original discussion](http://mc-stan.org/users/documentation/case-studies/divergences_and_bias.html) is written using the Stan language. A [PyMC3 port of the discussion](https://docs.pymc.io/notebooks/Diagnosing_biased_Inference_with_Divergences.html) is available as well

### Everything I need to know about Bayesian statistics, I learned in Eight Schools
One of Andrew Gelman's friends [wrote a non "mathy" blog post](https://andrewgelman.com/2014/01/21/everything-need-know-bayesian-statistics-learned-eight-schools/) about how the Eight Schools problem showcases the usefulness of Bayesian methods. The author of this notebook found the discussion interesting because it provides an approachable overview of the motivation of Bayesian methods, even for non technical readers.

## Eight Schools in ArviZ
Completed inference runs come preloaded in ArviZ. The `az.InferenceData` objects can be loaded using the `az.load_arviz_data` method. 

In [5]:
import arviz as az
eight_schools_noncentered = az.load_arviz_data("non_centered_eight")
eight_schools_centered = az.load_arviz_data("centered_eight")

In [13]:
eight_schools_centered

Inference data with groups:
	> posterior
	> sample_stats
	> posterior_predictive
	> prior
	> observed_data