<h1 align='center'> Curso de Estadística Bayesiana<br> Ejemplo 1 de Stan: eight schools</h1> 

<h3>Autor</h3>

1. Alvaro Mauricio Montenegro Díaz, ammontenegrod@unal.edu.co
2. Daniel Mauricio Montenegro Reyes, dextronomo@gmail.com 

<h3>Fork</h3>

<h3>Referencias</h3>


1. Gelman, A., Carlin, J., Stern, H., and Rubin, D. *Bayesian Data Analysis*, Chapman & Hall/CRC, 2000
2. https://statmodeling.stat.columbia.edu/2014/01/21/everything-need-know-bayesian-statistics-learned-eight-schools/

<h2> 1. Introducción</h2>

Se introduce en este cuaderno un ejemplo clásico utilizado en la enseñanza de la Estadística Bayesiana. En particular el ejemplo ha sido usado por Gelman y aparece en los manuales de usuario de Stan.

<h2> 2. Un primer modelo general</h2>



<h3> Modelo estadístico</h3>


Supongamos que tenemos observaciones $[y_n |x_n], n=1,\ldots,N$, y asumamos el modelo Bayesiano

$$
\begin{align}
y_n &\sim \mathcal{N}(\alpha + \beta x_n, \sigma^2),\quad n=1,\ldots,N\\
\alpha &\sim \mathcal{N}(0, 100)\\
\beta &\sim \mathcal{N}(0, 100)\\
\sigma &\sim \mathcal{C}auchy(0, 25) 1_{\sigma>0}\\
\end{align}
$$


Discuta con sus compañeros las distribuciones a priori.

<h3> Primera implementación en Stan</h3>

Entonces, el modelo  se escribe es Stan de la siguiente manera (en Stan no debe dar las varianzas sino las desviaciones estándar):

data { <br>
int<lower=0> N;<br>
vector[N] y;<br>
vector[N] x;<br>
}<br>


parameters {<br>
real alpha;<br>
real beta;<br>
real<lower=0> sigma;<br>
}<br>


model {<br>
alpha ~ normal(0,10);<br>
beta ~ normal(0,10);<br>
sigma ~ cauchy(0,5);<br>
for (n in 1:N)<br>
   y[n] ~ normal(alpha + beta * x[n], sigma);<br>
}<br>

<h2> 3. Ejemplo 1. Ocho escuelas (Eight Schools)</h2>

El ejemplo de "ocho escuelas" aparece en la Sección 5.5 de Gelman et al. (2003), en donde se estudian los efectos del entrenamiento de ocho escuelas.

1. Los datos se refieren al estudio del Educational Testing Service para analizar el efecto del entrenamiento.

2. Los datos provienen del examen  SAT-V en ocho escuelas secundarias

3.  No hay razón previa para creer que algún programa de nentranamiento fue:

- más efectivo que los demás

- más similar a otros




<h3>  Los datos</h3>

Los datos corresponden a la estimación (estandarizada) de un puntaje realizada en las ocho escuelas observadas.



|School |Estim. Treatment Effect | Estim. Stand. Error |
|---|---|---|
|A |28| 15|
|B |8 |10|
|C| -3| 16|
|D| 7| 11|
|E| -1 |9|
|F| 1| 11|
|G |18| 10|
|H| 12| 18|

In [1]:
# import the requiered objects
import pystan

In [2]:
# the data
schools_dat = {'J': 8,
               'y': [28,  8, -3,  7, -1,  1, 18, 12],
               'sigma': [15, 10, 16, 11,  9, 11, 10, 18]}
schools_dat 

{'J': 8,
 'y': [28, 8, -3, 7, -1, 1, 18, 12],
 'sigma': [15, 10, 16, 11, 9, 11, 10, 18]}

<h2>5. Primer modelo </h2>

Eight Schools: No Pooling

- Cada escuela es tratada individualmente. A prioris impropias. Cada media d elas escuelas proviene de una distribución diferente.

$$
\begin{equation}
\begin{split}
y_i &\sim \mathcal{N}(\theta_i,\sigma_i^2), \text{ known } \sigma_i^2\\
\theta_i &\propto 1
\end{split}
\end{equation}
$$

In [3]:
print(type(schools_dat))
print(schools_dat['y'])

<class 'dict'>
[28, 8, -3, 7, -1, 1, 18, 12]


In [4]:
schools_code_01 ="""

data {
int<lower=0> J; // # schools
real y[J]; // estimated treatment
real<lower=0> sigma[J]; // std err of effect
}

parameters {
real theta[J]; // school effect
}

model {
y ~ normal(theta, sigma);
}
"""

In [5]:
# compile the  model
sm = pystan.StanModel(model_code=schools_code_01)

INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_2097d3b0693259b17df3f8156052bac2 NOW.


In [6]:
# extract the samples
fit_01 = sm.sampling(data=schools_dat, iter=1000, chains=4)

In [7]:
fit_01

Inference for Stan model: anon_model_2097d3b0693259b17df3f8156052bac2.
4 chains, each with iter=1000; warmup=500; thin=1; 
post-warmup draws per chain=500, total post-warmup draws=2000.

           mean se_mean     sd   2.5%    25%    50%    75%  97.5%  n_eff   Rhat
theta[1]  27.86    0.27  14.46  -1.98  18.78  27.95  37.48  55.61   2857    1.0
theta[2]   8.22    0.18   10.3 -12.34   1.31   8.33  15.22  28.24   3345    1.0
theta[3]  -3.31    0.28  15.67 -33.33 -14.06  -3.26   7.34  27.41   3031    1.0
theta[4]   7.17    0.22   11.0  -14.0  -0.48    6.9  15.06  29.03   2498    1.0
theta[5]  -0.72    0.16    8.8 -18.39  -6.45  -0.87   4.86  16.89   3084    1.0
theta[6]   1.22    0.21   10.9 -20.36  -6.59   1.43    8.9  22.26   2795    1.0
theta[7]  17.83    0.18   9.72  -1.44  11.53  17.93  24.16  37.04   2906    1.0
theta[8]  12.73     0.3  17.78 -23.11   0.75   12.7  24.85   47.0   3448    1.0
lp__       -3.9    0.07   1.96  -8.47  -4.98  -3.57  -2.47   -1.0    854    1.0

Samples were

<h3>6. Segundo Modelo </h3>

Eight Schools: Complete Poolin. Todas la medias provienen de una única distribución.

- Todas las escuelas tomadas conjuntamente.

$$
y_i \sim \mathcal{N}(\theta,\sigma_i^2), \text{ known } \sigma_i^2
$$

In [26]:
schools_code_02 ="""

data {
int<lower=0> J; // # schools
real y[J]; // estimated treatment
real<lower=0> sigma[J]; // std err of effect
}

parameters {
real theta; // pooled school effect
}

model {
y ~ normal(theta, sigma);
}
"""

In [27]:
# compile the  model
sm_02 = pystan.StanModel(model_code=schools_code_02)

INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_e0b1291c3bc60b4cb98f20fa1a861260 NOW.


In [28]:
# extract the samples
fit_02 = sm_02.sampling(data=schools_dat, iter=1000, chains=4)

In [29]:
fit_02

Inference for Stan model: anon_model_e0b1291c3bc60b4cb98f20fa1a861260.
4 chains, each with iter=1000; warmup=500; thin=1; 
post-warmup draws per chain=500, total post-warmup draws=2000.

        mean se_mean     sd   2.5%    25%    50%    75%  97.5%  n_eff   Rhat
theta   7.87    0.14   3.93   0.18   5.22    7.8  10.47  15.84    810    1.0
lp__   -2.82    0.02   0.66  -4.72  -2.95  -2.57   -2.4  -2.35    842    1.0

Samples were drawn using NUTS at Tue Sep 24 13:07:55 2019.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

<h3> 7. Tercer Modelo </h3>

Eight Schools: Partial Pooling. Estimamos una media global y ahora cada media de las escuelas es una muestra de otra distribución  global.
    
- Estima una media global $\mu$,la cual es un hiperparámetro  para el modelo y fijamos  otro hiperparámetro $\tau = 25$ para esa distribución.


$$
\begin{align}
y_i &\sim \mathcal{N}(\theta_i,\sigma_i^2), \text{ known } \sigma_i^2\\
\theta_i &\sim \mathcal{N}(\mu, \tau^2) , \text{ known } \tau^2
\end{align}
$$

Así hemos supuesto que hay una media global $\mu$ de tal manera que las medias $\theta_i$ de las escuelas son generadas a partir de una distribución normal con media $\mu$.

In [32]:
schools_code_03 ="""

data {
int<lower=0> J; // # schools
real y[J]; // estimated treatment
real<lower=0> sigma[J]; // std err of effect
real<lower=0> tau; // variance between schools
}

parameters {
real theta[J]; // school effect
real mu; // mean for schools
}

model {
theta ~ normal(mu, tau);
y ~ normal(theta, sigma);
}
"""


In [34]:
# compile the  model
sm_03 = pystan.StanModel(model_code=schools_code_03)

INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_da44632f3f0b4c27764c2d4b292aeca0 NOW.


In [36]:
# the data
schools_dat_03 = {'J': 8,
               'y': [28,  8, -3,  7, -1,  1, 18, 12],
               'sigma': [15, 10, 16, 11,  9, 11, 10, 18],
               'tau': 25}

In [37]:
# extract the samples
fit_03 = sm_03.sampling(data=schools_dat_03, iter=1000, chains=4)

In [38]:
fit_03

Inference for Stan model: anon_model_da44632f3f0b4c27764c2d4b292aeca0.
4 chains, each with iter=1000; warmup=500; thin=1; 
post-warmup draws per chain=500, total post-warmup draws=2000.

           mean se_mean     sd   2.5%    25%    50%    75%  97.5%  n_eff   Rhat
theta[1]  22.64    0.22  13.05  -3.16  13.41  22.86  31.46  47.45   3410    1.0
theta[2]   8.12    0.16   9.57 -10.51   1.56   8.22  14.69  26.11   3610    1.0
theta[3]   0.33    0.23  14.04 -27.41  -8.92   0.04  10.17  28.17   3781    1.0
theta[4]   7.09    0.16  10.06 -12.77   0.31   7.02  13.87  27.05   4011    1.0
theta[5]   0.23    0.13   8.29 -16.21  -5.15   0.31   5.51  16.74   4255    1.0
theta[6]   2.37    0.18  10.42 -18.19  -4.42   2.34   9.32  23.37   3417    1.0
theta[7]  16.52    0.15   8.77  -0.91   11.0  16.31  22.42  33.87   3413    1.0
theta[8]  10.74    0.22  14.63  -17.9   0.82  10.32  20.93   39.4   4488    1.0
mu         8.77    0.18   9.83  -9.94   2.08   8.58   15.1  28.02   2918    1.0
lp__      -4.

<h3>8. Cuarto modelo</h3>

Eight Schools: modelo jerárquico completo
    
• Estima los  hiperparametros $\mu$ and $\tau$




$$
\begin{align}
y_i &\sim \mathcal{N}(\theta_i,\sigma_i^2), \text{ known } \sigma_i^2\\
\theta_i &\sim \mathcal{N}(\mu, \tau)
\end{align}
$$

In [39]:
schools_code_04 ="""

data {
int<lower=0> J; // # schools
real y[J]; // estimated treatment
real<lower=0> sigma[J]; // std err of effect
}

parameters {
real theta[J]; // school effect
real mu; // mean for schools
real<lower=0> tau; // variance between schools
}

model {
theta ~ normal(mu, tau);
y ~ normal(theta, sigma);
}
"""


In [40]:
# compile the  model
sm_04 = pystan.StanModel(model_code=schools_code_04)

INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_da7e6426ef8193e0aa34cb2b581c6be9 NOW.


In [41]:
# extract the samples
fit_04 = sm_04.sampling(data=schools_dat, iter=1000, chains=4)



In [42]:
fit_04

Inference for Stan model: anon_model_da7e6426ef8193e0aa34cb2b581c6be9.
4 chains, each with iter=1000; warmup=500; thin=1; 
post-warmup draws per chain=500, total post-warmup draws=2000.

           mean se_mean     sd   2.5%    25%    50%    75%  97.5%  n_eff   Rhat
theta[1]  12.21    0.37   8.98  -1.97   6.35  11.05  16.44  33.94    582   1.01
theta[2]   7.86    0.21   6.55  -5.73   4.05   7.72  11.96  20.75   1008    1.0
theta[3]   5.91    0.26    8.4 -12.71   1.45   6.45  11.07  22.01   1051    1.0
theta[4]    7.9    0.18   6.45  -5.84    4.0   7.63  11.83  21.13   1222    1.0
theta[5]   4.82    0.24   6.41  -9.16   1.05   5.31   9.24  16.32    697   1.01
theta[6]   5.81    0.23   6.88  -8.67   1.68   5.86  10.44  19.01    908    1.0
theta[7]  11.23    0.33   6.71  -0.43    6.5  10.58  15.27  25.82    415   1.01
theta[8]   8.37    0.29    8.1  -7.68   3.46   7.86  12.81  25.76    804    1.0
mu         8.03     0.2   4.99  -1.55   4.81   7.93  11.23  18.06    643   1.01
tau        7.

<h2>9. Quinto modelo </h2>

Modelo con efectos aleatorios
    
• Estima los hiperparámetros $\mu$ and $\tau$

• Predice los efectos aleatorios $\eta_i$


$$
\begin{align}
y_i &\sim \mathcal{N}(\theta_i,\sigma_i^2), \text{ known } \sigma_i^2\\
\theta_i & = \mu + \tau \times \eta_i \\
\eta_i &\sim \mathcal{N}(0, 1)
\end{align}
$$

In [45]:
schools_code_05 = """

data {
    int<lower=0> J; // number of schools
    vector[J] y; // estimated treatment effects
    vector<lower=0>[J] sigma; // s.e. of effect estimates
}

parameters {
    real mu;
    real<lower=0> tau;
    vector[J] eta;
}

transformed parameters {
    vector[J] theta;
    theta = mu + tau * eta;
}
model {
    eta ~ normal(0, 1);
    y ~ normal(theta, sigma);
}
"""


In [46]:
# compile the  model
sm_05 = pystan.StanModel(model_code=schools_code_05)

INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_2b5a7aab8238855033ba396e2bf4cff4 NOW.


In [47]:
# extract the samples
fit_05 = sm_05.sampling(data=schools_dat, iter=1000, chains=4)



In [48]:
fit_05

Inference for Stan model: anon_model_2b5a7aab8238855033ba396e2bf4cff4.
4 chains, each with iter=1000; warmup=500; thin=1; 
post-warmup draws per chain=500, total post-warmup draws=2000.

           mean se_mean     sd   2.5%    25%    50%    75%  97.5%  n_eff   Rhat
mu         7.57    0.25   5.26  -3.14   4.41    7.7  10.96  17.03    449   1.02
tau         6.8    0.47   6.64    0.2   2.35   5.02    9.0  25.03    196   1.02
eta[1]     0.37    0.02   0.94  -1.58  -0.22   0.43   1.01   2.13   2050    1.0
eta[2]   8.9e-3    0.02    0.9   -1.9  -0.54 7.1e-3    0.6   1.74   2083    1.0
eta[3]    -0.17    0.02   0.91  -1.97  -0.76  -0.18   0.43   1.61   2326    1.0
eta[4]    -0.03    0.02   0.92  -1.77  -0.67  -0.03   0.57   1.84   2062    1.0
eta[5]    -0.34    0.02    0.9  -2.13  -0.94  -0.33   0.23   1.45   1670    1.0
eta[6]    -0.19    0.02    0.9  -1.94  -0.81  -0.19   0.38   1.63   2046    1.0
eta[7]     0.36    0.02   0.84  -1.36  -0.17   0.38   0.89   2.03   2386    1.0
eta[8]     0.

<h2> 10. Tarea </h2>

Reescriba la tarea del [cuaderno](./EAP_Interpretacion.ipynb) utilizando Stan y lo aprendido en este cuaderno.