# Example of `InferenceData` schema in PyStan
The description of the `InferenceData` structure can be found in [GitHub](https://github.com/arviz-devs/arviz/blob/master/schema.md).

In [1]:
import arviz as az
import pystan
import pandas as pd
import numpy as np
import xarray
xarray.set_options(display_style="html")

<xarray.core.options.set_options at 0x7fa7bc115198>

In [4]:
#read data
data = pd.read_csv("linear_regression_data.csv", index_col=0)
time_since_joined = data.time.values
slack_comments = data.comments.values
names = data.index.values
N = len(names)
data

Unnamed: 0,comments,time
Alice,7500,4.5
Bob,10100,6.0
Cole,18600,7.0
Danielle,29200,12.0
Erika,27500,18.0


In [5]:
linreg_code = """
data {
  int<lower=0> N;
  real time_since_joined[N];
  real slack_comments[N];
}

parameters {
  real b0;
  real b1;
  real log_sigma;
}

transformed parameters {
  real<lower=0> sigma;
  sigma = exp(log_sigma);
}

model {
  b0 ~ normal(0,10);
  b1 ~ normal(0,10);
  for (i in 1:N) {
    slack_comments[i] ~ normal(b0 + b1 * time_since_joined[i], sigma);
  }
  
}

generated quantities {
    vector[N] log_lik;
    vector[N] slack_comments_hat;
    for (i in 1:N) {
        log_lik[i] = normal_lpdf(slack_comments[i] | b0 + b1 * time_since_joined[i], sigma);
        slack_comments_hat[i] = normal_rng(b0 + b1 * time_since_joined[i], sigma);
    }
}
"""
linreg_data_dict = {"N": N, "slack_comments": slack_comments, "time_since_joined": time_since_joined}

In [6]:
sm = pystan.StanModel(model_code=linreg_code)

INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_436a47b6d0471232cc823b9d65bd2bd6 NOW.


In [7]:
prior =  sm.sampling(data=linreg_data_dict, iter=200, chains=1, algorithm='Fixed_param', warmup=0)

In [8]:
posterior = sm.sampling(data=linreg_data_dict, iter=200, chains=4)

In [9]:
idata_stan = az.from_pystan(
    posterior=posterior,
    prior=prior,
    posterior_predictive="slack_comments_hat",
    prior_predictive="slack_comments_hat",
    observed_data=["slack_comments"],
    constant_data=["time_since_joined"],
    log_likelihood="log_lik",
    coords={"developer": names},
    dims={
        "slack_comments": ["developer"],
        "log_lik": ["developer"],
        "slack_comments_hat": ["developer"],
        "time_since_joined": ["developer"],
    }
)

In [10]:
idata_stan

Inference data with groups:
	> posterior
	> sample_stats
	> posterior_predictive
	> prior
	> sample_stats_prior
	> prior_predictive
	> observed_data
	> constant_data

In this example, each variable has as dimension a combination of the following 3: `chain`, `draw` and `developer`. Moreover, each dimension has specific coordinate values. In the case of `chain` and `draw` it is an integer identifier starting at `0`; in the case of `developer` dimension, its coordinate values are the following strings: `["Alice", "Bob", "Cole", "Danielle", "Erika"]`.

In [11]:
idata_stan.posterior

In [12]:
idata_stan.sample_stats

In [13]:
idata_stan.posterior_predictive

In [14]:
idata_stan.observed_data

In [15]:
idata_stan.constant_data

In [16]:
idata_stan.prior

In [17]:
idata_stan.sample_stats_prior

In [18]:
idata_stan.prior_predictive