# Example of `InferenceData` schema in PyMC3
The description of the `InferenceData` structure can be found in [GitHub](https://github.com/arviz-devs/arviz/blob/master/schema.md).

In [1]:
import arviz as az
import pymc3 as pm
import pandas as pd
import numpy as np
import xarray
# xarray.set_options(display_style="html");

In [2]:
#read data
data = pd.read_csv("linear_regression_data.csv", index_col=0)
time = data.time.values
slack_comments = data.comments.values
names = data.index.values
N = len(names)
data

Unnamed: 0,comments,time
Alice,7500,4.5
Bob,10100,6.0
Cole,18600,7.0
Danielle,25200,12.0
Erika,27500,18.0


In [3]:
with pm.Model() as model:
    time_since_joined = pm.Data("time_since_joined", time)
    
    sigma = pm.HalfNormal('sigma', sd=30)
    b0 = pm.Normal("b0", mu=0, sd=10)
    b1 = pm.Normal("b1", mu=0, sd=10)
    
    pm.Normal("slack_comments", mu=b0 + b1 * time_since_joined, sigma=sigma, observed=slack_comments)
    
    trace = pm.sample(200, chains=4)
    posterior_predictive = pm.sample_posterior_predictive(trace)
    prior = pm.sample_prior_predictive(150)

Only 200 samples in chain.
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [b1, b0, sigma]
Sampling 4 chains, 0 divergences: 100%|██████████| 2800/2800 [00:03<00:00, 834.18draws/s] 
The acceptance probability does not match the target. It is 0.9387873794290098, but should be close to 0.8. Try to increase the number of tuning steps.
The acceptance probability does not match the target. It is 0.8838650901566877, but should be close to 0.8. Try to increase the number of tuning steps.
The acceptance probability does not match the target. It is 0.8858350677206992, but should be close to 0.8. Try to increase the number of tuning steps.
The acceptance probability does not match the target. It is 0.9086067598818758, but should be close to 0.8. Try to increase the number of tuning steps.
100%|██████████| 800/800 [00:00<00:00, 1336.19it/s]


In [4]:
idata_pymc3 = az.from_pymc3(
    trace,
    prior=prior,
    posterior_predictive=posterior_predictive,
    coords={"developer": names},
    dims={
        "slack_comments": ["developer"],
        "log_likelihood": ["developer"],
        "time_since_joined": ["developer"],
    }
)

In [5]:
idata_pymc3

Inference data with groups:
	> posterior
	> sample_stats
	> posterior_predictive
	> prior
	> observed_data
	> constant_data

In this example, each variable has as dimension a combination of the following 3: `chain`, `draw` and `developer`. Moreover, each dimension has specific coordinate values. In the case of `chain` and `draw` it is an integer identifier starting at `0`; in the case of `developer` dimension, its coordinate values are the following strings: `["Alice", "Bob", "Cole", "Danielle", "Erika"]`.

In [6]:
idata_pymc3.posterior

In [7]:
idata_pymc3.sample_stats

In [8]:
idata_pymc3.posterior_predictive

In [9]:
idata_pymc3.observed_data

In [10]:
idata_pymc3.constant_data

In [11]:
idata_pymc3.prior