(pymc3_schema)=
# Example of `InferenceData` schema in PyMC3

The description of the `InferenceData` structure can be found {ref}`here <schema>`.

In [1]:
import arviz as az
import pymc3 as pm
import pandas as pd
import numpy as np
import xarray
xarray.set_options(display_style="html");

In [2]:
#read data
data = pd.read_csv("linear_regression_data.csv", index_col=0)
time = data.time.values
slack_comments = data.comments.values
github_commits = data.commits.values
names = data.index.values
N = len(names)
data

Unnamed: 0,comments,commits,time
Alice,7500,25,4.5
Bob,10100,32,6.0
Cole,18600,49,7.0
Danielle,25200,66,12.0
Erika,27500,96,18.0


In [3]:
# data for out of sample predictions
candidate_devs = ["Francis", "Gerard"]
candidate_devs_time = np.array([3.6, 5.1])

In [4]:
dims={
    "slack_comments": ["developer"],
    "github_commits": ["developer"],
    "time_since_joined": ["developer"],
}
with pm.Model() as model:
    time_since_joined = pm.Data("time_since_joined", time)
    
    b_sigma = pm.HalfNormal('b_sigma', sd=300)
    c_sigma = pm.HalfNormal('c_sigma', sd=6)
    b0 = pm.Normal("b0", mu=0, sd=200)
    b1 = pm.Normal("b1", mu=0, sd=200)
    c0 = pm.Normal("c0", mu=0, sd=10)
    c1 = pm.Normal("c1", mu=0, sd=10)
    
    pm.Normal("slack_comments", mu=b0 + b1 * time_since_joined, sigma=b_sigma, observed=slack_comments)
    pm.Normal("github_commits", mu=c0 + c1 * time_since_joined, sigma=c_sigma, observed=github_commits)
    
    trace = pm.sample(400, chains=4)
    posterior_predictive = pm.sample_posterior_predictive(trace)
    prior = pm.sample_prior_predictive(150)
    idata_pymc3 = az.from_pymc3(
        trace,
        prior=prior,
        posterior_predictive=posterior_predictive,
        coords={"developer": names},
        dims=dims
    )

Only 400 samples in chain.
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [c1, c0, b1, b0, c_sigma, b_sigma]


There was 1 divergence after tuning. Increase `target_accept` or reparameterize.
There was 1 divergence after tuning. Increase `target_accept` or reparameterize.


In [5]:
dims_pred={
    "slack_comments": ["candidate developer"],
    "github_commits": ["candidate developer"],
    "time_since_joined": ["candidate developer"],
}
with model:
    pm.set_data({"time_since_joined": candidate_devs_time})
    predictions = pm.sample_posterior_predictive(trace)
    az.from_pymc3_predictions(
        predictions, 
        idata_orig=idata_pymc3, 
        inplace=True,
        coords={"candidate developer": candidate_devs},
        dims=dims_pred,
    )

In [6]:
idata_pymc3

Inference data with groups:
	> posterior
	> posterior_predictive
	> sample_stats
	> prior
	> prior_predictive
	> observed_data
	> log_likelihood
	> constant_data
	> predictions
	> predictions_constant_data

In this example, each variable has as dimension a combination of the following 3: `chain`, `draw` and `developer`. Moreover, each dimension has specific coordinate values. In the case of `chain` and `draw` it is an integer identifier starting at `0`; in the case of `developer` dimension, its coordinate values are the following strings: `["Alice", "Bob", "Cole", "Danielle", "Erika"]`.

In [7]:
idata_pymc3.posterior

In [8]:
idata_pymc3.sample_stats

In [9]:
idata_pymc3.log_likelihood

In [10]:
idata_pymc3.posterior_predictive

In [11]:
idata_pymc3.observed_data

In [12]:
idata_pymc3.constant_data

In [13]:
idata_pymc3.prior

In [14]:
idata_pymc3.predictions

In [15]:
idata_pymc3.predictions_constant_data