# Inference Data Cookbook
`InferenceData` is the central data format for ArviZ. `InferenceData` itself is just a container that maintains references to one or more `xarray.Dataset`. Below are various ways to generate an `InferenceData` object. See [here](XarrayforArviZ.ipynb) for more on xarray.

In [1]:
import arviz as az
import numpy as np

## From 1d numpy array

In [2]:
size = 100
dataset = az.convert_to_inference_data(np.random.randn(size))
print(dataset)
dataset.posterior

Inference data with groups:
	> posterior


<xarray.Dataset>
Dimensions:  (chain: 1, draw: 100)
Coordinates:
  * chain    (chain) int64 0
  * draw     (draw) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 ...
Data variables:
    x        (chain, draw) float64 -0.1745 -0.4706 -0.2511 -0.9259 1.374 ...
Attributes:
    created_at:  2018-12-07T15:39:51.596588

## From nd numpy array


In [3]:
shape = (1, 2, 3, 4, 5)
dataset = az.convert_to_inference_data(np.random.randn(*shape))
print(dataset)
dataset.posterior

Inference data with groups:
	> posterior


<xarray.Dataset>
Dimensions:  (chain: 1, draw: 2, x_dim_0: 3, x_dim_1: 4, x_dim_2: 5)
Coordinates:
  * chain    (chain) int64 0
  * draw     (draw) int64 0 1
  * x_dim_0  (x_dim_0) int64 0 1 2
  * x_dim_1  (x_dim_1) int64 0 1 2 3
  * x_dim_2  (x_dim_2) int64 0 1 2 3 4
Data variables:
    x        (chain, draw, x_dim_0, x_dim_1, x_dim_2) float64 -1.244 0.418 ...
Attributes:
    created_at:  2018-12-07T15:39:51.621828

## From a dictionary

In [4]:
datadict = {
    'a': np.random.randn(100),
    'b': np.random.randn(1, 100, 10),
    'c': np.random.randn(1, 100, 3, 4),
}
dataset = az.convert_to_inference_data(datadict)
print(dataset)
dataset.posterior

Inference data with groups:
	> posterior


<xarray.Dataset>
Dimensions:  (b_dim_0: 10, c_dim_0: 3, c_dim_1: 4, chain: 1, draw: 100)
Coordinates:
  * chain    (chain) int64 0
  * draw     (draw) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 ...
  * b_dim_0  (b_dim_0) int64 0 1 2 3 4 5 6 7 8 9
  * c_dim_0  (c_dim_0) int64 0 1 2
  * c_dim_1  (c_dim_1) int64 0 1 2 3
Data variables:
    a        (chain, draw) float64 0.7113 2.057 0.03026 -1.872 -1.86 -0.823 ...
    b        (chain, draw, b_dim_0) float64 0.3641 -0.4706 0.9407 -0.05332 ...
    c        (chain, draw, c_dim_0, c_dim_1) float64 -0.307 -1.649 -0.3328 ...
Attributes:
    created_at:  2018-12-07T15:39:51.648277

## From dictionary with coords and dims

In [5]:
datadict = {
    'a': np.random.randn(100),
    'b': np.random.randn(1, 100, 10),
    'c': np.random.randn(1, 100, 3, 4),
}
coords = {'c1' : np.arange(3), 'c2' : np.arange(4), 'b1' : np.arange(10)}
dims = {'b' : ['b1'], 'c' : ['c1', 'c2']}

dataset = az.convert_to_inference_data(datadict, coords=coords, dims=dims)
print(dataset)
dataset.posterior

Inference data with groups:
	> posterior


<xarray.Dataset>
Dimensions:  (b1: 10, c1: 3, c2: 4, chain: 1, draw: 100)
Coordinates:
  * chain    (chain) int64 0
  * draw     (draw) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 ...
  * b1       (b1) int64 0 1 2 3 4 5 6 7 8 9
  * c1       (c1) int64 0 1 2
  * c2       (c2) int64 0 1 2 3
Data variables:
    a        (chain, draw) float64 -1.138 -0.2952 1.293 -0.1876 -0.3133 ...
    b        (chain, draw, b1) float64 -0.01716 -1.441 -0.5538 -0.831 ...
    c        (chain, draw, c1, c2) float64 -0.3915 1.511 -1.506 -1.935 ...
Attributes:
    created_at:  2018-12-07T15:39:51.672175

## From pymc3

In [6]:
import pymc3 as pm
draws = 500
chains = 2

eight_school_data = {'J': 8,
                     'y': np.array([28., 8., -3., 7., -1., 1., 18., 12.]),
                     'sigma': np.array([15., 10., 16., 11., 9., 11., 10., 18.])
                    }

with pm.Model() as model:
    mu = pm.Normal('mu', mu=0, sd=5)
    tau = pm.HalfCauchy('tau', beta=5)
    theta_tilde = pm.Normal('theta_tilde', mu=0, sd=1, shape=eight_school_data['J'])
    theta = pm.Deterministic('theta', mu + tau * theta_tilde)
    pm.Normal('obs', mu=theta, sd=eight_school_data['sigma'], observed=eight_school_data['y'])
    
    trace = pm.sample(draws, chains=chains)
    prior = pm.sample_prior_predictive()
    posterior_predictive = pm.sample_posterior_predictive(trace, 500, model)
    
    data = az.from_pymc3(
            trace=trace,
            prior=prior,
            posterior_predictive=posterior_predictive,
            coords={'school': np.arange(eight_school_data['J'])},
            dims={'theta': ['school'], 'theta_tilde': ['school']},
        )
data

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [theta_tilde, tau, mu]
Sampling 2 chains: 100%|██████████| 2000/2000 [00:01<00:00, 1234.73draws/s]
100%|██████████| 500/500 [00:00<00:00, 2600.93it/s]


Inference data with groups:
	> posterior
	> sample_stats
	> posterior_predictive
	> prior
	> observed_data

## From pystan

In [7]:
import pystan
schools_code = '''
        data {
            int<lower=0> J;
            real y[J];
            real<lower=0> sigma[J];
        }

        parameters {
            real mu;
            real<lower=0> tau;
            real theta_tilde[J];
        }

        transformed parameters {
            real theta[J];
            for (j in 1:J)
                theta[j] = mu + tau * theta_tilde[j];
        }

        model {
            mu ~ normal(0, 5);
            tau ~ cauchy(0, 5);
            theta_tilde ~ normal(0, 1);
            y ~ normal(theta, sigma);
        }

        generated quantities {
            vector[J] log_lik;
            vector[J] y_hat;
            for (j in 1:J) {
                log_lik[j] = normal_lpdf(y[j] | theta[j], sigma[j]);
                y_hat[j] = normal_rng(theta[j], sigma[j]);
            }
        }
    '''
stan_model = pystan.StanModel(model_code=schools_code)
fit = stan_model.sampling(data=eight_school_data,
                          iter=draws,
                          warmup=0,
                          chains=chains)

data = az.from_pystan(fit=fit,
                      posterior_predictive='y_hat',
                      observed_data=['y'],
                      log_likelihood='log_lik',
                      coords={'school': np.arange(eight_school_data['J'])},
                      dims={'theta': ['school'],
                             'y': ['school'],
                             'log_lik': ['school'],
                             'y_hat': ['school'],
                             'theta_tilde': ['school']
                            }
                     )
data

INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_e7aeb8b836685a923269e6171e7377cd NOW.


In file included from /home/osvaldo/anaconda3/lib/python3.6/site-packages/numpy/core/include/numpy/ndarraytypes.h:1816,
                 from /home/osvaldo/anaconda3/lib/python3.6/site-packages/numpy/core/include/numpy/ndarrayobject.h:18,
                 from /home/osvaldo/anaconda3/lib/python3.6/site-packages/numpy/core/include/numpy/arrayobject.h:4,
                 from /tmp/tmpwis2b4wg/stanfit4anon_model_e7aeb8b836685a923269e6171e7377cd_2808523857479010845.cpp:642:
  ^~~~~~~
In file included from /home/osvaldo/anaconda3/lib/python3.6/site-packages/pystan/stan/lib/stan_math/lib/boost_1.66.0/boost/mpl/aux_/na_assert.hpp:23,
                 from /home/osvaldo/anaconda3/lib/python3.6/site-packages/pystan/stan/lib/stan_math/lib/boost_1.66.0/boost/mpl/arg.hpp:25,
                 from /home/osvaldo/anaconda3/lib/python3.6/site-packages/pystan/stan/lib/stan_math/lib/boost_1.66.0/boost/mpl/placeholders.hpp:24,
                 from /home/osvaldo/anaconda3/lib/python3.6/site-packages/pyst



TypeError: from_pystan() got an unexpected keyword argument 'fit'

## From pyro

In [None]:
import torch 

import pyro
import pyro.distributions as dist
import pyro.poutine as poutine
from pyro.infer.mcmc import MCMC, NUTS

pyro.enable_validation(True)
pyro.set_rng_seed(0)

draws = 1000
warmup_steps = 0
eight_school_data = {'J' : 8,
                     'y' : torch.tensor([28,  8, -3,  7, -1,  1, 18, 12]).type(torch.Tensor),
                     'sigma' : torch.tensor([15, 10, 16, 11, 9, 11, 10, 18]).type(torch.Tensor)
                    }


def model(sigma):
    eta = pyro.sample('eta', dist.Normal(torch.zeros(eight_school_data['J']), torch.ones(eight_school_data['J'])))
    mu = pyro.sample('mu', dist.Normal(torch.zeros(1), 10 * torch.ones(1)))
    tau = pyro.sample('tau', dist.HalfCauchy(scale=25 * torch.ones(1)))

    theta = mu + tau * eta

    return pyro.sample("obs", dist.Normal(theta, sigma))


def conditioned_model(model, sigma, y):
    return poutine.condition(model, data={"obs": y})(sigma)



nuts_kernel = NUTS(conditioned_model, adapt_step_size=True)
posterior = MCMC(nuts_kernel,
                 num_samples=draws,
                 warmup_steps=warmup_steps).run(model, eight_school_data['sigma'], eight_school_data['y'])

pyro_data = az.from_pyro(posterior)
pyro_data

## From emcee

In [None]:
import emcee

eight_school_data = {'J': 8,
                     'y': np.array([28., 8., -3., 7., -1., 1., 18., 12.]),
                     'sigma': np.array([15., 10., 16., 11., 9., 11., 10., 18.])
                    }

def log_prior_8school(theta,J):
    mu = theta[0]
    tau = theta[1]
    eta = theta[2:]
    # Half-cauchy prior
    if tau<0:
        return -np.inf
    hwhm = 25
    prior_tau = -np.log(tau**2+hwhm**2)
    prior_mu = -(mu/10)**2  # normal prior, loc=0, scale=10
    prior_eta = -np.sum(eta**2)  # normal prior, loc=0, scale=1
    return prior_mu + prior_tau + prior_eta
    
def log_likelihood_8school(theta,y,sigma):
    mu = theta[0]
    tau = theta[1]
    eta = theta[2:]
    return -np.sum(((mu + tau * eta - y) / sigma)**2)
    
def lnprob_8school(theta,J,y,sigma):
    prior = log_prior_8school(theta,J)
    if prior <= -np.inf:
        return -np.inf
    like = log_likelihood_8school(theta,y,sigma)
    return like+prior

nwalkers = 40
ndim = eight_school_data['J']+2
draws = 1500
pos = np.random.normal(size=(nwalkers,ndim))
pos[:,1] = np.absolute(pos[:,1])
sampler = emcee.EnsembleSampler(nwalkers, 
                                ndim, 
                                lnprob_8school, 
                                args=(eight_school_data['J'], 
                                      eight_school_data['y'], 
                                      eight_school_data['sigma']
                                     )
                               )
sampler.run_mcmc(pos, draws)

# define variable names, it cannot be inferred from emcee
var_names = ['mu','tau']+['eta{}'.format(i) for i in range(eight_school_data['J'])]
emcee_data = az.from_emcee(sampler, var_names = var_names)
emcee_data

## From cmdstan
See [from_cmdstan](https://arviz-devs.github.io/arviz/generated/arviz.from_cmdstan.html#arviz.from_cmdstan) for details. Cookbook documentation coming soon.