# Section 2.2 Data Structure Activities

In [1]:
import os
import pymc3 as pm
import arviz as az
import pandas as pd
import numpy as np

if os.path.split(os.getcwd())[-1] != "notebooks":
    os.chdir(os.path.join(".."))
    
np.random.seed(0)



## Reproducing the Planet Experiment
More good news! Your astronomical discovery, from Section 1.2, has been published, but now people want to you to share your data and results. They also are asking for help getting seeing portions of your analysis runs to inspect in greater detail.

### Exercise 1 
Your favorite PPL is PyMC3, but it turns out your peer reviewer likes Stan. In an alternate universe your favorite PPL is stan, but now your peer reviewer is a PyMC3 gal. Here we introduce the *Law of researcher PPL choice*  

$$P(\text{Your friends uses another PPL} | \text{You choice of PPL}) = 1$$


**How can we use ArviZ, Xarray, and NetCDF to share results in a common way?**  
Note: We encourage you to use whatever PPL you prefer . These docs may come in helpful  
https://arviz-devs.github.io/arviz/api.html#data

#### Step 1: Define your model and generate results

In [2]:
observations = [0, 0, 1, 0, 1]
water_observations = sum(observations)
total_observations = len(observations)

In [3]:
with pm.Model() as planetmodel:
    p_water = pm.Uniform("p", 0 ,1)
    w = pm.Binomial("w", p=p_water, n=total_observations, observed=water_observations)
    trace = pm.sample(5000, chains=2)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Sequential sampling (2 chains in 1 job)
NUTS: [p]
100%|██████████| 5500/5500 [00:08<00:00, 617.62it/s]
100%|██████████| 5500/5500 [00:05<00:00, 933.86it/s]


#### Step 2: Convert model results from PPL to Az.InferenceData

In [4]:
water_data = az.from_pymc3(trace=trace)

ValueError: zero-dimensional arrays cannot be concatenated

#### Step 3: Inspect InferenceData to see what groups exist

In [None]:
water_data

#### Step 4: Inspect Posterior group to verify variables count, chain count, and draw count

In [None]:
water_data.posterior

#### Step 3: Save your model to disk

In [None]:
water_data.to_netcdf("WaterResults.nc")

### Exercise 2
You've been asked to peer review a study on radon levels in Minnesota basements. The dataset is available as part of ArviZ's remote datasets. You've been asked to do a couple things.

#### Step 1: Load the NetCDF file into python memory
*Note*: In ArviZ there are some preloaded datasets. Radon is one of those. It can be downloaded with the following command

In [None]:
radon_data = az.load_arviz_data(dataset="radon")

#### Step 2: List all the groups
See what analysis your colleague has already run by checking the groups present in the InferenceData object

In [None]:
radon_data

#### Step 3: Count the number of counties included in radon study
How many counties were included in the observed_data?
Hint: xarray has a `.to_dataframe()` method

In [None]:
radon_data.observed_data

#### How many variables are in Bayesian model?
Inspect the posterior xarray dataset and get a list of all variables in the model.

In [None]:
radon_data.posterior

#### Step 4: Select first 10 values of chain 2 for sigma_y in the posterior
Using the `.sel` method get the first ten values 

In [None]:
radon_data.posterior.sel(chain=[2], draw=slice(0,10))["sigma_y"]