In [20]:
import arviz as az
import numpy as np
import pymc as pm
from pymc.math import dot, stack, concatenate

%load_ext watermark
%watermark --iversions

The watermark extension is already loaded. To reload it, use:
  %reload_ext watermark
pymc : 4.0.0b4
arviz: 0.11.4
numpy: 1.22.3



# Rats 

I'm just going to do ratsignorable2.odc for now since it is relevant for HW6. Eventually I'll add the other examples.

Adapted from [Codes for Unit 8: ratsignorable2.odc](https://www2.isye.gatech.edu/isye6420/supporting.html).

Associated lecture video: [Unit 8 Lesson 2](https://www.youtube.com/watch?v=T5vkLsIs3f8&list=PLv0FeK5oXK4l-RdT6DWJj0_upJOG2WKNO&index=83).

Data can be found [here](https://raw.githubusercontent.com/areding/6420-pymc/main/data/rats.txt).

We had a previous example about [Dugongs](https://areding.github.io/6420-pymc/Unit6-dugongs.html) that dealt with missing data in the observed data (y values). This example shows how to deal with missing data in the input data (x). It's still pretty easy. You could look at it like creating another likelihood in the model, a very simple one where the observed data is x, and you use a single distribution to fill in the missing values (see ```x_imputed``` in the model below).

```{warning}
This version of the model is not running!

I'm having trouble figuring out how to translate the professor's model into PyMC. If anyone gets it working, let me know. That said, the imputation parts are correct, the important thing is creating the masked data arrays and then choosing a good prior for x_imputed.
```

I hope it gives you an idea for how to handle the missing data question on HW6, which I have confirmed works well in PyMC.

Original paper [here.](https://www.jstor.org/stable/pdf/2289594.pdf)

In [81]:
# trying to figure out how to get the shapes to work recreating ratsignoreable1.odc
# it looks like there are 30 alphas and taus in his model! 
x = np.array([8.0, 15.0, 22.0, np.nan, 36.0]).reshape(-1, 1)
x = np.repeat(x, 30, axis=1)
y = np.loadtxt("./data/rats.txt")

In [83]:
# create masked data
y = y.copy()
y = np.nan_to_num(y, nan=-1)
y = np.ma.masked_values(y, value=-1)

x = x.copy()
x = np.nan_to_num(x, nan=-1)
x = np.ma.masked_values(x, value=-1)

y.shape, x.T.shape

((30, 5), (30, 5))

In [92]:
with pm.Model() as m:
    alpha_c = pm.Normal("alpha_c", 0, tau=1e-6)
    alpha_tau = pm.Gamma("alpha_tau", .001, .001)
    beta_c = pm.Normal("beta.c", 0, tau=1e-6)
    beta_tau = pm.Gamma("beta_tau", .001, .001)
    
    alpha = pm.Normal("alpha", alpha_c, tau=alpha_tau, shape=(30,1))
    beta = pm.Normal("beta", beta_c, tau=beta_tau, shape=(30,1))
    likelihood_tau = pm.Gamma("likelihood_tau", .001, .001)

    # This line is important for the homework!
    x_imputed = pm.TruncatedNormal("x_imputed", mu=20, sigma=10, lower=0, observed=x.T)
    
    mu = alpha + beta * x_imputed
    likelihood = pm.Normal("likelihood", mu, tau=likelihood_tau, observed=y, shape=y.shape)

    trace = pm.sample(
        5000,
        tune=1000,
        cores=4,
    )

    ppc = pm.sample_posterior_predictive(trace)



ParameterValueError: tau > 0
Apply node that caused the error: Check{tau > 0}(alpha_tau, Elemwise{gt,no_inplace}.0)
Toposort index: 15
Inputs types: [TensorType(float64, ()), TensorType(bool, ())]
Inputs shapes: [(), ()]
Inputs strides: [(), ()]
Inputs values: [array(0.), array(False)]
Outputs clients: [[Elemwise{pow,no_inplace}(Check{tau > 0}.0, TensorConstant{-0.5})]]

Backtrace when the node is created (use Aesara flag traceback__limit=N to make it longer):
  File "/var/folders/pm/9z29qnf508bc1v6q8fksblm40000gn/T/ipykernel_13783/937182508.py", line 7, in <cell line: 1>
    alpha = pm.Normal("alpha", alpha_c, tau=alpha_tau, shape=(30,1))
  File "/Users/aaron/mambaforge/envs/pymc-dev-py39/lib/python3.9/site-packages/pymc/distributions/distribution.py", line 267, in __new__
    rv_out, dims, observed, resize_shape = _make_rv_and_resize_shape(
  File "/Users/aaron/mambaforge/envs/pymc-dev-py39/lib/python3.9/site-packages/pymc/distributions/distribution.py", line 166, in _make_rv_and_resize_shape
    rv_out = cls.dist(*args, **kwargs)
  File "/Users/aaron/mambaforge/envs/pymc-dev-py39/lib/python3.9/site-packages/pymc/distributions/continuous.py", line 552, in dist
    tau, sigma = get_tau_sigma(tau=tau, sigma=sigma)
  File "/Users/aaron/mambaforge/envs/pymc-dev-py39/lib/python3.9/site-packages/pymc/distributions/continuous.py", line 245, in get_tau_sigma
    tau_ = check_parameters(tau, tau > 0, msg="tau > 0")
  File "/Users/aaron/mambaforge/envs/pymc-dev-py39/lib/python3.9/site-packages/pymc/distributions/dist_math.py", line 67, in check_parameters
    return CheckParameterValue(msg)(logp, all_true_scalar)
  File "/Users/aaron/mambaforge/envs/pymc-dev-py39/lib/python3.9/site-packages/aesara/graph/op.py", line 294, in __call__
    node = self.make_node(*inputs, **kwargs)
  File "/Users/aaron/mambaforge/envs/pymc-dev-py39/lib/python3.9/site-packages/aesara/raise_op.py", line 87, in make_node
    [value.type()],

HINT: Use the Aesara flag `exception_verbosity=high` for a debug print-out and storage map footprint of this Apply node.

In [None]:
az.summary(trace, hdi_prob=0.95)

Notes:

- Pretty sure it's mostly a shape problem. Need to take some time and do this by hand to confirm.

- can't impute data with pm.Data(mutable=True)? 

    - reading: https://github.com/pymc-devs/pymc/issues/4441 https://github.com/pymc-devs/pymc/pull/5295
