In [20]:
import arviz as az
import numpy as np
import pymc as pm
from pymc.math import dot, stack, concatenate

%load_ext watermark
%watermark --iversions

The watermark extension is already loaded. To reload it, use:
  %reload_ext watermark
pymc : 4.0.0b4
arviz: 0.11.4
numpy: 1.22.3



# Rats 

I'm just going to do ratsignorable2.odc for now since it is relevant for HW6. Eventually I'll add the other examples.

Adapted from [Codes for Unit 8: ratsignorable2.odc](https://www2.isye.gatech.edu/isye6420/supporting.html).

Associated lecture video: [Unit 8 Lesson 2](https://www.youtube.com/watch?v=T5vkLsIs3f8&list=PLv0FeK5oXK4l-RdT6DWJj0_upJOG2WKNO&index=83).

Data can be found [here](https://raw.githubusercontent.com/areding/6420-pymc/main/data/rats.txt).

We had a previous example about [Dugongs](https://areding.github.io/6420-pymc/Unit6-dugongs.html) that dealt with missing data in the observed data (y values). This example shows how to deal with missing data in the input data (x). It's still pretty easy. You could look at it like creating another likelihood in the model, a very simple one where the observed data is x, and you use a single distribution to fill in the missing values (see ```x_imputed``` in the model below).

```{warning}
This version of the model is not running!

I'm having trouble figuring out how to translate the professor's model into PyMC. If anyone gets it working, let me know. That said, the imputation parts are correct, the important thing is creating the masked data arrays and then choosing a good prior for x_imputed.
```

I hope it gives you an idea for how to handle the missing data question on HW6, which I have confirmed works well in PyMC.

Original paper [here.](https://www.jstor.org/stable/pdf/2289594.pdf)

In [68]:
x = np.array([8.0, 15.0, 22.0, np.nan, 36.0]).reshape(-1, 1)
x = np.repeat(x, 30, axis=1)
y = np.loadtxt("./data/rats.txt")

In [69]:
# create masked data
y = y.copy()
y = np.nan_to_num(y, nan=-1)
y = np.ma.masked_values(y, value=-1)

x = x.copy()
x = np.nan_to_num(x, nan=-1)
x = np.ma.masked_values(x, value=-1)

y.shape, x.shape

((30, 5), (5, 30))

In [80]:
with pm.Model() as m:
    alpha_c = pm.Normal("alpha_c", 0, tau=1e-6)
    alpha_tau = pm.Gamma("alpha_tau", .001, .001)
    beta_c = pm.Normal("beta.c", 0, tau=1e-6)
    beta_tau = pm.Gamma("beta_tau", .001, .001)
    
    alpha = pm.Normal("alpha", alpha_c, tau=alpha_tau, shape=30)
    beta = pm.Normal("beta", beta_c, tau=beta_tau, shape=30)
    likelihood_tau = pm.Gamma("likelihood_tau", .001, .001)

    # This line is important for the homework!
    x_imputed = pm.TruncatedNormal("x_imputed", mu=20, sigma=10, lower=0, observed=x)
    
    mu = alpha + mu * x_imputed
    likelihood = pm.Normal("likelihood", mu, tau=likelihood_tau, observed=y, shape=y.shape)

    trace = pm.sample(
        5000,
        tune=1000,
        cores=4,
    )

    ppc = pm.sample_posterior_predictive(trace)

ValueError: ('shapes (1,30) and (5,1) not aligned: 30 (dim 1) != 5 (dim 0)', (1, 30), (5, 1))
Apply node that caused the error: Dot22(InplaceDimShuffle{x,0}.0, InplaceDimShuffle{0,x}.0)
Toposort index: 86
Inputs types: [TensorType(float64, (1, None)), TensorType(float64, (None, 1))]
Inputs shapes: [(1, 30), (5, 1)]
Inputs strides: [(240, 8), (8, 8)]
Inputs values: ['not shown', array([[26.39187751],
       [36.07791154],
       [10.95083349],
       [29.01311593],
       [17.26067923]])]
Outputs clients: [[InplaceDimShuffle{}(Dot22.0)]]

HINT: Re-running with most Aesara optimizations disabled could provide a back-trace showing when this node was created. This can be done by setting the Aesara flag 'optimizer=fast_compile'. If that does not work, Aesara optimizations can be disabled with 'optimizer=None'.
HINT: Use the Aesara flag `exception_verbosity=high` for a debug print-out and storage map footprint of this Apply node.

In [None]:
az.summary(trace, hdi_prob=0.95)

Notes:

can't impute data with pm.Data(mutable=True)? 

reading:
https://github.com/pymc-devs/pymc/issues/4441

https://github.com/pymc-devs/pymc/pull/5295
