# 1. Missing Data

## Taxonomy of Missing Data

1. Missing comlpetely at random (MCAR)

        Missingness does not depend on observed or unobserved data.

2. Missing at random (MAR)

        Missingness depends only on observed data

3. Missing not at random (MNAR)

        Neither MCAR nor MAR hold, missingness may depend on the data that is missing, say the magnitute

MCAR and MAR are considered *ignorable* missingness, while MNAR is considered *non-ignorable*.

## Multiple imputations

Single imputations of missing values are not preferred because they do not estimate the error correctly or account for the uncertainty in imputed data. Multiple imputations is a method developed by [Rubin](https://pubmed.ncbi.nlm.nih.gov/2057657/). 

The actual posterior distribution of $\theta$ is:

$$\text{P}(\theta | Y_{obs}) = \int \text{P}(\theta | Y_{obs},Y_{mis})\text{P}(Y_{mis},Y_{obs})d Y_{mis}$$

i.e. (actual posterior distribution of $\theta$) = AVE(complete posterior distribution of $\theta$)

Posterior mean of $\theta$:

$$\text{E}(\theta | Y_{obs}) = \text{E}(\text{E}(\theta | Y_{obs}, Y_{mis})|Y_{obs})$$

i.e. (posterior mean of $\theta$) = AVE(repeated complete data posterior means of $\theta$)

Posterior variance is:

$$\text{Var}(\theta | Y_{obs}) = \text{E}(\text{Var}(\theta | Y_{obs}, Y_{mis})|Y_{obs}) + \text{Var}(\text{E}(\theta | Y_{obs}, Y_{mis})|Y_{obs})$$

i.e. (posterior variance of $\theta$) = AVE(repeated complete data variances of $\theta$) + Var(repeated complete data posterior eans of $\theta$)

## PyMC Implementation

Luckily, `PyMC` automatically takes care of imputations for missing observed (or 'y') values, when they are encoded is NaN. PyMC provides samples for each missing value, from which we can calculate stats like mean, variance, and HDI.

The next page has examples of ignorable missingness (Models 1 and 2), followed by Model 3 with non-ignoarable missingness. 

* Model 1: Imputing observed (or y/response) values
* Model 2: Imputing predictors (or x) values
* Model 3: y is missing depending on size

To lay the groundwork, this is a linear regression model with slope and intercept ($\alpha$ and $\beta$) coefficients for each subject in the experiment.

$$\begin{align*}
y_{i,j} & = \alpha_j + \beta_j x_{i} + \epsilon_i, \space \space i = 1 , ... , n, \space \space j = 1 , ... , m \\
\epsilon_i & \overset{iid}{\sim} N (0,\sigma^2) \\
\alpha_j & \sim N (\alpha_c,1/\alpha_\tau) \\
\alpha_c & \sim N (0,1/1e^{-4}) \\
\alpha_\tau & \sim Gamma (0.001,.001) \\
\beta_j & \sim N (\beta_c,1/\beta_\tau) \\
\beta_c & \sim N (0,1/1e^{-4}) \\
\beta_\tau & \sim Gamma (0.001,.001) \\
\end{align*} $$

In Model 2, we need to provide a a prior distribution for the missing x's, which is $N (20,5^2)$. In Model 3, the probability of missingness depends on size, specified by:

$$\begin{align*}
\text{miss}[i] & \sim Bern(p[i])  \\
p[i] & = invlogit \left ( a + b \times y[i] \right ) \\
a & \sim Logistic(0,s=10) \\
b & = \log(1.01) \\
\end{align*} $$