In [1]:
import pymc as pm
import numpy as np
import arviz as az
from pymc.math import matrix_inverse, extract_diag, sqrt
import pytensor.tensor as pt

%load_ext lab_black
%load_ext watermark

# Dental development


Adapted from [Unit 10: growth.odc](https://raw.githubusercontent.com/areding/6420-pymc/main/original_examples/Codes4Unit10/growth.odc).

Data for the y array can be found [here](https://raw.githubusercontent.com/areding/6420-pymc/main/data/growthy.txt).

Associated lecture video: Unit 10 lesson 2

## Problem statement


Data set on dental development was first provided by Potthof and Roy in their 1964 paper. It consists of longitudinal observations on 11 girls (gender=1) and 16 boys (gender=2). 

There are 4 observations on each subject centered at times -3,-1, 1, 3, where the units were years.

The measurement on each subject is the distance (in mm) from the center of the pituitary to the pteryomaxillary fisure.

Potthoff and Roy (1964). "A Generalized Multivariate Analysis of Variance Model Useful Especially for Growth Curve Problems," Biometrika, 51, 313-326.

MVN with Gender Specific Means but Common Precision Matrix

## Notes
- Wishart. Pymc docs say it's unusable.
- https://github.com/pymc-devs/pymc/issues/538 interesting discussion here
- https://austinrochford.com/posts/2015-09-16-mvn-pymc3-lkj.html

Currently have a working version but not sure if it's correct. I split the likelihoods for male and female with a shared covariance matrix. Results are sort of in-line with BUGS. Not sure if the difference is from the different prior on the covariance matrix or something else. A better way would be to use the coordinate system but I couldn't get it working with the multivariate normal likelihood.

In [2]:
time = np.array([-3, -1, 1, 3])
y = np.loadtxt("../data/growthy.txt")

In [3]:
with pm.Model() as m_double:
    beta1 = pm.Normal("beta1", 20, tau=0.001, shape=2)
    beta2 = pm.Normal("beta2", 1, tau=0.001, shape=2)

    sd_dist = pm.Normal.dist(0, 2, shape=4)
    T, corr, _ = pm.LKJCholeskyCov("T", n=4, eta=2, sd_dist=sd_dist, compute_corr=True)

    mu_male = pm.Deterministic("mu_male", beta1[0] + beta2[0] * time)
    mu_female = pm.Deterministic("mu_female", beta1[1] + beta2[1] * time)

    pm.MvNormal("likelihood_male", mu_male, chol=T, shape=(11, 4), observed=y[:12, :])
    pm.MvNormal(
        "likelihood_female", mu_female, chol=T, shape=(15, 4), observed=y[11:, :]
    )

    pm.Deterministic("corr", corr)

    trace = pm.sample(1000)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Ambiguities exist in dispatched function _unify

The following signatures may result in ambiguous behavior:
	[object, ConstrainedVar, Mapping], [ConstrainedVar, object, Mapping]
	[ConstrainedVar, object, Mapping], [object, ConstrainedVar, Mapping]
	[object, ConstrainedVar, Mapping], [ConstrainedVar, Var, Mapping]
	[object, ConstrainedVar, Mapping], [ConstrainedVar, Var, Mapping]


Consider making the following additions:

@dispatch(ConstrainedVar, ConstrainedVar, Mapping)
def _unify(...)

@dispatch(ConstrainedVar, ConstrainedVar, Mapping)
def _unify(...)

@dispatch(ConstrainedVar, ConstrainedVar, Mapping)
def _unify(...)

@dispatch(ConstrainedVar, ConstrainedVar, Mapping)
def _unify(...)
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [beta1, beta2, T]


Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 107 seconds.


In [4]:
az.summary(trace, var_names="beta", filter_vars="like")

Unnamed: 0,mean,sd,hdi_3%,hdi_97%,mcse_mean,mcse_sd,ess_bulk,ess_tail,r_hat
beta1[0],23.059,0.491,22.109,23.939,0.008,0.006,3915.0,2805.0,1.0
beta1[1],25.179,0.469,24.356,26.113,0.008,0.006,3392.0,2822.0,1.0
beta2[0],0.527,0.11,0.328,0.739,0.002,0.001,4271.0,2583.0,1.0
beta2[1],0.786,0.114,0.554,0.978,0.002,0.001,2999.0,3028.0,1.0


In [5]:
az.summary(trace, var_names="corr")

  (between_chain_variance / within_chain_variance + num_samples - 1) / (num_samples)


Unnamed: 0,mean,sd,hdi_3%,hdi_97%,mcse_mean,mcse_sd,ess_bulk,ess_tail,r_hat
"corr[0, 0]",1.0,0.0,1.0,1.0,0.0,0.0,4000.0,4000.0,
"corr[0, 1]",0.475,0.138,0.221,0.724,0.003,0.002,2625.0,2392.0,1.0
"corr[0, 2]",0.618,0.112,0.402,0.801,0.002,0.002,2096.0,2360.0,1.0
"corr[0, 3]",0.404,0.146,0.129,0.656,0.003,0.002,2412.0,2195.0,1.0
"corr[1, 0]",0.475,0.138,0.221,0.724,0.003,0.002,2625.0,2392.0,1.0
"corr[1, 1]",1.0,0.0,1.0,1.0,0.0,0.0,4065.0,3783.0,1.0
"corr[1, 2]",0.725,0.085,0.564,0.864,0.002,0.001,2861.0,3041.0,1.0
"corr[1, 3]",0.555,0.121,0.33,0.773,0.003,0.002,2334.0,2835.0,1.0
"corr[2, 0]",0.618,0.112,0.402,0.801,0.002,0.002,2096.0,2360.0,1.0
"corr[2, 1]",0.725,0.085,0.564,0.864,0.002,0.001,2861.0,3041.0,1.0


In [6]:
az.summary(trace, var_names=["mu_male", "mu_female"])

Unnamed: 0,mean,sd,hdi_3%,hdi_97%,mcse_mean,mcse_sd,ess_bulk,ess_tail,r_hat
mu_male[0],21.478,0.522,20.462,22.427,0.008,0.005,4550.0,2833.0,1.0
mu_male[1],22.532,0.477,21.602,23.378,0.007,0.005,4126.0,2953.0,1.0
mu_male[2],23.586,0.528,22.628,24.596,0.009,0.006,3848.0,2886.0,1.0
mu_male[3],24.641,0.654,23.408,25.859,0.01,0.007,3992.0,2770.0,1.0
mu_female[0],22.822,0.521,21.899,23.851,0.008,0.006,4152.0,2791.0,1.0
mu_female[1],24.393,0.46,23.577,25.291,0.008,0.005,3704.0,3172.0,1.0
mu_female[2],25.965,0.505,25.065,26.963,0.009,0.006,3154.0,2687.0,1.0
mu_female[3],27.536,0.634,26.378,28.774,0.012,0.008,2951.0,2721.0,1.0


In [7]:
%watermark -n -u -v -iv -p aesara,aeppl

Last updated: Fri Feb 03 2023

Python implementation: CPython
Python version       : 3.11.0
IPython version      : 8.9.0

aesara: 2.8.10
aeppl : 0.1.1

numpy   : 1.24.1
pytensor: 2.8.11
arviz   : 0.14.0
pymc    : 5.0.1

