In [5]:
import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pymc as pm
from pymc.math import log, sqr

%load_ext watermark
%load_ext lab_black

# Jeremy with Zero Tricks

This introduces the "zero trick", which I'm still not sure if we'll ever need. Including for completeness' sake.

Adapted from [Codes for Unit 6: zerotrickjeremy.odc](https://raw.githubusercontent.com/areding/6420-pymc/main/original_examples/Codes4Unit6/zerotrickjeremy.odc).

## Associated lecture video: Unit 6 Lesson 10

In [1]:
%%html
<iframe width="560" height="315" src="https://www.youtube.com/embed?v=t4pHpZxtC0U&list=PLv0FeK5oXK4l-RdT6DWJj0_upJOG2WKNO&index=61" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen></iframe>

## Problem statement

There's a running example in the lectures about Jeremy testing his IQ. At some point I will track all those down and add links here, but for now I'm just going to port the code.

I'm not sure what's going on when the professor sets ```z1``` as both a deterministic and random variable. I'll need to test some things out in BUGS once the Citrix virtual machines are back online. For now, here's a first pass at recreating the model, where I interpret the ```z <- 0``` as feeding an observation of zero to the variable.

That said, I don't think we will ever need to use the zeros or ones tricks in the homeworks. If we do I will definitely expand this page.

In [2]:
y = 98
μ = 110
σ = 8.944272
τ = 10.954451
constant = 1000 # can't let lambda be lower than zero

inits = {"θ": 100}

In [9]:
with pm.Model() as m:
    θ = pm.Flat("θ")

    λ1 = pm.Deterministic("λ1", log(σ) + 0.5 * sqr(((y - θ) / σ)) + constant)
    λ2 = pm.Deterministic("λ2", log(τ) + 0.5 * sqr(((θ - μ) / τ)) + constant)

    z1 = pm.Poisson("z1", λ1, observed=0)
    z2 = pm.Poisson("z2", λ2, observed=0)

    trace = pm.sample(
        10000,
        chains=4,
        cores=4,
        tune=1000,
        random_seed=1,
        return_inferencedata=True,
        initvals=inits,
        target_accept=0.88,
    )

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [θ]


  return _boost._beta_ppf(q, a, b)
  return _boost._beta_ppf(q, a, b)
  return _boost._beta_ppf(q, a, b)
  return _boost._beta_ppf(q, a, b)
Sampling 4 chains for 1_000 tune and 10_000 draw iterations (4_000 + 40_000 draws total) took 9 seconds.


In [10]:
az.summary(trace, hdi_prob=0.95)

Unnamed: 0,mean,sd,hdi_2.5%,hdi_97.5%,mcse_mean,mcse_sd,ess_bulk,ess_tail,r_hat
θ,102.74,6.992,89.332,116.583,0.058,0.041,14393.0,22941.0,1.0
λ1,1002.637,0.597,1002.191,1003.847,0.005,0.003,17220.0,23150.0,1.0
λ2,1002.817,0.512,1002.394,1003.862,0.004,0.003,15742.0,23355.0,1.0


Again, it's not clear to me what BUGS is doing, but these PyMC results are almost exactly the same as the professor's results so this must be close. I have also tried passing vectors of zeros to each ```z```, putting more weight on the zero "observations." This ended up reducing the credible interval and standard deviation of theta. 

I found [this page](http://www.medicine.mcgill.ca/epidemiology/Joseph/courses/common/Tricks.html) that briefly mentions the same trick. They note that "... this method can be very inefficient and give a very high MC error."



In [11]:
%watermark --iversions -v

Python implementation: CPython
Python version       : 3.10.4
IPython version      : 8.3.0

pymc      : 4.0.0b5
arviz     : 0.12.1
matplotlib: 3.5.2
numpy     : 1.22.3

