In [15]:
import pymc as pm
import numpy as np
import arviz as az
from pymc.math import switch, ge, exp

%load_ext lab_black
%load_ext watermark

The lab_black extension is already loaded. To reload it, use:
  %reload_ext lab_black
The watermark extension is already loaded. To reload it, use:
  %reload_ext watermark


# Revisiting UK coal mining disasters

This example demonstrates ...

Adapted from [unit 10: disasters.odc](https://raw.githubusercontent.com/areding/6420-pymc/main/original_examples/Codes4Unit10/disasters.odc).

Data can be found [here](https://raw.githubusercontent.com/areding/6420-pymc/main/data/r.txt).

## Associated lecture video: Unit 10 Lesson 3

In [1]:
%%html
<iframe width="560" height="315" src="https://www.youtube.com/embed?v=xomK4tcePmc&list=PLv0FeK5oXK4l-RdT6DWJj0_upJOG2WKNO&index=100" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen></iframe>

## Problem statement

Change Point Analysis, discussed in Unit 5 (Gibbs Sampler).

British coal mine disaster data by year (1851-1962)  

The 112 data points represent the numbers of coal-mining disasters involving 10 or more men killed per year between  1851 and 1962. 
 
Based on the observation that the there was a significant decrease around 1900, it is suitable to apply a change-point model to divide the whole dataset into two periods; each period with its own distribution of number of disasters.
 
The data set was compiled by Maguire, Pearson and Wynn in 1952 and 
updated by Jarrett (1978). This data have been used by a number of authors to illustrate various techniques that can be applied to point processes


 Maguire, B. A., Pearson, E. S. and Wynn, A. H. A. (1952). The time intervals between industrial accidents.   Biometrika, 39, 168†180.

 Jarrett, R.G. (1979). A note on the intervals between coal-mining disasters. Biometrika, 66, 191-193. 

 Carlin, Gelfand, and Smith (1992) Heirarchical Bayesian Analysis of Changepoint Problems. Applied Statistics, 41, 389-405.


In [10]:
# X is the number of coal mine disasters per year
# fmt: off
X = np.array([4, 5, 4, 1, 0, 4, 3, 4, 0, 6, 3, 3, 4, 0, 2, 6, 3, 3, 5, 4, 5, 3, 1,
     4, 4, 1, 5, 5, 3, 4, 2, 5, 2, 2, 3, 4, 2, 1, 3, 2, 2, 1, 1, 1, 1, 3,
     0, 0, 1, 0, 1, 1, 0, 0, 3, 1, 0, 3, 2, 2, 0, 1, 1, 1, 0, 1, 0, 1, 0,
     0, 0, 2, 1, 0, 0, 0, 1, 1, 0, 2, 3, 3, 1, 1, 2, 1, 1, 1, 1, 2, 4, 2,
     0, 0, 0, 1, 4, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1])
# fmt: on

y = np.array([y for y in range(1851, 1963)])

## Model 1

In [13]:
α = 4
β = 1
γ = 0.5
δ = 1

with pm.Model() as m:
    year = pm.Uniform("year", 1851, 1963)
    λ = pm.Gamma("λ", α, β)
    μ = pm.Gamma("μ", γ, δ)

    diff = pm.Deterministic("diff", μ - λ)

    rate = λ + switch(ge(y - year, 0), 1, 0) * diff
    pm.Poisson("lik", mu=rate, observed=X)

    trace = pm.sample(2000)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [year, λ, μ]


  return _boost._beta_ppf(q, a, b)
  return _boost._beta_ppf(q, a, b)
  return _boost._beta_ppf(q, a, b)
  return _boost._beta_ppf(q, a, b)
Sampling 4 chains for 1_000 tune and 2_000 draw iterations (4_000 + 8_000 draws total) took 18 seconds.


In [14]:
az.summary(trace)

Unnamed: 0,mean,sd,hdi_3%,hdi_97%,mcse_mean,mcse_sd,ess_bulk,ess_tail,r_hat
year,1890.444,2.377,1886.0,1894.591,0.045,0.032,2853.0,2869.0,1.0
λ,3.15,0.287,2.608,3.682,0.006,0.004,2565.0,3161.0,1.0
μ,0.916,0.117,0.7,1.139,0.002,0.002,2336.0,2712.0,1.0
diff,-2.234,0.302,-2.816,-1.67,0.006,0.004,2631.0,3267.0,1.0


## Model 2

In [18]:
with pm.Model() as m:
    year = pm.Uniform("year", 1851, 1963)
    z0 = pm.Normal("z0", 0, tau=0.00001)
    z1 = pm.Normal("z1", 0, tau=0.00001)

    λ = pm.Deterministic("λ", exp(z0))
    μ = pm.Deterministic("μ", exp(z0 + z1))

    diff = pm.Deterministic("diff", μ - λ)

    rate = pm.math.exp(z0 + switch(ge(y - year, 0), 1, 0) * z1)
    pm.Poisson("lik", mu=rate, observed=X)

    trace = pm.sample(2000)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [year, z0, z1]


  return _boost._beta_ppf(q, a, b)
  return _boost._beta_ppf(q, a, b)
  return _boost._beta_ppf(q, a, b)
  return _boost._beta_ppf(q, a, b)
Sampling 4 chains for 1_000 tune and 2_000 draw iterations (4_000 + 8_000 draws total) took 17 seconds.


In [19]:
az.summary(trace)

Unnamed: 0,mean,sd,hdi_3%,hdi_97%,mcse_mean,mcse_sd,ess_bulk,ess_tail,r_hat
z0,1.13,0.095,0.944,1.301,0.002,0.002,1886.0,2422.0,1.0
z1,-1.215,0.155,-1.497,-0.919,0.004,0.003,1918.0,2608.0,1.0
year,1890.379,2.469,1885.889,1894.96,0.045,0.032,3125.0,2871.0,1.0
λ,3.11,0.293,2.548,3.646,0.007,0.005,1886.0,2422.0,1.0
μ,0.926,0.117,0.71,1.148,0.002,0.001,3422.0,3338.0,1.0
diff,-2.184,0.311,-2.756,-1.586,0.008,0.005,1674.0,1989.0,1.0
