In [1]:
import pymc as pm
import numpy as np
import arviz as az
import pandas as pd

%load_ext lab_black
%load_ext watermark

# Predicting using censored data

This example demonstrates ...

Adapted from [unit 10: katla.odc](https://raw.githubusercontent.com/areding/6420-pymc/main/original_examples/Codes4Unit10/katla.odc).

Data can be found [here](https://raw.githubusercontent.com/areding/6420-pymc/main/data/r.txt).

Associated lecture video: Unit 10 Lesson 6

## Problem statement

In 2010 Icelandic volcano Eyjafjallajökull erupted. Nearby volcano Katla erupts more frequently.

Prediction for next Katla erruption (BUGS Book p254)



notes:

Having problems with Weibull and pm.Censored again! What is the deal here? Imputed censoring works fine.

In [10]:
# fmt: off
D = np.array(
    (1177, 1262, 1311, 1357, 1416, 1440, 1450, 1500, 
     1550, 1580, 1612, 1625, 1660, 1721, 1755, 1823, 
     1860, 1918, np.inf)
)
# fmt: on

# probabilities
ps = [1, 5, 10, 50]

# time between eruptions
t = np.diff(D)

In [11]:
with pm.Model() as m:
    α = pm.TruncatedNormal("α", mu=0, sigma=5, lower=0)  # v in BUGS model

    σ = pm.Gamma("σ", .001, .001)
    λ = (1 / σ)**α
    β = λ ** (-1 / α)
    
    _t = pm.Weibull.dist(α, β)
    pm.Censored("likelihood", _t, lower=None, upper=100, observed=t)
    
    median = pm.Deterministic("median tte", σ * np.log(2)**(1 / α))
    
    for p in ps:
        pm.Deterministic(
            f"p_erupt_{p}", 1 - pm.math.exp((100 / σ) ** α - ((100 + p) / σ) ** α)
        )

    trace=pm.sample(3000, init="jitter+adapt_diag_grad")

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag_grad...


SamplingError: Initial evaluation of model at starting point failed!
Starting values:
{'α_interval__': array(0.6751748), 'σ_log__': array(-0.60518812)}

Initial evaluation results:
{'α': -1.24, 'σ': -6.92, 'likelihood': -inf}

pm.Censored doesn't seem to like this model. Imputed censoring method (below) works fine.

In [14]:
t_uncens = t[:-1]

In [12]:
with pm.Model() as m:
    # α = pm.Uniform("α", 0, 10) # getting divide by 0 errors
    α = pm.TruncatedNormal("α", mu=0, sigma=5, lower=0)  # v in BUGS model

    σ = pm.Gamma("σ", 0.001, 0.001)
    λ = (1 / σ) ** α
    β = λ ** (-1 / α)

    impute_censored = pm.Bound("impute_censored", pm.Weibull.dist(α, β), lower=100)

    pm.Weibull("uncensored", α, β, observed=t_uncens)

    median = pm.Deterministic("median tte", σ * np.log(2) ** (1 / α))

    for p in ps:
        pm.Deterministic(
            f"p_erupt_{p}", 1 - pm.math.exp((100 / σ) ** α - ((100 + p) / σ) ** α)
        )

    trace = pm.sample(3000)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [α, σ, impute_censored]


Sampling 4 chains for 1_000 tune and 3_000 draw iterations (4_000 + 12_000 draws total) took 16 seconds.
There were 12 divergences after tuning. Increase `target_accept` or reparameterize.
There were 30 divergences after tuning. Increase `target_accept` or reparameterize.
There were 5 divergences after tuning. Increase `target_accept` or reparameterize.
There were 3 divergences after tuning. Increase `target_accept` or reparameterize.


In [13]:
az.summary(trace)

Unnamed: 0,mean,sd,hdi_3%,hdi_97%,mcse_mean,mcse_sd,ess_bulk,ess_tail,r_hat
α,2.014,0.388,1.321,2.769,0.005,0.004,5637.0,6430.0,1.0
σ,54.655,7.331,41.291,68.558,0.103,0.073,5149.0,5655.0,1.0
impute_censored,114.956,16.849,100.002,142.897,0.22,0.157,5339.0,4791.0,1.0
median tte,45.311,6.533,32.965,57.373,0.091,0.065,5091.0,5038.0,1.0
p_erupt_1,0.071,0.031,0.021,0.127,0.0,0.0,6073.0,5503.0,1.0
p_erupt_5,0.307,0.111,0.118,0.514,0.001,0.001,6064.0,5477.0,1.0
p_erupt_10,0.515,0.15,0.243,0.794,0.002,0.001,6051.0,5519.0,1.0
p_erupt_50,0.959,0.063,0.842,1.0,0.001,0.001,5982.0,5563.0,1.0


In [None]:
%watermark -n -u -v -iv -p aesara,aeppl