## Experiments with Nutpie and PYMC built in sampler

In another project I was seeing substantial seed ups (30x) in sampling with nutpie vs the default sampler for a model that fit negative binomial to about 200k samples.  The purpose of this notebook was to produce a minimal example to demonstrate the speed up. 

However, in this notebook after upgrading to recent PYMC, the sampler lets you know real time what the different chains are doing and I see what was happening is that some chains were getting stuck. This didn't seem to happen with the nutpie sampler, but I have not spent time checking that this is not just due to small sample size.   After changing the priors sampling improved for pymc but even so nutpie is a bit faster (only 50% not 30x though!)

In [12]:
import numpy as np
import pymc as pm
import arviz as az
import time

def simulate_data(mu, alpha, num):
    neg_bin = pm.NegativeBinomial.dist(mu=mu, alpha=alpha)
    return pm.draw(neg_bin, num)

test1 = simulate_data(2.7, 0.6, 10000)   
np.mean(test1),np.std(test1)

(2.703, 3.9174470002796467)

In [20]:
pm.__version__

'5.21.0'

In [15]:
with pm.Model() as nb_model:
    data = pm.Data('data', test1)
    mu = pm.TruncatedNormal('mu', mu = 2, sigma=5, lower = 0.0)
    alpha = pm.Gamma('alpha', alpha =5, beta =.5 )
    counts = pm.NegativeBinomial('counts',mu=mu, alpha=alpha, shape = data.shape, observed = data)

In [16]:
start = time.perf_counter()


with nb_model:
    trace = pm.sample(1000, tune=1000)  

end = time.perf_counter()

print(f"Elapsed time: {end - start:.6f} seconds")

Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [mu, alpha]


Output()

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 4 seconds.


Elapsed time: 11.500265 seconds


In [17]:
test2 = simulate_data(2.7, 0.6, 200000) 
with nb_model:
    pm.set_data({'data': test2})

With default sampler, the chains tend to get stuck (sometimes) and take a long time to finish.

In [18]:
with nb_model:
    start = time.perf_counter()
    trace = pm.sample(1000, tune=1000) 
    end = time.perf_counter()
    print(f"Elapsed time: {end - start:.6f} seconds")

az.summary(trace)

Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [mu, alpha]


Output()

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 87 seconds.


Elapsed time: 87.473503 seconds


Unnamed: 0,mean,sd,hdi_3%,hdi_97%,mcse_mean,mcse_sd,ess_bulk,ess_tail,r_hat
mu,2.69,0.008,2.674,2.706,0.0,0.0,3330.0,2872.0,1.0
alpha,0.6,0.003,0.595,0.605,0.0,0.0,4152.0,2917.0,1.0


Nutpie (so far in my experiments) has not exhibited this behavior

In [19]:
with nb_model:
    start = time.perf_counter()
    trace = pm.sample(1000, tune=1000, nuts_sampler="nutpie")
    end = time.perf_counter()
    print(f"Elapsed time: {end - start:.6f} seconds")




Progress,Draws,Divergences,Step Size,Gradients/Draw
,2000,0,1.2,3
,2000,0,1.2,3
,2000,0,1.2,3
,2000,0,1.22,3


Elapsed time: 65.142873 seconds


Only a slide speed up here.   