# PyMC

In an effort to compare and contrast vs numpyro, going to test out pymc and hopefully write up something longer to show the syntax differences and similarities, and recommend one for a beginner.

In [6]:
# Preamble
import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import xarray as xr

In [5]:
# next, pymc specific setup

import pymc as pm
# Distributions
from pymc import HalfCauchy, Model, Normal

We'll follow the same simulated data as in the [GLM:Linear Regression pymc docs](https://www.pymc.io/projects/docs/en/stable/learn/core_notebooks/GLM_linear.html)

In [11]:
RANDOM_SEED = 8927
rng = np.random.default_rng(RANDOM_SEED)

size = 200
true_intercept = 1
true_slope = 2

x = np.linspace(0, 1, size)
# y = a + b*x
true_regression_line = true_intercept + true_slope * x
# add noise
y = true_regression_line + rng.normal(scale=0.5, size=size)

data = pd.DataFrame(dict(x=x, y=y))

In [12]:
data

Unnamed: 0,x,y
0,0.000000,0.888679
1,0.005025,1.672880
2,0.010050,0.734615
3,0.015075,1.364275
4,0.020101,1.612892
...,...,...
195,0.979899,2.774799
196,0.984925,4.721245
197,0.989950,3.650920
198,0.994975,2.462356


Model Estimation Syntax with pymc, takes about 5-9 sec.

In [18]:
with Model() as model:  # model specifications in PyMC are wrapped in a with-statement
    # Define priors
    sigma = HalfCauchy("sigma", beta=10)
    intercept = Normal("Intercept", 0, sigma=20)
    slope = Normal("slope", 0, sigma=20)

    # Define likelihood
    likelihood = Normal("y", mu=intercept + slope * x, sigma=sigma, observed=y)

The base `pymc` sampler takes like 6 sec:

In [19]:
with model:
    # Inference!
    # draw 3000 posterior samples using NUTS sampling
    idata = pm.sample(3000, nuts_sampler='pymc')

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [sigma, Intercept, slope]


Sampling 4 chains for 1_000 tune and 3_000 draw iterations (4_000 + 12_000 draws total) took 2 seconds.


We can get some speedups by using a different sampler right off the bat. For instance, we can use the `numpyro` sampler (as long as the model is continous)

In [21]:
with model:
    # Note: must be continuous
    idata = pm.sample(3000, nuts_sampler='numpyro')

Compiling...
Compilation time = 0:00:00.401768
Sampling...


  0%|          | 0/4000 [00:00<?, ?it/s]

  0%|          | 0/4000 [00:00<?, ?it/s]

  0%|          | 0/4000 [00:00<?, ?it/s]

  0%|          | 0/4000 [00:00<?, ?it/s]

Sampling time = 0:00:01.401228
Transforming variables...
Transformation time = 0:00:00.032153


In [22]:
idata