# Exercise 01: Hamiltonian Monte Carlo with CUQIpy-PyTorch

In this exercise, we will use the [CUQIpy-PyTorch](https://github.com/CUQI-DTU/CUQIpy-PyTorch) plugin to sample arbitrary probability distributions using the No U-Turn (NUTS) variant of Hamiltonian Monte Carlo (HMC). Make sure you have installed the plugin before starting this exercise.

## Learning objectives of this notebook:
- ...

## Table of contents: 
* [1. xxx](#xxx)
* [2. yyy ★](#yyy)

First we import the necessary packages. Notice we use the PyTorch package `torch` instead of NumPy for arrays and import both `cuqi` and `cuqi_torch` from CUQIpy and CUQIpy-PyTorch, respectively.

The main idea of the CUQIpy-PyTorch package is to leverage the automatic differentiation capabilities of PyTorch to compute the gradients of the log-probability function. To achieve this everything we use must work with PyTorch tensors

In [None]:
import torch as xp
import cuqi
import cuqipy_pytorch
import matplotlib.pyplot as plt

Because the `cuqi` distributions are written using NumPy and SciPy, we instead have to use the distributions defined in `cuqi_torch`. These are thin wrappers around PyTorch distributions, but acts as a drop-in replacement for the `cuqi` distributions.

In [None]:
from cuqipy_pytorch.distribution import Gaussian, HalfGaussian, LogGaussian, Uniform, Gamma, StackedJointDistribution

We also load the highly optimized NUTS sampler from the `pyro` package, which is conviniently wrapped in the `cuqi_torch` package.

In [None]:
from cuqipy_pytorch.sampler import NUTS

We then define a simple function to sample the posterior given densities and data.

In [None]:
# A convenience function to sample a Bayesian model
def sample(*densities, Ns=500, Nb=500, **data):
    """ Sample given by a list of densities. The observations are given as keyword arguments. """
    P = StackedJointDistribution(*densities)
    return NUTS(P(**data)).sample(Ns, Nb)

In [12]:
import pandas as pd
DATA_URL = "https://raw.githubusercontent.com/fonnesbeck/probabilistic_python/master/data/"
spin_rate_data = pd.read_csv(DATA_URL + "savant_data.csv", parse_dates=["game_date"]).dropna(subset=["spin_rate", "game_date"])

day_ind, date = pd.factorize(spin_rate_data.game_date, sort=True)
spin_rate = spin_rate_data.spin_rate.values
day_ind = xp.tensor(day_ind)

#spin_rate_data.head()
#spin_rate_data.plot.scatter(x="game_date", y="spin_rate", figsize=(14,5), alpha=0.2)

mu = Gaussian(xp.ones(2)*2500, 100)
tau = Uniform(0, 181)
sigma = HalfGaussian(100)
r = lambda tau, mu: xp.where(day_ind < tau, mu[0], mu[1])
sr = LogGaussian(r, lambda sigma: sigma)

samples = sample(mu, tau, sigma, sr, Ns=200, Nb=200, sr=spin_rate)




Sample: 100%|██████████| 400/400 [00:57,  7.02it/s, step size=3.52e-02, acc. prob=0.170]


## 1. A simple example

$$




In [None]:
x = xp.linspace(-10, 10, 100)
y = x**3-27*x - 8

plt.plot(x, y, 'o')



## Eight schools
The eight schools model is a classic example made famous by the Bayesian Data Analysis book by Gelman et. al. 

It is often used to illustrate the notation and code-style of probabilistic programming languages. 

The actual model is explained in the BDA book or in the Edward 1.0 PPL notebook ([link](https://github.com/blei-lab/edward/blob/master/notebooks/eight_schools.ipynb)).

The Bayesian model can be written as

\begin{align*}
    \mu &\sim \mathcal{N}(0, 10^2)\\
    \tau &\sim \log\mathcal{N}(5, 1)\\
    \boldsymbol \theta' &\sim \mathcal{N}(\mathbf{0}, \mathbf{I}_m)\\
    \boldsymbol \theta &= \mu + \tau \boldsymbol \theta'\\
    \mathbf{y} &\sim \mathcal{N}(\boldsymbol \theta, \boldsymbol \sigma^2 \mathbf{I}_m)
\end{align*}

where $\mathbf{y}\in\mathbb{R}^m$ and $\boldsymbol \sigma\in\mathbb{R}^m$ is observed data.

In CUQIpy we can define the model as follows:

In [None]:
y_obs = xp.tensor([28, 8, -3,  7, -1, 1,  18, 12], dtype=xp.float32)
σ_obs = xp.tensor([15, 10, 16, 11, 9, 11, 10, 18], dtype=xp.float32)

μ     = Gaussian(0, 10**2)
τ     = LogGaussian(5, 1)
θp    = Gaussian(xp.zeros(8), 1)
θ     = lambda μ, τ, θp: μ+τ*θp
y     = Gaussian(θ, cov=σ_obs**2)

samples = sample(μ, τ, θp, y, y=y_obs)

In [None]:
# Plot posterior samples
samples["θp"].plot_violin(); 
print(samples["μ"].mean()) # Average effect
print(samples["τ"].mean()) # Average variance

In [None]:
# Plot treatment effect distribution
θs = []
for μs, τs, θps in zip(samples["μ"], samples["τ"], samples["θp"]):
    θs.append(θ(μs, τs, θps))
    
θs = cuqi.samples.Samples(xp.tensor(θs).T)
θs.geometry._name = "θ"
θs.plot_violin();

## Using SciPy forward model

#### Try yourself
...

# Your code here