# Probabilistic Programming
## Introduction

So far you've been doing all your calculations by hand. As you have probably learned, this is timeconsuming and error prone. In this minicourse, we are going to automate some of this labour. The framework of defining a probabilistic model and automatically inferring variables of interest is called _Probabilistic Programming_.

The main problem with probabilistic models and Bayesian inference in general, is that one cannot perform certain integrations required to form the posterior distribution. To avoid this, we can employ approximate Bayesian inference.

Within probabilistic programming, there are two schools of thought: Monte Carlo Sampling and Variational Inference. We will briefly discuss both.

### Monte Carlo Sampling

We are often interested in expected values of posterior distributions. This need not be the mean, but can also be the expected value with respect to some function $g$ applied to the data: $\int g(\theta) p(\theta | x) d\theta$. However, in most cases, one cannot integrate with respect to the posterior distribution. Monte Carlo sampling is a numerical approximation method. It consists of drawing samples from the posterior distribution and approximate the expected value with the sample average. This is motivated by the [Law of Large Numbers](https://www.statlect.com/asymptotic-theory/law-of-large-numbers), stating that sample averages converge to expected values as the sample size goes to infinity.

Let's start with an example. Say that I would like to know the expected value of the following distribution: 

$$\mathcal{N}(x \mid 1, 1)$$. 

This is, of course, the Gaussian distribution and we know its expected value analytically: $\int x \mathcal{N}(x \mid \mu, 1) dx = \mu$. However, we are going to approximate it anyway, just to show how approximation via MC sampling works. So, I draw 10 samples from the Gaussian distribution and approximate the expected value via the sample average:

$$ \int x p(x | \theta) dx \approx \frac{1}{n} \sum_{i=1}^n x_i \quad \text{for} \ x_i \sim p(x | \theta)$$

I can draw samples from

In [22]:
using StatsPlots
using Distributions

# Data points
X = [1. 0.9 1.12 0.89]

# Likelihood
px(μ) = prod([pdf.(Normal(μ,1), x) for x in X])

# Draw samples from prior
μ_ = rand(Normal(1,1), 1000)

# Push drawn samples through likelihood
lμ = [px(m) for m in μ_]

# x = -3:.1:3
# plot(x, lμ(x), color="black", label="")
sum(lμ .* μ_)

10.619867910862402

### Variational Inference

TODO

## Outline mini-course PP

We will have four blocks of 2 hours, in wich we work through the following types of models:

1. Regression & Classification
2. Mixture models
3. Hidden Markov models
4. Kalman filters

In each block, we fit the particular probabilistic model using either of the two schools of thought. This results in two notebooks per block: one using MC sampling (e.g. PP-1-sampling) and one using VI (e.g. PP-1-variational). The headers and data generation are equal in both notebooks.

## Model Critiquing & Improvement
It is important to continuously critique and improve your model design. This is daily practice for most data scientists and machine learning engineers. How to critique models is a skill often expected to be obtained through experience; practice makes perfect. However, there are quite a few heuristics that can serve as useful tools in your toolbelt. Today we will be going over a few of them.

## Materials

#### Reading
- [Wikipedia](https://en.wikipedia.org/wiki/Probabilistic_programming)
- [Variational inference: a review for statisticians (Blei et al., 2018)](https://arxiv.org/pdf/1601.00670.pdf)

#### Videos
- [Intro to programming in Julia](https://youtu.be/8h8rQyEpiZA?t=233).

#### Software
- Multi-language
    - [Stan](https://mc-stan.org/)
- Julia
    - [Turing.jl](https://turing.ml/dev/tutorials/0-introduction/)
    - [ForneyLab.jl](https://biaslab.github.io/forneylab/docs/getting-started/)
- Python
    - [Pyro](http://pyro.ai/)
    - [TensorFlow Probability](https://www.tensorflow.org/probability/)
- MATLAB
    - [Stat & ML Toolbox](https://nl.mathworks.com/products/statistics.html)
    - [dimple](https://github.com/analog-garage/dimple)
- .NET
    - [Infer.NET](https://www.microsoft.com/en-us/research/project/infernet/)