In [1]:
%autosave 0
%matplotlib inline

import os, sys
sys.path.insert(0, os.path.expanduser('~/git/github/pymc-devs/pymc3'))

import matplotlib
import matplotlib.pyplot as plt
from matplotlib import rc
import daft
import pymc3 as pm
import numpy as np
import seaborn as sns

from IPython.display import display
from IPython.display import HTML
import IPython.core.display as display

Autosave disabled


<br />

<div style="text-align: center;">
<font size="7"><b>Stein variational gradient descent</b></font> <br /> <br /> <br />
<font size="7"><b>(Liu and Wang, 2016)</b></font>
<br />
<br />
<br />
<br />
<div style="text-align: right;">
<font size="6">Taku Yoshioka</font>
</div>

<br />

* Today's slide will be uploaded: https://github.com/PyDataOsaka/pydata-osaka-2017

# Agenda
* Variational Bayesian inference
* Construction of variational posterior
* Algorithm of SVGD
* Theory of SVGD
* Examples

# Example of Bayesian inference
<div align="center">
<img class="stretch" src="http://www.cns.atr.jp/cbi/wp-content/uploads/2010/07/hierarchical_bayes_estimation.png" width="50%" height="50%"/>
</div>

# Variational Bayesian inference
* Approximate $p(\mathbf{z}|\mathbf{x})$ by *variational posterior* $q(\mathbf{z})$
* Maximize *evidence lower bound (ELBO)* ${\cal L}(q)$ w.r.t. $q$ minimizes $KL[q(\mathbf{z})||p(\mathbf{z}|\mathbf{x})]$

\begin{eqnarray}
\cal{L}(q) & = & \mathbb{E}_{q(\mathbf{z})}\left[\log p(\mathbf{x},\mathbf{z}) - \log q(\mathbf{z})\right] \\
           & = & \mathbb{E}_{q(\mathbf{z})}\left[\log p(\mathbf{x}|\mathbf{z})\right] - KL\left[\log q(\mathbf{z})||p(\mathbf{z})\right] \\
           & = & \log p(\mathbf{x}) - KL[q(\mathbf{z})||p(\mathbf{z}|\mathbf{x})]
\end{eqnarray}

# Construction of variational posterior

* Accurate models need more computation
* Tractable models underfit desired distribution
* The most simplest form: Gaussian mean-field $q(\mathbf{z})=N(\mathbf{\mu},\mathbf{I})$
<div align="center">
<img class="stretch" src="mf.png" width="50%" height="50%"/>
</div>
<div align="right">
[Kucukelbir et al., 2016]
</div>

# Two approaches for improving accuracy

* Parametric transform of posterior samples with correcting density *(normalizing flows)*
* Non-parametric tramsform of posterior function in *reproducing kernel hirbert space (RKHS)* -- Stein variational gradient descent (SVGD)

    * Simple and fast
    * No assumption on the form of $q(\mathbf{z})$

# Algorithm of SVGD
<div align="center">
<img class="stretch" src="svgd-algo.png" width="100%" height="100%"/>
</div>
* Extremely simple!

# Theory: smooth transforms
* Consider smooth transforms

\begin{eqnarray}
\mathbf{T}(\mathbf{z}) & = & \mathbf{z}+\epsilon\phi(\mathbf{z}) \\
q_{[T]}(\mathbf{z}) & = & q(\mathbf{T}^{-1}(\mathbf{z}))
\left|{\rm det}(\nabla_{z}\mathbf{T}^{-1}(\mathbf{z}))\right|
\end{eqnarray}

* ($T$ and $\phi$ are vector functions in the dimension of $\mathbf{z}$)


### Theory: gradient of KL-divergence
* KL-divergence wrt $\epsilon$ is represented with *Stein operator* $\cal{A}_{p}$:

\begin{eqnarray}
\nabla_{\epsilon}KL(q_{[T]}||p) & = & -\mathbb{E}_{q}[{\rm trace(\cal{A}_{p}\phi(\mathbf{z}))}] \\ \cal{A}_{p}\phi(\mathbf{z}) & \equiv & \phi(\mathbf{z})\nabla_{z}\log p(\mathbf{z})^{T}+\nabla_{z}\phi(\mathbf{z})
\end{eqnarray}

* The key result of this study

# Theory: steepest direction in RKHS
* Restrict $\phi(\mathbf{z})$ in RKHS
* Analytic solution of the steepest direction for the gradient above

\begin{eqnarray}
\phi_{q,p}^{*}(\cdot) & = & {\rm arg}\min_{\phi}\nabla_{\epsilon}KL(q_{[T]}||p) \\
                  & = & \mathbb{E}_{q}[k(\mathbf{z},\cdot)\nabla_{z}\log p(\mathbf{z}|\mathbf{x})+\nabla_{z}k(\mathbf{z},\cdot)]
\end{eqnarray}

* Approximating the expectation with MC results in the algo
<div align="center">
<img class="stretch" src="svgd-algo.png" width="100%" height="100%"/>
</div>

# Examples
* Website of SVGD http://www.cs.dartmouth.edu/%7Edartml/project.html?p=vgd

# Implementation
* PyMC3 [#1671](https://github.com/pymc-devs/pymc3/pull/1671/commits/e59f01bc761b700e8b6badc711548215cc8bf358)

# (Possible) Application
* [Autoencoding LDA](https://taku-y.github.io/notebook/20160928/lda-advi-ae.html): may have complex posterior