## [SVI Part I: An Introduction to Stochastic Variational Inference in Pyro](http://pyro.ai/examples/svi_part_i.html#SVI-Part-I:-An-Introduction-to-Stochastic-Variational-Inference-in-Pyro)

 The model has observations x and latent random variables z as well as parameters $\theta$. It has a joint probability density of the form: $$p_{\theta}({\bf x}, {\bf z}) = p_{\theta}({\bf x}|{\bf z}) p_{\theta}({\bf z})$$



$$\log p_{\theta}({\bf x}) = \log \int\! d{\bf z}\; p_{\theta}({\bf x}, {\bf z})$$

$$\theta_{\rm{max}} = \underset{\theta}{\operatorname{argmax}} \log p_{\theta}({\bf x})$$

$$p_{\theta_{\rm{max}}}({\bf z} | {\bf x}) = \frac{p_{\theta_{\rm{max}}}({\bf x} , {\bf z})}{
\int \! d{\bf z}\; p_{\theta_{\rm{max}}}({\bf x} , {\bf z}) }$$

The basic idea is that we introduce a parameterized distribution $q_{\phi}({\bf z})$, where  are known as the variational parameters. This distribution is called the variational distribution in much of the literature, and in the context of Pyro it’s called the **guide** (one syllable instead of nine!). 

**Pyro enforces that model() and guide() have the same call signature, i.e. both callables should take the same arguments**.

Learning will be setup as an optimization problem where each iteration of training takes a step in $\theta-\phi$ space that moves the guide closer to the exact posterior. To do this we need to define an appropriate objective function.

The **ELBO**, which is a function of both $\theta$ and $\phi$, is defined as an expectation w.r.t. to samples from the guide:

$${\rm ELBO} \equiv \mathbb{E}_{q_{\phi}({\bf z})} \left [
\log p_{\theta}({\bf x}, {\bf z}) - \log q_{\phi}({\bf z})
\right]$$

$$\log p_{\theta}({\bf x}) - {\rm ELBO} =
\rm{KL}\!\left( q_{\phi}({\bf z}) \lVert p_{\theta}({\bf z} | {\bf x}) \right)$$

In [1]:
import pyro

In [2]:
from pyro.infer import SVI, Trace_ELBO

In [3]:
from pyro.optim import Adam

def per_param_callable(param_name):
    if param_name == 'my_special_parameter':
        return {"lr": 0.010}
    else:
        return {"lr": 0.001}

optimizer = Adam(per_param_callable)