## [Conditional Independence, Subsampling, and Amortization](http://pyro.ai/examples/svi_part_ii.html#SVI-Part-II:-Conditional-Independence,-Subsampling,-and-Amortization)

 The model has observations x and latent random variables z as well as parameters $\theta$. It has a joint probability density of the form: $$p_{\theta}({\bf x}, {\bf z}) = p_{\theta}({\bf x}|{\bf z}) p_{\theta}({\bf z})$$



$$\log p_{\theta}({\bf x}) = \log \int\! d{\bf z}\; p_{\theta}({\bf x}, {\bf z})$$

$$\theta_{\rm{max}} = \underset{\theta}{\operatorname{argmax}} \log p_{\theta}({\bf x})$$

$$p_{\theta_{\rm{max}}}({\bf z} | {\bf x}) = \frac{p_{\theta_{\rm{max}}}({\bf x} , {\bf z})}{
\int \! d{\bf z}\; p_{\theta_{\rm{max}}}({\bf x} , {\bf z}) }$$

The basic idea is that we introduce a parameterized distribution $q_{\phi}({\bf z})$, where  are known as the variational parameters. This distribution is called the variational distribution in much of the literature, and in the context of Pyro it’s called the **guide** (one syllable instead of nine!). 

**Pyro enforces that model() and guide() have the same call signature, i.e. both callables should take the same arguments**.

Learning will be setup as an optimization problem where each iteration of training takes a step in $\theta-\phi$ space that moves the guide closer to the exact posterior. To do this we need to define an appropriate objective function.

The **ELBO**, which is a function of both $\theta$ and $\phi$, is defined as an expectation w.r.t. to samples from the guide:

$${\rm ELBO} \equiv \mathbb{E}_{q_{\phi}({\bf z})} \left [
\log p_{\theta}({\bf x}, {\bf z}) - \log q_{\phi}({\bf z})
\right]$$

$$\log p_{\theta}({\bf x}) - {\rm ELBO} =
\rm{KL}\!\left( q_{\phi}({\bf z}) \lVert p_{\theta}({\bf z} | {\bf x}) \right)$$

$$\sum_{i=1}^N \log p({\bf x}_i | {\bf z}) \approx  \frac{N}{M}
\sum_{i\in{\mathcal{I}_M}} \log p({\bf x}_i | {\bf z})$$

If a user wants to do this sort of thing in Pyro, he or she first needs to make sure that the model and guide are written in such a way that Pyro can leverage the relevant conditional independencies. Let’s see how this is done. Pyro provides two language primitives for marking conditional independencies: plate and markov. Let’s start with the simpler of the two.

In [1]:
import pyro
import torch

In [2]:
from pyro.infer import SVI, Trace_ELBO

In [3]:
from pyro.optim import Adam

def per_param_callable(param_name):
    if param_name == 'my_special_parameter':
        return {"lr": 0.010}
    else:
        return {"lr": 0.001}

optimizer = Adam(per_param_callable)

This simply tells Pyro to use a learning rate of 0.010 for the Pyro parameter my_special_parameter and a learning rate of 0.001 for all other parameters.

In [4]:
import pyro.distributions as dist

For this model the observations are conditionally independent given the latent random variable latent_fairness. To explicitly mark this in Pyro we basically just need to replace the Python builtin range with the Pyro construct plate:

In [10]:
def model(data):
    alpha0 = torch.tensor(10.0)
    beta0 = torch.tensor(10.0)

    f = pyro.sample("latent_fairness", dist.Beta(alpha0, beta0))

    # for i in pyro.plate("data_loop", len(data)):
    #     pyro.sample("obs_{}".format(i), dist.Bernoulli(f), obs=data[i])
        # Each observation is assigned a unique name in Pyro.

    with pyro.plate("obsered_data"):
        pyro.sample("obs", dist.Bernoulli(f), obs=data) # vectorized plate

For example, pyro.plate is **not appropriate for temporal models** where each iteration of a loop depends on the previous iteration; in this case a range or **pyro.markov** should be used instead.

In [6]:
def guide(data):
    alpha_q = pyro.param("alpha_q", torch.tensor(15.0), constraint=dist.constraints.positive)
    beta_q = pyro.param("beta_q", torch.tensor(15.0), constraint=dist.constraints.positive)

    pyro.sample("latent_fairness", dist.Beta(alpha_q, beta_q))

The variational parameters are torch.tensors. The **requires_grad flag is automatically set to True by pyro.param**.

In [7]:
adam_params = {"lr": 0.005, "betas": (0.9, 0.999)}
optimizer = Adam(adam_params)

In [9]:
svi = SVI(model, guide, optimizer, loss=Trace_ELBO())