## Independence of Random Variables

As mentioned in [https://pyro.ai/examples/svi_part_iii.html#Aside:-Dependency-tracking-in-Pyro](https://pyro.ai/examples/svi_part_iii.html#Aside:-Dependency-tracking-in-Pyro), tracking dependencies of random variables in a general python code is difficult. So pyro currently makes conservative assumption: if the programmers don't tell pyro the variables are independent, it just assume they are dependent. So **later `pyro.sample` statements are assumed to depend on the earlier ones in a model if pyro is not informed otherwise**.

In [2]:
import torch
import pyro
import pyro.distributions as dist

Let's take an example:

In [None]:
def model(data):
    # sample f from the beta prior
    alpha0, beta0 = 10., 10.
    f = pyro.sample("latent_fairness", dist.Beta(alpha0, beta0))
    # loop over the observed data using pyro.sample with the obs keyword argument
    for i in range(len(data)):
        # observe datapoint i using the bernoulli likelihood
        pyro.sample("obs_{}".format(i), dist.Bernoulli(f), obs=data[i])

Although we want to let each observation be independent of each other given $f$, the sequential order here will implicitly assume `obs_{}.format(i)` depends on `obs_{}.format(i-1)`. To remove such dependency assumptions, we can use `pyro.plate`:

In [None]:
def model(data):
    # sample f from the beta prior
    alpha0, beta0 = 10., 10.
    f = pyro.sample("latent_fairness", dist.Beta(alpha0, beta0))
    # loop over the observed data [WE ONLY CHANGE THE NEXT LINE]
    for i in pyro.plate("data_loop", size=len(data)):
        # observe datapoint i using the bernoulli likelihood
        pyro.sample("obs_{}".format(i), dist.Bernoulli(f), obs=data[i])

In this case, `obs_{}.format(i)` are independent of each other under `f`, and they are still dependent on `f` (note that `f` is outside `pyro.plate`). The word `plate` comes from the [plate notation](https://en.wikipedia.org/wiki/Plate_notation) of probabilistic graphical models, where each plate means a **repetition** of sampling (with the repeating times at the right bottom of the plate, here the `size=len(data)`), and each repetition is independent of each other given the variables outside the plate. 

Pyro also supports vectorized operation (which would be faster if GPUs are available). The above for loop can be rewritten as the following: 

In [8]:
data = torch.zeros(10)
data[0:6] = torch.ones(6)  # 6 heads and 4 tails

def model(data):
    # sample f from the beta prior
    alpha0, beta0 = 10., 10.
    f = pyro.sample("latent_fairness", dist.Beta(alpha0, beta0))
    with plate('observe_data', size=data.shape[0]):
        pyro.sample('obs', dist.Bernoulli(f), obs=data)

So what does `pyro.plate` actually do? When we call `pyro.plate`, every random tensors generated by `pyro.sample` statement within the plate block will have their `batch_shape` expanded by a dimension from the left (with `size` as the size of that dimension), and pyro knows that random tensors along that dimension are independent (only if you don't specify an optional arg `dim`). For example, 

In [40]:
with pyro.plate('plate-1', size=10): # repeat 10 times
    v = pyro.sample('v', dist.Normal(0, 1))
print(v.shape) # v[0], v[1], ..., v[9] are independent

x = pyro.sample('x', dist.Normal(0, 1))
print(x.shape)

torch.Size([10])
torch.Size([])


In [41]:
with pyro.plate('plate-2', size=10):
    v = pyro.sample('out', dist.Normal(0, 1)) # v[0], v[1], ..., v[9] are independent
    with pyro.plate('plate-3', size=5):
        x = pyro.sample('inside', dist.Normal(0, 1)) # given v[i], x[0, i], x[1, i], ..., x[4, i] are independent

print(v.shape)
print(x.shape)

torch.Size([10])
torch.Size([5, 10])


So when we do specify the optional argument `dim` of `pyro.plate`, new `batch_shape` dimension will not be created but the random tensors along the specified dimension become independent:

In [7]:
with pyro.plate('plate-4', dim=-1):
    v = pyro.sample('v', dist.Normal(0, 1)).expand(2, 3)

# here v[i, 0], v[i, 1], v[i, 2] are independent
print(v.shape)

torch.Size([2, 3])


Also, note that create the new `batch_shape` dimension both by `pyro.plate` and `pyro.sample` won't fold:

In [10]:
with pyro.plate('plate-5', size=2):
    v = pyro.sample('v', dist.Bernoulli(torch.as_tensor([0.1, 0.8])))

assert v.shape == (2, ) # still only dimension is created

For more depth, see
- The documentation for [plate](https://docs.pyro.ai/en/stable/primitives.html?highlight=plate#pyro.primitives.plate)
- A highly recommended discussion about random variable dependency can also be found here [https://forum.pyro.ai/t/dependency-tracking-in-pyro/500/3](https://forum.pyro.ai/t/dependency-tracking-in-pyro/500/3)

## Usage in SVI

Remeber in SVI, we need to compute an estimator of ELBO: $$ELBO \approx \frac{1}{S}\sum_{i=1}^S \log p(\mathbf{x}, \mathbf{z}_i) - \log q_{\phi}(\mathbf{z}_i)$$

If $\mathbf{x} = (x_1, x_2, \dots, x_n)$ are independent given $\mathbf{z}$, then we can rewrite $$\log p(\mathbf{x}|\mathbf{z}) = \sum_{j=1}^N \log p(x_j | \mathbf{z})$$

When $N$ is large, we can use subsample to estimate this quantity: $$\sum_{j=1}^N \log p(x_j | \mathbf{z}) \approx \frac{N}{M}\sum_{j \in I_M}^M \log p(x_j | z)$$