three types of dimension:
* plate dims
* sample dimensions (usually indexed K)
* user dims; underlying dimensions (which the user gets to interact with). 

In [1]:
import torch.nn as nn
from torch.distributions import Normal
import numpy as np

import sys; sys.path.append("..")
from tpp_trace import *
import utils as u
import tensor_ops as tpp

## Run

In [2]:
kappa = 3
n = 2
tr = sample_and_eval(chain_dist, draws=kappa, nProtected=n, data={"a": 2})

#tr = sample_and_eval(plate_chain_dist, draws=kappa, nProtected=2)#, data={"a": 2})
#tr.trace.out_dicts

tr.trace.out_dicts['sample']['__a']

Plates: []
('_K', 'pos_A', 'pos_B')
('_k__d', '_k__c', 'pos_A', 'pos_B')


  return super(Tensor, self).rename(names)


tensor([[[-0.2551,  0.9730,  0.8046]],

        [[-3.6178,  1.9766, -7.3755]],

        [[ 1.4900, -6.8641,  0.2563]]], names=('_k__a', 'pos_A', 'pos_B'))

## index-aware summing

We have: one factor corresponding to each variable (latent or observed)

e.g. the trace output for the 4 gaussians in our chain example

every time we sample, we add a new dimension. Need to delete these after eval

### No plate case

1. Take all indices `set(I)` in the tensors $T_a$ that depend on `__a` (that have `_k__a` in their names)
2. use pytorch names to order the dims the same in each tensor
3. multiply $T_a$ (as in `*`)
4. sum out `__a`


do the reduction as a for-loop (picking the first K dimension and combining all the tensors with that dimension).


In [3]:
kappa = 2
n = 2
data = {} # {"a": [4] * 100}
tr = sample_and_eval(chain_dist, draws=kappa, nProtected=n, data=data)
tensors = tr.trace.out_dicts['log_prob']

tpp.combine_tensors(tensors)


Plates: []
('_K', 'pos_A', 'pos_B')
('_k__d', '_k__c', 'pos_A', 'pos_B')
tensor([[-18.0145, -32.0846],
        [-12.1288, -38.3957]], names=('_k__c', '_k__b'))
tensor([-53.0691, -53.9199], names=('_k__c',))
tensor([-8.6897, -1.8068], names=('_k__c',))
tensor(-18.5551)


{'_k__c': tensor(-18.5551)}

### Testing VI

Set up a Gaussian graphical model:
$$z \sim N(0,1)$$
$$x \sim N(z, 1)$$
Now, we can get $P(x| z) = N(\mu_{x|z}, \Sigma_{xx|z})$ analytically 

$$\mu_{x|z} = \mathbf{\mu_x + \Sigma_{xz}\Sigma_{zz}^{-1}(z-\mu_z)}$$ 
$$\Sigma_{x|z} = \mathbf{\Sigma_{xx} - \Sigma_{xz}\Sigma_{zz}^{-1}\Sigma_{zx}}$$

---

We use the approximate posterior, $$Q(z) = N(\mu, \sigma^2)$$, where mu and sigma are learned parameters. 

When we learn using our ELBO-thing, do those parameters learn to match the true posterior?


In [4]:
# e.g. bivariate example
z_mean, z_var = 0, 1
x_var = 1
rho = 0.5

z = 0 #z.sample([n]).mean()
GROUND_TRUTH_POST_MU = u.biv_norm_conditional_mean(z, z_mean, np.sqrt(x_var), \
                                              np.sqrt(z_var), rho, z)
GROUND_TRUTH_POST_VAR = u.biv_norm_conditional_var(x_var, rho)

GROUND_TRUTH_MU, GROUND_TRUTH_VAR


# u.analytical_posterior_var(var, X)
# u.analytical_posterior_mean(prior_mean, var, X, Z) 

(0.0, 0.25)

# VI without plates

i.e. no repeating bits to abstract over

Optimise the params of an approx posterior over extended Z-space, but not K space

$$Q (Z|X) = \prod_k  Q(Z^k|X) = \prod_k \prod_i Q(Z^k_i \mid Z^k_{qa(i)})$$

and

$$\prod_j f_j^{\kappa_j} = \frac{P(x_, Z)}{\prod Q(z_i^{k_i})}$$

Writing out the target (log marginal likelihood) fully makes the computation clear:

$$ \mathcal{L}= E_{Q(Z|X)} \left[ \log \frac{∑_K  P(Z,K,X)}{Q (Z|X)} \right]$$
$$= E_{Q} \left[ \log \frac{∑_K  P(Z,K,X)}{Q (Z|X)} \right]$$

In [5]:
# call sampler on Q. 
# gives you the samples and a log Q tensor `log_prob`

# (implemented as call `sampler` then run the chain_dist)

# pass these to evaluator, which does a lookup for all the latents
# gives you log P for each latent


## VI with plates

$T$: underlying tensor list 

1. Get plate order
    - by looking for the first tensor with that plate
    - and using the index of that tensor?
2. Sum out plates in reverse order of definition

For p in reverse(plates):
* $T_{\mathrm{new}} = []$

* $T_p \leftarrow$ all tensors in p

* Remove $T_p$ from $T$

* $T_{\mathrm{new}} \leftarrow T_p$

* Sum out all sample indexes within the plate

* $T_{\backslash p} \leftarrow$ Sum out the plate

* $T += T_{\backslash p}$


In [6]:
def rearrange_by_plate(tensor_dict) :
    """
    :param tensor_dict: dict of log_prob tensors
    :return: dict of dicts of log_prob tensors, dividing by plates
    """
    return NotImplementedError()


# Sum:  sum over each plate
def plate_sum() :
    return NotImplementedError()

In [None]:
# Consider Figure 1 from TMC

"""
z2 | z1
z3 | z1
z4 | z2
x | z3, z4
"""
def simple_dist(trace):
    k1 = trace["i1"](WrappedDist(Normal, t.ones(3), 3))
    k2 = trace["i2"](WrappedDist(Normal, k1, 3))
    k3 = trace["i3"](WrappedDist(Normal, k2, 3))
    (k3,) = trace.delete_names(("i1", "i2"), (k3,))
    k4 = trace["i4"](WrappedDist(Normal, k3, 3))
    
    return k4

tr = sample_and_eval(simple_dist, draws=kappa, nProtected=2)#, data={"a": 2})

## index-aware sampling