Initial Setup:

In [1]:
from __future__ import print_function
# import collections
# tfe = tf.contrib.eager
# try:
#   tfe.enable_eager_execution()
# except ValueError:
#   pass

import matplotlib.pyplot as plt
import numpy as np

import tensorflow as tf
import tensorflow_probability as tfp
tfd = tfp.distributions

$\theta_0$ : Make a batch of bernoulli distributions and draw samples from it $G_0$.

In [2]:
theta_0 = tfd.Bernoulli(probs=[[0.9, 0.4],[0.3, 0.9]], name='theta_0')
G_0 = theta_0.sample()

print(G_0)

Tensor("theta_0/sample/Reshape:0", shape=(2, 2), dtype=int32)


In [3]:
sess = tf.Session()
G_0 = sess.run(G_0)
print(G_0)

[[1 1]
 [0 1]]


Following method computes the Kronecker product two matrices. The convention is usually backwards that mat2 is scaled and block replicated on mat1 elementwise. Here lines are changed so that mat1 is scaled and block replicated on mat2. This is inline with the paper conventions.

In [4]:
def kronecker_product(mat1, mat2):
    m1 = tf.shape(mat1)[0]
    n1 = tf.size(mat1) // m1
    m2 = tf.shape(mat2)[0]
    n2 = tf.size(mat2) // m2
#     mat1_rsh =tf.reshape(mat1, [m1, 1, n1, 1])
    mat2_rsh =tf.reshape(mat1, [1, m1, 1, n1])
#     mat2_rsh =tf.reshape(mat2, [1, m2, 1, n2])
    mat1_rsh =tf.reshape(mat2, [m2, 1, n2, 1])
    return tf.reshape(mat1_rsh*mat2_rsh, [m1 * m2, n1 * n2])
#     return tf.reshape(tf.reshape(mat1_rsh * mat2_rsh, [m1 * m2, n1 * n2]),[-1])

EXAMPLE for kronecker product:

In [5]:
a = tf.placeholder(tf.float32)
b = tf.placeholder(tf.float32)
c = kronecker_product(a,b)

p = [[1.,0],[1.,0]]
q = [[3,4],[5.,6.]]
# c = sess.run(c, feed_dict={a:[[1.,0],[1.,0]], b:[[3.,4.],[5.,6.]]})
out = sess.run(c, feed_dict={a:p, b:q})
out

array([[ 3.,  0.,  4.,  0.],
       [ 3.,  0.,  4.,  0.],
       [ 5.,  0.,  6.,  0.],
       [ 5.,  0.,  6.,  0.]], dtype=float32)

For each $\theta_c$ make a matching batch of random Gaussian noise variables, given the $\theta_c$ matrix size: $\eta_c$. This will be the additive noise for each $\theta_c$ at each stage of kronecker multiplication.

In [6]:
def addNoise2Theta(theta_c, mean=0, std=0.01, NRMLZ=True):
    # Set the seed so the results are reproducible.
    np.random.seed(123)
    # Make distribution, sample and add
    p_eta = tfd.Normal(loc=mean, scale=std, name='AdditiveEdgeNoise')
    eta_c = p_eta.sample(tf.shape(theta_c))
    theta_c_noise = theta_c + eta_c
    # Make sure values do not run off
    # Clip values out of [0,1]
    if NRMLZ:
        theta_c_noise_nrml   = tf.maximum(tf.minimum(theta_c_noise,1),0)
        return theta_c_noise_nrml
    else:
        return theta_c_noise

EXAMPLE for noisy $\theta_c$:

In [7]:
theta   = tf.placeholder(tf.float32)
theta_noisy = addNoise2Theta(theta)
p = np.array([[1.,0],[1.,0]])
out = sess.run(theta_noisy, feed_dict={theta:p})
print(out)

[[  1.00000000e+00   7.23697085e-05]
 [  9.90425408e-01   1.50220341e-03]]


In [8]:
# Junk trials - can be ignored
Pk = tf.placeholder(tf.float32)

mean = 0
std = 0.1
p_eta = tfd.Normal(loc=mean, scale=std, name='AdditiveEdgeNoise')
eta = p_eta.sample(tf.shape(Pk))
probs = Pk + eta
probs_n = tf.maximum(tf.minimum(probs,1),0)
# a = tf.constant(Pk)

# Eta = tfd.Normal(loc=np.repmat([0], Theta), scale=, name='EdgeNoise_small')

[probs_n, probs, eta_] = sess.run([probs_n, probs, eta], feed_dict={Pk:[[1.,0],[1.,0]]})
print(eta_)
print(probs)
print(probs_n)

[[-0.179107    0.11355007]
 [ 0.07164492 -0.11725985]]
[[ 0.82089299  0.11355007]
 [ 1.0716449  -0.11725985]]
[[ 0.82089299  0.11355007]
 [ 1.          0.        ]]


Now let us try to generate a sample set of $\mu$s. These $\mu$s come from the same $\beta$-distribution. The 
samples for $\mu$ are the threshold values for the Bernoulli distributions in a single $\theta_c$.

In [9]:
theta_shape   = tf.placeholder(tf.int32)
alpha  = tf.placeholder_with_default([1.0], [1])
beta   = tf.placeholder_with_default([1.0], [1])
p_mu   = tfd.Beta(alpha, beta, name='Mu_distb')
mu     = tf.squeeze(p_mu.sample(theta_shape),axis=2)

mu_out  = sess.run(mu, feed_dict={theta_shape:[2,2]})
mu_out

array([[ 0.31504449,  0.67142296],
       [ 0.51679528,  0.70658988]], dtype=float32)

### TODO
* Need a markov chain specification for the alphas that parameterise the $\beta$-distribution for $\mu$.
* Need to connect all the distributions
* Parameterise $N_c$ --> size of square $\theta_c$ matrix
* Parameterise $K_c$ --> which is the number of kronecker products $\theta_c$ undergoes in the kronecker sequence
* How to connect (smoothness) between consecutive $\theta$s or $P_k$s? 
    * Markov relationship between the consecutive $\alpha$s
    * Connect $P_c$ to $P_{c+1}$ through some conditional distribution
    * Connect $P_c$ to $\theta_{c+1}$ through some conditional distribution
    * Do some statistics on $P_c$ to change distribution for $\mu_{c+1}$ by altering $\alpha_{c+1}$ or $\beta_{c+1}$

> Is it not important then to preserve the order of $\theta_c$s in the kronecker sequence? The smoothness should propagate to the relevant next layers of abstractions.

* Implement permutation and additive noise for edges from SKG paper.

### Questions to answer:
* Sample from the distributions to generate graphs G. Do they look good?
* Compute log likelihoods. Verification of likelihoods holds?
* Does inference work?
* What is the improvement over the permutation space?
* What kind of data should we test this on?
* Check out the datasets used in SKG paper.



### Ideas
* Stick breaking distribution for $K$
* Prime factor decomposition for $N$

### Practical stuff
* Bridge the gap between TFP implementation and log likelihood computation in SKG
* Carl tutorial for inference
* TFP tutorial for covariance matrix extimation
