# Trying to use new TF HMM code with edward:

The version of Edward that we have been using so far uses an old version of TF, which includes Probability as a submodule tf.contrib.Distribution, and this submodule doesn't contain HMM.

Yet the new version of TF Probability (**release 0.5**, and requires TF **1.12**), which is now a standalone module, does contain HMM. The goal of this notebook is to use that and wrap it as an edward Random Variable to do inference.

**Challenge:** The new TFP actually also contains the newest verion of Edward, as Edward2. But they haven't ported everything from Edward to Edward2 yet, in particular not the inference. We probably want to try to use only the new HMM from the new TF and use it in our (old) version of Edward.

In [1]:
import tensorflow as tf
import edward as ed
import numpy as np
import tensorflow.contrib.distributions as tfd

Instructions for updating:
Use the retry module or similar alternatives.


## Porting HMM from new TFP to our version of edward and tf:

In [3]:
from os.path import join, abspath
import sys
sys.path.append(join(abspath('.'), '../utils'))
from tf.tf_hidden_markov_model import HiddenMarkovModel

### Example, from tfp:

In [4]:
initial_distribution = tfd.Categorical(probs=[0.8, 0.2])

Suppose a cold day has a 30% chance of being followed by a hot day and a hot day has a 20% chance of being followed by a cold day.

We can model this as:

In [5]:
transition_distribution = tfd.Categorical(probs=[[0.7, 0.3],
                                                 [0.2, 0.8]])

Suppose additionally that on each day the temperature is normally distributed with mean and standard deviation 0 and 5 on a cold day and mean and standard deviation 15 and 10 on a hot day.

We can model this with:

In [6]:
observation_distribution = tfd.Normal(loc=[0., 15.], scale=[5., 10.])

We can combine these distributions into a single week long hidden Markov model with:

In [7]:
model = HiddenMarkovModel(
    initial_distribution=initial_distribution,
    transition_distribution=transition_distribution,
    observation_distribution=observation_distribution,
    num_steps=7)

In [8]:
with tf.Session() as sess:
    # The expected temperatures for each day are given by:
    print(sess.run(model.mean()))  # shape [7], elements approach 9.0
    # The log pdf of a week of temperature 0 is:
    print(sess.run(model.log_prob(tf.zeros(shape=[7]))))

Tensor("HiddenMarkovModel/cond/Merge:0", shape=(?,), dtype=float32)
Tensor("HiddenMarkovModel/mean/Normal/mean/mul:0", shape=(2,), dtype=float32)
[2.9999998 5.9999995 7.4999995 8.25      8.625001  8.812501  8.90625  ]
Tensor("HiddenMarkovModel/cond/Merge:0", shape=(?,), dtype=float32)
Tensor("zeros:0", shape=(7,), dtype=float32)
-20.120832


## Trying to build our model:

In [8]:
from edward.models import Categorical, Dirichlet

### Without priors:

In [86]:
# from issue 
chain_len = 30
n_hidden = 3
n_obs = 3

x_0 = Categorical(probs=tf.fill([n_hidden], 1.0 / n_hidden))

# transition matrix
T = tf.nn.softmax(tf.Variable(tf.random_uniform([n_hidden, n_hidden])), dim=0)
transition_distribution = Categorical(probs=T)

# emission matrix
E = tf.nn.softmax(tf.Variable(tf.random_uniform([n_hidden, n_obs])), dim=0)
emission_distribution = Categorical(probs=E)

model = HiddenMarkovModel(
    initial_distribution=x_0,
    transition_distribution=transition_distribution,
    observation_distribution=emission_distribution,
    num_steps=chain_len)

In [87]:
# INFERENCE
q_0 = Categorical(probs=tf.nn.softmax(tf.Variable(tf.ones(n_hidden), expected_shape=n_hidden)))
qt = Categorical(probs=tf.nn.softmax(tf.Variable(tf.random_uniform([n_hidden, n_hidden])), dim=0))
qe = Categorical(probs=tf.nn.softmax(tf.Variable(tf.random_uniform([n_hidden, n_obs])), dim=0))
qm = HiddenMarkovModel(q_0, qt, qe, chain_len)

y_data = ([0] * 10) + ([1] * 10) + ([2] * 10)
y_data = np.array(y_data)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    inference = ed.KLqp({x_0: q_0,
                         transition_distribution: qt,
                         emission_distribution: qe},
                        {model: y_data})
    inference.run(n_iter=5000)

AttributeError: 'Tensor' object has no attribute '_graph_parents'

### With priors:

In [3]:
# from issue 
chain_len = 30
n_hidden = 3
n_obs = 3

pi_0 = Dirichlet(tf.ones(n_hidden))
x_0 = Categorical(probs=pi_0)

# transition matrix
# pi_T = [Dirichlet(tf.ones(n_hidden)) for i in range(n_hidden)]
pi_T = Dirichlet(tf.ones([n_hidden, n_hidden]))
# T = tf.nn.softmax(tf.Variable(tf.random_uniform([n_hidden, n_hidden])), dim=0)
transition_distribution = Categorical(probs=pi_T)

# emission matrix
# E = tf.nn.softmax(tf.Variable(tf.random_uniform([n_hidden, n_obs])), dim=0)
pi_E = Dirichlet(tf.ones([n_hidden, n_obs]))
emission_distribution = Categorical(probs=pi_E)

model = HiddenMarkovModel(
    initial_distribution=x_0,
    transition_distribution=transition_distribution,
    observation_distribution=emission_distribution,
    num_steps=chain_len)

NameError: name 'Dirichlet' is not defined

Note: adding sample_shape to x_0, transmission_distribution, emission_distribution, and model, doesn't fail but doesn't converge either...

In [None]:
# INFERENCE
qpi_0 = Dirichlet(tf.nn.softplus(tf.Variable(tf.ones(n_hidden), name="q_0/concentration")))
qpi_T = Dirichlet(tf.nn.softplus(tf.Variable(tf.ones([n_hidden, n_hidden]), name="qt/concentration")))
qpi_E = Dirichlet(tf.nn.softplus(tf.Variable(tf.ones([n_hidden, n_obs]), name="qe/concentration")))

# qm = HiddenMarkovModel(q_0, qt, qe, chain_len) # not necessary

y_data = ([0] * 10) + ([1] * 10) + ([2] * 10)
y_data = np.array(y_data)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    inference = ed.KLqp({pi_0: qpi_0,
                         pi_T: qpi_T,
                         pi_E: qpi_E},
                        {model: y_data},)
    inference.run(n_iter=5000, logdir='./testlog')
    inferred_pi0, inferred_pi_T, inferred_pi_E = sess.run([qpi_0.mean(), qpi_T.mean(), qpi_E.mean()])

In [158]:
inferred_pi_T

array([[0.67318153, 0.1437307 , 0.18308769],
       [0.19555463, 0.57955325, 0.2248921 ],
       [0.2137069 , 0.27166   , 0.5146331 ]], dtype=float32)

In [159]:
inferred_pi_E

array([[0.12495308, 0.6211583 , 0.25388864],
       [0.56491506, 0.1938043 , 0.2412807 ],
       [0.28481784, 0.32066643, 0.39451572]], dtype=float32)

In [137]:
model.transition_distribution

<ed.RandomVariable 'Categorical_65/' shape=(3,) dtype=int32>

In [139]:
transition_distribution._graph_parents

[<tf.Tensor 'Categorical_65/logits/Log:0' shape=(3, 3) dtype=float32>,
 <tf.Tensor 'Dirichlet_111/sample/Reshape:0' shape=(3, 3) dtype=float32>]

**EM inference:**

**TODO**:
- run E more than M
- pass results from E as data in M??

In [195]:
# INFERENCE
qpi_0 = Dirichlet(tf.nn.softplus(tf.Variable(tf.ones(n_hidden), name="q_0/concentration")))
qpi_T = Dirichlet(tf.nn.softplus(tf.Variable(tf.ones([n_hidden, n_hidden]), name="qt/concentration")))
qpi_E = Dirichlet(tf.nn.softplus(tf.Variable(tf.ones([n_hidden, n_obs]), name="qe/concentration")))

y_data = ([0] * 10) + ([1] * 10) + ([2] * 10)
y_data = np.array(y_data)

with tf.Session() as sess:
    # inference = ed.KLqp({model: qm}, {model: y_data})
    inference_exp = ed.KLqp({pi_0: qpi_0,
                         pi_T: qpi_T},
                        {model: y_data, pi_E: qpi_E})
    inference_max = ed.KLqp({pi_E: qpi_E},
                        {model: y_data, pi_0: qpi_0, pi_T: qpi_T})
    inference_exp.initialize()
    inference_max.initialize()
    sess.run(tf.global_variables_initializer())
    for i in range(1000):
        for _ in range(10):
            dict_exp = inference_exp.update()
        dict_max = inference_max.update()
        if i % 100 == 0:
            print(i, " / 1000 done")
            # print(inference_exp.print_progress(dict_exp))
            # print(inference_max.print_progress(dict_max))
    inferred_pi0, inferred_pi_T, inferred_pi_E = sess.run([qpi_0.mean(), qpi_T.mean(), qpi_E.mean()])

Tensor("inference_14/sample_32/HiddenMarkovModel_4/cond/Merge:0", shape=(?,), dtype=float32)
Tensor("data_14/Variable/read:0", shape=(30,), dtype=int32)
Tensor("inference_15/sample_33/HiddenMarkovModel_4/cond/Merge:0", shape=(?,), dtype=float32)
Tensor("data_15/Variable/read:0", shape=(30,), dtype=int32)
0  / 1000 done
100  / 1000 done
200  / 1000 done
300  / 1000 done
400  / 1000 done
500  / 1000 done
600  / 1000 done
700  / 1000 done
800  / 1000 done
900  / 1000 done


In [196]:
inferred_pi_T

array([[0.45251185, 0.2563953 , 0.2910928 ],
       [0.25287455, 0.6159876 , 0.13113792],
       [0.21694666, 0.24478506, 0.53826827]], dtype=float32)

In [197]:
inferred_pi_E

array([[0.24084574, 0.3120118 , 0.44714245],
       [0.32083046, 0.3923618 , 0.28680775],
       [0.67579246, 0.18118344, 0.14302412]], dtype=float32)

**MCMC:**

In [44]:
from edward.models import Empirical

In [199]:
Empirical(params=tf.Variable(initial_value=tf.constant(1.0/n_hidden, shape=[T, n_hidden]))).shape

TensorShape([Dimension(3)])

This fails when HMC tries to convert some vars to unconstrained vars, I don't know why:

In [227]:
from importlib import reload
reload(ed)

<module 'edward' from '/Users/jeromekafrouni/.pyenv/versions/3.6.1/envs/prob-prog/lib/python3.6/site-packages/edward/__init__.py'>

In [230]:
pi_0.support, qpi_0.support

('simplex', 'points')

In [266]:
# INFERENCE
T = 5000 # number of MCMC samples

qpi_0 = Empirical(params=tf.Variable(initial_value=tf.constant(1.0/n_hidden, shape=[T, n_hidden])))
qpi_T = Empirical(params=tf.Variable(initial_value=tf.constant(1.0/n_hidden, shape=[T, n_hidden, n_hidden])))
qpi_E = Empirical(params=tf.Variable(initial_value=tf.constant(1.0/n_obs, shape=[T, n_hidden, n_obs])))

y_data = ([0] * 10) + ([1] * 10) + ([2] * 10)
y_data = np.array(y_data)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    inference = ed.inferences.HMC({pi_0: qpi_0,
                                   pi_T: qpi_T,
                                   pi_E: qpi_E},
                                  data={model: y_data})
    # inference = ed.inferences.HMC({model: qm}, data={model: y_data})
    inference.run()

TypeError: Key-value pair in latent_vars does not have same shape: (3, 3), (3,)

Debug: Problem when unconstraining pi_T (and therefore pi_E)

In [236]:
from tensorflow.contrib.distributions import bijectors

In [246]:
z_unconstrained = ed.util.transform(pi_E)

In [247]:
qz_unconstrained = qpi_E # because support = 'points', no need to be transformed

In [251]:
bijectors.Invert(z_unconstrained.bijector)

<tensorflow.contrib.distributions.python.ops.bijectors.invert.Invert at 0x19e38ceb8>

In [254]:
qz_unconstrained

<ed.RandomVariable 'Empirical_58/' shape=(3, 3) dtype=float32>

In [257]:
z_unconstrained.event_shape

TensorShape([Dimension(2)])

In [258]:
qz_unconstrained.event_shape

TensorShape([Dimension(3), Dimension(3)])

In [None]:
# if z_unconstrained != z: # it's the case since we transformed pi_0
qz_constrained = ed.util.transform(qz_unconstrained, bijectors.Invert(z_unconstrained.bijector))

In [265]:
qpi_T.shape

TensorShape([Dimension(3), Dimension(3)])

In [264]:
qz_unconstrained.event_shape.ndims

2

In [None]:
bijectors.Invert(z_unconstrained.bijector)._inverse_event_shape(qz_unconstrained.event_shape)

In [None]:
ed.models.TransformedDistribution(qz_unconstrained, bijectors.Invert(z_unconstrained.bijector))

In [270]:
with tf.Session() as sess:
    print(sess.run(Categorical(probs=Dirichlet(tf.ones([n_hidden, n_hidden]))).sample()))

[0 0 1]


In [272]:
model._num_states

<tf.Tensor 'HiddenMarkovModel_4/strided_slice:0' shape=() dtype=int32>

In [273]:
with tf.Session() as sess:
    print(sess.run(model.sample()))

[1 1 0 0 0 0 2 0 1 1 1 0 0 2 1 0 1 0 0 0 0 1 1 2 2 1 0 0 0 2]


In [134]:
from edward.models import Categorical, Dirichlet

In [129]:
# from issue 
chain_len = 30
n_hidden = 3
n_obs = 3

x_0 = Categorical(probs=[1., 0., 0.])
# x_0 = Categorical(probs=tf.fill([n_hidden], 1.0 / n_hidden))

# transition matrix
# T = tf.nn.softmax(tf.Variable(tf.random_uniform([n_hidden, n_hidden])), dim=0)
T = np.array([[0.8, 0.1, 0.1], [0.1, 0.8, 0.8], [0.1, 0.1, 0.8]])
transition_distribution = Categorical(probs=T)

# emission matrix
# E = tf.nn.softmax(tf.Variable(tf.random_uniform([n_hidden, n_obs])), dim=0)
E = np.array(np.eye(3))
emission_distribution = Categorical(probs=E)

model = HiddenMarkovModel(
    initial_distribution=x_0,
    transition_distribution=transition_distribution,
    observation_distribution=emission_distribution,
    num_steps=chain_len)

In [132]:
with tf.Session() as sess:
    print(sess.run(model.sample()))

[0 0 0 0 0 0 2 1 1 1 1 0 0 1 1 2 2 2 2 2 2 2 2 2 2 1 1 2 2 2]


## TRICK TO USE TFP'S HMM AS A MM:

In [148]:
# from issue 
chain_len = 30
n_states = 3

pi_0 = Dirichlet(tf.ones(n_states))
x_0 = Categorical(probs=pi_0)

# transition matrix
# pi_T = [Dirichlet(tf.ones(n_hidden)) for i in range(n_hidden)]
pi_T = Dirichlet(tf.ones([n_states, n_states]))
# T = tf.nn.softmax(tf.Variable(tf.random_uniform([n_hidden, n_hidden])), dim=0)
transition_distribution = Categorical(probs=pi_T)

# emission matrix
# E = tf.nn.softmax(tf.Variable(tf.random_uniform([n_hidden, n_obs])), dim=0)
pi_E = np.eye(n_states, dtype=np.float32) # identity matrix
emission_distribution = Categorical(probs=pi_E)

model = HiddenMarkovModel(
    initial_distribution=x_0,
    transition_distribution=transition_distribution,
    observation_distribution=emission_distribution,
    num_steps=chain_len)

Note: adding sample_shape to x_0, transmission_distribution, emission_distribution, and model, doesn't fail but doesn't converge either...

In [149]:
# INFERENCE
qpi_0 = Dirichlet(tf.nn.softplus(tf.Variable(tf.ones(n_hidden), name="q_0/concentration")))
qpi_T = Dirichlet(tf.nn.softplus(tf.Variable(tf.ones([n_hidden, n_hidden]), name="qt/concentration")))

# qm = HiddenMarkovModel(q_0, qt, qe, chain_len) # not necessary

y_data = ([0] * 10) + ([1] * 10) + ([2] * 10)
y_data = np.array(y_data)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    inference = ed.KLqp({pi_0: qpi_0,
                         pi_T: qpi_T},
                        {model: y_data},)
    inference.run(n_iter=5000, logdir='./testlog')
    inferred_pi0, inferred_pi_T = sess.run([qpi_0.mean(), qpi_T.mean()])

Tensor("inference_4/sample_4/HiddenMarkovModel_4/cond/Merge:0", shape=(?,), dtype=float32)
Tensor("data_4/Variable/read:0", shape=(30,), dtype=int32)
5000/5000 [100%] ██████████████████████████████ Elapsed: 18s | Loss: 19.607


In [150]:
inferred_pi_T

array([[0.71035326, 0.19993451, 0.08971225],
       [0.08799313, 0.7047433 , 0.20726356],
       [0.10449692, 0.11652052, 0.7789825 ]], dtype=float32)

In [156]:
# INFERENCE
qpi_0 = Dirichlet(tf.nn.softplus(tf.Variable(tf.ones(n_hidden), name="q_0/concentration")))
qpi_T = Dirichlet(tf.nn.softplus(tf.Variable(tf.ones([n_hidden, n_hidden]), name="qt/concentration")))

# qm = HiddenMarkovModel(q_0, qt, qe, chain_len) # not necessary

y_data = ([0] * 10) + ([1] * 10) + ([2] * 10)
y_data = np.array(y_data)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    inference = ed.KLqp({pi_0: qpi_0,
                         pi_T: qpi_T},
                        {model: y_data},)
    inference.run(n_iter=500, logdir='./testlog')
    inferred_pi0, inferred_pi_T = sess.run([qpi_0.mean(), qpi_T.mean()])

Tensor("inference_7/sample_7/HiddenMarkovModel_4/cond/Merge:0", shape=(?,), dtype=float32)
Tensor("data_7/Variable/read:0", shape=(30,), dtype=int32)
500/500 [100%] ██████████████████████████████ Elapsed: 7s | Loss: 25.484


In [157]:
inferred_pi_T

array([[0.6771458 , 0.22621284, 0.09664141],
       [0.159471  , 0.5947459 , 0.24578314],
       [0.17056869, 0.06759135, 0.7618399 ]], dtype=float32)