# Markov Model in Edward - mock data

Since good-examples was becoming messy I'm putting here different approaches to make inference work on a small Markov Model example

**TODO check out tips here: https://discourse.edwardlib.org/t/variational-em-for-independent-factor-analysis/61/2**

**Note:** Always use tf.nn.softmax or softplus when initializing the parameters of a Dirichlet to make sure that during the inference the params stay >0.

In [1]:
import numpy as np
import tensorflow as tf
import edward as ed
from pprint import pprint

from edward.models import Categorical, Dirichlet, Uniform, Mixture
from edward.models import Bernoulli, Normal
%matplotlib inline
import matplotlib.pyplot as plt

Instructions for updating:
Use the retry module or similar alternatives.


**Mock data used:** One trajectory

In [2]:
y_data = ([0] * 10) + ([1] * 10) + ([2] * 10)
# for each categorical var y, he associated this matrix:
np.array(y_data)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2])

## 1. Without Dirichlet Priors

Even if it works, not ideal because we do want to have priors...

### Version 1: [doesn't work] HMM, with Transition matrix + TF loop

= initial code form Github issue: https://github.com/blei-lab/edward/issues/450

If we run it we get a similar error as the github issue (**but not exactly the same error as back then)**. It's because many of these objects are not instances of RandomVariable... If we dig in more into how Edward works we might understand why exactly and if there is anyway to avoid this problem.

### Version 2: [works and converges] HMM, with Transition matrix + Python loop

**MODEL**

In [3]:
# from issue 
chain_len = 30
n_hidden = 3
n_obs = 3

x_0 = Categorical(probs=tf.fill([n_hidden], 1.0 / n_hidden))

# transition matrix
T = tf.nn.softmax(tf.Variable(tf.random_uniform([n_hidden, n_hidden])), dim=0)

# emission matrix
E = tf.nn.softmax(tf.Variable(tf.random_uniform([n_hidden, n_obs])), dim=0)

# MODEL
x = []
y = []
for _ in range(chain_len):
    x_tm1 = x[-1] if x else x_0
    x_t = Categorical(probs=T[x_tm1, :])
    y_t = Categorical(probs=E[x_t, :])
    x.append(x_t)
    y.append(y_t)

Instructions for updating:
dim is deprecated, use axis instead


**INFERENCE**

In [4]:
# INFERENCE
qx = [Categorical(probs=tf.nn.softmax(tf.Variable(tf.ones(n_hidden))))
      for _ in range(chain_len)]

y_data = ([0] * 10) + ([1] * 10) + ([2] * 10)
y_data = map(np.array, y_data)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print(sess.run(T))

    inference = ed.KLqp(dict(zip(x, qx)), dict(zip(y, y_data)))
    inference.run(n_iter=5000)

    inferred_T, inferred_E = sess.run([T, E])
    inferred_qx = sess.run([foo.probs for foo in qx])
    inferred_y_probs = sess.run([foo.probs for foo in y])
    print(inferred_T)
    print(inferred_E)

[[0.4184508  0.34765527 0.342746  ]
 [0.34079152 0.23904131 0.27410343]
 [0.2407577  0.41330343 0.3831506 ]]


  not np.issubdtype(value.dtype, np.float) and \
  not np.issubdtype(value.dtype, np.int) and \


5000/5000 [100%] ██████████████████████████████ Elapsed: 33s | Loss: 20.932
[[0.7177787  0.11977834 0.71373457]
 [0.02416126 0.83216935 0.02445475]
 [0.2580601  0.04805234 0.26181066]]
[[0.00221619 0.00382671 0.4800893 ]
 [0.9941004  0.9893936  0.0040356 ]
 [0.0036834  0.00677973 0.5158751 ]]


**Note:** this example seems to converge to something better that whay the guy said in the github example. Also I switched column indexing to rows. **Sometimes it doesn't seem to actually converge. When it converges, it reaches a loss between 7 and 10**. 5K iterations seems to be enough. Less seemed not to converge but not sure if by re-running it wouldn't do better...

*Given my 3 hidden states, 3 observation types, and long changes of identical observations, I expect transition matrix to be close to diagonal and the emission matrix to look like a permutation matrix. I see non-converging loss info. (the guy had a loss around 40, for 10K iterations)*, *non-uniform state probability distributions, and very uniform observation probabilities. Any idea what's going wrong in my setup or the solving of the problem?*

Transitions: you almost always stay in the same state. Emission: you almost always go to the same state, but it doesn't have to be the same number.

**Using the external loop like this seems to work, fixing the length of the chains is not a big problem (anyway at some point LC stops the loan anyway so they cannot run indefinitely), we could do that while thinking of how to make it more efficient inside TF.**

### Version 3: [works but bad results] Regular MM, with Transition matrix + Python loop

= Version 2 but without hidden states

**MODEL**

In [5]:
chain_len = 30
n_obs = 3

x_0 = Categorical(probs=tf.fill([n_obs], 1.0 / n_obs))

# transition matrix
T = tf.nn.softmax(tf.Variable(tf.random_uniform([n_obs, n_obs])), dim=0)

# no more emissions, we observe directly the hidden states x

# MODEL
x = []
for _ in range(chain_len):
    x_tm1 = x[-1] if x else x_0
    x_t = Categorical(probs=T[:, x_tm1])
    x.append(x_t)

**INFERENCE**

In [6]:
# INFERENCE
qx = [Categorical(probs=tf.nn.softmax(tf.Variable(tf.ones(n_obs))))
      for _ in range(chain_len)]

x_data = ([0] * 10) + ([1] * 10) + ([2] * 10)
x_data = map(np.array, x_data)

with tf.Session() as sess:
    # sess.run(tf.global_variables_initializer())
    # print(sess.run(T))

    inference = ed.KLqp(dict(zip(x, qx)), dict(zip(x, x_data)))
    inference.run(n_iter=5000)

    inferred_T = sess.run(T)
    inferred_qx = sess.run([foo.probs for foo in qx])
    print(inferred_T)
    print(inferred_qx)

5000/5000 [100%] ██████████████████████████████ Elapsed: 29s | Loss: 0.573
[[0.13051206 0.1282776  0.12705421]
 [0.46439302 0.47969452 0.47204718]
 [0.40509495 0.39202783 0.40089864]]
[array([0.13212071, 0.44575238, 0.4221269 ], dtype=float32), array([0.12733053, 0.5030093 , 0.3696602 ], dtype=float32), array([0.14275178, 0.46461993, 0.3926283 ], dtype=float32), array([0.14542097, 0.48163468, 0.3729444 ], dtype=float32), array([0.12822181, 0.46980274, 0.40197548], dtype=float32), array([0.12814322, 0.47672275, 0.39513403], dtype=float32), array([0.11452176, 0.5111241 , 0.37435412], dtype=float32), array([0.13028549, 0.43564564, 0.4340689 ], dtype=float32), array([0.14543663, 0.48284206, 0.37172136], dtype=float32), array([0.10846157, 0.50244874, 0.3890896 ], dtype=float32), array([0.10303085, 0.48524395, 0.41172528], dtype=float32), array([0.12696762, 0.4686965 , 0.4043359 ], dtype=float32), array([0.14359528, 0.4551472 , 0.4012575 ], dtype=float32), array([0.15214296, 0.45598698, 0.39

## 2. With Dirichlet Priors

### Version 4: [works!] Regular MM + Python loop + Mixture

**MODEL**

In [13]:
tf.reset_default_graph()
chain_len = 30
n_hidden = 3
n_obs = 3

x_0 = Categorical(Dirichlet(tf.ones(n_hidden)))

# transition matrix
pi_T = [Dirichlet(tf.ones(n_hidden)) for i in range(n_hidden)]
T = [Categorical(probs=pi) for pi in pi_T]

# MODEL
x = []
for _ in range(chain_len):
    x_tm1 = x[-1] if x else x_0
    x_t = ed.models.Mixture(cat=Categorical(probs=tf.one_hot(x_tm1, n_hidden)), components=T)
    x.append(x_t)

**INFERENCE (VI)**: WORKS

In [8]:
# INFERENCE
# qpi_T = [Dirichlet(tf.nn.softmax(tf.Variable(tf.ones(n_hidden)))) for i in range(n_hidden)]
qpi_T = [Dirichlet(tf.nn.softplus(tf.Variable(tf.ones(n_hidden)))) for i in range(n_hidden)]

latent_vars_map = dict(zip(pi_T, qpi_T))

x_data = ([0] * 10) + ([1] * 10) + ([2] * 10)
x_data = map(np.array, x_data)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    inference = ed.KLqp(latent_vars_map, dict(zip(x, x_data)))
    inference.run(n_iter=5000)
    inferred_qpi_T = sess.run([qpi.mean() for qpi in qpi_T])

5000/5000 [100%] ██████████████████████████████ Elapsed: 22s | Loss: 17.979


In [9]:
inferred_qpi_T

[array([0.7036784, 0.1762671, 0.1200545], dtype=float32),
 array([0.10730021, 0.73918796, 0.15351184], dtype=float32),
 array([0.14591683, 0.08303422, 0.77104896], dtype=float32)]

**INFERENCE (MCMC)**: DOESN'T WORK

In [14]:
# INFERENCE
T = 5000 # number of MCMC samples

# Maybe this is not the right way to initialize:
qpi_T = [ed.models.Empirical(
    tf.Variable(expected_shape=[n_hidden],
                initial_value=tf.constant(1.0/n_hidden, shape=[T, n_hidden]))) for i in range(n_hidden)]

latent_vars_map = dict(zip(pi_T, qpi_T))

x_data = ([0] * 10) + ([1] * 10) + ([2] * 10)
x_data = map(np.array, x_data)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    # inference = ed.inferences.MetropolisHastings(latent_vars_map, dict(zip(y, y_data)))
    inference = ed.inferences.Gibbs(latent_vars_map, data=dict(zip(x, x_data)))
    inference.run()
    inferred_qpi_T = sess.run([qpi.mean() for qpi in qpi_T])

NotImplementedError: conjugate_log_prob not implemented for <class 'abc.Mixture'>

In [11]:
inferred_qpi_T

[array([0.333344, 0.333344, 0.333344], dtype=float32),
 array([0.333344, 0.333344, 0.333344], dtype=float32),
 array([0.333344, 0.333344, 0.333344], dtype=float32)]

### Version 5: [don't know if works] HMM, without Transition matrix + Dirichlet + Python loop

**TODO**: Edward seems to be designed to know when to do EM when necessary, here we need to do EM because we don't observe the hidden state, we need to look more into that and see if it will figure it out or if we need to do it ourselves (instantiate two inference objects, and run alternatively).

In [15]:
tf.reset_default_graph()
chain_len = 30
n_hidden = 3
n_obs = 3

x_0 = Categorical(Dirichlet(tf.ones(n_hidden)))

# transition matrix
pi_T = [Dirichlet(tf.ones(n_hidden)) for i in range(n_hidden)]
T = [Categorical(probs=pi) for pi in pi_T]

# emission matrix
pi_E = [Dirichlet(tf.ones(n_obs)) for i in range(n_obs)]
E = [Categorical(probs=pi) for pi in pi_E]

x = []
y = []
for _ in range(chain_len):
    x_tm1 = x[-1] if x else x_0
    x_t = ed.models.Mixture(cat=Categorical(probs=tf.one_hot(x_tm1, n_hidden)), components=T)
    y_t = ed.models.Mixture(cat=Categorical(probs=tf.one_hot(x_t, n_hidden)), components=E)
    x.append(x_t)
    y.append(y_t)

**INFERENCE (VI) ON BOTH QPIT and QPIE** RESULTS NOT THAT GOOD

In [16]:
# INFERENCE
qpi_T = [Dirichlet(tf.nn.softplus(tf.Variable(tf.ones(n_hidden)))) for i in range(n_hidden)]
qpi_E = [Dirichlet(tf.nn.softplus(tf.Variable(tf.ones(n_obs)))) for i in range(n_hidden)]

latent_vars_map = dict(zip(pi_T, qpi_T))
latent_vars_map.update(dict(zip(pi_E, qpi_E)))

y_data = ([0] * 10) + ([1] * 10) + ([2] * 10)
y_data = map(np.array, y_data)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    inference = ed.KLqp(latent_vars_map, data=dict(zip(y, y_data)))
    inference.run(n_iter=5000)
    inferred_qpi_T = sess.run([qpi.mean() for qpi in qpi_T])
    inferred_qpi_E = sess.run([qpi.mean() for qpi in qpi_E])

5000/5000 [100%] ██████████████████████████████ Elapsed: 43s | Loss: 37.294


In [17]:
inferred_qpi_T

[array([0.5163122 , 0.19137841, 0.29230946], dtype=float32),
 array([0.4316325 , 0.20450503, 0.3638625 ], dtype=float32),
 array([0.2909255 , 0.18665254, 0.522422  ], dtype=float32)]

In [18]:
inferred_qpi_E

[array([0.35055286, 0.38690096, 0.26254615], dtype=float32),
 array([0.31107655, 0.37114182, 0.31778166], dtype=float32),
 array([0.45351365, 0.251756  , 0.29473028], dtype=float32)]

**INFERENCE (VI) ON ONLY QPIT:** RESULTS NOT THAT GOOD

In [19]:
# INFERENCE
qpi_T = [Dirichlet(tf.nn.softplus(tf.Variable(tf.ones(n_hidden)))) for i in range(n_hidden)]
# qpi_E = [Dirichlet(tf.nn.softplus(tf.Variable(tf.ones(n_obs)))) for i in range(n_hidden)]

latent_vars_map = dict(zip(pi_T, qpi_T))
# latent_vars_map.update(dict(zip(pi_E, qpi_E)))

y_data = ([0] * 10) + ([1] * 10) + ([2] * 10)
y_data = map(np.array, y_data)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    inference = ed.KLqp(latent_vars_map, data=dict(zip(y, y_data)))
    inference.run(n_iter=5000)
    inferred_qpi_T = sess.run([qpi.mean() for qpi in qpi_T])
    # inferred_qpi_E = sess.run([qpi.mean() for qpi in qpi_E])

5000/5000 [100%] ██████████████████████████████ Elapsed: 43s | Loss: 44.499


In [20]:
inferred_qpi_T

[array([0.2658316 , 0.440889  , 0.29327947], dtype=float32),
 array([0.5744874 , 0.08134261, 0.34417003], dtype=float32),
 array([0.2542029 , 0.46512648, 0.2806706 ], dtype=float32)]

## 3. Other tutorials/ideas we might consider

- Implementation of HMM but not suited for inference: https://gist.github.com/fredcallaway/c7252b6326dfb502e70cad4146731aef
- Implementation of HMM by creating a custom RV class: might work but needs a lot of work, the code doesn't run as is: https://discourse.edwardlib.org/t/hmm-implementation-with-marginalized-latent-states/755
- Using a list of categoricals and then using tf.gather to select from this list: Doesn't work, because tf.gather makes us loose the type so we no longer have Categorical variables => can't run inference (we can still sample though)
- Using the OneHotCategorical RV instead of a Categorical RV and then taking tf.one_hot of it didn't seem to change much.