# Markov Model in Edward - mock data

Since good-examples was becoming messy I'm putting here different approaches to make inference work on a small Markov Model example

**TODO check out tips here: https://discourse.edwardlib.org/t/variational-em-for-independent-factor-analysis/61/2**

**Note:** Always use tf.nn.softmax or softplus when initializing the parameters of a Dirichlet to make sure that during the inference the params stay >0.

In [1]:
import numpy as np
import tensorflow as tf
import edward as ed
from pprint import pprint

from edward.models import Categorical, Dirichlet, Uniform, Mixture
from edward.models import Bernoulli, Normal
%matplotlib inline
import matplotlib.pyplot as plt

Instructions for updating:
Use the retry module or similar alternatives.


**Mock data used:** One trajectory

In [2]:
y_data = ([0] * 10) + ([1] * 10) + ([2] * 10)
# for each categorical var y, he associated this matrix:
np.array(y_data)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2])

## 1. Without Dirichlet Priors

Even if it works, not ideal because we do want to have priors...

### Version 1: [doesn't work] HMM, with Transition matrix + TF loop

= initial code form Github issue: https://github.com/blei-lab/edward/issues/450

If we run it we get a similar error as the github issue (**but not exactly the same error as back then)**. It's because many of these objects are not instances of RandomVariable... If we dig in more into how Edward works we might understand why exactly and if there is anyway to avoid this problem.

### Version 2: [works and converges] HMM, with Transition matrix + Python loop

**MODEL**

In [3]:
# from issue 
chain_len = 30
n_hidden = 3
n_obs = 3

x_0 = Categorical(probs=tf.fill([n_hidden], 1.0 / n_hidden))

# transition matrix
T = tf.nn.softmax(tf.Variable(tf.random_uniform([n_hidden, n_hidden])), dim=0)

# emission matrix
E = tf.nn.softmax(tf.Variable(tf.random_uniform([n_hidden, n_obs])), dim=0)

# MODEL
x = []
y = []
for _ in range(chain_len):
    x_tm1 = x[-1] if x else x_0
    x_t = Categorical(probs=T[x_tm1, :])
    y_t = Categorical(probs=E[x_t, :])
    x.append(x_t)
    y.append(y_t)

Instructions for updating:
dim is deprecated, use axis instead


**INFERENCE**

In [4]:
# INFERENCE
qx = [Categorical(probs=tf.nn.softmax(tf.Variable(tf.ones(n_hidden))))
      for _ in range(chain_len)]

y_data = ([0] * 10) + ([1] * 10) + ([2] * 10)
y_data = map(np.array, y_data)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print(sess.run(T))

    inference = ed.KLqp(dict(zip(x, qx)), dict(zip(y, y_data)))
    inference.run(n_iter=5000)

    inferred_T, inferred_E = sess.run([T, E])
    inferred_qx = sess.run([foo.probs for foo in qx])
    inferred_y_probs = sess.run([foo.probs for foo in y])
    print(inferred_T)
    print(inferred_E)

[[0.4184508  0.34765527 0.342746  ]
 [0.34079152 0.23904131 0.27410343]
 [0.2407577  0.41330343 0.3831506 ]]


  not np.issubdtype(value.dtype, np.float) and \
  not np.issubdtype(value.dtype, np.int) and \


5000/5000 [100%] ██████████████████████████████ Elapsed: 33s | Loss: 20.932
[[0.7177787  0.11977834 0.71373457]
 [0.02416126 0.83216935 0.02445475]
 [0.2580601  0.04805234 0.26181066]]
[[0.00221619 0.00382671 0.4800893 ]
 [0.9941004  0.9893936  0.0040356 ]
 [0.0036834  0.00677973 0.5158751 ]]


**Note:** this example seems to converge to something better that whay the guy said in the github example. Also I switched column indexing to rows. **Sometimes it doesn't seem to actually converge. When it converges, it reaches a loss between 7 and 10**. 5K iterations seems to be enough. Less seemed not to converge but not sure if by re-running it wouldn't do better...

*Given my 3 hidden states, 3 observation types, and long changes of identical observations, I expect transition matrix to be close to diagonal and the emission matrix to look like a permutation matrix. I see non-converging loss info. (the guy had a loss around 40, for 10K iterations)*, *non-uniform state probability distributions, and very uniform observation probabilities. Any idea what's going wrong in my setup or the solving of the problem?*

Transitions: you almost always stay in the same state. Emission: you almost always go to the same state, but it doesn't have to be the same number.

**Using the external loop like this seems to work, fixing the length of the chains is not a big problem (anyway at some point LC stops the loan anyway so they cannot run indefinitely), we could do that while thinking of how to make it more efficient inside TF.**

### Version 3: [works but bad results] Regular MM, with Transition matrix + Python loop

= Version 2 but without hidden states

**MODEL**

In [3]:
chain_len = 30
n_obs = 3

x_0 = Categorical(probs=tf.fill([n_obs], 1.0 / n_obs))

# transition matrix
T = tf.nn.softmax(tf.Variable(tf.random_uniform([n_obs, n_obs])), dim=0)

# no more emissions, we observe directly the hidden states x

# MODEL
x = []
for _ in range(chain_len):
    x_tm1 = x[-1] if x else x_0
    x_t = Categorical(probs=T[:, x_tm1])
    x.append(x_t)

Instructions for updating:
dim is deprecated, use axis instead


**INFERENCE**

In [4]:
# INFERENCE
qx = [Categorical(probs=tf.nn.softmax(tf.Variable(tf.ones(n_obs))))
      for _ in range(chain_len)]

x_data = ([0] * 10) + ([1] * 10) + ([2] * 10)
x_data = map(np.array, x_data)

with tf.Session() as sess:
    # sess.run(tf.global_variables_initializer())
    # print(sess.run(T))

    inference = ed.KLqp(dict(zip(x, qx)), dict(zip(x, x_data)))
    inference.run(n_iter=5000)

    inferred_T = sess.run(T)
    inferred_qx = sess.run([foo.probs for foo in qx])
    print(inferred_T)
    print(inferred_qx)

  not np.issubdtype(value.dtype, np.float) and \
  not np.issubdtype(value.dtype, np.int) and \


5000/5000 [100%] ██████████████████████████████ Elapsed: 29s | Loss: 0.327
[[0.25940156 0.25781873 0.25737822]
 [0.39562166 0.3987041  0.40426505]
 [0.34497678 0.34347713 0.33835673]]
[array([0.28284925, 0.37504154, 0.34210923], dtype=float32), array([0.24529925, 0.397335  , 0.35736576], dtype=float32), array([0.25999007, 0.391565  , 0.34844494], dtype=float32), array([0.26203665, 0.39059556, 0.34736785], dtype=float32), array([0.26528502, 0.39197695, 0.34273797], dtype=float32), array([0.2525565, 0.3708744, 0.376569 ], dtype=float32), array([0.28238606, 0.3569442 , 0.3606698 ], dtype=float32), array([0.26020128, 0.4162674 , 0.32353133], dtype=float32), array([0.28147265, 0.37403712, 0.34449026], dtype=float32), array([0.24512741, 0.41684347, 0.33802906], dtype=float32), array([0.24144918, 0.39873162, 0.35981923], dtype=float32), array([0.23814194, 0.43193477, 0.32992324], dtype=float32), array([0.26426604, 0.38210905, 0.35362488], dtype=float32), array([0.24521352, 0.39080426, 0.36398

### Version 3.2: [?] Regular MM, with Transition matrix + Python loop

**MODEL**

In [27]:
chain_len = 30
n_obs = 3

x_0 = Categorical(probs=tf.fill([n_obs], 1.0 / n_obs))

# transition matrix
T = tf.nn.softmax(tf.Variable(tf.random_uniform([n_obs, n_obs]),
                              trainable=False), dim=0)

# MODEL
x = []
for _ in range(chain_len):
    x_tm1 = x[-1] if x else x_0
    x_t = Categorical(probs=T[:, x_tm1])
    x.append(x_t)

In [25]:
chain_len = 30
n_obs = 3

x = [Categorical(probs=tf.fill([n_obs], 1.0 / n_obs)) for _ in range(chain_len)]

**INFERENCE**

In [28]:
# INFERENCE
# qx = [Categorical(probs=tf.nn.softmax(tf.Variable(tf.ones(n_obs))))
#       for _ in range(chain_len)]

sess = tf.Session()

# MODEL
qT = tf.nn.softmax(tf.Variable(tf.random_uniform([n_obs, n_obs])), dim=0)
qx = []
qx_0 = Categorical(probs=tf.nn.softmax(tf.Variable(tf.random_uniform([n_obs]))))
for _ in range(chain_len):
    qx_tm1 = qx[-1] if qx else qx_0
    qx_t = Categorical(probs=qT[:, qx_tm1])
    qx.append(qx_t)

x_data = ([0] * 10) + ([1] * 10) + ([2] * 10)
x_data = map(np.array, x_data)

with sess.as_default():
    sess.run(tf.global_variables_initializer())
    # print(sess.run(T))
    
    latent_vars_map = dict(zip(x, qx))
    # latent_vars_map.update({T: qT})
    inference = ed.KLqp(latent_vars_map, data=dict(zip(x, x_data)))
    inference.run(n_iter=5000, n_samples=10)

    inferred_qT = sess.run(qT)
    print(inferred_qT)

5000/5000 [100%] ██████████████████████████████ Elapsed: 61s | Loss: -0.007
[[0.26643828 0.34438336 0.22928569]
 [0.26327008 0.3921638  0.35276103]
 [0.4702917  0.26345286 0.41795328]]


## 2. With Dirichlet Priors

### Version 4: [works!] Regular MM + Python loop + Mixture

**MODEL**

In [56]:
tf.reset_default_graph()
chain_len = 30
n_hidden = 3
n_obs = 3

x_0 = Categorical(Dirichlet(tf.ones(n_hidden)))

# transition matrix
pi_T = [Dirichlet(tf.ones(n_hidden)) for i in range(n_hidden)]
T = [Categorical(probs=pi) for pi in pi_T]

# MODEL
x = []
for _ in range(chain_len):
    x_tm1 = x[-1] if x else x_0
    x_t = ed.models.Mixture(cat=Categorical(probs=tf.one_hot(x_tm1, n_hidden)), components=T)
    x.append(x_t)

**INFERENCE (VI)**: WORKS

In [58]:
# INFERENCE
# qpi_T = [Dirichlet(tf.nn.softmax(tf.Variable(tf.ones(n_hidden)))) for i in range(n_hidden)]
qpi_T = [Dirichlet(tf.nn.softplus(tf.Variable(tf.ones(n_hidden)))) for i in range(n_hidden)]

latent_vars_map = dict(zip(pi_T, qpi_T))

x_data = ([0] * 10) + ([1] * 10) + ([2] * 10)
x_data = map(np.array, x_data)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    inference = ed.KLqp(latent_vars_map, dict(zip(x, x_data)))
    inference.run(n_iter=500)
    inferred_qpi_T = sess.run([qpi.mean() for qpi in qpi_T])

500/500 [100%] ██████████████████████████████ Elapsed: 12s | Loss: 23.476


In [59]:
inferred_qpi_T

[array([0.6658835 , 0.14985795, 0.18425858], dtype=float32),
 array([0.14769787, 0.6268977 , 0.22540449], dtype=float32),
 array([0.19821896, 0.09202534, 0.7097557 ], dtype=float32)]

**INFERENCE (MCMC)**: DOESN'T WORK

In [14]:
# INFERENCE
T = 5000 # number of MCMC samples

# Maybe this is not the right way to initialize:
qpi_T = [ed.models.Empirical(
    tf.Variable(expected_shape=[n_hidden],
                initial_value=tf.constant(1.0/n_hidden, shape=[T, n_hidden]))) for i in range(n_hidden)]

latent_vars_map = dict(zip(pi_T, qpi_T))

x_data = ([0] * 10) + ([1] * 10) + ([2] * 10)
x_data = map(np.array, x_data)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    # inference = ed.inferences.MetropolisHastings(latent_vars_map, dict(zip(y, y_data)))
    inference = ed.inferences.Gibbs(latent_vars_map, data=dict(zip(x, x_data)))
    inference.run()
    inferred_qpi_T = sess.run([qpi.mean() for qpi in qpi_T])

NotImplementedError: conjugate_log_prob not implemented for <class 'abc.Mixture'>

In [11]:
inferred_qpi_T

[array([0.333344, 0.333344, 0.333344], dtype=float32),
 array([0.333344, 0.333344, 0.333344], dtype=float32),
 array([0.333344, 0.333344, 0.333344], dtype=float32)]

### Version 5: [don't know if works] HMM, without Transition matrix + Dirichlet + Python loop

**TODO**: Edward seems to be designed to know when to do EM when necessary, here we need to do EM because we don't observe the hidden state, we need to look more into that and see if it will figure it out or if we need to do it ourselves (instantiate two inference objects, and run alternatively).

In [15]:
tf.reset_default_graph()
chain_len = 30
n_hidden = 3
n_obs = 3

x_0 = Categorical(Dirichlet(tf.ones(n_hidden)))

# transition matrix
pi_T = [Dirichlet(tf.ones(n_hidden)) for i in range(n_hidden)]
T = [Categorical(probs=pi) for pi in pi_T]

# emission matrix
pi_E = [Dirichlet(tf.ones(n_obs)) for i in range(n_obs)]
E = [Categorical(probs=pi) for pi in pi_E]

x = []
y = []
for _ in range(chain_len):
    x_tm1 = x[-1] if x else x_0
    x_t = ed.models.Mixture(cat=Categorical(probs=tf.one_hot(x_tm1, n_hidden)), components=T)
    y_t = ed.models.Mixture(cat=Categorical(probs=tf.one_hot(x_t, n_hidden)), components=E)
    x.append(x_t)
    y.append(y_t)

**INFERENCE (VI) ON BOTH QPIT and QPIE** RESULTS NOT THAT GOOD

In [16]:
# INFERENCE
qpi_T = [Dirichlet(tf.nn.softplus(tf.Variable(tf.ones(n_hidden)))) for i in range(n_hidden)]
qpi_E = [Dirichlet(tf.nn.softplus(tf.Variable(tf.ones(n_obs)))) for i in range(n_hidden)]

latent_vars_map = dict(zip(pi_T, qpi_T))
latent_vars_map.update(dict(zip(pi_E, qpi_E)))

y_data = ([0] * 10) + ([1] * 10) + ([2] * 10)
y_data = map(np.array, y_data)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    inference = ed.KLqp(latent_vars_map, data=dict(zip(y, y_data)))
    inference.run(n_iter=5000)
    inferred_qpi_T = sess.run([qpi.mean() for qpi in qpi_T])
    inferred_qpi_E = sess.run([qpi.mean() for qpi in qpi_E])

5000/5000 [100%] ██████████████████████████████ Elapsed: 43s | Loss: 37.294


In [17]:
inferred_qpi_T

[array([0.5163122 , 0.19137841, 0.29230946], dtype=float32),
 array([0.4316325 , 0.20450503, 0.3638625 ], dtype=float32),
 array([0.2909255 , 0.18665254, 0.522422  ], dtype=float32)]

In [18]:
inferred_qpi_E

[array([0.35055286, 0.38690096, 0.26254615], dtype=float32),
 array([0.31107655, 0.37114182, 0.31778166], dtype=float32),
 array([0.45351365, 0.251756  , 0.29473028], dtype=float32)]

**INFERENCE (VI) ON ONLY QPIT:** RESULTS NOT THAT GOOD

In [19]:
# INFERENCE
qpi_T = [Dirichlet(tf.nn.softplus(tf.Variable(tf.ones(n_hidden)))) for i in range(n_hidden)]
# qpi_E = [Dirichlet(tf.nn.softplus(tf.Variable(tf.ones(n_obs)))) for i in range(n_hidden)]

latent_vars_map = dict(zip(pi_T, qpi_T))
# latent_vars_map.update(dict(zip(pi_E, qpi_E)))

y_data = ([0] * 10) + ([1] * 10) + ([2] * 10)
y_data = map(np.array, y_data)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    inference = ed.KLqp(latent_vars_map, data=dict(zip(y, y_data)))
    inference.run(n_iter=5000)
    inferred_qpi_T = sess.run([qpi.mean() for qpi in qpi_T])
    # inferred_qpi_E = sess.run([qpi.mean() for qpi in qpi_E])

5000/5000 [100%] ██████████████████████████████ Elapsed: 43s | Loss: 44.499


In [20]:
inferred_qpi_T

[array([0.2658316 , 0.440889  , 0.29327947], dtype=float32),
 array([0.5744874 , 0.08134261, 0.34417003], dtype=float32),
 array([0.2542029 , 0.46512648, 0.2806706 ], dtype=float32)]

### Version 6: [don't know if works] MM, Dirichlet generating the whole matrix

**MODEL**

In [31]:
n_states = 3

pi_0 = Dirichlet(tf.ones(n_states))
x_0 = Categorical(pi_0)

# transition matrix
pi_T = Dirichlet(tf.ones([n_states, n_states]))

x = []
for _ in range(chain_len):
    x_tm1 = x[-1] if x else x_0
    x_t = Categorical(probs=tf.gather(pi_T, x_tm1))
    x.append(x_t)

**INFERENCE**

In [32]:
qpi_0 = Dirichlet(tf.nn.softplus(tf.Variable(tf.ones(n_states))))
qpi_T = Dirichlet(tf.nn.softplus(tf.Variable(tf.ones([n_states, n_states]))))

KLqp is going to copy each x, and see that they depend on the output of pi_T, and replace that output by the output of qpi_T. Problem: do the copied x still depend on each other? Not sure...

In [35]:
sess = tf.Session()
x_data = ([0] * 10) + ([1] * 10) + ([2] * 10)
x_data = map(np.array, x_data)

inference = ed.KLqp({pi_0: qpi_0, pi_T: qpi_T}, data=dict(zip(x, x_data)))

with sess.as_default():
    sess.run(tf.global_variables_initializer())
    inference.run(n_iter=5000)

FailedPreconditionError: Attempting to use uninitialized value data_387/Variable
	 [[Node: data_387/Variable/read = Identity[T=DT_INT32, _class=["loc:@data_387/Variable"], _device="/job:localhost/replica:0/task:0/device:CPU:0"](data_387/Variable)]]

Caused by op 'data_387/Variable/read', defined at:
  File "/Users/jeromekafrouni/.pyenv/versions/3.6.1/Python.framework/Versions/3.6/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/Users/jeromekafrouni/.pyenv/versions/3.6.1/Python.framework/Versions/3.6/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/jeromekafrouni/.pyenv/versions/3.6.1/envs/prob-prog/lib/python3.6/site-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/Users/jeromekafrouni/.pyenv/versions/3.6.1/envs/prob-prog/lib/python3.6/site-packages/traitlets/config/application.py", line 658, in launch_instance
    app.start()
  File "/Users/jeromekafrouni/.pyenv/versions/3.6.1/envs/prob-prog/lib/python3.6/site-packages/ipykernel/kernelapp.py", line 505, in start
    self.io_loop.start()
  File "/Users/jeromekafrouni/.pyenv/versions/3.6.1/envs/prob-prog/lib/python3.6/site-packages/tornado/platform/asyncio.py", line 132, in start
    self.asyncio_loop.run_forever()
  File "/Users/jeromekafrouni/.pyenv/versions/3.6.1/Python.framework/Versions/3.6/lib/python3.6/asyncio/base_events.py", line 421, in run_forever
    self._run_once()
  File "/Users/jeromekafrouni/.pyenv/versions/3.6.1/Python.framework/Versions/3.6/lib/python3.6/asyncio/base_events.py", line 1425, in _run_once
    handle._run()
  File "/Users/jeromekafrouni/.pyenv/versions/3.6.1/Python.framework/Versions/3.6/lib/python3.6/asyncio/events.py", line 127, in _run
    self._callback(*self._args)
  File "/Users/jeromekafrouni/.pyenv/versions/3.6.1/envs/prob-prog/lib/python3.6/site-packages/tornado/ioloop.py", line 758, in _run_callback
    ret = callback()
  File "/Users/jeromekafrouni/.pyenv/versions/3.6.1/envs/prob-prog/lib/python3.6/site-packages/tornado/stack_context.py", line 300, in null_wrapper
    return fn(*args, **kwargs)
  File "/Users/jeromekafrouni/.pyenv/versions/3.6.1/envs/prob-prog/lib/python3.6/site-packages/tornado/gen.py", line 1233, in inner
    self.run()
  File "/Users/jeromekafrouni/.pyenv/versions/3.6.1/envs/prob-prog/lib/python3.6/site-packages/tornado/gen.py", line 1147, in run
    yielded = self.gen.send(value)
  File "/Users/jeromekafrouni/.pyenv/versions/3.6.1/envs/prob-prog/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 357, in process_one
    yield gen.maybe_future(dispatch(*args))
  File "/Users/jeromekafrouni/.pyenv/versions/3.6.1/envs/prob-prog/lib/python3.6/site-packages/tornado/gen.py", line 326, in wrapper
    yielded = next(result)
  File "/Users/jeromekafrouni/.pyenv/versions/3.6.1/envs/prob-prog/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 267, in dispatch_shell
    yield gen.maybe_future(handler(stream, idents, msg))
  File "/Users/jeromekafrouni/.pyenv/versions/3.6.1/envs/prob-prog/lib/python3.6/site-packages/tornado/gen.py", line 326, in wrapper
    yielded = next(result)
  File "/Users/jeromekafrouni/.pyenv/versions/3.6.1/envs/prob-prog/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 534, in execute_request
    user_expressions, allow_stdin,
  File "/Users/jeromekafrouni/.pyenv/versions/3.6.1/envs/prob-prog/lib/python3.6/site-packages/tornado/gen.py", line 326, in wrapper
    yielded = next(result)
  File "/Users/jeromekafrouni/.pyenv/versions/3.6.1/envs/prob-prog/lib/python3.6/site-packages/ipykernel/ipkernel.py", line 294, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "/Users/jeromekafrouni/.pyenv/versions/3.6.1/envs/prob-prog/lib/python3.6/site-packages/ipykernel/zmqshell.py", line 536, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "/Users/jeromekafrouni/.pyenv/versions/3.6.1/envs/prob-prog/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2819, in run_cell
    raw_cell, store_history, silent, shell_futures)
  File "/Users/jeromekafrouni/.pyenv/versions/3.6.1/envs/prob-prog/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2845, in _run_cell
    return runner(coro)
  File "/Users/jeromekafrouni/.pyenv/versions/3.6.1/envs/prob-prog/lib/python3.6/site-packages/IPython/core/async_helpers.py", line 67, in _pseudo_sync_runner
    coro.send(None)
  File "/Users/jeromekafrouni/.pyenv/versions/3.6.1/envs/prob-prog/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3020, in run_cell_async
    interactivity=interactivity, compiler=compiler, result=result)
  File "/Users/jeromekafrouni/.pyenv/versions/3.6.1/envs/prob-prog/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3185, in run_ast_nodes
    if (yield from self.run_code(code, result)):
  File "/Users/jeromekafrouni/.pyenv/versions/3.6.1/envs/prob-prog/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3267, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-35-49042b065138>", line 5, in <module>
    inference = ed.KLqp({pi_0: qpi_0, pi_T: qpi_T}, data=dict(zip(x, x_data)))
  File "/Users/jeromekafrouni/.pyenv/versions/3.6.1/envs/prob-prog/lib/python3.6/site-packages/edward/inferences/klqp.py", line 84, in __init__
    super(KLqp, self).__init__(latent_vars, data)
  File "/Users/jeromekafrouni/.pyenv/versions/3.6.1/envs/prob-prog/lib/python3.6/site-packages/edward/inferences/variational_inference.py", line 27, in __init__
    super(VariationalInference, self).__init__(*args, **kwargs)
  File "/Users/jeromekafrouni/.pyenv/versions/3.6.1/envs/prob-prog/lib/python3.6/site-packages/edward/inferences/inference.py", line 93, in __init__
    var = tf.Variable(ph, trainable=False, collections=[])
  File "/Users/jeromekafrouni/.pyenv/versions/3.6.1/envs/prob-prog/lib/python3.6/site-packages/tensorflow/python/ops/variables.py", line 235, in __init__
    constraint=constraint)
  File "/Users/jeromekafrouni/.pyenv/versions/3.6.1/envs/prob-prog/lib/python3.6/site-packages/tensorflow/python/ops/variables.py", line 391, in _init_from_args
    self._snapshot = array_ops.identity(self._variable, name="read")
  File "/Users/jeromekafrouni/.pyenv/versions/3.6.1/envs/prob-prog/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 142, in identity
    return gen_array_ops.identity(input, name=name)
  File "/Users/jeromekafrouni/.pyenv/versions/3.6.1/envs/prob-prog/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3053, in identity
    "Identity", input=input, name=name)
  File "/Users/jeromekafrouni/.pyenv/versions/3.6.1/envs/prob-prog/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/Users/jeromekafrouni/.pyenv/versions/3.6.1/envs/prob-prog/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3290, in create_op
    op_def=op_def)
  File "/Users/jeromekafrouni/.pyenv/versions/3.6.1/envs/prob-prog/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1654, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

FailedPreconditionError (see above for traceback): Attempting to use uninitialized value data_387/Variable
	 [[Node: data_387/Variable/read = Identity[T=DT_INT32, _class=["loc:@data_387/Variable"], _device="/job:localhost/replica:0/task:0/device:CPU:0"](data_387/Variable)]]


## 3. Other tutorials/ideas we might consider

- Implementation of HMM but not suited for inference: https://gist.github.com/fredcallaway/c7252b6326dfb502e70cad4146731aef
- Implementation of HMM by creating a custom RV class: might work but needs a lot of work, the code doesn't run as is: https://discourse.edwardlib.org/t/hmm-implementation-with-marginalized-latent-states/755
- Using a list of categoricals and then using tf.gather to select from this list: Doesn't work, because tf.gather makes us loose the type so we no longer have Categorical variables => can't run inference (we can still sample though)
- Using the OneHotCategorical RV instead of a Categorical RV and then taking tf.one_hot of it didn't seem to change much.

## 4. Custom RV non hidden Markov Model

We need a RV that takes as input a categorical RV which depends on a Dirichlet
=> when doing KLqp, the alg will copy this RV and replace the dirichlet by the proposal dirichlet since it will be one of its ancestors. And then get the likelihood of the whole trajectory??

In [None]:
class distribution_HMM(Distribution):
  def __init__(self, transmat, init_p, states, name='', *args, **kwargs):
      self.transmat = transmat
      self.init_p = init_p
      self.states = states
      parameters = locals()
      parameters.pop("self")

      super(distribution_HMM, self).__init__(dtype=tf.float32,
                                reparameterization_type=FULLY_REPARAMETERIZED,
                                validate_args=False,
                                allow_nan_stats=False,
                                parameters=parameters,
                                name=name, *args, **kwargs)

  def normal_log_prob(self, X, mu, scale=1.):
      return -.5 * tf.log(2. * np.pi) - tf.log(scale) - (1. / (2. * scale ** 2)) * (X - mu) ** 2

  def _log_prob(self, value):
      # first we need to compute the (len(obs), len(states)) matrix that holds the
      # likelihood of each observation coming from each state
      B_list = []
      for j in range(0, self.transmat.shape[1]):
          state_probs = self.normal_log_prob(mu=self.states[j], X=value)
          B_list.append(state_probs)
      B = tf.squeeze(tf.stack(B_list, axis=1))

      alpha_0 = self.init_p + B[0]
      logprob = tf.scan(
          lambda alpha, b: tf.reduce_logsumexp(tf.transpose(alpha) + self.transmat, axis=0, keepdims=True) + b,
          initializer=alpha_0,
          elems=B[1:, :],
          back_prop=True,
          parallel_iterations=12,
          infer_shape=False)
      return tf.reduce_logsumexp(logprob[-1])

  def _sample_n(self, n, seed=None):
      raise NotImplementedError("_sample_n is not implemented")

class HMM(RandomVariable, distribution_HMM):
  def __init__(self, *args, **kwargs):
      RandomVariable.__init__(self, *args, **kwargs)

In [None]:
class distribution_MM(Distribution):
    def __init__(self, transmat, init_p, states, name='', *args, **kwargs):
        self.transmat = transmat
        self.init_p = init_p
        self.states = states
        parameters = locals()
        parameters.pop("self")

        super(distribution_HMM, self).__init__(dtype=tf.float32,
                                reparameterization_type=FULLY_REPARAMETERIZED,
                                validate_args=False,
                                allow_nan_stats=False,
                                parameters=parameters,
                                name=name, *args, **kwargs)

    # def normal_log_prob(self, X, mu, scale=1.):
    #   return -.5 * tf.log(2. * np.pi) - tf.log(scale) - (1. / (2. * scale ** 2)) * (X - mu) ** 2

    def _log_prob(self, value):
      # first we need to compute the (len(obs), len(states)) matrix that holds the
      # likelihood of each observation coming from each state
      B_list = []
      for j in range(0, self.transmat.shape[1]):
          state_probs = self.normal_log_prob(mu=self.states[j], X=value)
          B_list.append(state_probs)
      B = tf.squeeze(tf.stack(B_list, axis=1))

      alpha_0 = self.init_p + B[0]
      logprob = tf.scan(
          lambda alpha, b: tf.reduce_logsumexp(tf.transpose(alpha) + self.transmat, axis=0, keepdims=True) + b,
          initializer=alpha_0,
          elems=B[1:, :],
          back_prop=True,
          parallel_iterations=12,
          infer_shape=False)
      return tf.reduce_logsumexp(logprob[-1])

  def _sample_n(self, n, seed=None):
      raise NotImplementedError("_sample_n is not implemented")

class HMM(RandomVariable, distribution_HMM):
  def __init__(self, *args, **kwargs):
      RandomVariable.__init__(self, *args, **kwargs)

In [None]:
tf.scan(
    fn,
    elems,
    initializer=None,
    parallel_iterations=10,
    back_prop=True,
    swap_memory=False,
    infer_shape=True,
    reverse=False,
    name=None
)

In [36]:
T = tf.constant([[1, 2, 3], [4, 5, 6]])

In [37]:
def transition_function(xt_current, xt_next):
    return tf.log(T[xt_current, xt_next])

In [44]:
pi_0 = tf.constant([0.1, 0.1, 0.3, 0.5])

In [47]:
pi_0.shape

TensorShape([Dimension(4)])

In [38]:
x = [0, 2, 1, 3]

In [39]:
x[1:]

[2, 1, 3]

In [40]:
x[:-1]

[0, 2, 1]

In [43]:
# get sequence of transitions:
list(zip(x[:-1], x[1:]))

[(0, 2), (2, 1), (1, 3)]

In [48]:
tf.log(pi_0[x[0]])

<tf.Tensor 'Log_1:0' shape=() dtype=float32>

The simplest version of scan repeatedly applies the callable fn to a sequence of elements from first to last. The elements are made of the tensors unpacked from elems on dimension 0. The callable fn takes two tensors as arguments. The first argument is the accumulated value computed from the preceding invocation of fn. If initializer is None, elems must contain at least one element, and its first element is used as the initializer.



- fn: The callable to be performed. It accepts two arguments. The first will have the same structure as initializer if one is provided, otherwise it will have the same structure as elems. The second will have the same (possibly nested) structure as elems. Its output must have the same structure as initializer if one is provided, otherwise it must have the same structure as elems.

- elems: A tensor or (possibly nested) sequence of tensors, each of which will be unpacked along their first dimension. The nested sequence of the resulting slices will be the first argument to fn.

In [45]:
log_prob = tf.scan(
    fn=transition_function,
    elems=list(zip(x[:-1], x[1:])),
    initializer=tf.log(pi_0[x[0]])
)

ValueError: slice index 0 of dimension 0 out of bounds. for 'scan/strided_slice' (op: 'StridedSlice') with input shapes: [0], [1], [1], [1] and with computed input tensors: input[1] = <0>, input[2] = <1>, input[3] = <1>.

In [None]:
fn = log(T())

In [54]:
elems = tf.constant([1, 2, 3, 4, 5, 6])
sum = tf.foldl(lambda a, x: a + x, [elems, elems])

In [55]:
with tf.Session() as sess:
    print(sess.run(sum))

[ 2  4  6  8 10 12]


In [None]:
transition_log_probs = self._log_trans

def forward_step(log_probs, _):
    return _log_vector_matrix(log_probs, transition_log_probs)

dummy_index = tf.zeros(_num_steps - 1, dtype=tf.float32)

forward_log_probs = tf.scan(forward_step, dummy_index,
                                  initializer=initial_log_probs,
                                  name="forward_log_probs")

forward_log_probs = tf.concat([[initial_log_probs], forward_log_probs],
                                    axis=0)

In [None]:
def _log_vector_matrix(vs, ms):
    """Multiply tensor of vectors by matrices assuming values stored are logs."""

    return tf.reduce_logsumexp(vs[..., tf.newaxis] + ms, axis=-2)
    # vs[..., tf.newaxis] is equivalent to vs[:,:,:,tf.newaxis]