Marginal Util Function #778

cshenton · 2017-10-02T09:42:41Z

Re: #759 this adds limited support for full graph sampling. Given an RV, ed.marginal will traverse its parent graph, replacing any root ancestor instances of RandomVariable with a sampled equivalent, so that each non-root RV in the graph is evaluated with a tensor of batch_shape of parameters.

For example:

import edward as ed
import numpy as np

from edward.models import Normal


ed.get_session()
loc = Normal(0.0, 100.0)
y = Normal(loc, 0.0001)
conditional_sample = y.sample(50)
marginal_sample = ed.marginal(y, 50)

np.std(conditional_sample.eval())
0.000100221

np.std(marginal_sample.eval())
106.55982

This current implementation does not work for graphs of RVs using the sample_shape arg. That will require some refactoring of how RandomVariable internally stores the sample_shape. I'm making this PR mostly because I'm confident that the API will be backwards compatible.

Beyond not allowing sample_shape, ed.marginal can fail in the following ways.

Failure during graph construction, which could be caused by a successful broadcast in the source graph being broken by the prepended sample dimension.
Failure after graph construction but before execution, as a result of ed.marginal detecting incorrect broadcasting (this prevents situations where sampling 10000 from a scalar RV produces some enthusiastically broadcasted (10000, 10000, 1) shaped tensor.

dustinvtran

Great work so far.

I do wonder if we should implement this at a slightly higher level, as what you're returning is not the original random variable but its marginal distribution.
If x_full represents the marginal distribution of x, it's no longer x's original distribution. So maybe this warrants defining a Marginal rv class that takes x as input, and where the currently implemented function would be _sample_n. It would be called with a default setting of n (and cache its output to a class member) when the user calls methods such as _log_prob.

dustinvtran · 2017-10-02T14:18:37Z

edward/util/random_variables.py

@@ -778,3 +778,66 @@ def transform(x, *args, **kwargs):
  new_x = TransformedDistribution(x, bij, *args, **kwargs)
  new_x.support = new_support
  return new_x
+
+
+def marginal(x, n):


Functions should be placed according to alphabetical ordering of function names.

dustinvtran · 2017-10-02T14:20:09Z

edward/util/random_variables.py

+
+  Returns:
+    tf.Tensor.
+    The fully sampled values from x, of shape [n] + x.shape


x.shape = x.sample_shape + x.batch_shape + x.event_shape. You replace the sample_shape, so the output should have shape [n] + x.batch_shape + x.event_shape. But I guess it currently fails if x.sample_shape is non-scalar anyways.

dustinvtran · 2017-10-02T14:24:08Z

edward/util/random_variables.py

+  new_roots = []
+  for rv in old_roots:
+    new_rv = copy(rv)
+    new_rv._sample_shape = tf.TensorShape(n).concatenate(new_rv._sample_shape)


tf.TensorShape() fails if n is a tf.Tensor

This also came up when I looked into #774, and I think would need to be solved at the same time. Sample shape needs a TensorShape, and there's no nice way to turn tensor n into one (I don't think).

So I think either sample_shape needs to be interpreted as 'whatever gets passed to sample' and therefore is stored as a tensor, not a tensorshape. This would solve #774, and I don't think would break much. Another alternative is having sample_shape and sample_shape_tensor attributes built from the actual tensor representation of the RV.

Let me know if you'd prefer this is implemented in the same PR, I'll push the other changes.

dustinvtran · 2017-10-02T14:40:32Z

tests/util/test_marginal.py

+import tensorflow as tf
+
+from edward.models import Normal, InverseGamma
+from tensorflow.contrib.distributions import bijectors


I found this test very intuitive. Great work. One note: you don't use the bijectors module

dustinvtran · 2017-10-02T14:41:54Z

tests/util/test_marginal.py

+
+  def test_sample_passthrough(self):
+    with self.test_session():
+      loc = Normal(0.0, 100.0)


It's with very low probability this can produce a false negative/positive, but in general you should always set seed in tests when checking randomness.

cshenton · 2017-10-02T22:47:00Z

Creating a Marginal class crossed my mind, in particular when I thought how to implement the API as you described it, first creating the marginal RV and then being able to repeatedly sample it.

My thoughts are that the best way to add that would be to add a marginal method to RandomVariable, which would perform the graph manipulation done here, but attach a placeholder as we discussed in #759. So it could be something like:

rv.sample(10, deep=True)
# raises ValueError: 'must take marginal of rv before deep sampling'
marginal_rv = rv.marginal()
marginal_rv.sample(10)
# works as before, sets placeholder to one then samples 10 from the distribution
marginal_rv.sample(10, deep=True)
# desired functionality, sets placeholder to 10 then samples once from batch result

It means more functionality is added to RV, but it makes sense that a function on an RV that returns an RV would be a class method. I'd favour the explicit marginal method since we're doing a copy.

It also makes me a little less squeamish about all this private attribute access, even though its not on self, being on members of the class the method is in will make things easier to fix if breaking changes get made elsewhere on the class.

cshenton added 16 commits October 2, 2017 17:44

first pass implementation of ed.marginal

9788171

failing tests for single RV case

0525026

more limited marginal implementation with basic tests passing

bee3b3b

added more complex test cases

4dbb36f

test confirming sampling passthrough

c4d7b9e

pep8 fix

b7f4e86

added example to docstring

5c78557

first pass implementation of ed.marginal

e652079

failing tests for single RV case

7b4020b

more limited marginal implementation with basic tests passing

4e8c946

added more complex test cases

0f26cd7

test confirming sampling passthrough

c81e381

pep8 fix

48600dd

added example to docstring

f4b69d9

moved marginal tests to new util test pacakge

ac90792

fixed conflicts with remote

f473a6a

cshenton changed the title ~~Marginal Utility Function~~ Marginal Util Function Oct 2, 2017

removed duplicate test module

4a59832

dustinvtran reviewed Oct 2, 2017

View reviewed changes

test seed and formatting

0053375

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Marginal Util Function #778

Marginal Util Function #778

cshenton commented Oct 2, 2017 •

edited

dustinvtran left a comment

dustinvtran Oct 2, 2017

dustinvtran Oct 2, 2017

dustinvtran Oct 2, 2017

cshenton Oct 2, 2017

dustinvtran Oct 2, 2017

dustinvtran Oct 2, 2017

cshenton commented Oct 2, 2017

Marginal Util Function #778

Are you sure you want to change the base?

Marginal Util Function #778

Conversation

cshenton commented Oct 2, 2017 • edited

dustinvtran left a comment

Choose a reason for hiding this comment

dustinvtran Oct 2, 2017

Choose a reason for hiding this comment

dustinvtran Oct 2, 2017

Choose a reason for hiding this comment

dustinvtran Oct 2, 2017

Choose a reason for hiding this comment

cshenton Oct 2, 2017

Choose a reason for hiding this comment

dustinvtran Oct 2, 2017

Choose a reason for hiding this comment

dustinvtran Oct 2, 2017

Choose a reason for hiding this comment

cshenton commented Oct 2, 2017

cshenton commented Oct 2, 2017 •

edited