Add relation for the Beta-Binomial observation model #29

rlouf · 2022-04-21T08:21:57Z

In this PR we add a relation that represents the application of Bayes theorem to a binomial observation model and a beta prior. The relation is between the beta-binomial model, the observed value, and the closed-form posterior for the parameter. This will allow aemcmc to sample from closed-form posteriors whenever applying the Bayes theorem returns a known distribution. I have often seen people using pyMC3's sampler for such simple models, so there is no doubt this could be useful.

Similarly to Symbolic PyMC we also introduce a "conjugate" kanren Relation object that will store this goal and similar ones.

Related to #4

rlouf · 2022-04-21T10:10:38Z

kanren returns the expected expression from a model graph that represents the
beta-binomial observation model. However the evaluation of the expression
returned by kanren raises the following exception:

AttributeError: 'RandomGeneratorType' object has no attribute 'ndim'

This seems to be a more general issue, or a misunderstanding from my part. The following minimal example reproduces the error:

import aesara.tensor as at
from unification import var, unify, reify
from etuples import etuplize, etuple

srng = at.random.RandomStream(0)
Y = srng.beta(1., 1.)

y_srng_lv, y_size_lv, y_type_idx_lv = var(), var(), var()
alpha_lv, beta_lv = var(), var()
y_lv = etuple(etuplize(at.random.beta), y_srng_lv, y_size_lv, y_type_idx_lv, alpha_lv, beta_lv)

s = unify(Y, y_lv)
reify(y_lv, s).eval_obj
# AttributeError: 'RandomGeneratorType' object has no attribute 'ndim'. Did you mean: 'evaled_obj'?

zoj613 · 2022-04-21T10:33:47Z

@rlouf could it be that the variables are in the wrong order? It looks to me like a numpy array is expected but an RV object is supplied instead.

rlouf · 2022-04-21T10:44:34Z

The issue is subtler than that. The signature of RandomVariable's __call__ method (definition) accepts rng, size and type as keyword arguments:

def __call__(self, *args, rng=None, size=None, type=None)

So that when etuples evaluates the Op here with all arguments contained in op_args.args, RandomVariable's make_nodes is passed None for rng, type, size, and op_args (which contains the rng, size and type) as dist_args.

make_node then tries to infer the shape of the elements in dist_args, including the rng which has no shape.

The solution is to pass rng, size and dtype via op_args.kwargs in etuples, mirroring the way you would define a beta random variable in Aesara:

p_posterior_lv = etuple(
    etuplize(at.random.beta),
    new_alpha_lv,
    new_beta_lv,
    rng=p_srng_lv,
    size=p_size_lv,
    dtype=p_type_idx_lv,
)

codecov · 2022-04-21T11:01:24Z

Codecov Report

Merging #29 (41b3142) into main (9510be2) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##              main       #29   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            2         3    +1     
  Lines          185       206   +21     
  Branches        11        11           
=========================================
+ Hits           185       206   +21

Impacted Files	Coverage Δ
aemcmc/conjugates.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9510be2...41b3142. Read the comment docs.

brandonwillard

Just to be safe, we should add kanren, logical-unification, and etuples as direct requires. We're currently getting those from Aesara, but that's too indirect.

Otherwise, this looks great!

brandonwillard

Don't forget that this can be made into an immediately applicable Aesara rewrite using KanrenRelationSub (see here for more information).

rlouf · 2022-04-24T12:32:35Z

Added the missing imports. I am not sure how I am going to orchestrate the sampler building yet but I'll remember KanrenRelationSub in mind should we choose to rewrite the graph.

brandonwillard · 2022-04-25T18:48:22Z

tests/test_conjugates.py

+def test_beta_binomial_conjugate():
+    """Test that we can produce the closed-form posterior for
+    the binomial observation model with a beta prior.
+
+    """


Does it work the other way: i.e. can it expand a beta into a binomial and a beta?

Great point, I'm still used to thinking one way. I'll expand the test.

Update: No, it doesn't work the other way. I am trying to understand why.

Don't worry, it doesn't need to right now, but it's good to understand why.

brandonwillard · 2022-04-27T20:38:35Z

I forgot to mention this old Symbolic PyMC example for the beta-binomial: Symbolic-PyMC Beta-Binomial Conjugate Example.

brandonwillard · 2022-04-27T20:50:14Z

Added the missing imports. I am not sure how I am going to orchestrate the sampler building yet but I'll remember KanrenRelationSub in mind should we choose to rewrite the graph.

We want the material in this repository to be more immediately useful, and we could easily accomplish that by creating an OptimizationDatabase that contains conjugate rewrites—like this one—that are implemented using KanrenRelationSub.

The changes/additions necessary to accomplish this are fairly minimal, so they could be reasonably included in this PR.

rlouf · 2022-04-28T10:30:48Z

Ok I'll give it a try!

rlouf · 2022-04-28T14:25:06Z

I updated the code. The relation does not work the other way, and it is related to the issue I was mentioning above. It all boils down to RandomVariable's __call__ function taking rng, size and dtype as keyword arguments. And thus the need to use keyword arguments in the etuple that correspond to the target expression (otherwise it will raise an exception when reifying). With the following relation we are able to contract the beta-binomial observation model:

def beta_binomial_conjugateo(model_expr, observation_expr, posterior_expr):
      # Beta-binomial observation model
    alpha_lv, beta_lv = var(), var()
    p_rng_lv = var()
    p_size_lv = var()
    p_type_idx_lv = var()
    p_et = etuple(
        etuplize(at.random.beta), p_rng_lv, p_size_lv, p_type_idx_lv, alpha_lv, beta_lv
    )
    n_lv = var()
    Y_et = etuple(etuplize(at.random.binomial), var(), var(), var(), n_lv, p_et)

    y_lv = var()  # observation

    # Posterior distribution for p
    new_alpha_et = etuple(etuplize(at.add), alpha_lv, y_lv)
    new_beta_et = etuple(
        etuplize(at.sub), etuple(etuplize(at.add), beta_lv, n_lv), y_lv
    )
    p_posterior_et = etuple(
        etuplize(at.random.beta),
        new_alpha_et,
        new_beta_et,
        rng=p_rng_lv,
        size=p_size_lv,
        dtype=p_type_idx_lv,
    )

    return lall(
        eq(model_expr, Y_et),
        eq(observation_expr, y_lv),
        eq(posterior_expr, p_posterior_et),
    )

And to be able to expand the contracted posterior we would need to use the following relation:

def beta_binomial_conjugateo(model_expr, observation_expr, posterior_expr):
      # Beta-binomial observation model
    alpha_lv, beta_lv = var(), var()
    p_rng_lv = var()
    p_size_lv = var()
    p_type_idx_lv = var()
    p_et = etuple(
        etuplize(at.random.beta),
        alpha_lv,
        beta_lv,
        rng=p_rng_lv,
        size=p_size_lv, 
        dtype=p_type_idx_lv,
    )
    n_lv = var()
    Y_et = etuple(etuplize(at.random.binomial), var(), var(), var(), n_lv, p_et)

    y_lv = var()  # observation

    # Posterior distribution for p
    new_alpha_et = etuple(etuplize(at.add), alpha_lv, y_lv)
    new_beta_et = etuple(
        etuplize(at.sub), etuple(etuplize(at.add), beta_lv, n_lv), y_lv
    )
    p_posterior_et = etuple(
        etuplize(at.random.beta),
        p_rng_lv,
        p_size_lv,
        p_type_idx_lv,
        new_alpha_et,
        new_beta_et,

    )

    return lall(
        eq(model_expr, Y_et),
        eq(observation_expr, y_lv),
        eq(posterior_expr, p_posterior_et),
    )

Can we somehow change the behavior of the reify for RandomVariables or does the issue run deeper?

brandonwillard · 2022-04-28T18:13:24Z

And thus the need to use keyword arguments in the etuple that correspond to the target expression (otherwise it will raise an exception when reifying).

etuples should be able to handle keyword arguments.

rlouf · 2022-04-28T18:52:20Z

I was not clear enough. To see the issue, consider the following example:

import aesara.tensor as at
from aesara.graph.unify import eval_if_etuple
from unification import var, unify, reify
from etuples import etuple, etuplize
from kanren import run, lall, eq

def normaleo(in_expr, out_expr):
    mu_lv, std_lv = var(), var()
    rng_lv, size_lv, dtype_lv = var(), var(), var()
    norm_in_lv = etuple(etuplize(at.random.normal), rng_lv, size_lv, dtype_lv, mu_lv, std_lv)    
    norm_out_lv = etuple(etuplize(at.random.normal), rng_lv, size_lv, dtype_lv, mu_lv, std_lv)

    return lall(eq(in_expr, norm_in_lv), eq(out_expr, norm_out_lv))

srng = at.random.RandomStream(0)
Y_rv = srng.normal(1, 1)

a_lv = var()
(a_expr,) = run(1, a_lv, normaleo(Y_rv, a_lv))
print(eval_if_etuple(a_expr))
# TypeError: too many positional arguments

eval_if_etuple fails because it passes all rng, dtype, size and other arguments as postional arguments, while __call__ expects rng, dtype and size to be passed as kwargs. Indeed the following works:

import aesara.tensor as at
from aesara.graph.unify import eval_if_etuple
from unification import var, unify, reify
from etuples import etuple, etuplize
from kanren import run, lall, eq

def normaleo(in_expr, out_expr):
    mu_lv, std_lv = var(), var()
    rng_lv, size_lv, dtype_lv = var(), var(), var()
    norm_in_lv = etuple(etuplize(at.random.normal), rng_lv, size_lv, dtype_lv, mu_lv, std_lv)
    norm_out_kwd_lv = etuple(etuplize(at.random.normal), mu_lv, std_lv, rng=rng_lv, size=size_lv, dtype=dtype_lv)

    return lall(eq(in_expr, norm_in_lv), eq(out_expr, norm_out_kwd_lv))

srng = at.random.RandomStream(0)
Y_rv = srng.normal(1, 1)

a_lv = var()
(a_expr,) = run(1, a_lv, normaleo(Y_rv, a_lv))
print(eval_if_etuple(a_expr))
# normal_rv{0, (0, 0), floatX, False}.out

But now this does not work the other way:

b_lv = var()
(b_expr,) = run(1, b_lv, normaleo(b_lv, Y_rv))
# ValueError: not enough values to unpack (expected 1, got 0)

We would not need the kwargs in the etuple to being with if the signature of RandomVariable's __call__ function were

def __call__(self, rng, size, dtype, *args)

Is there an easy workaround for this?

brandonwillard · 2022-04-28T23:42:30Z

Is there an easy workaround for this?

The problem appears to be at the etuple level.

In your first example, the goal succeeds and returns the following:

a_expr
# e(
#   e(aesara.tensor.random.basic.NormalRV, 'normal', 0, (0, 0), 'floatX', False),
#   RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7FB710030BA0>),
#   TensorConstant{[]},
#   TensorConstant{11},
#   TensorConstant{1},
#   TensorConstant{1})

eval_if_etuple simply calls a_expr.evaled_obj and that recursively evaluates the etuple by first constructing an op = a_expr[0].evaled_obj (i.e. a NormalRV Op instance) and then performing the op.__call__(*a_expr[1:]) that fails.

If the last step was op.make_node(*a_expr[1:]), then it would work:

a_expr[0].evaled_obj.make_node(*a_expr[1:])
# normal_rv{0, (0, 0), floatX, False}(RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7FB710030BA0>), TensorConstant{[]}, TensorConstant{11}, TensorConstant{1}, TensorConstant{1})

Normally, Op.__call__ dispatches to Op.make_node, but not in the case of NormalRV and some other RandomVariables. The reason for this is that Op.__call__ is being used to provide the NumPy-like interface for those RandomVariables.

We could change those RandomVariables so that Op.__call__ is still just another interface to Op.make_node, but we would need all Ops to stick to this passthrough relationship. That's not a bad thing, but it is somewhat restrictive.

Another option is to not use Ops directly during etuplization, and instead use a wrapper class with a __call__ that defers to Op.make_node. Here's an example:

from dataclasses import dataclass
from aesara.graph import Op, Variable
from cons.core import ConsError, _car, _cdr


@dataclass
class MakeNodeOp:
    op: Op

    def __call__(self, *args):
        return self.op.make_node(*args)


def car_Variable(x):
    if x.owner:
        return MakeNodeOp(x.owner.op)
    else:
        raise ConsError("Not a cons pair.")


_car.add((Variable,), car_Variable)


def car_MakeNodeOp(x):
    return type(x)


_car.add((MakeNodeOp,), car_MakeNodeOp)


def cdr_MakeNodeOp(x):
    x_e = etuple(_car(x), x.op, evaled_obj=x)
    return x_e[1:]


_cdr.add((MakeNodeOp,), cdr_MakeNodeOp)

y_et = etuplize(Y_rv)
y_et
# e(
#   e(
#     __main__.MakeNodeOp,
#     e(
#       aesara.tensor.random.basic.NormalRV,
#       'normal',
#       0,
#       (0, 0),
#       'floatX',
#       False)),
#   RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7F8FA11129E0>),
#   TensorConstant{[]},
#   TensorConstant{11},
#   TensorConstant{1},
#   TensorConstant{1})

y_et.evaled_obj
# normal_rv{0, (0, 0), floatX, False}.out

There are a few other ways this could be done, but those should illustrate the basics of any viable approach.

rlouf · 2022-04-29T07:09:30Z

I see, so I can add this logic in Aesara and add a pytest.mark.xfail decorator to test_beta_binomial_expand?

brandonwillard · 2022-04-29T18:32:31Z

aemcmc/conjugates.py

+from unification import var
+
+
+def beta_binomial_conjugateo(model_expr, observation_expr, posterior_expr):


The underlying relation isn't clear from these arguments. For example, why is there an observation_expr? It seems like this should be implementing an equality with only two terms.

brandonwillard · 2022-04-29T18:33:16Z

I see, so I can add this logic in Aesara and add a pytest.mark.xfail decorator to test_beta_binomial_expand?

Yes, but add a reason="..." to the xfail.

brandonwillard · 2022-05-26T15:54:06Z

aemcmc/conjugates.py

+conjugatesdb = OptimizationDatabase()
+
+
+def beta_binomial_conjugateo(observation_model_expr, value_expr, posterior_expr):


Just like my previous comment, it's not clear why this relation should have three inputs instead of two. We need our relations to directly represent equalities, so that they can be applied generally.

rlouf force-pushed the add-conjugates branch from 42f2c4a to 2c62136 Compare April 21, 2022 08:32

rlouf force-pushed the add-conjugates branch from 2c62136 to 3036148 Compare April 21, 2022 10:55

rlouf force-pushed the add-conjugates branch from 3036148 to 5e0c7e1 Compare April 21, 2022 11:12

rlouf self-assigned this Apr 21, 2022

rlouf added the enhancement New feature or request label Apr 21, 2022

rlouf requested review from brandonwillard and zoj613 April 21, 2022 11:15

rlouf force-pushed the add-conjugates branch 2 times, most recently from bb82ca7 to 200a569 Compare April 21, 2022 12:39

brandonwillard reviewed Apr 22, 2022

View reviewed changes

brandonwillard added miniKanren exact posterior labels Apr 22, 2022

brandonwillard reviewed Apr 22, 2022

View reviewed changes

rlouf force-pushed the add-conjugates branch 2 times, most recently from ea0825b to 4d30768 Compare April 24, 2022 12:21

rlouf requested a review from brandonwillard April 24, 2022 12:29

rlouf linked an issue Apr 25, 2022 that may be closed by this pull request

Add support for closed-form posteriors #4

Closed

brandonwillard reviewed Apr 25, 2022

View reviewed changes

rlouf force-pushed the add-conjugates branch from 4d30768 to be4f502 Compare April 28, 2022 13:55

rlouf force-pushed the add-conjugates branch from be4f502 to f1df8cb Compare April 29, 2022 07:23

brandonwillard reviewed Apr 29, 2022

View reviewed changes

rlouf force-pushed the add-conjugates branch 2 times, most recently from 7961626 to b4c71d7 Compare May 10, 2022 08:02

rlouf force-pushed the add-conjugates branch from b4c71d7 to f552739 Compare May 25, 2022 14:32

brandonwillard reviewed May 26, 2022

View reviewed changes

rlouf force-pushed the add-conjugates branch from dbb7282 to 41b3142 Compare June 6, 2022 12:15

rlouf added 2 commits June 6, 2022 14:17

Add goal for beta-binomial observation model

3f2e4e4

Register the conjugates relation in a database

41b3142

brandonwillard approved these changes Jun 6, 2022

View reviewed changes

rlouf merged commit 87d6f75 into aesara-devs:main Jun 7, 2022

rlouf deleted the add-conjugates branch June 7, 2022 09:28

rlouf mentioned this pull request Jun 16, 2022

Add relation for recentering and rescaling #31

Merged

3 tasks

This was referenced Jun 29, 2022

Evaluation of etuplized RandomVariables fails for some distributions aesara-devs/aesara#1002

Open

Make sure that the beta-binomial relation can expand both ways #43

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add relation for the Beta-Binomial observation model #29

Add relation for the Beta-Binomial observation model #29

rlouf commented Apr 21, 2022 •

edited

Loading

rlouf commented Apr 21, 2022 •

edited

Loading

zoj613 commented Apr 21, 2022

rlouf commented Apr 21, 2022 •

edited

Loading

codecov bot commented Apr 21, 2022 •

edited

Loading

brandonwillard left a comment

brandonwillard left a comment

rlouf commented Apr 24, 2022

brandonwillard Apr 25, 2022

rlouf Apr 25, 2022 •

edited

Loading

brandonwillard Apr 27, 2022

brandonwillard commented Apr 27, 2022

brandonwillard commented Apr 27, 2022

rlouf commented Apr 28, 2022

rlouf commented Apr 28, 2022

brandonwillard commented Apr 28, 2022

rlouf commented Apr 28, 2022 •

edited

Loading

brandonwillard commented Apr 28, 2022 •

edited

Loading

rlouf commented Apr 29, 2022 •

edited

Loading

brandonwillard Apr 29, 2022

brandonwillard commented Apr 29, 2022

brandonwillard May 26, 2022

		from unification import var


		def beta_binomial_conjugateo(model_expr, observation_expr, posterior_expr):

		conjugatesdb = OptimizationDatabase()


		def beta_binomial_conjugateo(observation_model_expr, value_expr, posterior_expr):

Add relation for the Beta-Binomial observation model #29

Add relation for the Beta-Binomial observation model #29

Conversation

rlouf commented Apr 21, 2022 • edited Loading

rlouf commented Apr 21, 2022 • edited Loading

zoj613 commented Apr 21, 2022

rlouf commented Apr 21, 2022 • edited Loading

codecov bot commented Apr 21, 2022 • edited Loading

Codecov Report

brandonwillard left a comment

Choose a reason for hiding this comment

brandonwillard left a comment

Choose a reason for hiding this comment

rlouf commented Apr 24, 2022

brandonwillard Apr 25, 2022

Choose a reason for hiding this comment

rlouf Apr 25, 2022 • edited Loading

Choose a reason for hiding this comment

brandonwillard Apr 27, 2022

Choose a reason for hiding this comment

brandonwillard commented Apr 27, 2022

brandonwillard commented Apr 27, 2022

rlouf commented Apr 28, 2022

rlouf commented Apr 28, 2022

brandonwillard commented Apr 28, 2022

rlouf commented Apr 28, 2022 • edited Loading

brandonwillard commented Apr 28, 2022 • edited Loading

rlouf commented Apr 29, 2022 • edited Loading

brandonwillard Apr 29, 2022

Choose a reason for hiding this comment

brandonwillard commented Apr 29, 2022

brandonwillard May 26, 2022

Choose a reason for hiding this comment

rlouf commented Apr 21, 2022 •

edited

Loading

rlouf commented Apr 21, 2022 •

edited

Loading

rlouf commented Apr 21, 2022 •

edited

Loading

codecov bot commented Apr 21, 2022 •

edited

Loading

rlouf Apr 25, 2022 •

edited

Loading

rlouf commented Apr 28, 2022 •

edited

Loading

brandonwillard commented Apr 28, 2022 •

edited

Loading

rlouf commented Apr 29, 2022 •

edited

Loading