-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conditional parents #34
Comments
Note: This message is a Jupyter Notebook available for download or for interactive session. It is a response to BayesPy issue #34. Thanks for the positive feedback! Glad to hear you like the package. :) So, you asked about conditioning from bayespy.nodes import Categorical, Beta, Mixture
lambda1 = Beta([20,5])
lambda2 = Beta([[5,20],[20,5]])
lambda3 = Beta([ [[5,20],[20,5]],
[[3,40],[40,3]] ])
theta1 = Categorical(lambda1)
theta2 = Mixture(theta1, Categorical, lambda2)
theta3 = Mixture(theta1, Mixture, theta2, Categorical, lambda3) You could use I also used pi1 = Beta([[[5,20], [20,5]]], plates=(10,2))
pi2 = Beta([[[5,20], [20,5]]], plates=(10,2))
pi3 = Beta([[[5,20], [20,5]]], plates=(10,2))
from bayespy.nodes import Bernoulli
X1 = Mixture(theta1, Bernoulli, pi1)
X2 = Mixture(theta2, Bernoulli, pi2)
X3 = Mixture(theta3, Bernoulli, pi3)
X1.observe([1,1,1,1,1,1,1,1,1,1])
X2.observe([0,0,0,0,0,0,0,0,0,0])
X3.observe([0,1,0,1,0,1,0,1,0,1]) Then just run the inference: from bayespy.inference import VB
Q = VB(X1, X2, X3, pi1, pi2, pi3, theta3, theta2, theta1, lambda1, lambda2, lambda3)
Q.update(repeat=100) A minor detail: I list the nodes in such an order that child nodes are before their parents, thus Finally, you can, for instance, look at the posterior probabilities for print(theta1.get_moments()[0])
print(theta2.get_moments()[0])
print(theta3.get_moments()[0]) You may have noticed that you need to use separate nodes for each I hope this helps. If something was unclear or if I misunderstood something, please comment. |
Hi @jluttine, thanks for the help and suggestions! So I think I actually had two distinct questions (not sure if I wrote them very distinctly), and your answer definitely answered one, namely conditioning theta3 on theta1 and theta2, so thank you very much! For the other question, I need a deterministic node (delta in the model) to be True if and only if both theta1 and theta2 are True (assuming 1 and True are equivalent here). That means that delta is entirely deterministic, dependent on its parents. I was thinking that I needed some sort of mixture or nested gate. I see that the deltas are just convenience nodes and that we could make the prior for X depend on the values of both theta1 and theta2 directly, but I'm not sure we've achieved that yet...please correct me if I'm wrong! So to make this clearer, I need:
In the model, the status of theta1 and theta2 is coded by the indicator variable delta, and then delta is used as a Gate for which Beta() to use. Additionally, to make sure I understand what's going on I've tried instantiating variables and sampling from them. If I do this:
I regularly get a value of about 0.813 (rather than 0.8). Is this because the inference hasn't been run yet? And of course, I can't run the inference without data... so what's happening here? Just to make sure I'm not nuts I ran the same in PyMC and get 0.8 as expected:
Thanks again for your prompt and thorough responses to my question and others'. I wasn't sure whether to ask this here or on Stackoverflow, because I know it's not your job to answer my questions :-) But are so responsive that I decided to ask here in any case. |
OK, regarding the deterministic node I MIGHT have found a solution.
I get the predicted result:
with the warning:
I'm going to work on the more complex part now, but I just wanted to report that I think this part at least seems to be working... |
Just a very quick comment. I looked into this yesterday a bit, and I think I need to implement a few simple nodes. Your idea is correct in principle, but in practice stochastic intermediate nodes with almost deterministic properties cause problems for the VB approximation. Thus, you might get a very bad posterior approximation which captures only some mode. I was planning to implement I`ll get back to you in a few days with more detailed comments. |
That sounds great. Sorry to create more work. I originally thought that an _and() version of your _or() function in #23 might be of some help but I wasn't quite sure how to apply it to this case. |
Note: This message is a Jupyter Notebook available for download or for interactive session. It is a response to BayesPy issue #34. It's always fun to implement new features that might be useful for someone, so no worries! I implemented
If you notice any bugs, please report. You can read the docstring here: https://github.com/bayespy/bayespy/blob/develop/bayespy/inference/vmp/nodes/take.py#L14 Anyway, so I removed from bayespy.nodes import Categorical, Beta, Mixture
lambda1 = Beta([20,5])
lambda2 = Beta([[5,20],[20,5]])
theta1 = Categorical(lambda1)
theta2 = Mixture(theta1, Categorical, lambda2)
pi1 = Beta([[[5,20], [20,5]]], plates=(10,2))
pi2 = Beta([[[5,20], [20,5]]], plates=(10,2))
pi3 = Beta([[[5,20], [20,5]]], plates=(10,2)) In order to create from bayespy.nodes import Bernoulli, Take
X1 = Mixture(theta1, Bernoulli, pi1)
X2 = Mixture(theta2, Bernoulli, pi2)
X3 = Mixture(theta1, Mixture, theta2, Bernoulli, Take(pi3, [[0, 0], [0, 1]]))
X1.observe([0,1,0,1,0,1,0,1,0,1])
X2.observe([0,1,0,1,0,1,0,1,0,1])
X3.observe([1,1,1,1,1,1,1,1,1,1]) Basically, you can think of that 2x2 index table as a table which tells how to map values of from bayespy.inference import VB
Q = VB(X1, X2, X3, pi1, pi2, pi3, theta2, theta1, lambda1, lambda2)
Q.update(repeat=100)
print(theta1.get_moments()[0])
print(theta2.get_moments()[0])
Oh, it looks like there is a bug in lower bound computation.. I need to take a look.. Anyway, the result looks reasonable for that data. I'm planning to implement a node which would allow users to construct complex discrete graphs as a part of a model and perform "exact"/non-factorized inference within that set of variables. See issue #37 for details. But currently, when constructing these discrete variable graphs with BayesPy it factorizes with respect to the nodes. Thus, it is important to understand what you lose: So, let me demonstrate one problem of the mean-field approximation by using a classic XOR example. There are two booleans v1 = Categorical([0.5, 0.5])
v2 = Categorical([0.5, 0.5])
v3 = Mixture(v1, Mixture, v2, Categorical,
[[[0.9,0.1], [0.1,0.9]],
[[0.1,0.9], [0.9,0.1]]])
v3.observe(1) Intuitively, because of the symmetry, observing v1.initialize_from_random()
v2.initialize_from_random()
Q = VB(v1, v2, v3)
Q.update(repeat=100, verbose=False)
print(v1.get_moments()[0])
print(v2.get_moments()[0])
The true marginals would have been And finally, you asked about getting an incorrect mean (0.813) when sampling a node. Yes, print(np.mean(Bernoulli(Beta([20, 5]), plates=(100000,)).random()))
print(np.mean(Bernoulli(Beta([200, 50]), plates=(100000,)).random()))
print(np.mean(Bernoulli(Beta([20000, 5000]), plates=(100000,)).random()))
print(np.mean(Bernoulli(Beta([2000000, 500000]), plates=(100000,)).random()))
I hope I understood your model correctly this time. Please don't hesitate to ask if you have further questions or comments! |
This is fantastic, thank you so much for your help! I will go through your suggestions in detail and try to implement the full model over the next couple of days and will let you know if I run into any snags. Either way I will report back. Thank you again! |
I have a quick followup about the interpretation of results. If I have the code:
with data
and results
The 6 in the lambda1, and in the second index of b in lambda2, suggests to me that the inference result means that the generator of the data (the person giving responses X, in the model) does NOT have proficiency theta1. But the direct theta1 moments suggest that they in fact do have those proficiencies. I bet I'm mixing the order of something. I thought the order of parameters in the plate was index0=false, index1=true, so that
But maybe I'm wrong about this and this is where my confusion lies? Edit:
But clearly I'm not understanding something about the indexing! Edit 2: When I switch the indexing in the 2-D parameters to reflect [ [a-if-true,b-if-true],[a-if-false,b-if-false]] and have the model:
and I get the results I'd expect:
I assumed that the true/false values for variables mapped to 1 and 0 respectively, making the indexes of the 2-D parameter arrays (like for lambda2 and the pi values) [0,1] or [FALSE,TRUE]. I can handle using [TRUE,FALSE] instead, but then shouldn't the Take node be:
rather than
? |
Ok, I think the confusion comes from My conclusion from this would be that it's not good to use So sorry about suggesting incorrect/confusing solutions, and thanks for pointing it out. :) I hope this helps you solve the issue. |
I'm not sure how to fix this in BayesPy so that it'd be good. Would it be better if |
Yes! That's exactly it. The model in the example I'm trying to replicate uses Beta -> Bernoulli, but I switched to Dirichlet -> Categorical because of the NoConverterError, thinking that the results should be equivalent. Then you suggested I switch back to Beta and I didn't think to switch back to Bernoulli, though I should have. So as I suspected, the issue was rooted in my confusion, though it wasn't in the place I thought it was. I think this gives me everything I need. Thank you so much for your help. I've thought about building some convenience nodes to help with the multiple nesting of gates or mixture nodes with lots of parameters. If I come up with anything particularly useful I'll let you know for your consideration. I don't think making Most of the knowledge I have about Bayesian modeling is from the books "Bayesian Data Analysis" by Gelman and 'Doing Bayesian Data Analysis' by Kruschke. Neither of those talk about Mixture nodes or Gates as far as I'm aware, so I've had a little bit of trouble figuring out what's really going on in your implementation. The other implementation of this model I've done was in PyMC, which allows you to construct your own deterministic nodes, which obviates the need to use these nodes (I think...). I did find a recent paper by Microsoft describing Gate notation, though I haven't been able to find anything similar about a Mixture node. It might be worth adding a few references to the documentation to help users get a better idea of how you're using these nodes, and the plate notation that goes along with it. Thank you again! |
Glad to hear it became clearer. Even I had to think about a while how the Beta / Bernoulli / Categorical mappings go, so it probably isn't very obvious to new users. I have to think about it. Probably I'll add a warning at least, and then maybe use I'm planning to add support for black box variational inference in which case one could write arbitrary deterministic or stochastic nodes easily (similar to PyMC), but using black box variational inference is not as efficient because it's extremely general. But in any case it would be a great and essential feature. It's on the TODO list... And again, please do not hesitate to ask further questions or to report problems. It's extremely valuable to hear what problems users are facing. |
OK, I'm trying to build the dependencies between my latent variables and I keep running into different
gives
while
gives
I'm not super confident using Mixtures/Gates/Take together yet so I'm not sure if this is a problem with my code or something just not implemented yet (like you mentioned above). |
In general, I would suggest avoiding |
You also commented about some convenience node. Here's a sketch that you may find useful. Probably not perfect as this but I hope it gives a starting point. Note that you don't necessarily need to write node classes, but you can use simple functions in some cases.
|
FYI, there is a bug when using nested mixtures. I'm currently looking at it, but at the moment the results are most probably incorrect and you may get warnings about lower bound decreasing. Sorry for the inconvenience, I'll inform you as soon as I get it fixed. #39 |
FYI, I have already tracked down the bug. Yep, nested mixtures were totally incorrect. I already know how to fix it, I'll probably try to do it tomorrow. |
Awesome, thank you for following up about that. The MappedCategoricalMixture is fantastic and has already saved me lots of trial and error getting nested mixtures to work, so thank you!! It all seems to work if I don't use plates, but now I need to put some of the variables on a plate. The question I'm working on now is getting plates to work with nested mixtures. I can get everything to work properly with the rest of the model. I modified the helper function to accept a plates parameter:
The minimal example is:
Seemingly no matter what I put for the plates parameter in theta5 (including omitting the parameter), I get the exact same error:
11 is clearly the numStudents value, but I don't know where it gets [2] from - shouldn't plate dimensions be tuples in any case? Section 2.3.3 of the documentation (Irregular Plates here) seems to suggest that the Mixture inherits plates from the first parameter (the categorical-like node or array), and that seems to be the case for theta2, but I can't get it to work for theta5. The higher-level view of what I'm trying to do is get all of the theta variables on the same "i" plate like in this image: I'm ignoring the "j" and "s" plates and am going to just make multiple nodes, but I do need the "i" plate, and I need all the thetas and Xs to be on the plate, but NOT the lambda or pi nodes. And like I said, it all works when I make a simpler model that doesn't have a nested mixture. I'd appreciate any suggestions! Edit: If I don't plate the theta values and JUST plate the X values, I do seem to get a model that works:
This works great. So I'm not sure what I'm doing wrong above. Maybe the issue has to do with the nested Mixture inheriting plate values? |
The plate broadcasting issue is most likely related to the nested mixture bug I discovered #39. So, with the current code, you most likely experience incorrect results and problems that plates don't match etc. Actually I would say that you do get incorrect results with certainty if using nested mixtures. They are the right way to go, the issue is in BayesPy. Thus, your code might be ok (I didn't take a careful look yet), but at least Mixture node is doing plate axis mapping incorrectly and I hope I'll get that issue fixed ASAP. |
Note: This message is a Jupyter Notebook available for download or for interactive session. It is a response to BayesPy issue #34. I fixed the bug in nested mixtures, so they should work now. I emphasize, that even if you had been able to use them before, all the results have been incorrect with almost certainty. I added a convenience function from bayespy.nodes import *
from bayespy.utils import misc
def MappedCategoricalMixture(thetas, indices, p, **kwargs):
return MultiMixture(thetas, Categorical, Take(p, indices), **kwargs) Note that you will need to use lambda1 = Dirichlet([5,20])
lambda2 = Dirichlet([[20,5], # if lambda1=False
[5,20]]) # if lambda1=True
lambda5 = Dirichlet([[20,5], # none (theta1,theta2) are True
[12.5,12.5],# exactly one of (theta1,theta2) is True
[5,20]]) # both of (theta1,theta2) are True
numStudents = 11
theta1 = Categorical(lambda1, plates=(numStudents,))
theta2 = Mixture(theta1, Categorical, lambda2)
theta5 = MappedCategoricalMixture([theta1, theta2],
[[0, 1], [1, 2]],
lambda5) As I wrote all this into BayesPy quite quickly, there might be errors. If you notice any weird issues, please report. And again, please do ask further questions if you have any, this is really helping me to improve BayesPy. Thank you for your patience with all these issues. |
I'm so pumped - everything seems to work now. I really appreciate your help, both in catching my mistakes and in your responsiveness to fixes in BayesPy. I'm going to do some fairly thorough model validation. If anything seems awry I will let you know. Otherwise, I'll send you a fairly in-depth iPython notebook with the model write-up that you can use as another example if you wish. |
Great! Glad to hear it finally works. 😋 And yes, it'd be great to have an IPython Notebook with explanations about this in the examples. You can make a pull request for the notebook. Currently, the other examples aren't notebooks, but I'm in the process of converting all examples into notebooks, so notebooks are preferred instead of ReST files (if you happend to wonder). |
Might not be relevant to you anymore, but I just found a critical bug in |
Great, thank you for letting me know! |
First of all, I love this package, the documentation, and especially your (@jluttine) enthusiasm and willingness to help people learn how to use it. You are a huge credit to the community, so thanks!
Second, I have a question. I'm trying to implement the "Mixed Number Subtraction" Bayesian network example discussed in this this paper and in Chapter 11 of the book "Bayesian Networks for Educational Assessment." The basic structure is that there's a 'proficiency model' of latent variables THETA, and an 'evidence model' of measurable/observable variables X. The full problem looks like this:
The latent variables (Skills) are modeled as Bernoulli with Beta priors. Each observable variable (Items) has one or more latent variables that maps to it such for a given observable variable X, if all of the THETA values that map to it have a value of 1, then X has a prior of PI=Beta(20,5). Otherwise, X has a prior PI=Beta(5,20).
The full plate notation looks like this:
I'm omitting the Evidence Model (s) from my implementation until I can get just the i and j plates to work. The delta is a deterministic indicator variable that says, for an item X, whether subject i has all of the prerequisite skills - the prior on X changes if the answer to this (encoded by delta) is true or false. I'm also just setting i=1 to have a single subject.
For a single variable THETA an 10 items I can get the following to work:
This seems to work great - i.e. the condition when X depends on the value of only a single THETA. However, I also have the situation in which the prior for theta2 depends on the value of theta1, and the prior for X depends on the value of BOTH variables. What I have so far is:
I suspect there's a more compact way to write all the pi1, pi2, etc, but this seems to work. Now what I need to add is a pi3/delta3/X3 where the prior depends on the value of BOTH theta1 and theta2. I suspect the result has to do with a clever use of gates and mixtures like in issue #23, but I don't really understand how you use mixtures with non-Gaussian nodes.
I would really appreciate any guidance! I have this full model implemented in PyMC2 but it's super slow and uses SO much memory. I would really love to switch to variational Bayes using this package.
Edit: Once I get this working I'd be very happy to contribute it as an example in the repo if you're interested.
Edit 2: Fixed an error in the code
The text was updated successfully, but these errors were encountered: