*Note: This message is a Jupyter Notebook available for [download](https://github.com/bayespy/bayespy-notebooks/blob/master/notebooks/issue34.ipynb) or for [interactive session](http://mybinder.org/repo/bayespy/bayespy-notebooks/notebooks/issue34.ipynb). It is a response to BayesPy issue [#34](https://github.com/bayespy/bayespy/issues/34)*.

Thanks for the positive feedback! Glad to hear you like the package. :)

So, you asked about conditioning `theta3` on both `theta1` and `theta2`. It is easy to do by "nesting" `Mixture` nodes as follows:

In [None]:
from bayespy.nodes import Categorical, Beta, Mixture
lambda1 = Beta([20,5]) 
lambda2 = Beta([[5,20],[20,5]])
lambda3 = Beta([ [[5,20],[20,5]],
                 [[3,40],[40,3]] ])
theta1 = Categorical(lambda1)
theta2 = Mixture(theta1, Categorical, lambda2)
theta3 = Mixture(theta1, Mixture, theta2, Categorical, lambda3)

You could use `Gate` nodes as you did. Doesn't really matter in your case. There are cases where only either one works correctly, but in your case it's a matter of preference. With nested `Gate` nodes, `theta3 = Categorical(Gate(theta1, Gate(theta2, lambda3)))` should work. I used `Mixture` so you could see an alternative.

I also used `Beta` instead of `Dirichlet`. Doesn't matter, just a thought. And thanks for reporting that `Bernoulli` isn't accepted by `Gate`/`Mixture` node. I made a bug report of it. I used `Mixture` for `X` nodes too, instead of `Gate`, and Bernoulli instead of Binomial with one trial. I also removed `delta` nodes. Don't really affect anything, just another way of writing.

In [None]:
pi1 = Beta([[[5,20], [20,5]]], plates=(10,2))
pi2 = Beta([[[5,20], [20,5]]], plates=(10,2))
pi3 = Beta([[[5,20], [20,5]]], plates=(10,2))

from bayespy.nodes import Bernoulli
X1 = Mixture(theta1, Bernoulli, pi1)
X2 = Mixture(theta2, Bernoulli, pi2)
X3 = Mixture(theta3, Bernoulli, pi3)

X1.observe([1,1,1,1,1,1,1,1,1,1])
X2.observe([0,0,0,0,0,0,0,0,0,0])
X3.observe([0,1,0,1,0,1,0,1,0,1])

Then just run the inference:

In [None]:
from bayespy.inference import VB
Q = VB(X1, X2, X3, pi1, pi2, pi3, theta3, theta2, theta1, lambda1, lambda2, lambda3)
Q.update(repeat=100)

A minor detail: I list the nodes in such an order that child nodes are before their parents, thus `theta3` is before `theta2`. I think this makes sense when you observe the leave nodes and thus that information flows from children to parents.

Finally, you can, for instance, look at the posterior probabilities for `theta`:

In [None]:
print(theta1.get_moments()[0])
print(theta2.get_moments()[0])
print(theta3.get_moments()[0])

You may have noticed that you need to use separate nodes for each `X1`, `X2`, `X3` etc. This is because `theta` nodes have dependencies that can't be represented by any single built-in node but need to be separated. Thus, all the child nodes need to be separate too. If you have a large amount of `X1, ..., Xn` nodes, you probably want to generate them programmatically, maybe using list comprehensions or for loops. There could be a node class to concatenate several nodes into a single "array", but it's not implemented yet, and I doubt it would make any significant difference here.

I hope this helps. If something was unclear or if I misunderstood something, please comment.