NOTE: This message is available as Jupyter Notebook in https://github.com/bayespy/bayespy-notebooks/blob/master/notebooks/issue33.ipynb and can be run interactively in Binder: [![Binder](http://mybinder.org/badge.svg)](http://mybinder.org/repo/bayespy/bayespy-notebooks/notebooks/issue33.ipynb)

Thanks a lot for the simplification! The problem is that the distribution type of `Mixture` must be a distribution, not a deterministic operation. Here you seem to want `Y` to be a mixture of Gaussians, with mean `X` and precision `1/sigma**2`. You should define `X` with `SumMultiply`, but `Y` as a Gaussian mixture.

First, define the cluster assignments:

In [None]:
from bayespy.nodes import Dirichlet, Categorical
N = 40 # number of data samples
M = 3 # number of clusters (i.e., mixture components)
alpha = Dirichlet([1e-3] * M) # cluster probabilities or "sizes"
theta = Categorical(alpha, plates=(N,)) # assignment for each data sample

As you observe `r`, is it necessary to give it Gaussian prior? Is it sufficient just to fix it, or do you specifically want to estimate its distribution? I'll assume here that `r` is just some inputs you can use as fixed values. Also, which of your variables `w` and `r` have cluster-specific values? I assume you want to learn different weight vectors `w` for clusters but each input `r` doesn't depend on the cluster. I might be guessing wrong here, so you may need to fix these snippets accordingly. I also assume that `w` is the same for each data sample but `r` is sample-specific. These assumptions affect how I use the plates for `w` and `r`. Easy to change if I have made wrong assumptions.

In [None]:
from bayespy.nodes import Gaussian, SumMultiply
p = 5 # dimensionality fo w and r
import numpy as np
w = Gaussian(np.zeros(p), 1e-3*np.identity(p), plates=(1,M)) # weight vectors, no sample plate
r = np.random.randn(N, 1, p)  # fixed inputs, no cluster plate
X = SumMultiply('i,i->', w, r)

Construct the mixture distribution. Again, do you want to have shared or different precision parameter for each cluster? Here I assume different, but that can be modified in the plates.

In [None]:
from bayespy.nodes import Mixture, Gamma, GaussianARD
tau = Gamma(1e-3, 1e-3, plates=(M,)) # Do you want to have different precision for each cluster?
Y = Mixture(theta, GaussianARD, X, tau)

Just some random data generated with two different weight vectors:

In [None]:
data1 = np.einsum('...i,...i->...', np.random.randn(p), r)
data2 = np.einsum('...i,...i->...', np.random.randn(p), r)
data = np.where(np.random.rand(N, 1)>0.5, data1, data2)[:,0]
Y.observe(data)

Break the symmetry in the model in order to not converge to a bad trivial solution:

In [None]:
theta.initialize_from_random()

Create the inference engine:

In [None]:
from bayespy.inference import VB
Q = VB(Y, w, tau, theta, alpha)

Run VB algorithm:

In [None]:
Q.update(repeat=200)

Now you can use the results. For instance, plot the cluster assignments for each sample:

In [None]:
%matplotlib inline
import bayespy.plot as bpplt
bpplt.hinton(theta)

Or plot the weight vectors for each cluster:

In [None]:
bpplt.hinton(w)

Does this answer your question? :) I can give more details on some steps if you like. Also, if I misunderstood something (or everything) please correct me. :)