# Recommending ingredients using restricted Boltzmann machines

The most well-known applications of artificial neural networks is into supervised learning. Using a trainable non-linear expansion of the features, they can be used to classify images, identify odors identify genes and many other things.

In contrast to supervised learning, which learns to map input to output, unsupervised learning deals with general patterns in data. It could be argued (and has been!) that the latter is the more challenging task, as it is fairly straightforward how to fit a function to labeled training data, while there is no obvious thing to optimize for finding patterns. Though in general, unsupervised learning is somewhat less well understood, it is likely the key to better AI applications, as in general we have much more unlabeled data to learn from compared to labeled data.

In this post, we will discuss the restricted Bolzmann machine (RBM) as a way of fitting a distribution of a complex dataset. Like many unsupervised methods, RBMs are probabilistic models that make use of latent variables. This means that our visible data $\mathbf{v}$ is explained in terms to some unobserved hidden variables $\mathbf{h}$. The way how the hidden variables are linked to the visibele variables should give us some insight in how the data might be generated.

By means of an application, we will train a RBM on the recipe data of Ahn et al. to learn the distribution of recipes. This will allow us to find new ingredient combinations and create novel recipes.

## The restricted Boltzmann machine

Suppose we have binary data vectors $\mathbf{v} = [v_i]$, which could represent a (binary) image, preference for certain movies, presence or absence of ingredients in a recpipe... Futhermore, we have binary hidden variabeles $\mathbf{h} = [h_i]$ which should explain these observed data vectors. These we have to learnt from the data. In this post we will only consider binary data, making generalization to interger or real values is relatively straightforward. 

The degree to which a pair $(\mathbf{v}, \mathbf{h})$ match is given by their energy $E(\mathbf{v}, \mathbf{h})$, dyads that are likely are more favorable. This can be quantified probabilistically using the Boltzmann distribution:

$$
\mathcal{P}(\mathbf{v}, \mathbf{h}) = \frac{1}{Z}e^{-E(\mathbf{v}, \mathbf{h})}\,,
$$
with $Z$ the partition function to make this distribution normalized
$$
Z = \sum_\mathbf{v} \sum_\mathbf{h}e^{-E(\mathbf{v}, \mathbf{h})}\,.
$$
Unfortunately, to calculate this partition function (and hence calculate probabilities), we have to calculate a sum over a combinatorial large space!

* beschrijving data 
* uitleg RMB
* leerregels

## Example: bars and stripes
* voorbeeldje uitleggen en tonen

## Thoughts for Food: a recipe restricted Boltzmann machine

In [None]:
!git add RBM_blogpost.ipynb
!git commit -am 'begun bloh'