- title: Dreaming Boltzmann Machines
- summary: Biologically plausible machine learning
- date: 2019-03-08
- status: draft

I recently read [The Miracle of the Boltzmann Machine]( https://theneural.wordpress.com/2011/07/08/the-miracle-of-the-boltzmann-machine/), and it was so compelling that I've been thinking about it ever since. I'd like to invite you into my excitement, and although this is not a beginner post, I hope to be able to do better than, "Unfortunately the Boltzmann Machine can only be understood by using Math. So you’ll like it if you know math too."

## Biological plausibility

If a machine learning algorithm doesn't violate any of the constraints neuroscience tells us the brain probably has to obey, then that algorithm is "biologically plausible". It means it's a candidate model for how the brain may actually work.

It may surprise you to hear that although artificial neural networks take inspiration from animal brains, most algorithms that work well for training ANNs are _not_ biologically plausible, and are therefore not how the brain actually works.

## Boltzmann Machines

![Figure 1: Boltzmann Machine](https://upload.wikimedia.org/wikipedia/commons/7/7a/Boltzmannexamplev1.png#right)

A Boltzmann machine (BM) is a kind of neural network that comes with a biologically plausible training algorithm. There are some caveats to this, of course, which is why it's only a _model_ of how the brain might work, but I think you'll agree that BMs share some remarkable similarities to animal brains: they sleep, they dream, and then they forget their dreams.

## Mathematical Definition

BMs are a kind of recurrent neural network, meaning that their connections can loop back on themselves. They have some "visible" neurons which can be directly connected to the outside world (eyes, ears, etc.), and some "hidden" neurons which are not directly connected to the outside world. The example BM pictured here has four visible neurons ($V=[v_1, v_2, v_3, v_4]$) and three hidden neurons ($H=[h_1, h_2, h_3]$).

Each neuron can either be _off_ (0) or _on_ (1), with some probability. We can express the probability of a given entire configuration $X$ of neurons at once with

$$P(X) := {e^{X^TWX/2}\over \sum\limits_{X'} e^{X'^TWX'/2}}$$

where $X$ is the binary column vector of the on/off, 1/0 states of all neurons (so $X=(V,H)$), which in our example is

$$X=
  \begin{bmatrix}
    x_1 \\
    x_2 \\
    x_3 \\
    x_4 \\
    x_5 \\
    x_6 \\
    x_7
  \end{bmatrix}
  =
  \begin{bmatrix}
    v_1 \\
    v_2 \\
    v_3 \\
    v_4 \\
    h_1 \\
    h_2 \\
    h_3
  \end{bmatrix}
$$

and W is the matrix of weights $w_{ij}$ on each connection between neurons $x_i$ and $x_j$. So

$$X^TWX/2 =
  \begin{bmatrix}
    x_1 & x_2 & x_3 & x_4 & x_5 & x_6 & x_7 \\
  \end{bmatrix}
  \begin{bmatrix}
    0 & w_{12} & w_{13} & w_{14} & w_{15} & w_{16} & w_{17} \\
    w_{21} & 0 & w_{23} & w_{24} & w_{25} & w_{26} & w_{27} \\
    w_{31} & w_{32} & 0 & w_{34} & w_{35} & w_{36} & w_{37} \\
    w_{41} & w_{42} & w_{43} & 0 & w_{45} & w_{46} & w_{47} \\
    w_{51} & w_{52} & w_{53} & w_{54} & 0 & w_{56} & w_{57} \\
    w_{61} & w_{62} & w_{63} & w_{64} & w_{65} & 0 & w_{67} \\
    w_{71} & w_{72} & w_{73} & w_{74} & w_{75} & w_{76} & 0
  \end{bmatrix}
    \begin{bmatrix}
    x_1/2 \\
    x_2/2 \\
    x_3/2 \\
    x_4/2 \\
    x_5/2 \\
    x_6/2 \\
    x_7/2
  \end{bmatrix}$$

which I won't expand all the way, but I'd like to point out that each connection between two neurons $i$ and $j$ is going to be represented by a single pair of terms in the expansion like this:

$$...+{x_ix_jw_{ij}\over 2}+...+{x_ix_jw_{ji}\over 2}+...$$

and since $w_{ij}=w_{ji}$, these will add together to make a single ${x_ix_jw_{ij}}$. In other words, the expansion of $X^TWX/2$ is just the sum of all ${x_ix_jw_{ij}}$:

$$X^TWX/2=\sum\limits_{i<j} x_ix_jw_{ij}$$

1. Hebbian
2. Impractical unless connectivity constrained