# Restricted Boltzmann Machines

### Energy-based model
Energy-based models associate a scalar energy to each configuration of the variables of interest. Low energy is more desirable. The probability distribution based on an energy function can be defined as follows
$$ \Pr(x) = \frac{\exp (-E(x))}{Z}\,,$$
where $Z = \sum_{x} \exp (-E(x))$ denotes the normalization factor or **partition function**. 

### Restricted Boltzmann Machine

Restricted Boltzmann Machine (RBM) has an efficient training algorithm. In order to increase the expressive power of the model, we do not observe the example $x$ fully, we also want to introduce some non-observed variables.  Consider an observed part $x$ and a hidden part $h$. We can then write:
$$\Pr(x) = \sum_h \frac{\exp (-E(x, h))}{Z} \,.$$

In RBM, the energy function is defined as
$$
E(x, h) = -a^\top x - b^\top h - x^\top W h \,.
$$

To make RBM as an energy-based model, the free energy function is computed as follows
$$
\begin{align}
F(x) &= -\log \sum_h \exp (-E(x, h)) \\
     &= -a^\top x - \sum_j \log (1 + \exp(W^{\top}_jx + b_j))\,.
\end{align}
$$

We have an tractable expression for the conditional probabilities
$$
\Pr (h|x) = \prod_i \Pr (h_i | x)
$$

In [1]:
import os
print(os.environ['SHELL'])


/bin/bash


In [2]:
import matplotlib.pyplot as plt
from importlib import reload
import torch
import torch.nn.functional as F
from torchvision import datasets, transforms
from torchvision.utils import make_grid

from rbm import RBM
import libs
from libs import train, show_and_save

Make some configurations

In [8]:
batch_size = 64 # batch size
n_epochs = 12 # number of epochs
lr = 0.001 # learning rate
n_hid = 128 # number of neurons in the hidden layer
n_vis = 784 # input size
k = 1 # number of contrastive divergence steps during training

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print('device:', device)

device: cuda


Create a RBM model

In [5]:
# create a Restricted Boltzmann Machine
model = RBM(n_vis=n_vis, n_hid=n_hid, k=k).to(device)


Prepare the data set

In [6]:
train_loader = torch.utils.data.DataLoader(
    datasets.MNIST('./output', train=True, download=True,
                   transform=transforms.Compose([
                       transforms.ToTensor()
                   ])),
    batch_size=batch_size,
    num_workers=4,
    pin_memory=True if device == "cuda" else False,
)

Then train the model.

In [9]:
epoch_loss = train(model, train_loader, device, n_epochs=n_epochs, lr=lr)

plt.plot(epoch_loss)
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title(f'Training Loss (n_hid={n_hid}, n_epochs={n_epochs}, lr={lr}, k={k})')
custom_path = f'rbm_{n_hid}_{n_epochs}_{lr}_{k}'
plt.savefig('plots/' + custom_path + '.png')
plt.close()

filepath = 'weights/'
filepath += custom_path + '.pt'
torch.save(model.state_dict(), filepath)

  0%|          | 0/12 [00:22<?, ?it/s]


KeyboardInterrupt: 

Now, we can do inpainting using the saved weights. We can treat this task as conditioned generation. At every step of gibbs sampling in the CD-k algo, we can convert the observed pixels back to their true values.

In [None]:
batch_size = 64 # batch size
n_hid = 128 # number of neurons in the hidden layer
n_vis = 784 # input size
k = 1000 # number of contrastive divergence steps during inference

# this will correspond to the test set where the only the top half of the image
# is observed and the task is to predict the bottom half of the image
# since this is grayscale, we will be using accuracy as the metric

test_loader = torch.utils.data.DataLoader(
    datasets.MNIST('./output', train=False, download=True,
                   transform=transforms.Compose([
                       transforms.ToTensor()
                   ])),
    batch_size=batch_size
)

model = RBM(n_vis=n_vis, n_hid=n_hid, k=k)
model.load_state_dict(torch.load(filepath))
model.to(device)

NameError: name 'filepath' is not defined

In [None]:
# can play around with this k during test time
reload(libs)
acc = libs.test_for_inpaint(model, test_loader, device, random_values=True,k=k)

In [None]:
reload(libs)
acc = libs.test_for_inpaint(model, test_loader, device, random_values=False,k=k)

In [None]:
reload(libs)
# print('1', torch.cuda.memory_summary())
torch.cuda.empty_cache()
# print('2', torch.cuda.memory_summary())
acc = libs.test_for_inpaint(model, test_loader, device, random_values=True,k=50000, plot=True)
acc = libs.test_for_inpaint(model, test_loader, device, random_values=False,k=1000, plot=True)

Ignore the rest of the stuff below for now :)

In [None]:
images = next(iter(train_loader))[0]
v, v_gibbs = model(images.view(-1, 784))

In [None]:
# show the real images
show_and_save(make_grid(v.view(batch_size, 1, 28, 28).data), 'output/real')

In [None]:
# show the generated images
show_and_save(make_grid(v_gibbs.view(batch_size, 1, 28, 28).data), 'output/fake')

How one image is factorized through the hidden variables

In [None]:
n_sample = 4
kth = 18
d = images[kth:kth+1]

V = torch.sigmoid(F.linear(d.view(1, -1), model.W, model.h))
v, o = torch.sort(V.view(-1))

fig, ax = plt.subplots(1, n_sample + 1, figsize=(3*(1 + n_sample),3))
ax[0].imshow(d.view(28, 28).numpy(), cmap='gray')
ax[0].set_title('Original image')

for k, i in enumerate(o[-n_sample:].numpy()):
    f = model.W[i].view(28, 28).data.numpy()
    ax[k + 1].imshow(f, cmap='gray')
    ax[k + 1].set_title('p=%.2f'% V[0][i].item())
    
plt.savefig('output/factor.png', dpi=200)