In [None]:
import numpy as np
import matplotlib.pyplot as plt

In this tutorial we will explain how to integrate deep learning models into your Brancher pipline. 

## Building neural network models

All ```pytorch``` functions can be used in Brancher. These functions need to be imported from the Brancher `function` module which contains all ```pytorch``` functions acting on ```torch.Tensor``` in the form of Brancher functions acting on ```brancher.Variable```.

We will define a stochastic convolutional network on the MNIST dataset. The first step is to import the dataset:

In [None]:
import torchvision

# Create the data
image_size  = 28
num_classes = 10

train = torchvision.datasets.MNIST(root='./data',   train=True,  download=True, transform=None)
test  = torchvision.datasets.MNIST(root='./dataSo', train=False, download=True, transform=None)

dataset_size   = len(train)
input_variable = np.reshape(train.train_data.numpy(), newshape=(dataset_size, 1, image_size, image_size))
output_labels  = train.train_labels.numpy()

In a Brancher model, datasets are stored in empirical variables. These are random variables that sample minibatches from a dataset.

In a supervised problem we need two empirical variables, one for the input images and the other for the labels. However, these two variables need to be sampled jointly as each image should be associated with its correct label. In Brancher, we can implement this by generating a random variable ```RandomIndices```, and use it as ```indices``` for images and labels: 

In [None]:
from brancher.standard_variables import EmpiricalVariable as Empirical
from brancher.standard_variables import RandomIndices

# Data sampling model
minibatch_size = 7

minibatch_indices = RandomIndices( dataset_size=dataset_size, batch_size=minibatch_size, 
                                   name="indices", is_observed=True )

x = Empirical( input_variable, indices=minibatch_indices, 
               name="x", is_observed=True )

labels = Empirical( output_labels, indices=minibatch_indices, 
                    name="labels", is_observed=True )

As a next step we import the `pytorch` function `conv2d` from `brancher.functions` and use it to build a stochastic 2D-convolutional layer with Gaussian weights. 

In order to do this we define the weights as a `NormalVariable`. We then use the Brancher `conv2d` and `relu` functions on the weights `Wk` and images `x`, and wrap the result of the layer `z` inside a `DeterministicVariable`.

In [None]:
from brancher import functions as BF

from brancher.standard_variables import DeterministicVariable as Deterministic
from brancher.standard_variables import NormalVariable as Normal
from brancher.standard_variables import CategoricalVariable as Categorical

in_channels  = 1
out_channels = 5
image_size   = 28

# Define Gaussian convolutional kernels:
Wk = Normal( loc=np.zeros((out_channels, in_channels, 3, 3)),
             scale=np.ones((out_channels, in_channels, 3, 3)),
             name="Wk")

# Define output: 
z = Deterministic( BF.relu(BF.conv2d(x, Wk, padding=1)), name="z" )

The randomized input image `x` is convolved with the random convolutional filter weights `Wk`. We can now run the forward pass by sampling from the model:

In [None]:
num_samples = 6
z.get_sample(num_samples)["z"]

Note that in each of these samples both the input and the weigths are sampled independently.

We can now add a linear layer to this result to get a shallow convolutional classifier. We can do this by again defining random parameter variables as follows: 

In [None]:
num_classes = 10
Wl = Normal( loc   = np.zeros((num_classes, image_size*image_size*out_channels)),
             scale = np.ones((num_classes, image_size*image_size*out_channels)),
             name  = "Wl")

b  = Normal( loc   = np.zeros((num_classes, 1)),
             scale = np.ones((num_classes, 1)),
             name  = "b")

reshaped_z = BF.reshape(z, shape=(image_size*image_size*out_channels, 1))

k  = Categorical( logits = BF.linear(reshaped_z, Wl, b), 
                  name="k" )

We had to reshape (flatten) the variable `z` to use it as input to the linear layer. Note that in Brancher you never need to explicitly consider the batch dimension. Batch properties are part of the data, not of the model and Brancher will handle them automatically!

### Observing the model and training the weights ##

Now that the model is defined we need to specifiy which variables are observed. The input image variable `x` was set up to be observed during model definition. The other variable to observe is the output `k` which needs to be observed using the real labels. To this aim, we simply need to call the `.observe` method on `k` with the label `EmpiricalVariable` as input.

In [None]:
k.observe(labels)

Done! We are now ready to learn the weights. If we are not concerned with quantifying uncertainty, we can train using maximum-a-posteriori (MAP). 

In [None]:
from brancher.inference import MAP
from brancher.inference import perform_inference
from brancher.variables import ProbabilisticModel

convolutional_model = ProbabilisticModel([k])

perform_inference( convolutional_model,
                   inference_method=MAP(),
                   number_iterations=600, #500
                   optimizer="Adam",
                   lr=0.005 )
loss_list = convolutional_model.diagnostics["loss curve"]
plt.plot(loss_list)

We can now run the model on the test set by sampling the newly trained posterior distribution. To this aim we need to call the `.get_posterior_samples` method since `.get_sample` will result on a sample from the (untrained) prior. We also need to provide the test images as input to the this method:

In [None]:
test_size = test.test_data.numpy().shape[0]
test_images = np.reshape(test.test_data.numpy(), newshape=(test_size, 1, image_size, image_size))

posterior_samples = convolutional_model.get_posterior_sample(num_samples = 1, 
                                                             input_values = {x: test_images[0:4,:]})

The output is one-hot encoded. Let's get the predicted labels:

In [None]:
np.argmax(posterior_samples["k"][0], axis=1)

## Using existing PyTorch models in a Brancher model

We have seen how we can construct Brancher models with neural network building blocks. However, there are situations where we want to specify a `pytorch` model and use it within Brancher. In this case we can wrap the `pytorch` model using the `brancher.functions.BrancherFunction` class. 

Let's start by defining a `pytorch` neural network:

In [None]:
import torch

class PytorchNetwork(torch.nn.Module):
    def __init__(self):
        super(PytorchNetwork, self).__init__()
        out_channels = 5
        image_size = 28
        self.l1 = torch.nn.Conv2d(in_channels=1, out_channels=out_channels, kernel_size=3, padding=1)
        self.f1 = torch.nn.ReLU()
        self.l2 = torch.nn.Linear(in_features=image_size ** 2 * out_channels, out_features=10)

    def __call__(self, x):
        h = self.f1(self.l1(x))
        h_shape = h.shape
        h = h.view((h_shape[0], np.prod(h_shape[1:])))
        logits = self.l2(h)
        return logits
    
network = PytorchNetwork()

We can convert it into a brancher function via `BrancherFunction`: 

In [None]:
## Equivalent Brancher model ##
brancher_network = BF.BrancherFunction(network)

However note that here we **do not** create latent variables. Instead we are treating the network as a black-box **function** with learnable parameters (learned via standard `pytorch`). We can not use Bayesian inference methods on the weights. You can learn this black-box-function via maximal likelihood. 

We can construct a full Brancher model as follows: 

In [None]:
# Data sampling model #
minibatch_size = 4
minibatch_indices = RandomIndices( dataset_size=dataset_size, batch_size=minibatch_size,
                                   name="indices", is_observed=True )
x = Empirical( input_variable, indices=minibatch_indices,
               name="x", is_observed=True )
labels = Empirical( output_labels, indices=minibatch_indices,
                    name="labels", is_observed=True )

# Forward model #
k = Categorical( logits=brancher_network(x),
                 name="k" )
k.observe(labels)

We can now train using `MaximumLikelihood`:

In [None]:
from brancher.inference import MaximumLikelihood
from brancher.inference import perform_inference
from brancher.variables import ProbabilisticModel

convolutional_model = ProbabilisticModel([k])

perform_inference( convolutional_model,
                   inference_method=MaximumLikelihood(),
                   number_iterations=500,
                   optimizer="Adam",
                   lr=0.001 )
loss_list = convolutional_model.diagnostics["loss curve"]
plt.plot(loss_list)

Note that `MaximumLikelihood` trains the probabilistic model itself (the prior in Bayesian terms) and not the posterior model. Therefore, we can test the model by calling the `.get_sample` method insteadòf `.get_posterior_sample` (the posterior is not even defined in this model as we do not have latent variables):

In [None]:
test_size = test.test_data.numpy().shape[0]
test_images = np.reshape(test.test_data.numpy(), newshape=(test_size, 1, image_size, image_size))

np.argmax(convolutional_model.get_sample(1, input_values= {x: test_images[0:4,:]})["k"][0], axis=1)