# LeNet5 NN for image classfication on MNIST

### Setup:

- Vi har trænet et LeNet5 netværk til at classificere MNIST cifre, lad os kalde det for NN.

$$ NN:~\mathbb{R}^{784} \rightarrow \mathbb{R}^{10} $$

- Til at lave predictions bruger vi softmax, $\pi(x)$, på outputtet af netværket og får dermed en sandsynlighedsvektor ud.

Dermed bliver 

$$ P(y_n|\theta) = Cat(y_n|\pi(NN(x_n))) = \prod_{i=1}^{10} \pi_{i}(f(x_n))^{y_{n,i}} \in [0,1] $$

Og

$$ P(y|NN) = \prod_{n=1}^N Cat(y_n|\pi (NN(x_n))) $$

Vi ønsker at lave BBVI i dette setup, men på et underrum da vi antager at antal parametre/vægte er for mange til at dette kan beregnes på rimelig tid.

Vi ønsker at bruge følgende variational family (mean field gaussians):

$$ Q =\{ N(m,V)|m\in \mathbb{R}^K, V\in \mathbb{M}^{Diag~KxK},~v_i>0\} $$

ELBO ser da således ud:

$$ L(q) = \mathbb{E}_{q(z)}[log~P(y,z)] - \mathbb{E}_{q(z)}[log~q(z)], z\in\mathbb{R}^K $$

Sidste led (entropien) kan regnes analytisk (Vist i aflevering 3):

$$ H(q) = - \mathbb{E}_{q(z)}[log~q(z)] = \frac{1}{2} \sum_{i=1}^K log(2e\pi v_i) $$

Første led regnes med MC estimater af $z^s$

$$ \mathbb{E}_{q(z)[log~P(y,z)]} \approx \frac{1}{S} \sum_{s=1}^S \sum_{n=1}^N log~P(y_n|z^s) + \frac{1}{S}\sum_{s=1}^S log~P(z^s), $$

hvor 

$$ P(y_n|z^s) \sim Cat(y_n| \pi(NN_{\theta^s}(x_n))), ~~\theta^s=\theta_{MAP} + Pz $$

og $$ NN_{\theta^s} $$ angiver det neurale netværk med $\theta^s$ som vægte.

In [1]:
from utils import AdamOptimizer, VariationalInference, BlackBoxVariationalInference
import autograd.numpy as np
import torch 
from torchvision.datasets import MNIST
from models.LeNet5 import LeNet

# load the MNIST dataset
mnist_train = MNIST('./datasets', train=True, download=True)
mnist_test = MNIST('./datasets', train=False, download=True)

# load the data
xtrain = mnist_train.train_data
ytrain = mnist_train.train_labels
xtest = mnist_test.test_data
ytest = mnist_test.test_labels

# normalize the data
xtrain = xtrain.float()/255
xtest = xtest.float()/255

#insert a channel dimension
xtrain = xtrain.unsqueeze(1)
xtest = xtest.unsqueeze(1)

#print shapes
print(xtrain.shape)

torch.Size([60000, 1, 28, 28])




In [4]:
log_npdf = lambda x, m, v: -(x-m)**2/(2*v) - 0.5*np.log(2*np.pi*v)
log_mvnpdf = lambda x, m, v: -0.5*np.sum((x-m)**2/v + np.log(2*np.pi*v), axis=1)
softmax = lambda x: np.exp(x) / np.sum(np.exp(x), axis=1)[:, None]

def new_weights_in_NN(model, new_weight_vector):
    current_index = 0
    # Iterate over each parameter in the model
    for param in model.parameters():
        num_params = param.numel() # number of elements in the tensor
        new_weights = new_weight_vector[current_index:current_index + num_params].view_as(param.data) # reshape the new weights to the shape of the parameter tensor
        param.data.copy_(new_weights) # copy the new weights to the parameter tensor
        current_index += num_params # update the current index

    return model

def log_prior_pdf(z, prior_mean, prior_var):
    """ Evaluates the log prior Gaussian for each sample of z. 
        D denote the dimensionality of the model and S denotes the number of MC samples.

        Inputs:
            z             -- np.array of shape (S, 2*K)
            prior_mean    -- np.array of shape (S, K)
            prior_var     -- np.array of shape (S, K)

        Returns:
            log_prior     -- np.array of shape (1,)???
       """
    log_prior = np.sum(log_npdf(z, prior_mean, prior_var), axis=1)
    return log_prior

def log_like_NN_classification(X, y, theta_s):
    """
    Implements the log likelihood function for the classification NN with categorical likelihood.
    S is number of MC samples, N is number of datapoints in likelihood and D is the dimensionality of the model (number of weights).

    Inputs:
    X              -- Data (np.array of size N x D)
    y              -- vector of target (np.array of size N)
    theta_s        -- vector of weights (np.array of size (S, D))

    outputs: 
    log_likelihood -- Array of log likelihood for each sample in z (np.array of size S)
     """
    S = theta_s.shape[0]
    #net = LeNet()
    log_likelihood = 0
    for i in range(S):
        print("Shape of theta_s[i]:", theta_s[i].shape)
        net_s = new_weights_in_NN(net, torch.tensor(theta_s[i]).float())
        outputs = softmax(net_s(X).detach().numpy())
        #categorical log likelihood
        log_likelihood += np.sum(np.log(outputs[np.arange(len(y)), y]))
    
    return log_likelihood / S

#class BlackBoxVariationalInference(VariationalInference):
#    def __init__(self, theta_map, P, log_prior, log_lik, num_params, step_size=1e-2, max_itt=2000, num_samples=20, batch_size=None, seed=0, verbose=False):



In [7]:
net = LeNet()
#load weights
weights = torch.load('checkpoints\LeNet5_acc_95.12%.pth')
theta_map = torch.cat([w.flatten() for w in weights.values()]) # flatten the weights
theta_map = theta_map.detach().numpy()

# settings
num_params = sum(p.numel() for p in net.parameters())
print('Number of parameters:', num_params)
K = 10
P = torch.randn(num_params, K).numpy() # random matrix from normal distribution
max_itt = 1
step_size = 5e-2
num_samples = 20
batch_size = 100
seed = 0
verbose = True

bbvi = BlackBoxVariationalInference(theta_map, P, log_prior_pdf, log_like_NN_classification, K, step_size, max_itt, num_samples, batch_size, seed, verbose)
bbvi.fit(xtrain, ytrain)


Number of parameters: 61706
Fitting approximation using Black-box VI
shapes: (20, 10) (10, 61706) (61706,)
Shape of theta_s[i]: (61706,)
Shape of theta_s[i]: (61706,)
Shape of theta_s[i]: (61706,)
Shape of theta_s[i]: (61706,)
Shape of theta_s[i]: (61706,)
Shape of theta_s[i]: (61706,)
Shape of theta_s[i]: (61706,)
Shape of theta_s[i]: (61706,)
Shape of theta_s[i]: (61706,)
Shape of theta_s[i]: (61706,)
Shape of theta_s[i]: (61706,)
Shape of theta_s[i]: (61706,)
Shape of theta_s[i]: (61706,)
Shape of theta_s[i]: (61706,)
Shape of theta_s[i]: (61706,)
Shape of theta_s[i]: (61706,)
Shape of theta_s[i]: (61706,)
Shape of theta_s[i]: (61706,)
Shape of theta_s[i]: (61706,)
Shape of theta_s[i]: (61706,)
shapes: (20, 10) (10, 61706) (61706,)
Shape of theta_s[i]: (61706,)


  softmax = lambda x: np.exp(x) / np.sum(np.exp(x), axis=1)[:, None]


TypeError: object of type 'numpy.float64' has no len()