<h1><center><span style='color:blue'> MNIST: Full network with patch-centering   </center></h1></span> 

We here try to replicate the experiments as detailed in Mairal and al's 2014 paper on Convolutional Kernel Methods.

The convolutional network is a __CKN__ object which comprises of $K$ different layers (or cells, since these cells are all trained independently of the others). Each cell is a __ Cell__ object, parameterized by a number of filters (the greater the number of filters, the better the linear kernel approximation).

When training the network, each cell is trained as follows:

__INPUT__: input_map: input data (contrast-normalized(?) image patches for instance for MNIST)
+ (1) extract patches from input map (by concatenating neighboring pixel input). Normalize the patches so that they all have unit norm. (Keep the orginal norms in memory, in Cell.norms).
+ (2) apply dimensionality reduction to these large patches (RobustScaler+PCA), so that the whole thing remains computable
+ (3) initalize new filters with Kmeans
+ (4) train W and eta (parameters of the cell) on a subset of the data. The function check_convergence() allows to check how well we are doing in terms of approximating the kernel.
+ (5) call get_activation_map() to compute the activation of each patch

__OUTPUT__: activation map of patches centered at each pixel with respect to the filter W

From there, to compute the activation map of any new image, we simply have to:
+ (1) extract the patches from the image
+ (2) call propagate_through_network(X). This function at each layer applies the Kernel to compute similarities between patches. 

Here, we try to first center the MNIST data before applying the filter.

In [1]:
%matplotlib inline
from __future__ import print_function
import argparse
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.autograd import Variable
from CKN import *
from Nystrom import *
from image_processing_utils import *
import scipy.optimize

In [2]:
batch_size=64
test_batch_size=1000

train_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=True, download=True,
                   transform=transforms.Compose([
                       transforms.ToTensor(),
                       transforms.Normalize((0.1307,), (0.3081,))
                   ])),
    batch_size=batch_size, shuffle=True)
test_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=False, transform=transforms.Compose([
                       transforms.ToTensor(),
                       transforms.Normalize((0.1307,), (0.3081,))
                   ])),
    batch_size=test_batch_size, shuffle=True)

In [3]:
inds=range(60000)
import random
random.seed(2017)
random.shuffle(inds)

test=train_loader.dataset.train_data[inds[:3000],:,:]

In [4]:
X_labels=train_loader.dataset.train_labels[torch.LongTensor(inds[:3000])]

In [5]:
X_labels


 0
 9
 6
⋮ 
 1
 1
 5
[torch.LongTensor of size 3000]

In [6]:
import seaborn as sb
import math
import numpy as np


In [7]:
data=test
data=1.0/255*data.type(torch.FloatTensor)
#data2=data-torch.mean(data.view(-1,data.size()[1]*data.size()[2]),0).view(28,28)
#S=torch.std(data2.view(-1,data.size()[1]*data.size()[2]),0)
#S[S==0]=1
#data2=1.0/255*data2.view(-1,data.size()[1]*data.size()[2])
#data2=data2.view(-1,28,28)


In [8]:
### contrast normalize
n_d,p_dim,_=data.size()
#norm=torch.max(torch.sum(data.view(n_d,-1)**2,1),torch.Tensor([0.00001]))
#data=torch.div(data.view(n_d,-1),norm.view(n_d,1).expand(n_d,p_dim**2))

In [9]:
data=data.view(n_d,p_dim,p_dim)

In [10]:
net=CKN()
size_patch=5
net=CKN(n_components=[50,200],n_layers=2,iter_max=50,n_patches=[size_patch,2],subsampling_factors=[2,2],batch_size=[60,60])#Cell.fit_LBFGS(X)
net.train_network(data)

torch.Size([3000, 28, 28])
patches extracted with size  torch.Size([784, 3000, 25])
patches normalized extracted with size  torch.Size([784, 3000, 25])
Training patches have been standardized and have size torch.Size([784, 3000, 25])
[[0, 366, 642], [0, 12, 326], [0, 471, 660], [0, 524, 720], [0, 721, 223], [0, 47, 559], [0, 173, 695], [0, 335, 280], [0, 92, 49], [0, 506, 196]]
('sel', torch.Size([90000, 2, 25]))
('The data has size', torch.Size([784, 3000, 25]))
('all okay', True)
('The variance is: ', 5.4945039749145508)
('size input: ', torch.Size([90000, 2, 25]))
Epoch: 0  Mean loss: 
1.00000e-02 *
  1.9249
[torch.FloatTensor of size 1]
  time per epoch:  16.1194081306
Epoch: 1  Mean loss: 
1.00000e-03 *
  6.1131
[torch.FloatTensor of size 1]
  time per epoch:  16.2693510056
Epoch: 2  Mean loss: 
1.00000e-03 *
  4.6325
[torch.FloatTensor of size 1]
  time per epoch:  19.1869580746
Epoch: 3  Mean loss: 
1.00000e-03 *
  3.7634
[torch.FloatTensor of size 1]
  time per epoch:  12.40934

KeyboardInterrupt: 

In [None]:
print(net.Kernel[0].eta.data)
self=net
sb.heatmap(net.Kernel[0].W.data.numpy())
plt.figure()

In [None]:
net.Kernel[0].convergence_check()

In [None]:
net.Kernel[1].convergence_check()

In [None]:
plt.plot([ u.numpy() for u in net.Kernel[0].training_loss])
#plt.plot([ u.numpy() for u in net.Kernel[1].training_loss],c='yellow')

In [None]:
print(net.Kernel[0].output.size())
sb.heatmap(net.Kernel[0].output[100,:100,:].numpy())
print(X_labels[:10])

In [None]:
# Import datasets, classifiers and performance metrics
from sklearn import datasets, svm, pipeline
from sklearn.kernel_approximation import (RBFSampler,
                                          Nystroem)
from sklearn.decomposition import PCA



#X=net.Kernel[0].output
X=net.Kernel[0].get_activation_map()
n_p,n_d,p_dim=X.size()
print(X.size())
X=X.permute(1,0,2)
print(X.size())

pca_mnis=PCA(n_components=15,whiten=True)
pca_mnis.fit(X.contiguous().view(n_d, n_p*p_dim).numpy())
print(X.contiguous().view(n_d, n_p*p_dim).size())

print(X.contiguous().size())
X2=pca_mnis.transform(X.contiguous().view(n_d, n_p*p_dim).numpy())

kernel_svm = svm.SVC(gamma=0.2)

kernel_svm.fit(X2, X_labels.numpy())
kernel_svm_score = kernel_svm.score(X2, X_labels.numpy())
print(kernel_svm_score)



In [None]:
X_test=train_loader.dataset.train_data[torch.LongTensor(inds[3000:6000])]
X_test_labels=train_loader.dataset.train_labels[torch.LongTensor(inds[3000:6000])]

X_test=1.0/255*X_test.type(torch.FloatTensor)
print(X_test.size())
norm_test=torch.max(torch.sum(X_test.view(n_d,-1)**2,1),torch.Tensor([0.00001]))
print(norm_test.size())
n_d,p,_=X_test.size()
normalize=False
if normalize:
    X_test=torch.div(X_test.view(n_d,-1),norm_test.view(n_d,1).expand(n_d,p**2))
print(X_test.size())

In [None]:
#X_test=torch.div(X_test.view(n_d,-1),norm_test.view(n_d,1,1).expand(n_d,X_test.size()[1]*X_test.size()[2]))
X_test=X_test.view(n_d,p,p)
print(X_test.size())
#X_test=extract_patches_from_image(X_test,size_patch)
#print(X_test.size())
#n_p2,n_d,_,_=X_test.size()
#X_test=X_test.view(n_p2,n_d,-1)
print(X_test.size())
output_test=net.propagate_through_network(X=X_test,patches_given=False)

#n_p,n_d,p_dim=X_test.size()
XX=output_test
print( 'XX size',XX.size())
XX=XX.permute(1,0,2)
print(XX.size())
X_test1=pca_mnis.transform(X.contiguous().view(n_d, n_p*p_dim).numpy())
X_test=pca_mnis.transform(XX.contiguous().view(n_d, n_p*p_dim).numpy())

kernel_svm_score = kernel_svm.score(X_test, X_test_labels.numpy())
print(kernel_svm_score)

In [None]:
input_map=net.Kernel[0].output

In [None]:
norms=input_map.norm(p=2,dim=2)
input_map=normalize_output(input_map)

In [None]:
input_map2=net.Kernel[1].get_activation_map( X=input_map,norms=norms,verbose=True
                                           )

In [None]:
input_map2=net.Kernel[1].standardize.transform(input_map.view(n_p*n_d,-1).numpy())

In [None]:
input_map.view(n_p*n_d,-1).size()