# KRR on ICTUS dataset after projection onto NN centers

Simple neural networks achieve good performance on this dataset. We take a set of 7 centers from a sigmoid NN and project the whole data on these centers (effectively the centers are a dicionary learned using label supervision).

Then applying KRR on the projected data achieves similar results of the neural net.

We also run the same experiment taking as centers, the points closest to the NN centers in the dataset. This nearest neighbor analysis is done in the `DataDistance` notebook.
The result is that nearest-neighbor centers **do not behave the same as the neural-net centers**.

In [1]:
%load_ext autoreload
%autoreload 2

%matplotlib notebook

In [5]:
import sys
sys.path.append("../../PyFalkon/src")
import time
import math

import numpy as np
import pandas as pd
import scipy
from scipy.spatial import distance
import matplotlib.pyplot as plt
from matplotlib import cm

from sklearn import model_selection, preprocessing
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE
from sklearn.model_selection import train_test_split, KFold


from falkon import Falkon 
from nystrom import select_uniform_centers
from kernels import *
from utils import load_mat_data

## Data Load

In [10]:
fname = "../run_all.mat"
X, Y = load_mat_data(fname)
Y[Y == 0.0] = -1.0

## Train/Test Split

In [11]:
kf = KFold(n_splits=5)
scaler = preprocessing.StandardScaler(copy=False, with_mean=True, with_std=True)

## Using Trained NN Weights

We train a simple neural net as specified below
```python
 nn.Sequential(
    nn.Linear(992, 7),
    nn.Sigmoid(),
    nn.Linear(7, 1))
```
which achieves 3,08% test error.

Other parameters:
 - batch size: 128
 - loss: L2
 - lr: 2e-3
 - opt: Adam (no weight decay)
see the file `neural_test.py` for more details.

We extract the first layer weights of the trained NN (called `W0` below, of shape $7\times992$) and apply them to the original data as a dimensionality-reduction / feature extraction step.

We then train Falkon on the resulting data.
The results are heavily dependent on sigma, but with $\sigma = 1$ we find that we achieve results on par with the neural net (i.e. 3.42% test error).

In [9]:
weights = scipy.io.loadmat("nn_weights7sigmoid.mat")
W0 = weights["W0"]
W1 = weights["W1"]
print(W0.shape)
print(W1.shape)

(7, 992)
(1, 7)


In [13]:
M = 5000
sigma = 1
l = 1e-15
kernel = GaussianKernel(sigma)
np.random.seed(34)

F = Falkon(kernel, l, M=M, maxiter=20, max_ram=1*2**30)

In [15]:
train_err, test_err = [], []
for train, test in kf.split(X):
    X_train, X_test, Y_train, Y_test = X[train], X[test], Y[train], Y[test]
    scaler.fit(X_train, X_test)
    X_train = scaler.transform(X_train)
    X_test = scaler.transform(X_test)
    
    X_train_nn = np.matmul(X_train, W0.T)
    X_test_nn = np.matmul(X_test, W0.T)
    
    F.fit(X_train_nn, Y_train)
    train_pred = F.predict(X_train_nn)
    test_pred = F.predict(X_test_nn)
    train_err.append(np.mean(np.sign(train_pred) != Y_train))
    test_err.append(np.mean(np.sign(test_pred) != Y_test))

.....................................................................................................



In [16]:
print("Train error: %.2f%% - Test error: %.2f%%" % 
      (np.mean(train_err)*100, np.mean(test_err)*100))

Train error: 0.025647 - Test error: 0.033170


## Test: Replace NN centers by their nearest neighbors

The replacement centers are data-points identified in the `DataDistance` notebook:
`9398, 12645, 13891, 9888, 29274, 6965, 20633`

In [25]:
from sklearn.preprocessing import scale
Xscaled = scale(X, axis=0, with_mean=True, with_std=True)
center_idx = [9398, 12645, 13891, 9888, 29274, 6965, 20633]
Xcenters = Xscaled[center_idx]
Xrest = np.delete(X, np.array(center_idx), axis=0)
Yrest = np.delete(Y, np.array(center_idx), axis=0)

In [45]:
M = 5000
sigma = 25
l = 1e-15
kernel = GaussianKernel(sigma)
np.random.seed(34)

F = Falkon(kernel, l, M=M, maxiter=30, max_ram=1*2**30)

In [46]:
train_err, test_err = [], []
for train, test in kf.split(Xrest):
    X_train, X_test, Y_train, Y_test = Xrest[train], Xrest[test], Yrest[train], Yrest[test]
    scaler.fit(X_train, X_test)
    X_train = scaler.transform(X_train)
    X_test = scaler.transform(X_test)
    
    X_train_proj = np.matmul(X_train, Xcenters.T)
    X_test_proj = np.matmul(X_test, Xcenters.T)
    
    F.fit(X_train_proj, Y_train)
    train_pred = F.predict(X_train_proj)
    test_pred = F.predict(X_test_proj)
    train_err.append(np.mean(np.sign(train_pred) != Y_train))
    test_err.append(np.mean(np.sign(test_pred) != Y_test))

.....................

KeyboardInterrupt: 

In [None]:
print("Train error: %.2f%% - Test error: %.2f%%" % 
      (np.mean(train_err)*100, np.mean(test_err)*100))