**Testing L-BFGS Optimizer**

The original thesis by Eric Wulff primarily investigated the usage of the Adam optimizer with weight decay for the purposse of training the Autoencoder. In this notebook I investigate the usage of the L-BFGS optimizer to achieve the same. It should however be noted that I have not performed any extensive research on the same, and hence this method naturally warrants more investigation, the importance of which will be determined in this notebook. 

**Importing and preparing the dataset**

In [1]:
#Import necesary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import numpy as np

In [2]:
#Attach the csv files prepared in the previous notebook and import then as DataFrames
gauss_x = pd.read_csv('/Dataset/Rank Gauss.csv')
standard_x = pd.read_csv('/Dataset/Standard.csv')

In [3]:
#Split the datasets into the train, validation and test sets
gauss_x_train, gauss_x_test, standard_x_train, standard_x_test = train_test_split(gauss_x, standard_x, train_size=0.8,test_size=0.2, random_state=42)
gauss_x_train, gauss_x_valid, standard_x_train, standard_x_valid = train_test_split(gauss_x_train, standard_x_train, train_size=0.8,test_size=0.2, random_state=101)

In [4]:
train_standard = standard_x_train
test_standard = standard_x_valid
train_gauss = gauss_x_train
test_gauss = gauss_x_valid

In [5]:
train_standard.pop('Unnamed: 0')
test_standard.pop('Unnamed: 0')
train_gauss.pop('Unnamed: 0')
test_gauss.pop('Unnamed: 0')

5118    5118
3626    3626
869      869
2877    2877
4460    4460
        ... 
946      946
5583    5583
4811    4811
649      649
5561    5561
Name: Unnamed: 0, Length: 993, dtype: int64

In [6]:
#We have to manually convert the datasets to floats from doubles for training
for i in train_standard.columns:
  train_standard[i] = train_standard[i].astype('float32')
for i in test_standard.columns:
  test_standard[i] = test_standard[i].astype('float32')
for i in train_gauss.columns:
  train_gauss[i] = train_gauss[i].astype('float32')
for i in test_gauss.columns:
  test_gauss[i] = test_gauss[i].astype('float32')

#Save the mean and standard deviation of the datasets for plotting
train_mean_standard = train_standard.mean()
train_std_standard = train_standard.std()
train_mean_gauss = train_gauss.mean()
train_std_gauss = train_gauss.std()

In [7]:
standard_x_test.pop('Unnamed: 0')
gauss_x_test.pop('Unnamed: 0')

for i in standard_x_test.columns:
  standard_x_test[i] = standard_x_test[i].astype('float32')
for i in gauss_x_test.columns:
  gauss_x_test[i] = gauss_x_test[i].astype('float32')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  import sys


**Building the Autoencoder**

In [8]:
%matplotlib inline

import sys
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import torch.nn as nn
import torch.optim as optim
import torch.utils.data
from torch.autograd import Variable
from torch.utils.data import Dataset

from torch.optim import Adam, LBFGS

from fastai import data_block, basic_train, basic_data
from fastai.callbacks import ActivationStats
from fastai import train as tr
import fastai

import matplotlib as mpl

In [9]:
import torch

from torch.utils.data import TensorDataset
from torch.utils.data import DataLoader

In [10]:
class AutoEncoder(nn.Module):
    def __init__(self):
        super(AutoEncoder, self).__init__()
        self.en1 = nn.Linear(4, 200)
        self.en2 = nn.Linear(200, 100)
        self.en3 = nn.Linear(100, 50)
        self.en4 = nn.Linear(50, 3)
        self.de1 = nn.Linear(3, 50)
        self.de2 = nn.Linear(50, 100)
        self.de3 = nn.Linear(100, 200)
        self.de4 = nn.Linear(200, 4)
        self.tanh = nn.Tanh()

    def encode(self, x):
        x = self.en1(x)
        x = self.tanh(x)
        x = self.en2(x)
        x = self.tanh(x)
        x = self.en3(x)
        x = self.tanh(x)
        x = self.en4(x)
        return x

    def decode(self, x):
        x = self.tanh(x)
        x = self.de1(x)
        x = self.tanh(x)
        x = self.de2(x)
        x = self.tanh(x)
        x = self.de3(x)
        x = self.tanh(x)
        x = self.de4(x)
        return x

    def forward(self, x):
        z = self.encode(x)
        return self.decode(z)

In [19]:
class AE_LBFGS(AutoEncoder):
  def __init__(self, loss):
    AutoEncoder.__init__(self)
    self.lossFct = loss 
    self.optim = None

  def train_lbfgs(self, data_loader, epochs, validation_data=None):
    for epoch in range(epochs):
      running_loss = self._train_iteration(data_loader)
      val_loss = None
      if validation_data is not None:
        y_hat = self(validation_data['X'])
        val_loss = self.lossFct(input=y_hat, target=validation_data['y']).detach().numpy()
        print('{}/{} Epoch | loss: {:E} | validation loss: {:E}'.format(str(epoch + 1), str(epochs), running_loss, val_loss))
      else:
        print('[%d] loss: %.3f' % (epoch + 1, running_loss))


  def _train_iteration(self,data_loader):
        running_loss = 0.0
        for i, (X,y) in enumerate(data_loader):
            X = X.float()
            y = y.unsqueeze(1).float()
            X_ = Variable(X, requires_grad=True)
            y_ = Variable(y)
            ### Comment out the typical gradient calculation
#             pred = self(X)
#             loss = self.lossFct(pred, y)
#             self.optim.zero_grad()
#             loss.backward()
            ### Add the closure function to calculate the gradient.
            def closure():
                if torch.is_grad_enabled():
                    self.optim.zero_grad()
                output = self(X_)
                loss = self.lossFct(output, y_)
                if loss.requires_grad:
                    loss.backward()
                return loss
            self.optim.step(closure)
            
            # calculate the loss again for monitoring
            output = self(X_)
            loss = closure()
            running_loss += loss.item()
               
        return running_loss

  def predict(self, X):
    X = torch.Tensor(X)
    return self(X).detach().numpy().squeeze()

In [20]:
class ExperimentData(Dataset):
    def __init__(self, X, y):
        self.X = X
        self.y = y
        
    def __len__(self):
        return self.X.shape[0]
    
    def __getitem__(self, idx):
        return self.X[idx,:], self.y[idx]

**Training on the Standard Dataset**

In [21]:
X_standard = np.asarray(train_standard)
y_standard = np.asarray(test_standard)

train_data_standard = ExperimentData(X_standard, X_standard)
valid_data_standard = ExperimentData(y_standard, y_standard)

INPUT_SIZE = 4
EPOCHS=30

pred_val_standard = {}
data_loader_train_standard = DataLoader(train_data_standard, batch_size=X_standard.shape[0])

In [22]:
net_standard = AE_LBFGS(loss = nn.MSELoss())
net_standard.optim = LBFGS(net_standard.parameters(), history_size=10, max_iter=4)

In [23]:
net_standard.train_lbfgs(data_loader_train_standard, EPOCHS, validation_data = {"X":torch.Tensor(y_standard), "y":torch.Tensor(y_standard).unsqueeze(1) })

  return F.mse_loss(input, target, reduction=self.reduction)
  return F.mse_loss(input, target, reduction=self.reduction)


1/30 Epoch | loss: 9.954859E-01 | validation loss: 9.945354E-01
2/30 Epoch | loss: 9.954653E-01 | validation loss: 9.945039E-01
3/30 Epoch | loss: 9.954565E-01 | validation loss: 9.945714E-01
4/30 Epoch | loss: 9.954531E-01 | validation loss: 9.945463E-01
5/30 Epoch | loss: 9.954531E-01 | validation loss: 9.945396E-01
6/30 Epoch | loss: 9.954517E-01 | validation loss: 9.945116E-01
7/30 Epoch | loss: 9.954503E-01 | validation loss: 9.945359E-01
8/30 Epoch | loss: 9.954503E-01 | validation loss: 9.945449E-01
9/30 Epoch | loss: 9.954498E-01 | validation loss: 9.945412E-01
10/30 Epoch | loss: 9.954500E-01 | validation loss: 9.945371E-01
11/30 Epoch | loss: 9.954500E-01 | validation loss: 9.945363E-01
12/30 Epoch | loss: 9.954500E-01 | validation loss: 9.945353E-01
13/30 Epoch | loss: 9.954500E-01 | validation loss: 9.945337E-01
14/30 Epoch | loss: 9.954498E-01 | validation loss: 9.945317E-01
15/30 Epoch | loss: 9.954498E-01 | validation loss: 9.945366E-01
16/30 Epoch | loss: 9.954498E-01 |

**Training on the Gaussian transformed dataset**

In [24]:
X_gauss = np.asarray(train_gauss)
y_gauss = np.asarray(test_gauss)

train_data_gauss = ExperimentData(X_gauss, X_gauss)
valid_data_gauss = ExperimentData(y_gauss, y_gauss)

INPUT_SIZE = 4
EPOCHS=30

pred_val_gauss = {}
data_loader_train_gauss = DataLoader(train_data_gauss, batch_size=X_gauss.shape[0])

In [25]:
net_gauss = AE_LBFGS(loss = nn.MSELoss())
net_gauss.optim = LBFGS(net_gauss.parameters(), history_size=10, max_iter=4)

In [26]:
net_gauss.train_lbfgs(data_loader_train_gauss, EPOCHS, validation_data = {"X":torch.Tensor(y_gauss), "y":torch.Tensor(y_gauss).unsqueeze(1) })

  return F.mse_loss(input, target, reduction=self.reduction)
  return F.mse_loss(input, target, reduction=self.reduction)


1/30 Epoch | loss: 9.969780E-01 | validation loss: 9.971191E-01
2/30 Epoch | loss: 9.969388E-01 | validation loss: 9.971381E-01
3/30 Epoch | loss: 9.969406E-01 | validation loss: 9.971686E-01
4/30 Epoch | loss: 9.969358E-01 | validation loss: 9.971371E-01
5/30 Epoch | loss: 9.969347E-01 | validation loss: 9.971334E-01
6/30 Epoch | loss: 9.969345E-01 | validation loss: 9.971328E-01
7/30 Epoch | loss: 9.969338E-01 | validation loss: 9.971284E-01
8/30 Epoch | loss: 9.969338E-01 | validation loss: 9.971322E-01
9/30 Epoch | loss: 9.969338E-01 | validation loss: 9.971321E-01
10/30 Epoch | loss: 9.969336E-01 | validation loss: 9.971314E-01
11/30 Epoch | loss: 9.969335E-01 | validation loss: 9.971320E-01
12/30 Epoch | loss: 9.969335E-01 | validation loss: 9.971322E-01
13/30 Epoch | loss: 9.969335E-01 | validation loss: 9.971323E-01
14/30 Epoch | loss: 9.969335E-01 | validation loss: 9.971323E-01
15/30 Epoch | loss: 9.969335E-01 | validation loss: 9.971323E-01
16/30 Epoch | loss: 9.969335E-01 |

**Testing**

We now run the two models against each other with the test sets to calculate their final benchmark score. 

In [27]:
import torch.nn.functional as F

In [28]:
def test(model, device, test_dataloader, plot_type):
  model.eval()
  loss = 0
  success = 0
  with torch.no_grad():
    for (X, y) in test_dataloader:
      X, y = X.to(device), y.to(device)
      pred_prob = model(X)
      loss += nn.MSELoss()(pred_prob, y).item()
  loss /= len(test_dataloader.dataset)
  print('\nTest dataset of {}: Overall Loss: {}'.format(plot_type, loss))

In [29]:
test_ds = TensorDataset(torch.tensor(standard_x_test.values), torch.tensor(standard_x_test.values))
test_dataloader = DataLoader(test_ds, batch_size=64)
test(net_standard, 'cpu', test_dataloader, 'Standard model')


Test dataset of Standard model: Overall Loss: 0.01638257032820143


In [30]:
test_ds = TensorDataset(torch.tensor(gauss_x_test.values), torch.tensor(gauss_x_test.values))
test_dataloader = DataLoader(test_ds, batch_size=64)
test(net_gauss, 'cpu', test_dataloader, 'Gaussian Model')


Test dataset of Gaussian Model: Overall Loss: 0.016301531305628184
