<a href="https://colab.research.google.com/github/balintnegyesi/Neural_Networks_in_Finance_2022_UU/blob/main/handout_week_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Toy example

In this homework exercise we give two classification toy examples. The first implements the classification problem of the Iris dataset explained in the lectures, in the second the students are expected to implement a neural network classification algorithm for the MNIST dataset, labeling handwritten digits.

# Iris dataset

See lecture 1 for details.
We use PyTorch. Additionally, we rely on the following libraries.

In [300]:
import torch
import torch.nn as nn
import torch.nn.functional as F

torch.set_default_dtype(torch.float64)

import numpy as np

import matplotlib.pyplot as plt

# for loading the dataset only
from sklearn.datasets import load_iris

from torch.utils.data import Dataset, DataLoader
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

We first load the dataset.

In [301]:
dataset = load_iris()  # dictionary
X = dataset['data']
y = dataset['target']
names = dataset['target_names']
feature_names = dataset['feature_names']

In [302]:
X.shape

(150, 4)

In [303]:
y.shape

(150,)

In [304]:
names

array(['setosa', 'versicolor', 'virginica'], dtype='<U10')

In [305]:
feature_names

['sepal length (cm)',
 'sepal width (cm)',
 'petal length (cm)',
 'petal width (cm)']

# Logistic Regression

Split the dataset in a disjoint partition of train and test sets. We take a random 4:1 ratio between the sample sizes.

In [306]:
M = X.shape[0]
random_idx_ord = np.random.permutation(M)
train_indices = random_idx_ord[0: int(0.8 * M)]
test_indices = random_idx_ord[int(0.8 * M): ]

X_train = X[train_indices, :]
y_train = y[train_indices]

X_test = X[test_indices, :]
y_test = y[test_indices]

print('Number of training samples: %d'%X_train.shape[0])
print('Number of test samples: %d'%X_test.shape[0])

Number of training samples: 120
Number of test samples: 30


Build the logistic regression and optimize logistic regression model.

In [307]:
model = LogisticRegression(penalty='none', fit_intercept=True,
                           solver='newton-cg', max_iter=10000, verbose=0)
clf = model.fit(X_train, y_train)

The optimal coefficients

In [308]:
beta = clf.coef_
beta

array([[ 170.25753155,  388.20101541, -425.01100611, -225.49130057],
       [ -37.14254913,   21.65627414,  -65.28954899, -646.30275743],
       [-133.11498212, -409.85728926,  490.30055509,  871.79405798]])

The resulting predictions are as follows.

In [309]:
y_pred = clf.predict(X_test)

print("Estimated test labels: ", y_pred)
print("True test labels:      ", y_test)

Estimated test labels:  [2 2 2 1 2 1 2 1 1 2 0 1 2 2 1 1 0 2 1 2 0 0 2 1 2 2 1 2 1 1]
True test labels:       [2 2 2 1 2 2 2 1 1 2 0 0 2 2 1 1 0 2 1 2 0 1 2 1 2 2 1 2 1 1]


The accuracy over the test sample

In [310]:
clf.score(X_test, y_test)

0.9

# Neural network regressions

At this point, we define fully-connected, feedforward neural network with $L$ hidden layers, $p_n$ neurons in each layer and a given activation $\varphi: R\to R$

In [311]:
torch.set_default_dtype(torch.float64)  # mismatch between numpy and pytorch default

## Shallow neural nets

In [312]:
class ShallowNet(nn.Module):
    def __init__(self, input_dimension, output_dimension, num_neurons,
                 activation, output_activation):
        super(ShallowNet, self).__init__()

        self.hidden_layer = nn.Linear(input_dimension, num_neurons)
        # the corresponding affine transformation of two full-connected dense
        # layers
        self.output_layer = nn.Linear(num_neurons, output_dimension)
        
        self.activation = activation
        self.output_activation = output_activation


        self.optimizer = torch.optim.SGD(self.parameters(), lr=0.1)
        self.loss = torch.nn.CrossEntropyLoss()

    def forward(self, x):
        x = self.hidden_layer(x)
        x = self.activation(x)
        x = self.output_layer(x)
        output = self.output_activation(x)
        return output

    def train_minibatch(self, x_train, y_train, epochs=200, log_freq=10,
                        batch_size=4):
      
        losses = []

        permutation = torch.randperm(x_train.size()[0])

        for epoch in range(epochs):
            for i in range(0, x_train.size()[0], batch_size):
                self.optimizer.zero_grad()

                indices = permutation[i: i+batch_size]
                batch_x, batch_y = x_train[indices], y_train[indices]

                y_pred = self.forward(batch_x)
                current_loss = self.loss(y_pred, batch_y)
    
                self.optimizer.zero_grad()
                current_loss.backward()
                self.optimizer.step()
            losses.append(current_loss.item())
            if epoch % log_freq == 0:
              print(f'epoch: {epoch:2}  training loss: {current_loss.item():10.8f}')

        return None

    def train(self, x_train, y_train, epochs=10**5, log_freq=1000):
        losses = []
        
        for i in range(epochs):
          y_pred = self.forward(x_train)
          current_loss = self.loss(y_pred, y_train)
          losses.append(current_loss.item())
          if i % log_freq == 0:
            print(f'epoch: {i:2}  training loss: {current_loss.item():10.8f}')
    
          self.optimizer.zero_grad()
          current_loss.backward()
          self.optimizer.step()
        
        return None



For reasons which become clear in Lecture 3, PyTorch does not "like" numpy inputs. Therefore, we convert the corresponding data sets to so called "tensors", and pass those onto the model.

In [313]:
model = ShallowNet(len(feature_names), len(names), 10, torch.relu, torch.nn.Softmax(dim=-1))

y_train_pt = np.zeros(shape=[len(y_train), len(names)])
for m in range(len(y_train)):
  idx = y_train[m]
  y_train_pt[m, idx] = 1

X_train_pt = torch.from_numpy(X_train).type(torch.DoubleTensor)
y_train_pt = torch.from_numpy(y_train_pt).type(torch.DoubleTensor)

X_test_pt = torch.from_numpy(X_test).type(torch.DoubleTensor)


model(X_train_pt).shape

torch.Size([120, 3])

In [None]:
model.train(X_train_pt, y_train_pt)

epoch:  0  training loss: 1.12720807
epoch: 1000  training loss: 0.58485265
epoch: 2000  training loss: 0.57604015
epoch: 3000  training loss: 0.57288305
epoch: 4000  training loss: 0.57106022
epoch: 5000  training loss: 0.56981584
epoch: 6000  training loss: 0.56887727
epoch: 7000  training loss: 0.56811944
epoch: 8000  training loss: 0.56747939
epoch: 9000  training loss: 0.56692290
epoch: 10000  training loss: 0.56642996
epoch: 11000  training loss: 0.56598788
epoch: 12000  training loss: 0.56558808
epoch: 13000  training loss: 0.56522444
epoch: 14000  training loss: 0.56489234
epoch: 15000  training loss: 0.56458813
epoch: 16000  training loss: 0.56430881
epoch: 17000  training loss: 0.56405185
epoch: 18000  training loss: 0.56381507
epoch: 19000  training loss: 0.56359656
epoch: 20000  training loss: 0.56339461
epoch: 21000  training loss: 0.56320772
epoch: 22000  training loss: 0.56303454
epoch: 23000  training loss: 0.56287387
epoch: 24000  training loss: 0.56272462
epoch: 25000

In [None]:
y_pred = model(X_test_pt).detach().numpy().argmax(axis=-1)

num_misclassified = np.sum(y_pred != y_test)

print("Estimated test labels: ", y_pred)
print("True test labels:      ", y_test)

print("Test classification accuracy: %.2f"%(1 - num_misclassified / len(y_test)))

## Deep neural networks

In [None]:
class DeepNet(nn.Module):
    def __init__(self, input_dimension, output_dimension, num_neurons,
                 activation, output_activation):
        super(DeepNet, self).__init__()

        self.hidden_layers = [nn.Linear(input_dimension, num_neurons[0])]
        for idx, width in enumerate(num_neurons):
            if idx < (len(num_neurons) - 1):
              self.hidden_layers.append(nn.Linear(width, num_neurons[idx + 1]))

        self.output_layer = nn.Linear(num_neurons[-1], output_dimension)

        self.activation = activation
        self.output_activation = output_activation

        self.optimizer = torch.optim.SGD(self.parameters(), lr=0.01)
        self.loss = torch.nn.CrossEntropyLoss()

    def forward(self, x):
        for idx, layer in enumerate(self.hidden_layers):
          x = layer(x)
          x = self.activation(x)

        x = self.output_layer(x)
        output = self.output_activation(x)
        return output

    def train_minibatch(self, x_train, y_train, epochs=200, log_freq=10,
                        batch_size=4):
      
        losses = []

        permutation = torch.randperm(x_train.size()[0])

        for epoch in range(epochs):
            for i in range(0, x_train.size()[0], batch_size):
                self.optimizer.zero_grad()

                indices = permutation[i: i+batch_size]
                batch_x, batch_y = x_train[indices], y_train[indices]

                y_pred = self.forward(batch_x)
                current_loss = self.loss(y_pred, batch_y)
    
                self.optimizer.zero_grad()
                current_loss.backward()
                self.optimizer.step()
            losses.append(current_loss.item())
            if epoch % log_freq == 0:
              print(f'epoch: {epoch:2}  training loss: {current_loss.item():10.8f}')

        return None

    def train(self, x_train, y_train, epochs=10**5, log_freq=1000):
        losses = []
        
        for i in range(epochs):
          y_pred = self.forward(x_train)
          current_loss = self.loss(y_pred, y_train)
          losses.append(current_loss.item())
          if i % log_freq == 0:
            print(f'epoch: {i:2}  training loss: {current_loss.item():10.8f}')
    
          self.optimizer.zero_grad()
          current_loss.backward()
          self.optimizer.step()
        
        return None

Choose network architecture (width, activation functions)

In [None]:
deep_model = DeepNet(len(feature_names), len(names), [10, 10], torch.relu, torch.nn.Softmax(dim=-1))

deep_model.train(X_train_pt, y_train_pt)

In [None]:
y_pred = deep_model(X_test_pt).detach().numpy().argmax(axis=-1)

num_misclassified = np.sum(y_pred != y_test)

print("Estimated test labels: ", y_pred)
print("True test labels:      ", y_test)

print("Test classification accuracy: %.2f"%(1 - num_misclassified / len(y_test)))

# Comparison

In [None]:
y_pred_logit = clf.predict(X_test)
y_pred_shallow = model(X_test_pt).detach().numpy().argmax(axis=-1)
y_pred_deep = deep_model(X_test_pt).detach().numpy().argmax(axis=-1)

In [None]:
confusion_logit = np.zeros(shape=[len(names), len(names)])
confusion_shallow = np.zeros(shape=[len(names), len(names)])
confusion_deep = np.zeros(shape=[len(names), len(names)])

In [None]:
for true_class in range(3):
    for predicted_class in range(3):
      confusion_logit[true_class, predicted_class] = np.sum(y_pred_logit[y_test == true_class] == predicted_class)
      confusion_shallow[true_class, predicted_class] = np.sum(y_pred_shallow[y_test == true_class] == predicted_class)
      confusion_deep[true_class, predicted_class] = np.sum(y_pred_deep[y_test == true_class] == predicted_class)



In [None]:
confusion_logit

In [None]:
confusion_shallow

In [None]:
confusion_deep

And the corresponding accuracies

In [None]:
acc_logit = np.sum(np.diag(confusion_logit)) / len(y_test)
acc_shallow = np.sum(np.diag(confusion_shallow)) / len(y_test)
acc_deep = np.sum(np.diag(confusion_deep)) / len(y_test)


print('Accuracy logistic regression: %.2f'%acc_logit)
print('Accuracy shallow neural network: %.2f'%acc_shallow)
print('Accuracy deep neural network: %.2f'%acc_deep)