# Business Analytics and Artificial Intelligence
Summer semester 2024

Prof. Dr. Jürgen Bock

## Exercises regarding foundations of artificial neural networks in PyTorch

This notebook offers exercises for the basics of artificial neural networks in *PyTorch*. The single tasks are described in Markdown cells. Enter your solution in the code cell following the task description, and add further code cells if needed.

### Learning goals
* You are able to implement simple artificual neurons and to analyse the influence of the single components.
* You are able to prepare structured data sets for their use in *PyTorch* and to define simple multi-layered neural networks.
* You are able to embed your own neural networks in a generic learning algorithms and to train it with different data sets.
* You are able to analyse and discuss the effect of different hyperparameters on the learning process and the quality of the learned model.

### Simple Neurons

Consider the Boolean function *NAND*.

The truth table for NAND is as follows:

| NAND | 0 | 1 |
|-----|---|---| 
| **0**   | 1 | 1 |
| **1**   | 1 | 0 |

Why is this function computable with a single neuron?

*Answer:* It is linearly separable, i.e., the samples of the single classes are separable by a linear function (here: a strait line).

How would this single neuron need to be configured (inputs, weights)? 

*Answer:* The two inputs will be weighted negatively, the bias positively. It is important that the effect of the bias makes the input sum greater 0 if and only if at most one of the two inputs is 1, i.e., if the input weight contributes to the sum.

Implement the computation in Python and test your solution.

In [None]:
def threshold(x):
    if x < 0:
        return 0
    else:
        return 1

In [None]:
x1 = 1
x2 = 1

w03 = 1.5
w13 = -1
w23 = -1

y = threshold(w03 + x1*w13 + x2*w23)
print('{} NAND {} -> {}'.format(x1, x2, y))

### Multi-layered neural networks

#### Synthetic Data

Consider the following (synthecital) data set with two features and two classes:

In [None]:
from sklearn import datasets

data = datasets.make_circles(
    n_samples = 10000,
    noise = 0.1,
    factor = 0.5 )

Make yourself familiar with this data set by creating a scatter plot.

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
X, t = data
plt.scatter(X[:,0], X[:,1], c=t, s=1)
plt.show()

Define a multi-layered neural network using *PyTorch* and implement a training routine.

In [None]:
import torch
from torch.utils.data import TensorDataset, DataLoader

In [None]:
dataset = TensorDataset(torch.from_numpy(X), torch.from_numpy(t))
data_loader = DataLoader(dataset=dataset, batch_size=5, shuffle=True)

In [None]:
import torch.nn as nn
from torch.nn import functional as F
import torch.optim as optim

In [None]:
class MLP( nn.Module ):
    def __init__( self ):
        super( MLP, self ).__init__()
        self.fc1 = nn.Linear( 2, 10 )
        self.fc3 = nn.Linear( 10, 1 )
        
    def forward( self, x ):
        x = torch.sigmoid( self.fc1( x ) )
        x = torch.sigmoid( self.fc3( x ) )
        return x

In [None]:
model = MLP()

In [None]:
num_epochs = 20

In [None]:
loss_fn = nn.BCELoss()

In [None]:
optimizer = optim.Adam(model.parameters(), lr=0.01)

In [None]:
from IPython import display
from statistics import mean
loss_history = []
loss_ep = []
plt.figure(figsize = (12,8));

In [None]:
for epoch in range(num_epochs):
    for batch in data_loader :
        optimizer.zero_grad()
        input, target = batch
        output = model(input.float())
        loss = loss_fn(output, torch.unsqueeze(target.float(), 1))
        loss.backward()
        optimizer.step()
        loss_ep.append(loss.item())
    
    ## Zu Visualisierungszwecken:
    loss_history.append(mean(loss_ep))
    loss_ep = []
    display.clear_output(wait=True)
    plt.plot(loss_history)
    #dataview.plot_decision_boundary2d(model, X, t, showData=False)
    display.display(plt.gcf())
    display.display(print("Epoch {:2}, loss: {}".format(epoch, loss_history[-1])))

Ensure that the module ``dataview`` is in the current directory (or in ``sys.path``). Use the function ``dataview.plot_decision_boundary2d(model, X, y)`` in order to visualize the *decision boundary*.

In [None]:
import dataview

In [None]:
dataview.plot_decision_boundary2d(model, X, t, showData=False)

Experiment with the so-called *hyperparameters*: Change the number of and width of the layers, change the number of epochs and the *batch size*. Have a look at the *PyTorch* API documentation and experiment with different activation functions and optimizers.

**Note:** After changing the model, the object instance of the model must be re-initialized (e.g. ``model = Net()``). Also, the optimizer and other auxiliary variebles, e.g., a ``loss_history``, etc., must be re-initialized.

Which were your finding?

*Answer:*

- A network with a single *hidden layer* is sufficient for solving this classification problem. 3 neurons in the *hidden layer* makes a are working solution. The more neurons there are in the *hidden layer* the more accurately the circle is approximated.
- The *sigmoid* function works as an activation function.
- Robustness and speed of convergence depend on the optimizer. *Adam* works significantly better than *SGD*.
- The *learning rate* is differently effective for both optimization algorithms. This classification problem is relatively robust against larger learning rates. Obviously there are not bigger problems due to local minima in the weight space.
- A small *batch size* (even of size 1) leads to a significant slow-down of the iterations over the epochs. However, even after one epoch, a reasonably accurate approximation is found. A large *batch size* leads to a fast iteration over the epochs, the convergence is slower. The model would be more robust against overfitting ("Learning exactly the training data set"). This effect cannot be observed here.

#### Real Data

Use ``scikit-learn`` to obtain the *Breast Cancer Wisconsin* data set. See: https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_breast_cancer.html

In [None]:
from sklearn.datasets import load_breast_cancer

In [None]:
data = load_breast_cancer()

In [None]:
print('Number of features:', len(data.feature_names))
print(data.feature_names)
print('Number of classes:', len(data.target_names))
print(data.target_names)
print('Number of samples:', len(data.data))

Make yourself familiar with the function ``train_test_split`` from the module ``sklearn.model_selection``. Use this function to split the data set into training and test set.

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
X_train, X_test, t_train, t_test = train_test_split(data.data, data.target)

In [None]:
X_train.shape

In [None]:
X_test.shape

Define a neural network and use the training data set to train it. Note the lengths of input and output vector when defining the network structure.

In [None]:
from torch.utils.data import TensorDataset
import torch
from torch.utils.data import DataLoader

In [None]:
dataset_train = TensorDataset(torch.from_numpy(X_train), torch.from_numpy(t_train))

In [None]:
data_loader = DataLoader(dataset=dataset_train, batch_size=10, shuffle=True)

In [None]:
from torch import nn

In [None]:
loss_fn = nn.BCELoss()

In [None]:
from torch.nn import functional as F

class MLP(nn.Module):
    def __init__( self ):
        super( MLP, self ).__init__()
        self.fc1 = nn.Linear( 30, 50 )
        self.fc2 = nn.Linear( 50, 20 )
        self.fc3 = nn.Linear( 20, 5 )
        self.fc4 = nn.Linear( 5, 1 )
        
    def forward( self, x ):
        x = F.relu( self.fc1( x ) )
        x = F.relu( self.fc2( x ) )
        x = F.relu( self.fc3( x ) )
        x = torch.sigmoid( self.fc4( x ) )
        return x

In [None]:
model = MLP()

In [None]:
import torch.optim as optim
optimizer = optim.Adam(model.parameters(), lr=0.001)

In [None]:
import matplotlib.pyplot as plt
from IPython import display
%matplotlib inline

loss_history = []
loss_ep = []
plt.figure(figsize = (12,8));

In [None]:
num_epochs = 30

In [None]:
for epoch in range(num_epochs):
    for batch in data_loader :
        optimizer.zero_grad()
        input, target = batch
        output = model(input.float())
        loss = loss_fn(output, torch.unsqueeze(target.float(), 1))
        loss.backward()
        optimizer.step()
        loss_ep.append(loss.item())
    
    ## Zu Visualisierungszwecken:
    loss_history.append(mean(loss_ep))
    plt.plot(loss_history)
    display.clear_output(wait=True)
    display.display(plt.gcf())
    display.display(print("Epoch {:2}, loss: {}".format(epoch, loss_history[-1])))
    loss_ep = []

Use ``scikit-learn`` to create a *classification report*, based on which you can evaluate your model.

To do this, first compute the ouput vector using your neural network for the input vectors of the test data set.

**Note:** ``torch.from_numpy`` creates a tensor from a *NumPy* array, which is the data structure of the test data sets. Also, you need to convert the input vector into a ``FloatTensor``.

For *classification report* you need to convert the floating point numbers from the output vector (resulting from the ``sigmoid`` activation function) to integer values.

In [None]:
y_test = model(torch.from_numpy(X_test).float())

In [None]:
import sklearn.metrics as metrics

In [None]:
print(metrics.classification_report(t_test, torch.round(y_test).int()))

In [None]:
print(metrics.confusion_matrix(t_test, torch.round(y_test).int()))