# Classification Task (XOR Classification)

In this notebook a classification task based on the XOR function is presented. 


### The setup is the following:
* We generate a data set containing two classes which are not linearly separable.
* By the variable ``center_gap`` the overlap of the two classes can be modified.
* We visualize the sampled data sets.
* We train a linear classifier and evaluate the results.

### Excercises
* Exercise 1: Train a non-linear classifier and evaluate the results.
* Exercise 2: Reduce the gap (``center_gap``) between the classes and evaluate on specificly selected samples.

# Import needed packages

In [None]:
import random
import torch
from torch import nn, optim
from IPython import display
import numpy as np
import time
from matplotlib import pyplot as plt
from res.plot_lib import set_default

# Initial Setup

In [None]:
set_default()
seed = 44
random.seed(seed)
torch.manual_seed(seed);

### Define Pytorch Device: GPU if available, else use CPU

In [None]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

### Set Notebook Parameters

In [None]:
center_gap = 9         # Distance between cluster centers of the data set
num_samples = 2000     # Number of samples included in the data set
num_of_classes = 2     # Number of classes in the data set
validation_left_out = 0.2     # Fraction of data set used for validation

H = 10                 # Number of hidden unites in the Neural Netowrk
num_out = 1            # Number of outputs in the neural network (Binary classifier needs only one output)

# Create Data Set

### Gerate XOR Data Set with Four Clusters

In [None]:
# Sample from 2 dimensinal normal distribution
X = torch.normal(mean=0, std=1, size=(num_samples,2)) - 0.5 * center_gap

# Sample cluster ids randomly from {0,1,2,3}
cluster_id = torch.randint(low=0,high=4, size=(num_samples,))

# Transform data point to receive four clusters
#      odd cluster_id   -->  shift x value by center_gap
#      cluster_id >= 2  -->  shift y value by center_gap
X = torch.stack([X[:,0] + center_gap * (cluster_id % 2), 
                 X[:,1] + center_gap * (cluster_id // 2)], axis=-1)

# Map cluster_ids to class labels: 
#      cluster_ids 0 and 3 --> 0
#      cluster_ids 1 and 4 --> 1
Y = torch.where((cluster_id == 0) + (cluster_id == 3) > 0, 
                torch.zeros_like(cluster_id), 
                torch.ones_like(cluster_id)).type(torch.FloatTensor)
Y = torch.unsqueeze(Y, axis=-1)


# Split data into training and validation set
split_id = int(num_samples*validation_left_out)
cluster_id_train = cluster_id[split_id:]
X_train = X[split_id:]
Y_train = Y[split_id:]
cluster_id_val = cluster_id[:split_id]
X_val = X[:split_id]
Y_val = Y[:split_id]

### Print Data Set Properties

In [None]:
print("Shapes:")
print("X_train:", tuple(X_train.size()))
print("Y_train:", tuple(Y_train.size()))
print("Cluster Ids Training:", torch.unique(cluster_id_train,return_counts=True))
print("X_val:  ", tuple(X_val.size()))
print("Y_val:  ", tuple(Y_val.size()))
print("Cluster Ids Validation:", torch.unique(cluster_id_val,return_counts=True))

### Plot Data Sets

In [None]:
fig = plt.figure(figsize=(15, 7))

# Plot Training Data
fig.add_subplot(1, 2, 1)
plt.scatter(X_train.cpu()[Y_train[:,0]==0,0].numpy(), X_train.cpu()[Y_train[:,0]==0,1].numpy(), color="green", label="Class 1")
plt.scatter(X_train.cpu()[Y_train[:,0]==1,0].numpy(), X_train.cpu()[Y_train[:,0]==1,1].numpy(), color="yellow", label="Class 2")
plt.legend(loc='upper right')
plt.axis('equal');
plt.title('Training Data');

# Plot Validation Data
fig.add_subplot(1, 2, 2)
plt.scatter(X_val.cpu()[Y_val[:,0]==0,0].numpy(), X_val.cpu()[Y_val[:,0]==0,1].numpy(), color="green", label="Class 1")
plt.scatter(X_val.cpu()[Y_val[:,0]==1,0].numpy(), X_val.cpu()[Y_val[:,0]==1,1].numpy(), color="yellow", label="Class 2")
plt.legend(loc='upper right')
plt.axis('equal');
plt.title('Validation Data');

## Linear Classifier

For the linear classifier, we train 5 networks and visualize the predictions of the validation data. 

### Setup
* Train ``num_networks`` different networks.
* Train for ``max_epochs`` epochs.
* Set the leanrning rate of the optimizer to ``learning_rate``.
* Use the binary-cross-entropy as loss function ``torch.nn.BCELoss()``.

* Save the trained models into the list ``models``.

In [None]:
learning_rate = 1e-3            # Learning rate for the optimizer
max_epochs  = 250                # Maximum number of epochs to train
num_networks = 5                 # Number of networks to be trained
criterion = torch.nn.BCELoss()   # Use binary-cross-entropy as loss function

models = []                      # Empty list to be filled with trained models

### Build Neural Network without Non-Linearities
* We use the ``nn`` package to create our linear model.
* The network consists of ``H`` hidden units.
* Each linear module has a weight and bias.
* We apply ``nn.Sigmoid()`` function to output in order to receive a probability value.

In [None]:
def build_model_linear():
    model = nn.Sequential(
        nn.Linear(2, H),
        nn.Linear(H, 1),
        nn.Sigmoid()      # Sigmoid function to receive probabilities
    )
    
    return model

In [None]:
print(build_model_linear())

### Train the Neural Network without Non-Linearities

In [None]:
def train_networks(max_epochs, num_networks, X, y, model_generator):

    models = []

    # Iterate through number of networks
    for n in range(num_networks):

        torch.manual_seed(seed + n+3);
        model = model_generator()
        model.to(device)       # move model to device
        models.append(model)

        # we use the optim package to apply
        # stochastic gradient descent for our parameter updates
        optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)   

        for t in range(max_epochs):

            # Feed forward to get prediction
            y_pred_prob = model(X)

            # Compute the loss (Binary-Cross-Entropy)
            loss = criterion(y_pred_prob, y)

            # Print current progress
            print("[MODEL]: %i, [EPOCH]: %i, [LOSS]: %.6f" % (n+1, t, loss.item()))

            display.clear_output(wait=True)

            # zero the gradients before running the backward pass.
            optimizer.zero_grad()

            # Backward pass to compute the gradient of loss w.r.t our learnable params. 
            loss.backward()


            # Update model parameters
            optimizer.step()
            
    return models
            
models = train_networks(max_epochs, num_networks, X_train, Y_train, model_generator=build_model_linear)

### Plot Predictions of Validation Set for each Model

In [None]:
def predict_and_plot(models, X_val, Y_val):
    fig = plt.figure(figsize=(15, 7))
    rows =2
    columns = 5

    for i, m in enumerate(models):
        fig.add_subplot(rows, columns, i+1)
    
        y_pred = torch.round(m(X_val))

        plt.scatter(X_val.cpu()[y_pred[:,0]==1, 0].numpy(), X_val.cpu()[y_pred[:,0]==1, 1].numpy(), color="green")
        plt.scatter(X_val.cpu()[y_pred[:,0]==0, 0].numpy(), X_val.cpu()[y_pred[:,0]==0, 1].numpy(), color="yellow")
        plt.axis('equal')
        plt.title('Model %i \n Acc.: %.2f\n Loss: %.2f' %(i+1, 
                                                  torch.sum(torch.round(y_pred) 
                                                            == Y_val) / len(y_pred),
                                                  criterion(y_pred, Y_val)))   

predict_and_plot(models, X_val, Y_val)

Excercises
===========
* Exercise 1: Train a non-linear classifier and evaluate the results.
* Exercise 2: Reduce the gap (``center_gap``) between the classes and evaluate on specificly selected samples.

## Exercise 1: Two-Layered Non-Linear Network

In this excercise you extend the above presented examples to a non-linear classifier. This can be realized by adding different non-linearities to the model description (e.g. ``nn.Tanh()`` or ``nn.ReLU``).

Go through the code below and fill the missing parts by parts (marked as ``???``) such that...
* ... a nonlinearity is applied by adding ``nn.ReLU()`` to the network architecture.
* ... 5 networks are trained.
* ... the network is trained for 200 epochs

### Setup

In [None]:
learning_rate = 1e-1
max_epochs  = ???                # Maximum number of epochs to train
num_networks = ???               # Number of networks to be trained
models = []                      # Empty list to be filled with trained models
criterion = torch.nn.BCELoss()   # Use binary-cross-entropy as loss function

### Build Neural Network with Non-Linearities
* Use nn package to create our linear model
* The network consists of ``H`` hidden units
* Each Linear module has a weight and bias
* Apply sigmoid on output to receive a probability value

In [None]:
def build_model():
    model = nn.Sequential(
        nn.Linear(2, H),
        ???
        nn.Linear(H, H),
        ???
        nn.Linear(H, 1),
        nn.Sigmoid()      # Sigmoid function to receive probabilities
    )
    
    return model

In [None]:
print(build_model())

### Train the Neural Network with Non-Linearities
* Train ``num_networks`` different networks.
* Use the binary-cross-entropy as loss function ``torch.nn.BCELoss()``.

In [None]:
models = train_networks(max_epochs, num_networks, X_train, Y_train, model_generator=build_model)

### Plot Predictions of Validation Set for each Model

In [None]:
predict_and_plot(models, X_val, Y_val)

## Exercise 2: Reduced Gap between Cluster Centers

In this excercise we reduce the gap between the single clusters by setting ``center_gap``. Based on this we generate new data and train models. Following, we evaluate the model performance in uncertain regions of the data plane. 

Go through the code below and fill the missing parts by parts (marked as '???') such that 
* the ``center_gap`` is set to a smaller values (as for example 3).
* the model is evaluated on interesting inputs after training ``(eval_point_1,eval_point_2,eval_point_3,eval_point_4)``. You might use ``[0,0], [-2,-2], [-2,2] and [0, 10]``.

In [None]:
center_gap = ???              # Distance between cluster centers of the data set
num_samples = 2000            # Number of samples included in the data set
num_of_classes = 2            # Number of classes in the data set
validation_left_out = 0.2     # Fraction of data set used for validation

H = 10                        # Number of hidden unites in the Neural Netowrk
num_out = 1                   # Number of outputs in the neural network (Binary classifier needs only one output)

### Gerate XOR Data Set with Four Clusters

In [None]:
# Sample from 2 dimensinal normal distribution
X = torch.normal(mean=0, std=1, size=(num_samples,2)) - 0.5 * center_gap

# Sample cluster ids randomly from {0,1,2,3}
cluster_id = torch.randint(low=0,high=4, size=(num_samples,))

# Transform data point to receive four clusters
#      odd cluster_id   -->  shift x value by center_gap
#      cluster_id >= 2  -->  shift y value by center_gap
X = torch.stack([X[:,0] + center_gap * (cluster_id % 2), 
                 X[:,1] + center_gap * (cluster_id // 2)], axis=-1)

# Map cluster_ids to class labels: 
#      cluster_ids 0 and 3 --> 0
#      cluster_ids 1 and 4 --> 1
Y = torch.where((cluster_id == 0) + (cluster_id == 3) > 0, 
                torch.zeros_like(cluster_id), 
                torch.ones_like(cluster_id)).type(torch.FloatTensor)
Y = torch.unsqueeze(Y, axis=-1)


# Split data into training and validation set
split_id = int(num_samples*validation_left_out)
cluster_id_train = cluster_id[split_id:]
X_train = X[split_id:]
Y_train = Y[split_id:]
cluster_id_val = cluster_id[:split_id]
X_val = X[:split_id]
Y_val = Y[:split_id]

### Print Data Set Properties

In [None]:
print("Shapes:")
print("X_train:", tuple(X_train.size()))
print("Y_train:", tuple(Y_train.size()))
print("Cluster Ids Training:", torch.unique(cluster_id_train,return_counts=True))
print("X_val:  ", tuple(X_val.size()))
print("Y_val:  ", tuple(Y_val.size()))
print("Cluster Ids Validation:", torch.unique(cluster_id_val,return_counts=True))

### Plot Data Sets

In [None]:
fig = plt.figure(figsize=(15, 7))

# Plot Training Data
fig.add_subplot(1, 2, 1)
plt.scatter(X_train.cpu()[Y_train[:,0]==0,0].numpy(), X_train.cpu()[Y_train[:,0]==0,1].numpy(), color="green", label="Class 1")
plt.scatter(X_train.cpu()[Y_train[:,0]==1,0].numpy(), X_train.cpu()[Y_train[:,0]==1,1].numpy(), color="yellow", label="Class 2")
plt.legend(loc='upper right')
plt.axis('equal');
plt.title('Training Data');

# Plot Validation Data
fig.add_subplot(1, 2, 2)
plt.scatter(X_val.cpu()[Y_val[:,0]==0,0].numpy(), X_val.cpu()[Y_val[:,0]==0,1].numpy(), color="green", label="Class 1")
plt.scatter(X_val.cpu()[Y_val[:,0]==1,0].numpy(), X_val.cpu()[Y_val[:,0]==1,1].numpy(), color="yellow", label="Class 2")
plt.legend(loc='upper right')
plt.axis('equal');
plt.title('Validation Data');

### Build Neural Network

In [None]:
def build_model():
    model = nn.Sequential(
        nn.Linear(2, H),
        nn.ReLU(),
        nn.Linear(H, H),
        nn.ReLU(),
        nn.Linear(H, 1),
        nn.Sigmoid()      # Sigmoid function to receive probabilities
    )
    
    return model

In [None]:
print(build_model())

### Network Setup

In [None]:
learning_rate = 1e-1
max_epochs  = 300                # Maximum number of epochs to train
num_networks = 5                 # Number of networks to be trained
models = []                      # Empty list to be filled with trained models
criterion = torch.nn.BCELoss()   # Use binary-cross-entropy as loss function

### Train the Neural Network
* Train ``num_networks`` different networks.
* Use the binary-cross-entropy as loss function ``torch.nn.BCELoss()``.

In [None]:
models = train_networks(max_epochs, num_networks, X_train, Y_train, model_generator=build_model)

In [None]:
print(models[0])

### Plot Predictions of Validation Set for each Model

In [None]:
predict_and_plot(models, X_val, Y_val)

### Evaluate Model on Specific Inputs

Based on the plots shown above, define four points to predict. Which points might be interesting and what is your expectation on the model behaviour? You might also use the four points given in the introduction of this exercise. 

In [None]:
eval_point_1 = ???
eval_point_2 = ???
eval_point_3 = ???
eval_point_4 = ???

eval_points = torch.Tensor([eval_point_1, eval_point_2, eval_point_3, eval_point_4])

In [None]:
predictions = models[0](eval_points.cpu())

### Print predicted Class Probabilities

In [None]:
for i in range(len(eval_points)):
    print("Evaluation Point: " + str(eval_points[i].numpy()) + "\n"
          "    Class 1: " + str(torch.round(predictions[i] * 100).item() / 100) + "\n" +
          "    Class 1: " + str(torch.round((1-predictions[i]) * 100).item() / 100) + "\n")