# Workshop 6: An Exercise in Learning XOR

In this workshop, we will be working with PyTorch to create an XOR classifier? What exactly is XOR? Perhaps an example will help. 

![XOR](../assets/XOR.png)

Essentially, we have a function on two features $f(a, b)$. Our features $a$ and $b$ can belong in two categories, either 0 or 1 in this case. In this example, XOR outputs 1 if either feature is in category 1, but not both. Otherwise, it outputs 0. 

In [None]:
import os

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch import optim
from torch.utils.data import Dataset, DataLoader

import matplotlib.pyplot as plt

## Creating the Dataset
First, we will need to create a dataset to solve a general XOR-like problem. We need to answer two questions. 

1) What kind of features / categories should our inputs have? 
2) What should our output space look like?

We're working just with numbers. Theoretically, this shouldn't make too much difference. Practically, since we are working with numbers, some decisions might make it easier to train our network. 

In [None]:
class BinaryDataset(Dataset):
    def __init__(self):
        self.samples = 

        self.labels = 

    def __len__(self):
        return 

    def __getitem__(self, idx):
        return 

Click for code snippet of potential solution
<details>

class BinaryDataset(Dataset):

    def __init__(self):
    
        self.samples - F.normalize(
            torch.stack(
                torch.Tensor([1, 1]),
                torch.Tensor([2, 1]),
                torch.Tensor([1, 2]),
                torch.Tensor([1, 5]),
                torch.Tensor([0.01, 1]),
                torch.Tensor([5, 1]),
                torch.Tensor([1, 0.01]),
                torch.Tensor([-1, -1]),
                torch.Tensor([-2, -1]),
                torch.Tensor([-1, -2]),
                torch.Tensor([-1, -5]),
                torch.Tensor([-5, -1]),
                torch.Tensor([1, -1]),
                torch.Tensor([2, -1]),
                torch.Tensor([1, -2]),
                torch.Tensor([1, -5]),
                torch.Tensor([0.01, -1]),
                torch.Tensor([5, -1]),
                torch.Tensor([-1, 1]),
                torch.Tensor([-2, 1]),
                torch.Tensor([-1, 2]),
                torch.Tensor([-1, 5]),
                torch.Tensor([-5, 1]),
                torch.Tensor([-1, 0.01])
        ))

        self.labels = torch.cat((
            torch.tile(torch.Tensor([1]), (12, 1)),
            torch.tile(torch.Tensor([-1]), (12, 1)),
        ))

    def __len__(self):
        return len(self.samples)

    def __getitem__(self, idx):
        return (self.samples[idx], self.labels[idx])

<details>

## Defining Our Model

What kind of model can solve this problem? Does the number of layers matter? What activation should we choose? 

The number of layers ends up being very important for this problem. Let's create two networks to experiment with this. 

In [None]:
class BasicLinearModel_Singlelayer(nn.Module):
    def __init__(self, in_features, out_features):
        super().__init__()
        self.linear1 = nn.Linear(in_features, out_features)

    def forward(self, input):
        x = torch.tanh(self.linear1(input))

class BasicLinearModel_Multilayer(nn.Module):
    def __init__(self, in_features, hidden_nodes, out_features):
        super().__init__()
        self.linear1 = nn.Linear(in_features, hidden_nodes)
        self.linear2 = nn.Linear(hidden_nodes, out_features)

    def forward(self, input):
        x = torch.tanh(self.linear1(input))
        return torch.tanh(self.linear2(x))

<details>
    
    class BasicLinearModel_Singlelayer(nn.Module):
        def __init__(self, in_features, out_features):
            super().__init__()
            self.linear1 = nn.Linear(in_features, out_features)

        def forward(self, input):
            x = torch.tanh(self.linear1(input))

    class BasicLinearModel_Multilayer(nn.Module):
        def __init__(self, in_features, hidden_nodes, out_features):
            super().__init__()
            self.linear1 = nn.Linear(in_features, hidden_nodes)
            self.linear2 = nn.Linear(hidden_nodes, out_features)

    def forward(self, input):
        x = torch.tanh(self.linear1(input))
        return torch.tanh(self.linear2(x))
        
</details>

## Extra Utilities

We will need some additional tools before we start training to evaluate our model. First, our network will likely have a continuous output range, decided by our activation function. Let's **binarize** this output to match the constraints we want to set on our learned XOR function. 

We'll also want to write a function to test the accuracy of our network. 

In [None]:
def binary_pred(pred):
    return torch.where(pred > 0, torch.Tensor([1]), torch.Tensor([-1]))

def accuracy(pred, label):
    pred = binary_pred(pred)

    correct_results_sum = (pred == label).sum().float()
    acc = correct_results_sum / pred.shape[0]
    acc = torch.round(acc * 100).item()

    return acc

## Training

Let's take a typical training loop and apply it to our problem. We will also add some extra code to save **checkpoints** of our model every epoch, so we can load the weights at that point and see what our network has learned so far. 

We'll also add in a function to load our checkpoints easily. 

In [None]:
def train(trainloader, model, criterion, optimizer, savename):
    for epoch in range(100):
        
        loss_history = []
        running_loss = 0.0
        for i, data in enumerate(trainloader, 0):
            inputs, labels = data
            optimizer.zero_grad()
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            
            # print statistics every epoch
            running_loss += loss.item()
        
        average_loss = loss.item()
        if epoch % 10 == 0:
            print('[Epoch %d] loss: %.5f' % (epoch + 1, average_loss))
        loss_history.append(average_loss)
        
        filename = "-".join([savename, str(epoch) + ".pt"])
        if not os.path.isdir("./tmpdir"):
            os.mkdir("tmpdir")
        PATH = os.path.join("tmpdir", filename)
        torch.save(model.state_dict(), PATH)
        
    print('Finished Training')
    return loss_history

def load_checkpoint(model, savename, epoch):
    checkpoint_name = "-".join([savename, str(epoch) + ".pt"])
    checkpoint = torch.load(os.path.join("tmpdir", checkpoint_name))
    model.load_state_dict(checkpoint)

In [None]:
dataset = BinaryDataset()
model = BasicLinearModel_Multilayer(2, 2, 1)
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

trainloader = DataLoader(dataset, batch_size=4, shuffle=True)

loss = train(trainloader, model, criterion, optimizer, 'multinet')

## Testing our Model

Let's write a quick utility that will create a bunch of random data points, and an explicit function to calculate the right answer, in order to test our network. 

In [None]:
testset = F.normalize(torch.rand(10, 2) * 2 - 1, dim=1)
mask = torch.logical_xor(testset[:, 0] > 0, testset[:, 1] > 0)
testlabels = torch.where(mask, torch.Tensor([1]), torch.Tensor([-1])).unsqueeze(1)

## Visualizing Our Model

Let's define a utility to look at the **decision boundary** for the model we just trained. 

In [None]:
import numpy as np

def plot_decision_boundary(model, h=0.01):

    plt.figure(figsize=(20, 20))

    plt.axis('scaled')

    plt.xlim(-1.1, 1.1)
    plt.ylim(-1.1, 1.1)

    colors = {
        -1: "ro", 
        1: "go"
    }

    trainloader = DataLoader(dataset, batch_size=1)
    for data in trainloader:
        inputs, label = data
        plt.plot(
            [inputs[0, 0]],
            [inputs[0, 1]],
            colors[int(label)],
            markersize=20
        )

    x_range = np.arange(-1.1, 1.1, h)
    y_range = np.arange(-1.1, 1.1, h)

    xx, yy = np.meshgrid(x_range, y_range, indexing='ij')
    Z = np.array([[binary_pred(model(torch.Tensor([x, y]).unsqueeze(0))) for x in x_range] for y in y_range])

    plt.contourf(xx, yy, Z, colors=['red', 'green', 'green', 'green'], alpha=0.4)

In [None]:
mynet = BasicLinearModel_Multilayer(2, 2, 1)
load_checkpoint(mynet, "multinet", 20)

In [None]:
plot_decision_boundary(mynet)