# The $XOR$ Problem

Return to the [castle](https://github.com/Nkluge-correa/teeny-tiny_castle).

**The XOR, or “exclusive or”, problem is a classic problem in ANN history. It is the problem of using a neural network to predict the outputs of XOR logic gates given two binary inputs. An XOR function should return a true value if the two inputs are not equal and a false value if they are equal. All possible inputs and predicted outputs are shown in the figure bellow.**

![image](https://res.cloudinary.com/practicaldev/image/fetch/s--6OpbLFPq--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/0%2ALYlt6CZJHOJkNRHJ.)

**On the surface, XOR appears to be a very simple problem, however, Minksy and Papert ([1969](https://leon.bottou.org/publications/pdf/perceptrons-2017.pdf)) showed that this was a big problem for neural network architectures of the 1960s, known as perceptrons.**


In [1]:
import numpy as np
import pandas as pd
import plotly.offline as py
from sklearn.svm import SVC
import plotly.graph_objects as go
rs = np.random.RandomState(666)


### The `Perceptron`

**To quickly summarise, a perceptron is essentially a method of separating a manifold with a _hyperplane_. This is just drawing a straight line to separate an n-dimensional space into two regions: True or False. Take a look at the `dot_product.ipynb` to see how this is done.**

[![perceptron2](https://miro.medium.com/max/639/1*_Epn1FopggsgvwgyDA4o8w.png)](https://miro.medium.com/max/639/1*_Epn1FopggsgvwgyDA4o8w.png)


In [3]:
x = np.random.randn(100)*5
x_0 = np.linspace(x.min()-.1, x.max()+.1, 500)  # evenly spaced test points


fig = go.Figure(data=go.Scatter(x=x_0, y=x_0, name='Linear Function'))
fig.update_layout(template='plotly_dark',
                  title='Linear Function')
fig.show()


**A neural network is essentially a series of hyperplanes (a plane in N dimensions) that group / separate regions in the target hyperplane.**

_Let's generate some fake data do vizualize this:_

- $X$ is the input features,
- $Y$ is the class labels for each $x$.


In [4]:
n_samples = 100
A = np.zeros((100, 3))
A[:n_samples //
    2] = rs.multivariate_normal(np.ones(3), np.eye(3), size=n_samples//2)
A[n_samples //
    2:] = rs.multivariate_normal(-np.ones(3), np.eye(3), size=n_samples//2)
B = np.zeros(n_samples)
B[n_samples//2:] = 1


**We can fit the data with an SVM and separate classifications in the hyperplane.**

_The equation for separating the plane is given by all $x$ in $R^{3}$ such that:_

$$(svm.coef \cdot x) + b = 0$$


In [5]:
svm = SVC(kernel='linear')
svm.fit(A, B)


def z(x, y): return (-svm.intercept_[0]-svm.coef_[0]
                     [0]*x-svm.coef_[0][1]*y) / svm.coef_[0][2]


am, aM = A[:, 0].min(), A[:, 0].max()
bm, bM = A[:, 1].min(), A[:, 1].max()
a = np.linspace(am, aM, 10)
b = np.linspace(bm, bM, 10)
a, b = np.meshgrid(a, b)


fig = go.Figure()
fig.add_surface(x=a, y=b, z=z(a, b), showscale=False, opacity=0.9)
fig.add_scatter3d(x=A[B == 0, 0], y=A[B == 0, 1], z=A[B == 0, 2],
                  mode='markers', marker={'color': 'blue'}, name='Class_0')
fig.add_scatter3d(x=A[B == 1, 0], y=A[B == 1, 1], z=A[B == 1, 2],
                  mode='markers', marker={'color': 'red'}, name='Class_1')
fig.update_layout(template='plotly_dark',
                  title='Hyperplane Separation',
                  paper_bgcolor='rgba(0, 0, 0, 0)',
                  plot_bgcolor='rgba(0, 0, 0, 0)')
fig.show()


**The XOR function is one of the simplest non-linear function. It is impossible to separate True results from the False results using a single line.**


In [6]:
def xor(x1, x2):
    return bool(x1) != bool(x2)


x = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([xor(*i) for i in x])


df = pd.DataFrame(x, columns=['x1', 'x2'])
df['xor'] = y
df


fig = go.Figure(data=go.Scatter(
    x=df['x1'],
    y=df['x2'],
    mode='markers',
    text=df['xor'],
    showlegend=False
))
fig.update_layout(template='plotly_dark',
                  title='XOR function',
                  paper_bgcolor='rgba(0, 0, 0, 0)',
                  plot_bgcolor='rgba(0, 0, 0, 0)')
fig.show()


### Activation functions

**The way our brains work is like a sort of step function. Neurons fire a 1 if there is enough build-up of voltage else it doesn’t fire (i.e a zero). _We aim, via the perceptron, to recreate this behavior._**

**The problem with a step function is that they are discontinuous. _This creates problems with the practicality of the mathematics._ Thus we tend to use a smooth function, the sigmoid, which is infinitely differentiable, or the Relu (_which has a nicely behaved derivative_) allowing us to easily do calculus with our model.**


In [7]:
x = np.random.randn(100)*5
x_0 = np.linspace(x.min()-.1, x.max()+.1, 500)


def step(array):
    y = list(np.maximum(array, 0))
    for i in range(len(y)):
        if y[i] > 0:
            y[i] = 1
    return y


y_step = step(x_0)

fig = go.Figure(data=go.Scatter(x=x_0, y=y_step, name='Step Function'))
fig.update_layout(template='plotly_dark',
                  title='Step Function',
                  paper_bgcolor='rgba(0, 0, 0, 0)',
                  plot_bgcolor='rgba(0, 0, 0, 0)')
fig.show()


def ReLU(Z):
    return np.maximum(Z, 0)  # valor máximo entre Z e 0


y_relu = ReLU(x_0)
fig = go.Figure(data=go.Scatter(x=x_0, y=y_relu, name='ReLU Function'))
fig.update_layout(template='plotly_dark',
                  title='ReLU Function',
                  paper_bgcolor='rgba(0, 0, 0, 0)',
                  plot_bgcolor='rgba(0, 0, 0, 0)')
fig.show()


def sigmoid(x):
    return 1/(1 + np.exp(-x))


y_sigmoid = sigmoid(x_0)

fig = go.Figure(data=go.Scatter(x=x_0, y=y_sigmoid, name='Sigmoid Function'))
fig.update_layout(template='plotly_dark',
                  title='Sigmoid Function',
                  paper_bgcolor='rgba(0, 0, 0, 0)',
                  plot_bgcolor='rgba(0, 0, 0, 0)')
fig.show()


### How do we solve the xor problem?

**The trick is to realize that we can just logically stack two perceptrons.** _Two perceptrons that will draw straight lines, and another perceptron that serves to combine these two separate signals into a single signal that just has to differentiate between a single True / False boundary._


In [8]:
x = np.random.randn(100)
x_0 = np.linspace(0, 1, 500)

fig = go.Figure(data=go.Scatter(
    x=df['x1'],
    y=df['x2'],
    mode='markers',
    text=df['xor'],
    showlegend=False
))
fig.add_trace(go.Scatter(x=x_0 - 0.1, y=x_0[::-1] - 0.1, name='boundary_1'))
fig.add_trace(go.Scatter(x=x_0 + 0.1, y=x_0[::-1] + 0.1, name='boundary_2'))
fig.update_layout(template='plotly_dark',
                  title='Idealized decision boundary',
                  paper_bgcolor='rgba(0, 0, 0, 0)',
                  plot_bgcolor='rgba(0, 0, 0, 0)')
fig.show()


**The “knowledge” of a neural network is all contained in the learned parameters which are the weights and bias. The weights are multiplied by each signal sent by their respective perceptrons and the bias is added as $y(x) = w \cdot x + b$ where $w$ is the weight and $b$ is the bias.**

_The backpropagation algorithm (backprop.) is the key method by which we sequentially adjust the weights by backpropagating the errors from the final output neuron._

**To calculate the adjustment of each weight, We define the error as anything that will decrease as we approach the target distribution. Let $E$ be the error function given by:**

$$E = \frac{(y−y')^{2}}{2}$$

**The learning algorithm consists of the following steps:**

- Randomly initialize bias and weights;
- Iterate the training data;
- Forward propagate: Calculate the neural net the output;
- Compute a “loss function”;
- Backwards propagate: Calculate the gradients concerning the weights and bias;
- Adjust weights and bias by gradient descent;
- Exit when the error has reached a certain threshold.



In [None]:
import itertools


np.random.seed(42)


def xor(x1, x2):
    return bool(x1) != bool(x2)


def sigmoid(x):
    return 1 / (1 + np.exp(-x))


def sigmoid_derivative(sigmoid_result):
    return sigmoid_result * (1 - sigmoid_result)


def error(target, prediction):
    return .5 * (target - prediction)**2


def error_derivative(target, prediction):
    return - target + prediction


x = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([[xor(*i)] for i in x], dtype=int)

alpha = 0.02
n_neurons_input, n_neurons_hidden, n_neurons_output, bias_per_neuron = 2, 2, 1, 1

w_hidden = np.random.random(size=(n_neurons_input, n_neurons_hidden))
b_hidden = np.random.random(size=(bias_per_neuron, n_neurons_hidden))

w_output = np.random.random(size=(n_neurons_hidden, n_neurons_output))
b_output = np.random.random(size=(bias_per_neuron, n_neurons_output))

errors = []
params = []
grads = []
epoch = 1
while True:
    if epoch == 1:
        print(f'Training...\nEpoch 1')
    y_hidden = sigmoid(np.dot(x, w_hidden) + b_hidden)
    y_output = sigmoid(np.dot(y_hidden, w_output) + b_output)

    e = error(y, y_output).mean()

    if e < 1e-4:
        print(f'Epoch {epoch}.')
        print(f'Training terminated. Loss score: {e}.')
        break

    grad_output = error_derivative(y, y_output) * sigmoid_derivative(y_output)
    grad_hidden = grad_output.dot(w_output.T) * sigmoid_derivative(y_hidden)

    w_output -= alpha * y_hidden.T.dot(grad_output)
    w_hidden -= alpha * x.T.dot(grad_hidden)

    b_output -= alpha * np.sum(grad_output)
    b_hidden -= alpha * np.sum(grad_hidden)

    errors.append(e)
    grads.append(np.concatenate((grad_output.ravel(), grad_hidden.ravel())))
    params.append(np.concatenate((w_output.ravel(), b_output.ravel(),
                                  w_hidden.ravel(), b_hidden.ravel())))
    epoch += 1


def predict(x, y):
    y_hidden = sigmoid(np.dot(x, w_hidden) + b_hidden)
    result = sigmoid(np.dot(y_hidden, w_output) + b_output)
    df = pd.DataFrame(x, columns=['x1', 'x2'])
    df['Prediction'] = result
    df['Ground Truth'] = y
    return df


predict(x, y)


In [None]:
epochs = list(range(1, 40000))

fig = go.Figure(data=go.Scatter(
    x=epochs, y=errors, name='Multi-Perceptron Loss'))
fig.update_layout(template='plotly_dark',
                  title='Multi-Perceptron Loss')
fig.show()


grads_df = pd.DataFrame(grads)
params_df = pd.DataFrame(params)

fig = go.Figure()
for i in range(len(grads_df.columns)):
    fig.add_trace(go.Scatter(
        x=epochs, y=grads_df[i].abs(), name=f'gradient_{i}'))
fig.update_layout(template='plotly_dark',
                  title='Gradients (0-3: output layer) (4-11: the hidden layer)',)
fig.show()

fig = go.Figure()
for i in range(len(params_df.columns)):
    fig.add_trace(go.Scatter(x=epochs, y=params_df[i], name=f'Paremeters_{i}'))
fig.update_layout(template='plotly_dark',
                  title='Weigths (0-1, 3-6) and Biases(2, 7-8)',)
fig.show()


---

Return to the [castle](https://github.com/Nkluge-correa/teeny-tiny_castle).
