# XOR neural network

This notebook is to serve as a practice for implementing a feed forward neural network and training it with backpropagation algorithm. It is a well known fact that it takes at least two layers to get the network to learn XOR (exclusive or), as the 4 data point for XOR is not linearly separable and on elayer networks are in fact linear classifiers.

* Define the weights for each layer and activation ReLU on the hidden layer and sigmoid on the final layer. No fancy implemenation, just basic. 

* Implememt a feed forward pass with the XOR "data set"

* Implement backopropagation algorithm

* Train the network and see that it gets the XOR correct


### Make a scratch network with 8 units in the first layer and one sigmoid activated ouput

In [164]:
import numpy as np
import math

# Note that in this setting, the data points are organized in columns instead of rows
X = np.array([[0,0, 1, 1], [0, 1, 0, 1]])
Y = np.array([[0, 1, 1, 0]])
print(np.vstack((X,Y)))

[[0 0 1 1]
 [0 1 0 1]
 [0 1 1 0]]


In [165]:
# each column rrpresents the 3 weights of one of 8 units.
W1 = np.random.normal(loc=0.0, scale=1.0, size=(3,8))

# and here we have 9 weights corresponding to one last output unit
W2 = np.random.normal(loc=0.0, scale=1.0, size=(8,1))

def ReLU(x):
    return max(0,x)

def ReLU_grad(x):
    if x <= 0:
        return 0
    else:
        return 1

def sigmoid(x):
    return (1/(1+np.exp(-x)))

def sigmoid_grad(x):
    return sigmoid(x)*(1 - sigmoid(x))

def loss(guess, y):
    return  (- y * np.log(guess) - (1 - y) * np.log(1 - guess)).sum()


def loss_grad(guess, y):
    return - y * (1/guess) + (1 - y) * (1 / (1 - guess)) 


### Make feed forward for our neural network

In [166]:
def feed_forward(X, W1, W2):
    results = []
    new_X = np.vstack((X, np.ones((1,4))))
    z1 = np.matmul(new_X.T, W1)
    a1 = np.vectorize(lambda z : ReLu(z))(z1)
    z2 = np.matmul(a1, W2)
    a2 = np.vectorize(lambda z: sigmoid(z))(z2)
    return (z1, a1, z2, a2)
    
initial_guess = feed_forward(X, W1, W2)[-1]


### Implement back propagation with weights update

In [167]:
def backprop(X, W1, W2, Y,  loss, loss_grad, step, max_iter = 100):
    new_X = np.vstack((X, np.ones((1,4))))

    for _ in range(max_iter):
        z1, a1, z2, a2 = feed_forward(X, W1, W2)
        W2 = W2 - step *\
            np.matmul(a1.T, np.matmul(np.diag(np.vectorize(lambda z: sigmoid(z)*(1-sigmoid(z)))(z2).reshape((4,))), loss_grad(a2 , Y.T)))
        L = []
        for i in range(4): # how to vectorize this?
            a = np.diag(np.vectorize(lambda z: ReLU_grad(z))(z1[i,:]))
            c = sigmoid_grad(z2[i,:])
            L.append(loss_grad(a2[i,:],(Y.T)[i,:])*c*np.matmul(a, W2))

        W1 = W1 - step * np.matmul(new_X, np.array(L).reshape(4,8))
    return W1, W2



### Train the network and see if it get's XOR correct

In [172]:
# Fit weights:
iterations = 1000
A,B = backprop(X, W1, W2, Y,  loss, loss_grad, 0.05, max_iter = iterations)

#Make predictions with the fitted weights
pred = feed_forward(X, A, B)[-1]


print("The loss went from {} at random initialization to {} after \
running back propagtion graident descent for {} iterations".format(loss(initial_guess, Y.T), loss(pred, Y.T), iterations))


def check(pred,y):
    if (pred<0.5 and y==0) or (pred>=0.5 and y==1):
        return 0
    else:
        return 1


def check_correctness(prediction, Y):
    prediction_ = prediction.T.reshape((max(pred.shape),))
    Y_ = Y.reshape((max(Y.shape),))
    assert prediction_.shape == Y_.shape
    L = list(zip(prediction_,Y_))
    return  np.vectorize(lambda x: check(x[0], x[1]))(np.array(L,dtype="f,f")).sum()
    

original_prediction_mistakes = check_correctness(initial_guess, Y)
fitted_prediction_mistakes = check_correctness(pred, Y)

print()
print('Random initialization gave {} mistakes. After fitting we have {} wrongly classified points'\
      .format(original_prediction_mistakes, fitted_prediction_mistakes))

print()
print(np.hstack((pred, Y.T)))

The loss went from 5.1556868134395195 at random initialization to 0.01640712130851201 after running back propagtion graident descent for 1000 iterations

Random initialization gave 2 mistakes. After fitting we have 0 wrongly classified points

[[0.00313858 0.        ]
 [0.9972922  1.        ]
 [0.99383346 1.        ]
 [0.00435699 0.        ]]
