In this notebook I'm going to demonstrate how to construct a machine learning model (a neural net) learns to do binary operations such as AND, OR, and XOR.

This example only shows the model that does XOR operation, beacuse one can train the AND or OR operation model simply by 1) using the same neural net architechture that is used to train the XOR model and change the labels of the training dataset, or 2) creating a neural net without a hidden layer.

The training dataset is simply the XOR truth table that looks like this:

In [1]:
%%latex 
\begin{array}{|c|c|c|}
\hline 
A & B & Output \\\hline
0 & 0 & 0 \\\hline
0 & 1 & 1 \\\hline
1 & 0 & 1 \\\hline
1 & 1 & 0 \\\hline
\end{array}

<IPython.core.display.Latex object>

One can see that the outputs of XOR operation is not easily seperated by a straight line, such as the AND and OR operation. The model architecture that would be used in this example is the simplest type of a neural net that has a hidden layer between the input layer and the output layer. This hidden layer allows the model to learn a rule to draw a complex boundary between the two classes (True,False). One can increase the number of layers or the number of nodes in the hidden layer as one desires and achieve the same goal. Instead of wasting computational resources, this example only uses one hidden layer with 2 nodes.

In [2]:
import numpy as np
from sklearn.metrics import mean_squared_error
np.random.seed(99)
def sigmoid(z):
    return 1/(1+np.exp(-z))
def sigmoid_d(x):
    return sigmoid(x)*(1-sigmoid(x))
def MSE(pred,target):
    return mean_squared_error(pred,target)

# Define train dataset: X with labels y
X = np.array([[0,0],[0,1],[1,0],[1,1]])
y = np.array([[0],[1],[1],[0]])
# Preprocessing
X = X.T

#Define model architecture: 
unit_l1= 2 #nodes in layer1(hidden layer)
unit_l2= 1 #nodes in layer2(output)

#weights and intercepts
W1=np.random.rand(unit_l1,X.shape[0])
W2=np.random.rand(unit_l2,unit_l1)
b1 = np.random.rand(unit_l1,1)
b2 = np.random.rand(unit_l2,1)

#learning rate
lr = 0.1

In [3]:
m = y.shape[1]
costs=[]
for iters in range(20000):
    #Forward path
    Z1 = np.matmul(W1,X) + b1
    A1 = sigmoid(Z1)
    Z2 = np.matmul(W2,A1) + b2
    A2 = sigmoid(Z2)
    
    #Calculating cost
    cost = MSE(A2,y.T)
    
    #Backward path
    MSE_d = A2 - y.T
    d_Z2 = MSE_d * sigmoid_d(Z2)
    d_W2 = np.matmul(d_Z2,A1.T)
    d_b2 = np.sum(d_Z2,axis=1,keepdims=True)
    
    d_A1 = np.matmul(W2.T,d_Z2)
    d_Z1 = d_A1 * sigmoid_d(Z1)
    d_W1 = np.matmul(d_Z1,X.T)
    d_b1 = np.sum(d_Z1,axis=1,keepdims=True)
    
    #Update
    W2 -= d_W2 * lr
    b2 -= d_b2 * lr
    W1 -= d_W1 * lr
    b1 -= d_b1 * lr
    if iters % 5000 == 0:
        costs.append(cost)
        print ("Cost per iters %i: %f" %(iters, cost))


Cost per iters 0: 0.385695
Cost per iters 5000: 0.054040
Cost per iters 10000: 0.003333
Cost per iters 15000: 0.001549


Evaluation

In [4]:
test_set = np.array([[1,0]]).T
output_ = sigmoid(np.matmul(W2,sigmoid(np.matmul(W1,test_set) + b1))+ b2)
print(output_)

[[0.97002708]]
