[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/your-repo/Merged_XOR_Notebook.ipynb)

> Click the badge above to open this notebook directly in Google Colab.

# XOR Learning with Neural Networks

In this notebook, we will explore how a neural network can learn the XOR function, a classic problem in the history of artificial intelligence. The XOR (exclusive OR) problem is not linearly separable, which means a simple perceptron cannot solve it. This motivated the development of multi-layer neural networks.

We will study two approaches:
- **Part 1: Using built-in backpropagation** (leveraging modern libraries for automatic differentiation)
- **Part 2: Implementing manual backpropagation** (to understand the math and mechanics behind the learning process)

By the end of this notebook, you should gain intuition about how neural networks learn non-linear decision boundaries.

# Part 1: XOR with Built-in Backpropagation

In [None]:
# This notebook implements a neural net to apprximate XOR function using PyTorch
from __future__ import print_function
import numpy as np
import torch
import torch.nn as nn
from torch.autograd import Variable

In [None]:
X = torch.tensor([[1.0,0.0,0.0,1.0],[0.0,0.0,1.0,1.0]],dtype=torch.float32) # 2x4 matrix
X = torch.transpose(X,0,1)
Y = torch.tensor([[1.0,0.0,1.0,0.0]],dtype=torch.float32)                   # 1x4 vector
Y = torch.transpose(Y,0,1)
print("input: ", X)
print("output: ", Y)


input:  tensor([[1., 0.],
        [0., 0.],
        [0., 1.],
        [1., 1.]])
output:  tensor([[1.],
        [0.],
        [1.],
        [0.]])


In [None]:
# parameters of neural net
W1 = Variable(torch.torch.FloatTensor(2, 8).uniform_(-1, 1), requires_grad=True) # 2x8 matrix
b1 = Variable(torch.zeros((1,8)), requires_grad=True)                            # 1x8 matrix
W2 = Variable(torch.torch.FloatTensor(8, 1).uniform_(-1, 1), requires_grad=True) # 8x1 matrix
b2 = Variable(torch.zeros([1]), requires_grad=True)                              # scalar

learning_rate = 0.05
optimizer = torch.optim.SGD([W1, b1, W2, b2], lr=learning_rate, momentum=0.9)    # Torch optimizer

loss_fn = torch.nn.MSELoss() # Eclidean loss function

for step in range(10000):

  # forward pass
  Z1 = torch.mm(X,W1)    # 4x8 matrix
  Z2 = Z1 + b1           # 4x8 matrix
  Z3 = torch.sigmoid(Z2) # 4x8 matrix
  Z4 = torch.mm(Z3,W2)   # 4x1 vector
  Z5 = Z4 + b2           # 4x1 vector
  Yp = torch.sigmoid(Z5) # 4x1 vector

  # backward pass
  optimizer.zero_grad()  # zero out previous gradients
  loss = loss_fn(Yp,Y)   # compute loss
  loss.backward()        # calculate gradients
  #Yp.backward(Yp-Y)     # or, apply gradient of loss at Yp!
  #Z5.backward(Yp*(1.0-Yp)*(Yp-Y)) # or, apply gradient of Yp at Z5!
  optimizer.step()       # apply new gradients

  if step%1000 == 0:
    print("loss:",loss.item())

print(Yp)
print(Y)

loss: 0.26149123907089233
loss: 0.12390542030334473
loss: 0.009003173559904099
loss: 0.003408163320273161
loss: 0.0019757780246436596
loss: 0.0013575187185779214
loss: 0.0010208551539108157
loss: 0.0008116937824524939
loss: 0.000670215580612421
loss: 0.0005686600343324244
tensor([[0.9772],
        [0.0169],
        [0.9786],
        [0.0266]], grad_fn=<SigmoidBackward0>)
tensor([[1.],
        [0.],
        [1.],
        [0.]])


## Let's do the same using PyTorch's nn.Module

In [None]:
# Define a neural net architecture
class XORNet(nn.Module):
    def __init__(self):
        super(XORNet, self).__init__()
        self.fc1 = nn.Linear(2, 8)
        self.fc = nn.Linear(8, 15)
        self.fc2 = nn.Linear(15, 1)

    # forward pass of the neural net
    def forward(self, X):
        #return torch.sigmoid(self.fc2(torch.sigmoid(self.fc1(X))))
        return torch.sigmoid(self.fc2(torch.relu(self.fc(torch.sigmoid(self.fc1(X))))))

In [None]:
learning_rate = 0.05

xornet = XORNet()

optimizer2 = torch.optim.SGD(xornet.parameters(), lr=learning_rate, momentum=0.9) # Torch optimizer

loss_fn = torch.nn.MSELoss() # Eclidean loss function

for step in range(10000):

  # forward pass
  Yp = xornet(X)

  # backward pass
  optimizer2.zero_grad()  # zero out previous gradients
  loss = loss_fn(Yp,Y)   # compute loss
  loss.backward()        # calculate gradients
  optimizer2.step()       # adjust parameters

  # diagnostics
  if step%1000 == 0:
    print("loss:",loss.item())

print(Yp)
print(Y)

loss: 0.2528023421764374
loss: 0.0015943795442581177
loss: 0.0003419781569391489
loss: 0.0001764642511261627
loss: 0.00011544035805854946
loss: 8.45494942041114e-05
loss: 6.612023571506143e-05
loss: 5.397929999162443e-05
loss: 4.542222450254485e-05
loss: 3.908346116077155e-05
tensor([[0.9941],
        [0.0048],
        [0.9940],
        [0.0066]], grad_fn=<SigmoidBackward0>)
tensor([[1.],
        [0.],
        [1.],
        [0.]])


In [None]:
print(W1)

tensor([[ 0.3506, -0.9267,  0.0768,  0.9164, -1.2467, -6.2091, -2.5248,  4.7537],
        [ 0.9317, -2.4585,  1.1709,  0.8998,  3.2911, -6.0572, -2.8577, -3.4503]],
       requires_grad=True)


# Part 2: XOR with Manual Backpropagation

In [None]:
# This notebook implements a neural net to apprximate XOR function using PyTorch
from __future__ import print_function
import numpy as np
import torch
from torch.autograd import Variable

In [None]:
X = torch.tensor([[1.0,0.0,0.0,1.0],[0.0,0.0,1.0,1.0]],dtype=torch.float32)
X = torch.transpose(X,0,1) # 4x2 matrix
Y = torch.tensor([[1.0,0.0,1.0,0.0]],dtype=torch.float32)
Y = torch.transpose(Y,0,1) # 4x1 vector
print("input: ", X)
print("output: ", Y)


input:  tensor([[1., 0.],
        [0., 0.],
        [0., 1.],
        [1., 1.]])
output:  tensor([[1.],
        [0.],
        [1.],
        [0.]])


In [None]:
# parameters of neural net
W1 = Variable(torch.torch.FloatTensor(2, 8).uniform_(-1, 1), requires_grad=True) # 2x8 matrix
b1 = Variable(torch.zeros((1,8)), requires_grad=True)                            # 1x8 matrix
W2 = Variable(torch.torch.FloatTensor(8, 1).uniform_(-1, 1), requires_grad=True) # 8x1 matrix
b2 = Variable(torch.zeros([1]), requires_grad=True)                              # scalar

learning_rate = 0.5

for step in range(10000):

  # forward pass
  Z1 = torch.mm(X,W1)    # 4x8 matrix
  Z2 = Z1 + b1           # 4x8 matrix
  Z3 = torch.sigmoid(Z2) # 4x8 matrix
  Z4 = torch.mm(Z3,W2)   # 4x1 vector
  Z5 = Z4 + b2           # 4x1 vector
  Yp = torch.sigmoid(Z5) # 4x1 vector

  # backward pass
  dYp = Yp-Y # 4x1 vector
  dZ5 = torch.sigmoid(Z5)*(1.0-torch.sigmoid(Z5))*dYp # 4x1 vector
  dZ4 = dZ5  # 4x1 vector
  dZ3 = torch.mm(dZ4,torch.transpose(W2,0,1)) # 4x8 matrix
  dZ2 = torch.sigmoid(Z2)*(1.0-torch.sigmoid(Z2))*dZ3 # 4x8 matrix
  dZ1 = dZ2 # 4x8 matrix

  dW1 = torch.mm(torch.transpose(X,0,1),dZ1)
  db1 = torch.sum(dZ2,0,True)
  dW2 = torch.mm(torch.transpose(Z3,0,1),dZ4)
  db2 = torch.sum(dZ5)

  # adjust parameters by gradient descent
  W1 = W1 - learning_rate*dW1
  b1 = b1 - learning_rate*db1
  W2 = W2 - learning_rate*dW2
  b2 = b2 - learning_rate*db2

  if step%1000 == 0:
    loss = torch.sum((Yp-Y)**2)
    print("loss:",loss.item())

print(Yp.data)
print(Y)

loss: 1.0379252433776855
loss: 0.11125476658344269
loss: 0.013625452294945717
loss: 0.005919770337641239
loss: 0.0035672527737915516
loss: 0.0024878974072635174
loss: 0.0018834633519873023
loss: 0.0015023509040474892
loss: 0.0012423645239323378
loss: 0.0010547919664531946
tensor([[0.9855],
        [0.0144],
        [0.9847],
        [0.0161]])
tensor([[1.],
        [0.],
        [1.],
        [0.]])


# Conclusion

In this notebook, we explored the XOR problem using two different approaches:

- **Built-in backpropagation**: We leveraged modern deep learning libraries to train a neural network quickly and efficiently. This demonstrates the power and convenience of automatic differentiation and high-level abstractions.

- **Manual backpropagation**: We implemented the learning process step by step, giving us insight into how gradients are calculated and how weights are updated. This approach reinforces the mathematical foundations of neural networks.

The XOR problem illustrates why non-linear activation functions and multiple layers are essential for solving tasks that cannot be handled by a single-layer perceptron. By comparing these two approaches, we see both the **practical advantages** of using libraries and the **educational value** of working through the details manually.

This exercise provides a foundation for tackling more complex problems in machine learning and deep learning.