<a href="https://colab.research.google.com/github/anuraglamsal/Random-Algorithms/blob/main/XORnn_torch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [72]:
import torch

Defining the model class.

In [73]:
class XOR_model(torch.nn.Module): # Inherits from "Module"

  def __init__(self): # We define the components of our architecture here.
    super(XOR_model, self).__init__()

    self.linear1 = torch.nn.Linear(2, 2) # Defining the hidden layer. Takes in an input vector with 2 values and has two neurons.
    self.activation = torch.nn.Sigmoid() # Defining the activation function we'll be using. Here, we'll use sigmoid. A classic differentiable activation for backprop.
    self.linear2 = torch.nn.Linear(2, 1) # Defining our output layer. Takes in 2 values from the hidden layer and outputs, well, the output which is a single value ofc.

  def forward(self, x): # Defining the computations to be performed in a forward pass.
    x = self.linear1(x) # Weighted sums of the inputs in the neurons of the hidden layer.
    x = self.activation(x) # Apply activations to the outputs of the hidden neurons.
    x = self.linear2(x) # Weighted sum in the output neuron.
    x = self.activation(x) # Apply activation to the output of the output neuron to get the final output.
    return x

* Our data should be tensors.
* The params are floats and the output of the network is also going to contain float values, so make the inputs and target floats too.
* momentum: https://paperswithcode.com/method/sgd-with-momentum

In [74]:
inputs = torch.tensor([[0., 0.], [0., 1.], [1., 0.], [1., 1.]])
target = torch.tensor([0., 1., 1., 0.]).unsqueeze(1) # Adding an additional dimension to the tensor. For some reason, the model output has an extra dimension, and the target dimension and output dimension should match in pytorch, therefore needing to "unsqueeze".
model = XOR_model() # You work with an object of the model class in order to get predictions, pass model params to the optimizer, etc.
loss_fn = torch.nn.BCELoss() # We'll be using the binary cross-entropy loss function.
optimizer = torch.optim.SGD(model.parameters(), lr = 0.01, momentum = 0.9) # Defining our optimizer (stochastic gradient descent) by passing the model params, defining learning rate, etc.

Training the model.

In [75]:
for epoch in range(10000):
  optimizer.zero_grad() # https://stackoverflow.com/questions/48001598/why-do-we-need-to-call-zero-grad-in-pytorch
  outputs = model(inputs) # getting the model to predict.
  loss = loss_fn(outputs, target) # calculating loss.
  loss.backward() # Calculating the gradient. Gradient is simply a collection of partial derivaties of the loss fn w.r.to each param. We'll use these partial derivaties to change the param values such that we go towards minima. Look up gradient descent to understandt his stuff.
  optimizer.step() # Performing the param updates.

  if epoch % 1000 == 0: # Just printing out some losses.
    print(loss.item())

0.6934019327163696
0.6880759596824646
0.6436037421226501
0.3797834515571594
0.11400820314884186
0.057247456163167953
0.037153907120227814
0.027235660701990128
0.02140078693628311
0.017581649124622345


It seems that around 10000 epochs is good enough to the desired result here. You can round the model's output to get the actual binary outputs.

In [76]:
print(torch.round(model(inputs)))

tensor([[0.],
        [1.],
        [1.],
        [0.]], grad_fn=<RoundBackward0>)
