<a href="https://colab.research.google.com/github/anirbanmukherjee2709/tsai_end2.0_Session_1/blob/main/Session_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import torch
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
torch.manual_seed(2)

<torch._C.Generator at 0x7f2687b818b0>

In [2]:
X = torch.Tensor([[0,0], [0,1], [1,0], [1,1]])
Y = torch.Tensor([0, 1, 1, 0]).view(-1,1)

**Removing/commenting out the last activation function.**

In [3]:
class XOR(nn.Module):
    def __init__(self, input_dim= 2, output_dim= 1):
        super(XOR, self).__init__()
        self.lin1 = nn.Linear(input_dim, 5)
        self.lin2 = nn.Linear(5, output_dim)
    
    def forward(self, x):
        x = self.lin1(x)
        x = F.tanh(x)
        x = self.lin2(x)
        # x = F.tanh(x) # removing the last activation
        return x

In [4]:
model = XOR()
print(model)
from torchsummary import summary
summary(model, (2, 2))

XOR(
  (lin1): Linear(in_features=2, out_features=5, bias=True)
  (lin2): Linear(in_features=5, out_features=1, bias=True)
)
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Linear-1                 [-1, 2, 5]              15
            Linear-2                 [-1, 2, 1]               6
Total params: 21
Trainable params: 21
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.00
----------------------------------------------------------------




**Calculating model with exactly 44 parameters, including weights and biases**

In [5]:
display(' 1 hidden layer With Bias')
display([f'in: {i}, hidden_1: {j}, out: {k}' for i in range(1, 10) for j in range(1, 10) for k in range(1, 10) if j*(1+i) + k*(1+j) == 44 and k < i and k < j])

display(' 1 hidden layer Without Bias')
display([f'in: {i}, hidden_1: {j}, out: {k}' for i in range(1, 10) for j in range(1, 10) for k in range(1, 10) if j*i + k*j == 44 and k < i and k < j])

' 1 hidden layer With Bias'

['in: 3, hidden_1: 7, out: 2', 'in: 4, hidden_1: 6, out: 2']

' 1 hidden layer Without Bias'

['in: 8, hidden_1: 4, out: 3', 'in: 9, hidden_1: 4, out: 2']

There are no possible combinations to have 44 parameters with 2 input neurons and 1 output neuron and only 1 single hidden layer

Now trying with 2 hidden layers

In [6]:
display(' 2 hidden layer')
display([f'in: {2}, hidden_1: {j}, hidden_2: {k}, Out: {1}' for j in range(1, 10) for k in range(1, 10) if j*(1+2) + k*(1+j) + 1*(1+k) == 44])

' 2 hidden layer'

['in: 2, hidden_1: 5, hidden_2: 4, Out: 1']

There is only one possible combinations to have 44 parameters with 2 input neurons and 1 output neuron and only 2 hidden layers. Hence, we implement the same

In [7]:
class XOR(nn.Module):
    def __init__(self, input_dim= 2, output_dim= 1):
        super(XOR, self).__init__()
        self.lin1 = nn.Linear(input_dim, 5)
        self.lin2 = nn.Linear(5, 4)
        self.lin3 = nn.Linear(4, output_dim)
    
    def forward(self, x):
        x = self.lin1(x)
        x = torch.tanh(x)
        x = self.lin2(x)
        x = torch.tanh(x)
        x = self.lin3(x)
        # x = torch.tanh(x) # removing the last activation
        return x

In [8]:
model = XOR()
print(model)
from torchsummary import summary
summary(model, (2, 2))

XOR(
  (lin1): Linear(in_features=2, out_features=5, bias=True)
  (lin2): Linear(in_features=5, out_features=4, bias=True)
  (lin3): Linear(in_features=4, out_features=1, bias=True)
)
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Linear-1                 [-1, 2, 5]              15
            Linear-2                 [-1, 2, 4]              24
            Linear-3                 [-1, 2, 1]               5
Total params: 44
Trainable params: 44
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.00
----------------------------------------------------------------


We can see that the total number of parameters  (including weights and biases) is 44.

Another way to do this can be, if we we remove the bias of the output layer.

In [9]:
class XOR(nn.Module):
    def __init__(self, input_dim= 2, output_dim= 1):
        super(XOR, self).__init__()
        self.lin1 = nn.Linear(input_dim, 11)
        self.lin2 = nn.Linear(11, output_dim,bias = False)
    
    def forward(self, x):
        x = self.lin1(x)
        x = F.tanh(x)
        x = self.lin2(x)
        # x = F.tanh(x) # removing the last activation
        return x

In [10]:
model = XOR()
print(model)
from torchsummary import summary
summary(model, (2, 2))

XOR(
  (lin1): Linear(in_features=2, out_features=11, bias=True)
  (lin2): Linear(in_features=11, out_features=1, bias=False)
)
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Linear-1                [-1, 2, 11]              33
            Linear-2                 [-1, 2, 1]              11
Total params: 44
Trainable params: 44
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.00
----------------------------------------------------------------




As we can see, there are 44 parameters.

In [11]:
def weights_init(model):
    for m in model.modules():
        if isinstance(m, nn.Linear):
            # initialize the weight tensor, here we use a normal distribution
            m.weight.data.normal_(0, 1)

weights_init(model)

In [12]:
loss_func = nn.L1Loss()

In [13]:
optimizer = optim.SGD(model.parameters(), lr=0.02, momentum=0.9)

In [14]:
epochs = 2001
steps = X.size(0)
for i in range(epochs):
    for j in range(steps):
        data_point = np.random.randint(X.size(0))
        x_var = Variable(X[data_point], requires_grad=False)
        y_var = Variable(Y[data_point], requires_grad=False)
        
        optimizer.zero_grad()
        y_hat = model(x_var)
        loss = loss_func.forward(y_hat, y_var)
        loss.backward()
        optimizer.step()
        
    if i % 50 == 0:
        print( "Epoch: {0}, Loss: {1}, ".format(i + 1, loss.data.numpy()))

Epoch: 1, Loss: 1.198080062866211, 
Epoch: 51, Loss: 0.11336255073547363, 




Epoch: 101, Loss: 0.04942166805267334, 
Epoch: 151, Loss: 0.08608889579772949, 
Epoch: 201, Loss: 0.29695039987564087, 
Epoch: 251, Loss: 0.20475217700004578, 
Epoch: 301, Loss: 0.6133637428283691, 
Epoch: 351, Loss: 0.4661521911621094, 
Epoch: 401, Loss: 0.04556477069854736, 
Epoch: 451, Loss: 0.33332371711730957, 
Epoch: 501, Loss: 0.2566060423851013, 
Epoch: 551, Loss: 0.11660987138748169, 
Epoch: 601, Loss: 0.28267720341682434, 
Epoch: 651, Loss: 0.010149598121643066, 
Epoch: 701, Loss: 0.2854008674621582, 
Epoch: 751, Loss: 0.22045516967773438, 
Epoch: 801, Loss: 0.03141164779663086, 
Epoch: 851, Loss: 0.08384227752685547, 
Epoch: 901, Loss: 0.19410449266433716, 
Epoch: 951, Loss: 0.3587958812713623, 
Epoch: 1001, Loss: 0.19532275199890137, 
Epoch: 1051, Loss: 0.08159404993057251, 
Epoch: 1101, Loss: 0.053788185119628906, 
Epoch: 1151, Loss: 0.2324419617652893, 
Epoch: 1201, Loss: 0.1696997880935669, 
Epoch: 1251, Loss: 0.023647665977478027, 
Epoch: 1301, Loss: 0.1612512469291687,