<a href="https://colab.research.google.com/github/DotSlash-A/Pytorch/blob/main/surprise_test.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import torch

**Q.** The diagram below shows a neural network used for classification problem. The network contains two hidden layers and one output layer. The input to the network is a column vector $\textbf{x} \in \mathbb{R}^{3}$. The first layer contains 3 neurons, the second hidden layer contains 3 neurons and the output layer contains 3 neurons. Each neuron in the $l^{th}$ layer is connected to all the neurons in the $(l + 1)^{th}$ layer. Each neuron has a bias connected to it. (not shown in the figure)

![](https://backend.seek.onlinedegree.iitm.ac.in/22t3_cs3004/assets/img/W3GA1.png)

All the neurons in the hidden layer use Sigmoid activation function and the neurons in the output layer uses Softmax function. Assume that the network uses cross entropy loss (use natural log)


**Q. 1:** How many learnable parameters?

**A:** $3 \times 3 + 3 \times 3 + 3 \times 3 + 3 + 3 + 3 = 9 + 9 + 9 + 9 = 36$

In [None]:
w1 = torch.tensor([
  [0.5488135, 0.71518937, 0.60276338],
  [0.54488318, 0.4236548, 0.64589411],
  [0.43758721, 0.891773, 0.96366276]
], dtype = torch.float64, requires_grad = True)

w2 = torch.tensor([
  [0.56804456, 0.92559664, 0.07103606],
  [0.0871293, 0.0202184, 0.83261985],
  [0.77815675, 0.87001215, 0.97861834]
], dtype = torch.float64, requires_grad = True)

w3 = torch.tensor([
  [0.11827443, 0.63992102, 0.14335329],
  [0.94466892, 0.52184832, 0.41466194],
  [0.26455561, 0.77423369, 0.45615033]
], dtype = torch.float64, requires_grad = True)

In [None]:
b1 = torch.tensor([0.38344152, 0.79172504, 0.52889492], dtype = torch.float64, requires_grad = True)
b2 = torch.tensor([0.79915856, 0.46147936, 0.78052918], dtype = torch.float64, requires_grad = True)
b3 = torch.tensor([0.56843395, 0.0187898, 0.6176355], dtype = torch.float64, requires_grad = True)

In [None]:
x = torch.tensor([1.0, 0.0, 1.0], dtype = torch.float64)
y = torch.tensor([0, 0, 1], dtype = torch.float64)

**Q. 2:** What is the sum of elements of output $\textbf{a}_{1}?$

In [None]:
a1=w1@x+b1

In [None]:
a1.sum()

tensor(5.4477, dtype=torch.float64, grad_fn=<SumBackward0>)

In [None]:
a1 = w1 @ x + b1
print(f'Sum of elements of a1: {torch.sum(a1):.2f}')

Sum of elements of a1: 5.45


**Q. 3:** What is the sum of elements of output $\textbf{h}_{1}?$

In [None]:
h1=torch.sigmoid(a1)

In [None]:
h1.sum()

tensor(2.5750, dtype=torch.float64, grad_fn=<SumBackward0>)

In [None]:
# Hidden layers use sigmoid as activation
h1 = torch.sigmoid(a1)
print(f'Sum of elements of h1: {torch.sum(h1):.2f}')

Sum of elements of h1: 2.57


**Q:** The sum of elements of $[\textbf{a}_{2}, \textbf{h}_{2}, \textbf{a}_{3}]$, respectively are $[6.4, 2.63, 4.87]$. What is the loss value?

In [None]:
a2=w2@h1+b2

In [None]:
a2

tensor([2.1421, 1.2780, 3.0400], dtype=torch.float64, grad_fn=<AddBackward0>)

In [None]:
h2=torch.sigmoid(a2)

In [None]:
a3=w3@h2+b3

In [None]:
loss = torch.nn.functional.cross_entropy(a3.view(1, -1), torch.argmax(y).view(1))

In [None]:
loss

tensor(0.8564, dtype=torch.float64, grad_fn=<NllLossBackward0>)

In [None]:
array_given_in_question = []

a2 = w2 @ h1 + b2
array_given_in_question.append(torch.sum(a2).item())

h2 = torch.sigmoid(a2)
array_given_in_question.append(torch.sum(h2).item())

a3 = w3 @ h2 + b3
array_given_in_question.append(torch.sum(a3).item())

# Loss function used here is cross entropy, it applies the softmax on the output layer
loss = torch.nn.functional.cross_entropy(a3.view(1, -1), torch.argmax(y).view(1))
print(f'Array given in the question: {array_given_in_question}')
print(f'Loss: {loss.item():.2f}', end = '')

Array given in the question: [6.460166777022406, 2.6313930964242145, 4.874920995857018]
Loss: 0.86

**Q:** What is the vector that corresponds to $\nabla_{a_{3}}\mathscr{L}(\theta)$

In [None]:
torch.autograd.grad(loss,a3,retain_graph=True)[0]

tensor([ 0.2369,  0.3384, -0.5753], dtype=torch.float64)

In [None]:
d_loss_d_a3 = torch.autograd.grad(loss, a3, retain_graph = True)[0]
display(d_loss_d_a3)

tensor([ 0.2369,  0.3384, -0.5753], dtype=torch.float64)

**Q:** We know that after computing gradients, we update the values of $\textbf{b}_{2}$ by subtracting its gradient, as shown below \\
$$
\textbf{b}_{2} - \eta\nabla_{b_{2}}\mathscr{L}(\theta)
$$
Which of the following is gradient vector of $\textbf{b}_{2}$

In [None]:
torch.autograd.grad(loss,b2,retain_graph=True)[0]

tensor([ 0.0184, -0.0200, -0.0038], dtype=torch.float64)

**Q:** Update all the parameters with calculated gradients. Forward propagate through the network. What is the new loss value? (Take $\eta = 1$)

In [None]:
dw1 = torch.autograd.grad(loss, w1, retain_graph = True)[0]
dw2 =torch.autograd.grad(loss,w2,retain_graph = True)[0]
dw3 =torch.autograd.grad(loss,w3,retain_graph = True)[0]
db1=torch.autograd.grad(loss,b1,retain_graph = True)[0]
db2=torch.autograd.grad(loss,b2,retain_graph = True)[0]
db3=torch.autograd.grad(loss,b3,retain_graph = True)[0]

$\theta_{\text{new}} = \theta_{\text{old}} - \eta \nabla\mathscr{L}(\theta)$

In [None]:
w1=w1-dw1
w2=w2-dw2
w3=w3-dw3
b1=b1-db1
b2=b2-db2
b3=b3-db3

In [None]:
a1=w1@x+b1
h1 = torch.sigmoid(a1)
a2=w2@h1+b2
h2= torch.sigmoid(a2)
a3=w3@h2+b3


In [None]:
loss = torch.nn.functional.cross_entropy(a3.view(1, -1), torch.argmax(y).view(1))

In [None]:
loss

tensor(0.0725, dtype=torch.float64, grad_fn=<NllLossBackward0>)

In [None]:
# lr = 1.0

# dw1 = torch.autograd.grad(loss, w1, retain_graph = True)[0]
# dw2 = torch.autograd.grad(loss, w2, retain_graph = True)[0]
# dw3 = torch.autograd.grad(loss, w3, retain_graph = True)[0]
# db1 = torch.autograd.grad(loss, b1, retain_graph = True)[0]
# db2 = torch.autograd.grad(loss, b2, retain_graph = True)[0]
# db3 = torch.autograd.grad(loss, b3, retain_graph = True)[0]

# w1 = w1 - lr * dw1
# w2 = w2 - lr * dw2
# w3 = w3 - lr * dw3
# b1 = b1 - lr * db1
# b2 = b2 - lr * db2
# b3 = b3 - lr * db3

# a1 = w1 @ x + b1
# h1 = torch.sigmoid(a1)

# a2 = w2 @ h1 + b2
# h2 = torch.sigmoid(a2)

# a3 = w3 @ h2 + b3
# # h3 = torch.nn.functional.softmax(a3, dim = 0)

# new_loss = torch.nn.functional.cross_entropy(a3.view(1, -1), torch.argmax(y).view(1))

# display(new_loss.item())

In [None]:
# w1 = torch.tensor([
#   [0.5488135, 0.71518937, 0.60276338],
#   [0.54488318, 0.4236548, 0.64589411],
#   [0.43758721, 0.891773, 0.96366276]
# ], dtype = torch.float64, requires_grad = True)
# w2 = torch.tensor([
#   [0.56804456, 0.92559664, 0.07103606],
#   [0.0871293, 0.0202184, 0.83261985],
#   [0.77815675, 0.87001215, 0.97861834]
# ], dtype = torch.float64, requires_grad = True)

# w3 = torch.tensor([
#   [0.11827443, 0.63992102, 0.14335329],
#   [0.94466892, 0.52184832, 0.41466194],
#   [0.26455561, 0.77423369, 0.45615033]
# ], dtype = torch.float64, requires_grad = True)

In [None]:
# b1 = torch.tensor([0.38344152, 0.79172504, 0.52889492], dtype = torch.float64, requires_grad = True)
# b2 = torch.tensor([0.79915856, 0.46147936, 0.78052918], dtype = torch.float64, requires_grad = True)
# b3 = torch.tensor([0.56843395, 0.0187898, 0.6176355], dtype = torch.float64, requires_grad = True)

In [None]:
# x = torch.tensor([1.0, 0.0, 1.0], dtype = torch.float64)
# y = torch.tensor([0, 0, 1], dtype = torch.float64)

In [None]:
class NeuralNetwork(torch.nn.Module):
  def __init__(self):
    super(NeuralNetwork, self).__init__()
    self.fc1 = torch.nn.Linear(3, 3)
    self.fc2 = torch.nn.Linear(3, 3)
    self.fc3 = torch.nn.Linear(3, 3)

  def forward(self, x):
    x = torch.sigmoid(self.fc1(x))
    x = torch.sigmoid(self.fc2(x))
    x = self.fc3(x)

    return x

In [None]:
nn = NeuralNetwork()

nn.fc1.weight.data = w1
nn.fc1.bias.data = b1

nn.fc2.weight.data = w2
nn.fc2.bias.data = b2

nn.fc3.weight.data = w3
nn.fc3.bias.data = b3

In [None]:
loss_fn = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(nn.parameters(), lr = 1.0)

In [None]:
for epoch in range(1):
  optimizer.zero_grad()

  preds = nn(x)
  loss = loss_fn(preds.view(1, -1), torch.argmax(y).view(1))

  loss.backward() #replaced cell 22 with this
  optimizer.step()#replaced cell 23 with this

In [None]:
display(loss.item())

0.0725333007370964