<a href="https://colab.research.google.com/github/cjsutton77/PracticeProblems/blob/main/vanilla_neural_network.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

[![PracticeProbs](https://d33wubrfki0l68.cloudfront.net/b6800cc830e3fd5a3a4c3d9cfb1137e6a4c15c77/ec467/assets/images/transparent-1.png)](https://www.practiceprobs.com/)

# [Vanilla Neural Network](https://www.practiceprobs.com/problemsets/pytorch/tensors/vanilla-neural-network/)

Here's a "vanilla" 🍦 feed-forward neural network with [logistic activation functions](https://en.wikipedia.org/wiki/Logistic_function) and [softmax](https://en.wikipedia.org/wiki/Softmax_function) applied to the output layer.

<img src="https://www.practiceprobs.com/problemsets/pytorch/images/vanilla.png" width="800px">

Given the following `X` and `y` tensors representing training data,

In [1]:
import torch

# Make data
torch.manual_seed(4321)
X = torch.rand(size=(8,2))
y = torch.randint(low=0, high=3, size=(8,))

print(X)
# tensor([[0.1255, 0.5377],
#         [0.6564, 0.0365],
#         [0.5837, 0.7018],
#         [0.3068, 0.9500],
#         [0.4321, 0.2946],
#         [0.6015, 0.1762],
#         [0.9945, 0.3177],
#         [0.9886, 0.3911]])

print(y) 
# tensor([0, 2, 2, 0, 2, 2, 0, 1])

tensor([[0.1255, 0.5377],
        [0.6564, 0.0365],
        [0.5837, 0.7018],
        [0.3068, 0.9500],
        [0.4321, 0.2946],
        [0.6015, 0.1762],
        [0.9945, 0.3177],
        [0.9886, 0.3911]])
tensor([0, 2, 2, 0, 2, 2, 0, 1])


calculate:

1. the predictions (forward pass)
2. the loss using categorical cross entropy
3. the gradient of the loss with respect to the weights and biases

---

In [2]:
# Your brilliant solution here!

In [3]:
w1 = torch.tensor([[0.48,-0.43],[-0.51,-0.48]],requires_grad=True)
w1.size()

torch.Size([2, 2])

In [4]:
b1 = torch.tensor([[0.23,0.05]],requires_grad=True)
b1.size()

torch.Size([1, 2])

In [5]:
w2 = torch.tensor([[-0.99,0.36,-0.75],[-0.66,0.34,0.66]],requires_grad=True)
w2.size()

torch.Size([2, 3])

In [6]:
b2 = torch.tensor([[0.32,-0.44,0.7]],requires_grad=True)
b2.size()

torch.Size([1, 3])

In [7]:
def sigmoid(x):
  return 1/(1 + torch.exp(-x))

In [8]:
z1 = X @ w1 + b1
z1, z1.shape

(tensor([[ 0.0160, -0.2621],
         [ 0.5264, -0.2498],
         [ 0.1522, -0.5378],
         [-0.1073, -0.5379],
         [ 0.2872, -0.2772],
         [ 0.4289, -0.2932],
         [ 0.5454, -0.5301],
         [ 0.5051, -0.5628]], grad_fn=<AddBackward0>), torch.Size([8, 2]))

In [9]:
sigmoid(z1),sigmoid(z1).shape

(tensor([[0.5040, 0.4349],
         [0.6287, 0.4379],
         [0.5380, 0.3687],
         [0.4732, 0.3687],
         [0.5713, 0.4311],
         [0.6056, 0.4272],
         [0.6331, 0.3705],
         [0.6236, 0.3629]], grad_fn=<MulBackward0>), torch.Size([8, 2]))

In [10]:
w2,w2.shape

(tensor([[-0.9900,  0.3600, -0.7500],
         [-0.6600,  0.3400,  0.6600]], requires_grad=True), torch.Size([2, 3]))

In [11]:
z2 = sigmoid(z1) @ w2 + b2
z2,z2.shape

(tensor([[-0.4660, -0.1107,  0.6090],
         [-0.5914, -0.0648,  0.5175],
         [-0.4559, -0.1210,  0.5398],
         [-0.3918, -0.1443,  0.5884],
         [-0.5301, -0.0877,  0.5561],
         [-0.5615, -0.0767,  0.5278],
         [-0.5513, -0.0861,  0.4697],
         [-0.5369, -0.0921,  0.4718]], grad_fn=<AddBackward0>),
 torch.Size([8, 3]))

In [12]:
from torch import nn

In [13]:
loss = nn.CrossEntropyLoss()
softmax = nn.Softmax(1)

In [14]:
torch.exp(torch.tensor([-0.5440, -0.0693,  0.7010]))

tensor([0.5804, 0.9330, 2.0158])

In [15]:
torch.exp(torch.tensor([-0.5440, -0.0693,  0.7010])).sum()

tensor(3.5292)

In [16]:
torch.exp(torch.tensor([-0.5440, -0.0693,  0.7010]))/torch.exp(torch.tensor([-0.5440, -0.0693,  0.7010])).sum()

tensor([0.1645, 0.2644, 0.5712])

In [17]:
softmax(z2)

tensor([[0.1867, 0.2663, 0.5470],
        [0.1747, 0.2958, 0.5295],
        [0.1959, 0.2738, 0.5303],
        [0.2022, 0.2590, 0.5388],
        [0.1812, 0.2820, 0.5368],
        [0.1787, 0.2902, 0.5311],
        [0.1863, 0.2966, 0.5171],
        [0.1886, 0.2943, 0.5171]], grad_fn=<SoftmaxBackward0>)

In [18]:
l = loss(softmax(z2),y)
l

tensor(1.0681, grad_fn=<NllLossBackward0>)

In [19]:
l.backward()

In [20]:
print(f'{w1.grad}\n{w2.grad}\n{b1.grad}\n{b2.grad}')

tensor([[ 0.0057, -0.0017],
        [ 0.0067,  0.0058]])
tensor([[-0.0059,  0.0323, -0.0264],
        [-0.0053,  0.0252, -0.0199]])
tensor([[0.0167, 0.0001]])
tensor([[-0.0157,  0.0579, -0.0422]])


## [See our solution!](https://www.practiceprobs.com/problemsets/pytorch/tensors/vanilla-neural-network/solution/)