## Install `torch`
link: https://pytorch.org

**As a side note:** <br>
You may want to check out the following cloud platforms:
* AWS
* MS Azure
* GCP
* Lightning AI

In [5]:
import numpy as np
import torch
torch.__version__

'2.6.0'

In [2]:
torch.cuda.is_available()

False

It returns `False` in my mac as my GPU is not supported.

In [4]:
# check if Apple Silicon Chip can accelerate PyTorch's computation
print(torch.backends.mps.is_available())

True


## Three main components of the PyTorch library

(1) Tensors <br>
(2) Automatic differentiation <br>
(3) Deep Learning layers / pre-built model / etc <br>

## Tensors vs Numpy ndarray

Tensors with rank n is like n-dim numpy array. Tensors have more features and they are supported for GPU's computation. Tensor with rank 0 is just a scalar.

In [6]:
# let's create some tensors!
lst_of_data = [
    1,
    [2, 3, 4],
    np.array([[3, 5],[0, -1]]),
]

for data in lst_of_data:
    print('=' * 50)
    print('data: \n', data)
    out = torch.tensor(data)
    print('type of out: \n', type(out))
    print('out: \n', out)

data: 
 1
type of out: 
 <class 'torch.Tensor'>
out: 
 tensor(1)
data: 
 [2, 3, 4]
type of out: 
 <class 'torch.Tensor'>
out: 
 tensor([2, 3, 4])
data: 
 [[ 3  5]
 [ 0 -1]]
type of out: 
 <class 'torch.Tensor'>
out: 
 tensor([[ 3,  5],
        [ 0, -1]])


In [7]:
a = np.array([[3, 5],[0, -1]])
a = torch.tensor(a)
a

tensor([[ 3,  5],
        [ 0, -1]])

In [8]:
a.shape

torch.Size([2, 2])

In [10]:
a.view(4, -1)

tensor([[ 3],
        [ 5],
        [ 0],
        [-1]])

In [14]:
a.shape

torch.Size([2, 2])

In [15]:
a

tensor([[ 3,  5],
        [ 0, -1]])

In [16]:
a.T

tensor([[ 3,  0],
        [ 5, -1]])

## Backward propagation in computation graph

A computation graph is a graph that represents a computation. For instance, y = a * x + 3 can be shown like:

```mermaid
flowchart LR

A[a] --> C[*]
B[x] --> C
C --> C2[v]
C2 --> E[+]
D[3] --> E
E --> F[y]
```

To find out the impact of change in input on output, we can take the derivative $\frac{\partial{y}}{\partial{x}}$. According to graph, the variable `y` depends on `v`, and `v` depends on `x`. So, we can use the chain rule:

$\frac{\partial{y}}{\partial{x}} = \frac{\partial{y}}{\partial{v}} \times \frac{\partial{v}}{\partial{x}}$

So, if an algorithm keeps updating those derivatives, I can quickly compute the new value, and plug them into the equation above to compute $\frac{\partial{y}}{\partial{x}}$

**NOTE:** For more details, you may check out the Andrew NG's Deep Learning course in coursera.

`PyTorch` has a high-level method `.backward` which does all those good stuff for us! The author showed it for a simple logistic regression:

In [19]:
# logistic regression
import torch.nn.functional as F

y = torch.tensor([1.0])
x1 = torch.tensor([1.1])
w1 = torch.tensor([2.2], requires_grad=True)
b = torch.tensor([0.0], requires_grad=True)
z = x1 * w1 + b
a = torch.sigmoid(z)
loss = F.binary_cross_entropy(a, y)

In [20]:
loss.backward()  # compute gradient of all leaf nodes that require_grad==True (?)

print('gradient of w1: ', w1.grad)
print('gradient of b: ', b.grad)

gradient of w1:  tensor([-0.0898])
gradient of b:  tensor([-0.0817])


## Implementing multi-layer NN

In [21]:
class NeuralNetwork(torch.nn.Module):
    def __init__(self, n_inputs, n_outputs):
        """
        Parameters
        ----------
        n_inputs : int
            The number of nodes in the input layer

        n_outputs : int
            The number of nodes in the output layer
        """
        super().__init__()
        self.layers = torch.nn.Sequential(
            # 1st hidden layer
            torch.nn.Linear(n_inputs, 30),
            torch.nn.ReLU(),
            
            # 2nd hidden layer
            torch.nn.Linear(30, 20),
            torch.nn.ReLU(),
            
            # output layer
            torch.nn.Linear(20, n_outputs),
        )


    def forward(self, x):  
        # what does this do?! 
        # Are we overwriting `forward` method?
        logits = self.layers(x)
        return logits

In [29]:
n_inputs = 50
n_outputs = 3
model = NeuralNetwork(n_inputs, n_outputs)
model

NeuralNetwork(
  (layers): Sequential(
    (0): Linear(in_features=50, out_features=30, bias=True)
    (1): ReLU()
    (2): Linear(in_features=30, out_features=20, bias=True)
    (3): ReLU()
    (4): Linear(in_features=20, out_features=3, bias=True)
  )
)

In [38]:
for p in model.parameters():
    print('=' * 50)
    print(p)
    print('shape: ', p.shape)
    if p.requires_grad:
        print(p.numel())

Parameter containing:
tensor([[-0.0568,  0.0116, -0.0765,  ...,  0.0267, -0.0036,  0.1051],
        [-0.0800,  0.0317,  0.0497,  ..., -0.0003,  0.0717,  0.1366],
        [-0.1140, -0.1359,  0.1081,  ..., -0.1306, -0.0522, -0.0165],
        ...,
        [-0.0078, -0.1096,  0.0351,  ..., -0.0469, -0.1103, -0.0339],
        [ 0.1384, -0.0292,  0.0510,  ..., -0.0750, -0.1346, -0.1321],
        [-0.0610, -0.0907, -0.0793,  ..., -0.0262,  0.1162, -0.0735]],
       requires_grad=True)
shape:  torch.Size([30, 50])
1500
Parameter containing:
tensor([-0.0049, -0.0913, -0.0130, -0.0938, -0.0229,  0.0346, -0.0861,  0.0526,
         0.1310, -0.0585,  0.1069, -0.0391,  0.0198, -0.0255, -0.1297, -0.0888,
        -0.1074,  0.0288,  0.0842,  0.0734, -0.0732,  0.1014,  0.0902,  0.0656,
        -0.1037,  0.1003,  0.0894,  0.0174,  0.1259, -0.0045],
       requires_grad=True)
shape:  torch.Size([30])
30
Parameter containing:
tensor([[-0.0687,  0.1770, -0.0836,  0.0367,  0.1320,  0.0718,  0.0091, -0.0477,
