## Install `torch`
link: https://pytorch.org

**As a side note:** <br>
You may want to check out the following cloud platforms:
* AWS
* MS Azure
* GCP
* Lightning AI

In [2]:
import numpy as np
import torch
torch.__version__

'2.6.0'

In [3]:
torch.cuda.is_available()

False

It returns `False` in my mac as my GPU is not supported.

In [4]:
# check if Apple Silicon Chip can accelerate PyTorch's computation
print(torch.backends.mps.is_available())

True


## Three main components of the PyTorch library

(1) Tensors <br>
(2) Automatic differentiation <br>
(3) Deep Learning layers / pre-built model / etc <br>

## Tensors vs Numpy ndarray

Tensors with rank n is like n-dim numpy array. Tensors have more features and they are supported for GPU's computation. Tensor with rank 0 is just a scalar.

In [5]:
# let's create some tensors!
lst_of_data = [
    1,
    [2, 3, 4],
    np.array([[3, 5],[0, -1]]),
]

for data in lst_of_data:
    print('=' * 50)
    print('data: \n', data)
    out = torch.tensor(data)
    print('type of out: \n', type(out))
    print('out: \n', out)

data: 
 1
type of out: 
 <class 'torch.Tensor'>
out: 
 tensor(1)
data: 
 [2, 3, 4]
type of out: 
 <class 'torch.Tensor'>
out: 
 tensor([2, 3, 4])
data: 
 [[ 3  5]
 [ 0 -1]]
type of out: 
 <class 'torch.Tensor'>
out: 
 tensor([[ 3,  5],
        [ 0, -1]])


In [6]:
a = np.array([[3, 5],[0, -1]])
a = torch.tensor(a)
a

tensor([[ 3,  5],
        [ 0, -1]])

In [7]:
a.shape

torch.Size([2, 2])

In [8]:
a.view(4, -1)

tensor([[ 3],
        [ 5],
        [ 0],
        [-1]])

In [9]:
a.shape

torch.Size([2, 2])

In [10]:
a

tensor([[ 3,  5],
        [ 0, -1]])

In [11]:
a.T

tensor([[ 3,  0],
        [ 5, -1]])

## Backward propagation in computation graph

A computation graph is a graph that represents a computation. For instance, y = a * x + 3 can be shown like:

```mermaid
flowchart LR

A[a] --> C[*]
B[x] --> C
C --> C2[v]
C2 --> E[+]
D[3] --> E
E --> F[y]
```

To find out the impact of change in input on output, we can take the derivative $\frac{\partial{y}}{\partial{x}}$. According to graph, the variable `y` depends on `v`, and `v` depends on `x`. So, we can use the chain rule:

$\frac{\partial{y}}{\partial{x}} = \frac{\partial{y}}{\partial{v}} \times \frac{\partial{v}}{\partial{x}}$

So, if an algorithm keeps updating those derivatives, I can quickly compute the new value, and plug them into the equation above to compute $\frac{\partial{y}}{\partial{x}}$

**NOTE:** For more details, you may check out the Andrew NG's Deep Learning course in coursera.

`PyTorch` has a high-level method `.backward` which does all those good stuff for us! The author showed it for a simple logistic regression:

In [12]:
# logistic regression
import torch.nn.functional as F

y = torch.tensor([1.0])
x1 = torch.tensor([1.1])
w1 = torch.tensor([2.2], requires_grad=True)
b = torch.tensor([0.0], requires_grad=True)
z = x1 * w1 + b
a = torch.sigmoid(z)
loss = F.binary_cross_entropy(a, y)

In [13]:
loss.backward()  # compute gradient of all leaf nodes that require_grad==True (?)

print('gradient of w1: ', w1.grad)
print('gradient of b: ', b.grad)

gradient of w1:  tensor([-0.0898])
gradient of b:  tensor([-0.0817])


## Implementing multi-layer NN

In [14]:
class NeuralNetwork(torch.nn.Module):
    def __init__(self, n_inputs, n_outputs):
        """
        Parameters
        ----------
        n_inputs : int
            The number of nodes in the input layer

        n_outputs : int
            The number of nodes in the output layer
        """
        super().__init__()
        self.layers = torch.nn.Sequential(
            # 1st hidden layer
            torch.nn.Linear(n_inputs, 30),
            torch.nn.ReLU(),
            
            # 2nd hidden layer
            torch.nn.Linear(30, 20),
            torch.nn.ReLU(),
            
            # output layer
            torch.nn.Linear(20, n_outputs),
        )


    def forward(self, x):  
        # what does this do?! 
        # Are we overwriting `forward` method?
        logits = self.layers(x)
        return logits

In [15]:
torch.manual_seed(123)
N_INPUTS = 50
N_OUTPUTS = 3
model = NeuralNetwork(N_INPUTS, N_OUTPUTS)

In [16]:
model

NeuralNetwork(
  (layers): Sequential(
    (0): Linear(in_features=50, out_features=30, bias=True)
    (1): ReLU()
    (2): Linear(in_features=30, out_features=20, bias=True)
    (3): ReLU()
    (4): Linear(in_features=20, out_features=3, bias=True)
  )
)

In [17]:
for p in model.parameters():
    print('=' * 50)
    print(p)
    print('shape: ', p.shape)
    if p.requires_grad:
        print(p.numel())

Parameter containing:
tensor([[-0.0577,  0.0047, -0.0702,  ...,  0.0222,  0.1260,  0.0865],
        [ 0.0502,  0.0307,  0.0333,  ...,  0.0951,  0.1134, -0.0297],
        [ 0.1077, -0.1108,  0.0122,  ...,  0.0108, -0.1049, -0.1063],
        ...,
        [-0.0787,  0.1259,  0.0803,  ...,  0.1218,  0.1303, -0.1351],
        [ 0.1359,  0.0175, -0.0673,  ...,  0.0674,  0.0676,  0.1058],
        [ 0.0790,  0.1343, -0.0293,  ...,  0.0344, -0.0971, -0.0509]],
       requires_grad=True)
shape:  torch.Size([30, 50])
1500
Parameter containing:
tensor([-0.1250,  0.0513,  0.0366,  0.0075,  0.0509,  0.0545, -0.0393,  0.0924,
        -0.1412, -0.1232, -0.1063,  0.0081, -0.1249,  0.0101, -0.0019, -0.1298,
         0.1388, -0.0330,  0.1017,  0.1247, -0.0554, -0.0417,  0.1388,  0.0159,
         0.1215,  0.0385,  0.0769, -0.1224, -0.0279,  0.0991],
       requires_grad=True)
shape:  torch.Size([30])
30
Parameter containing:
tensor([[-1.0154e-02, -1.5861e-01,  1.9066e-02,  1.6987e-02, -1.7074e-02,
       

In [18]:
model.layers[0].weight

Parameter containing:
tensor([[-0.0577,  0.0047, -0.0702,  ...,  0.0222,  0.1260,  0.0865],
        [ 0.0502,  0.0307,  0.0333,  ...,  0.0951,  0.1134, -0.0297],
        [ 0.1077, -0.1108,  0.0122,  ...,  0.0108, -0.1049, -0.1063],
        ...,
        [-0.0787,  0.1259,  0.0803,  ...,  0.1218,  0.1303, -0.1351],
        [ 0.1359,  0.0175, -0.0673,  ...,  0.0674,  0.0676,  0.1058],
        [ 0.0790,  0.1343, -0.0293,  ...,  0.0344, -0.0971, -0.0509]],
       requires_grad=True)

**Note:** <br>
$Y = W^{T}X + B$, where: <br>
* X (Input): a data point in $R^{m}$ space; `x1, x2, ..., xm`
* Y (Output): a data point in $R^{n}$ space; `y1, y2, ..., yn`
* W (Weight): The weight which is $m$-by-$n$, So, `W[col1]` represents the weights for the `y1`, and so on.
* B (Bias): The bias vector in $R^{n}$ space.

In [19]:
# let's check the layer0's weight again

model.layers[0].weight.shape

torch.Size([30, 50])

It is important to note that the shape of the weight tensor above is actually AFTER transpose. In other words, the shape `(30, 50)` is the shape of $W^{T}$.

**NOTE:** <br>
The `layers[0]` is not the nodes! It is actually those edges that connects two sets of nodes. Recall that the input has 50 elements. And, then, 30 elements are computed. So, that layer0's weight- with shape `(30, 50)`- actually refers to those edges that connect 50-node layer to the next 30-node layer.


In [21]:
torch.manual_seed(123)

sample = torch.rand((1, 50))
out = model(sample)  # should be equivalent to model.forward(x). will check it out shortly
print(out) 

tensor([[-0.1262,  0.1080, -0.1792]], grad_fn=<AddmmBackward0>)


In [22]:
model.forward(sample)

tensor([[-0.1262,  0.1080, -0.1792]], grad_fn=<AddmmBackward0>)

If we are done with training, and we only need to do inference (prediction), there is no need for backpropagation. All we need now is to just allow the input value goes through all layers so that the output can be computed.So, if the goal is only inference, we can avoid keeping track of gradients.

In [24]:
with torch.no_grad():
    out = model(sample)
print(out)

tensor([[-0.1262,  0.1080, -0.1792]])


**NOTE:**
If user wants to get probability of each class, then the output of the model needs to be fed into softmax layer.

In [25]:
with torch.no_grad():
    out = model(sample)
    softmax_output = torch.softmax(out, dim=1)

print('softmax_output:\n', softmax_output)

softmax_output:
 tensor([[0.3113, 0.3934, 0.2952]])


## Setting up Data loader

**Simple Example**

In [77]:
# number of features: two

# train, with fives records
X_train = torch.tensor([
    [-1.2, 3.1],
    [-0.9, 2.9],
    [-0.5, 2.6],
    [2.3, -1.1],
    [2.7, -1.5]
])
y_train = torch.tensor([0, 0, 0, 1, 1])

# test, with fives records
X_test = torch.tensor([
    [-0.8, 2.8],
    [2.6, -1.6],
])
y_test = torch.tensor([0, 1])

What are we trying to do here? We want to create a class to help us get samples from our `TRAIN` & `TEST` set. 

In [50]:
from torch.utils.data import Dataset, DataLoader

In [78]:
class ToyDataset(Dataset):
    def __init__(self, X, y):
        self.features = X
        self.labels = y

    def __getitem__(self, index):
        """
        index is the index of an existing sample in X
        """
        return self.features[index], self.labels[index]
    
    def __len__(self):
        return self.labels.shape[0]

**Why do we need this class?** <br>

Because we will use `DataLoader` to create the iterator to help us iterate through our dataset and get a certain batch of data. The "data set" object that is passed as input to the class `DataLoader` needs to support two methods `__getitem__` and `__len__`.

In [79]:
train_ds = ToyDataset(X_train, y_train)
test_ds = ToyDataset(X_test, y_test)

In [80]:
# example
train_ds

<__main__.ToyDataset at 0x1431de120>

In [84]:
train_ds.__getitem__(2)

(tensor([-0.5000,  2.6000]), tensor(0))

## use DataLoader() class to create iterator for train and test

In [85]:
torch.manual_seed(123)

train_loader = DataLoader(
    dataset=train_ds,  
    batch_size=2,
    shuffle=True,
    num_workers=0
)

test_loader = DataLoader(
    dataset=test_ds,
    batch_size=2,
    shuffle=False,
    num_workers=0
)

In [87]:
for i, item in enumerate(train_loader):
    print(f'i={i}, and train data: \n{item}')
    print('-' * 50)

i=0, and train data: 
[tensor([[-1.2000,  3.1000],
        [-0.5000,  2.6000]]), tensor([0, 0])]
--------------------------------------------------
i=1, and train data: 
[tensor([[ 2.3000, -1.1000],
        [-0.9000,  2.9000]]), tensor([1, 0])]
--------------------------------------------------
i=2, and train data: 
[tensor([[ 2.7000, -1.5000]]), tensor([1])]
--------------------------------------------------
