# Learning Pytorch
PyTorch is an open-source machine learning framework developed by Facebook's AI Research lab (FAIR). It's primarily used for deep learning applications, such as computer vision and natural language processing.

Key characteristics of PyTorch include:

**Pythonic**: It's designed to be deeply integrated with the Python language, making it feel more natural and easier to debug using standard Python tools.

**Tensors**: Its fundamental data structure is a Tensor, which is a multi-dimensional array similar to NumPy arrays, but with the added capability to utilize GPUs for accelerated computation.

**Imperative and Dynamic**: PyTorch is imperative. It excutes the code immediately, step-by-step like standard python program. While executing the code, it builds the computational graph on the fly dynamically. The computational graph (the map of math operations) is constructed on the fly dynamically (meaning during execution, or "define-by-run"). It changes and adapts with every forward pass of data.

The other poular framework is **TensorFlow** which was was originally developed by Google. It is strongly focused on production and scalability. Traditionally had a steeper learning curve, but the high-level **Keras API** makes model building much simpler now.

## 5 Steps to Deep Learning with Pytorch
- Prediction
- Loss Calculation
- Gradient Calculation
- Param Update
- Gradient Reset


# Creating Tensor
Tensor is the building block of Pytorch.

In [17]:
import torch

# create tensor from data
data = [[1, 2],[3, 4]]
x_data = torch.tensor(data)
print(x_data)
print("\n")

# create tensor from numpy array
import numpy as np
np_array = np.array(data)
x_np = torch.from_numpy(np_array)
print(x_np)
print("\n")

# create tensor from another tensor
x_ones = torch.ones_like(x_data) # retains the properties of x_data
print(x_ones)
print("\n")

torch.manual_seed(123) # Set seed for reproducibility when creating random tensors

# Creating from a desired shape
x_rand = torch.rand_like(x_data, dtype=torch.float) # overrides the datatype of x
print(x_rand)
print("\n")
# Specify shape
shape = (2,3,)
rand_tensor = torch.rand(shape)
print(rand_tensor)
print("\n")
ones_tensor = torch.ones(shape)
print(ones_tensor)
print("\n")
zeros_tensor = torch.zeros(shape)
print(zeros_tensor)     

tensor([[1, 2],
        [3, 4]])


tensor([[1, 2],
        [3, 4]])


tensor([[1, 1],
        [1, 1]])


tensor([[0.2961, 0.5166],
        [0.2517, 0.6886]])


tensor([[0.0740, 0.8665, 0.1366],
        [0.1025, 0.1841, 0.7264]])


tensor([[1., 1., 1.],
        [1., 1., 1.]])


tensor([[0., 0., 0.],
        [0., 0., 0.]])


# Critical Attributes for Debugging

In [None]:
import torch

tensor = torch.rand(3,4) # float is the default datatype

print(f"Shape of tensor: {tensor.shape}")
print(f"Datatype of tensor: {tensor.dtype}") 
print(f"Device tensor is stored on: {tensor.device}") # tells where tensor lives, cpu or cuda. cuda is for nvidia gpu


Shape of tensor: torch.Size([3, 4])
Datatype of tensor: torch.float32
Device tensor is stored on: cpu


# Autograd
Autograd (short for Automatic Differentiation) is the engine that modern deep learning frameworks like PyTorch and TensorFlow use to automatically calculate the gradients needed to train a neural network.

requires_grad=True tells Autograd to build computational graph for the tensor.

In [None]:
# A standard data tensor
x_data = torch.tensor([[1.0, 2.0], [3.0, 4.0]])

# A parameter tensor that requires gradients
x_param = torch.tensor([[1.0, 2.0], [3.0, 4.0]], requires_grad=True) # pytorch builds a computational graph for this tensor

print("Data Tensor:\n", x_data)
print("Requires Grad:", x_data.requires_grad)
print("\n")
print("Parameter Tensor:\n", x_param)
print("Requires Grad:", x_param.requires_grad)  

Data Tensor:
 tensor([[1., 2.],
        [3., 4.]])
Requires Grad: False


Parameter Tensor:
 tensor([[1., 2.],
        [3., 4.]], requires_grad=True)
Requires Grad: True


In [33]:
a = torch.tensor(2.0, requires_grad=True)
b = torch.tensor(3.0, requires_grad=True)

c = a + b

d = a*b + b**2 + c*3

print(f"grad_fn for d: {d.grad_fn}")  # grad_fn shows the function that generated this tensor
print(f"grad_fn for c: {c.grad_fn}")  # grad_fn shows the function that generated this tensor
print(f"grad_fn for b: {b.grad_fn}")  # None because a is created by user
print(f"grad_fn for a: {a.grad_fn}")  # None because a is created by user



grad_fn for d: <AddBackward0 object at 0x77076bf61d20>
grad_fn for c: <AddBackward0 object at 0x77076b8bb2e0>
grad_fn for b: None
grad_fn for a: None


# Pytorch Operations

## Matrix Multiplication

* is used for element-wise multiplication
@ is used for matrix multiplication

In [None]:
import torch

a = torch.tensor([[1.0, 2.0], 
                  [3.0, 4.0],])

b = torch.tensor([[5.0, 6.0], 
                  [7.0, 8.0]])

elementwise_multiplication = a * b
matrix_multiplication = a @ b

# Note: element-wise multiplication expects both tensors to be of same shape
print("Element-wise Multiplication:\n", elementwise_multiplication) 
print("\n")
# Note: matrix multiplication expects the inner dimensions to match. 
# i.e. the number of columns in the first matrix should be equal to the number of rows in the second matrix.
print("Matrix Multiplication:\n", matrix_multiplication)

# Note: In linear algebra (y = m.x + c), matrix multiplication is often denoted using a dot (m¬∑x) or simply by juxtaposition (mx).


Element-wise Multiplication:
 tensor([[ 6.,  8.],
        [10., 12.]])


Matrix Multiplication:
 tensor([[19., 22.],
        [43., 50.]])


## Reduction Operation and dim Argument
Reduction operations are mean(), min(), max(), sum()...


In [53]:
scores = torch.tensor([[10.0, 20.0, 30.0],
                       [5.0, 10.0, 5.0],
                       [40.0, 30.0, 20.0]])

average_score = scores.mean()  # mean of all elements
print("Average Score of all elements:", average_score)                 

average_score = scores.mean(dim=0)  # mean across rows for each column
print("Average Score per Column:", average_score)

average_score = scores.mean(dim=1)  # mean across columns for each row
print("Average Score per Row:", average_score)



Average Score of all elements: tensor(18.8889)
Average Score per Column: tensor([18.3333, 20.0000, 18.3333])
Average Score per Row: tensor([20.0000,  6.6667, 30.0000])


## Indexing

In [56]:
import torch

x = torch.arange(12).reshape(3,4)
print(x)

tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])


In [58]:
# Get the 3rd column
third_column = x[:, 2]
print("3rd Column:", third_column)

# Get the 2nd row
second_row = x[1, :]
print("2nd Row:", second_row)

3rd Column: tensor([ 2,  6, 10])
2nd Row: tensor([4, 5, 6, 7])


In [None]:
import torch

scores = torch.tensor([[10.0, 20.0, 30.0],
                       [5.0, 10.0, 5.0],
                       [40.0, 30.0, 20.0]])

best_indices = torch.argmax(scores,dim=1)  # Get indices of max elements in each row
print("Indices of Best Elements in Each Row:", best_indices)

Indices of Best Elements in Each Row: tensor([2, 1, 0])


In [None]:
import torch

data = torch.tensor([[10.0, 20.0, 30.0, 13],
                       [5.0, 10.0, 5.0, 23],
                       [40.0, 30.0, 20.0, 33]])

# shopping list
indices_to_select = torch.tensor([[0], [2], [1]])  # selecting 1st and 3rd rows

selected_rows = data[indices_to_select]
print("Selected Rows:\n", selected_rows)

selected_rows = torch.gather(data, dim=1, index=indices_to_select)
print("Selected Rows using gather:\n", selected_rows)

Selected Rows:
 tensor([[[10., 20., 30.]],

        [[40., 30., 20.]],

        [[ 5., 10.,  5.]]])
Selected Rows using gather:
 tensor([[10.],
        [ 5.],
        [30.]])


# Building a Model from Scratch

$\hat{y}$ = mX + c

## Forward Pass
Forward Pass is the model's first guess. When you first creates the model, it knows nothing. It's guess is completely random.

In [80]:
import torch

# Out batch of data will have 10 data points
N = 10

# Each data point has one input feature (x) and one output feature (y)
D_in = 1
D_out = 1

# Create our input data X
torch.manual_seed(24)  # For reproducibility
X = torch.randn(N, D_in)
print("Input Data X:\n", X)

# Let's assume the true relationship is y = 2x + 3
true_m = torch.tensor([[2.0]]) # slope in the linear requation y = mx + c
true_c = torch.tensor(3) # intercept in the linear equation y = mx + c

# Create output data y with some noise (0.1 * torch.randn(N, D_out))
y = X @ true_m + true_c 
y_true = y + 0.1 * torch.randn(N, D_out) # Adding noise
print("Output Data y_true:\n", y_true)


Input Data X:
 tensor([[ 1.0139],
        [ 0.8988],
        [-0.2111],
        [-1.5326],
        [-0.6163],
        [ 0.2288],
        [-0.1120],
        [-2.0506],
        [-0.6189],
        [-0.7804]])
Output Data y_true:
 tensor([[ 4.9357],
        [ 4.8457],
        [ 2.5002],
        [ 0.0214],
        [ 1.8423],
        [ 3.3763],
        [ 2.8427],
        [-1.1818],
        [ 1.8532],
        [ 1.5877]])


In [81]:
# To build a model from scratch, we need to define parameters m and c
# Initialize m and c randomly
m = torch.randn(D_in, D_out, requires_grad=True)  # slope
c = torch.randn(1, requires_grad=True)            # intercept
print("Initial weight:", m)
print("Initial bias:", c)

Initial weight: tensor([[1.5769]], requires_grad=True)
Initial bias: tensor([0.1243], requires_grad=True)


In [86]:
# split the data into train and test sets
train_size = int(0.8 * N)
X_train = X[:train_size]
y_train = y_true[:train_size]
X_test = X[train_size:]


In [87]:
# Build the model
def model(X):
    return X @ m + c
    

In [88]:
# predict the output for training data
y_pred = model(X_train)
print("Predicted Output y_pred:\n", y_pred)

Predicted Output y_pred:
 tensor([[ 1.7231],
        [ 1.5416],
        [-0.2086],
        [-2.2925],
        [-0.8476],
        [ 0.4851],
        [-0.0523],
        [-3.1093]], grad_fn=<AddBackward0>)


## Loss Calculation

In [89]:
MSE_loss = torch.mean((y_pred - y_train) ** 2)
print("Mean Squared Error Loss:", MSE_loss)

Mean Squared Error Loss: tensor(7.7024, grad_fn=<MeanBackward0>)


# üìò Calculating the Gradients

The **gradient** tells us how much the loss $ L $ changes when we slightly change a parameter ($ w $ or $ b $).  
We find this using **partial differentiation**.

---

## üîπ Gradient with Respect to the Weight ($ w $)

We want to find $ \frac{\partial L}{\partial w} $.  
Using the **chain rule**:

$$ 
\frac{\partial L}{\partial w} = \frac{\partial L}{\partial \hat{y}} \cdot \frac{\partial \hat{y}}{\partial w}
$$ 

### Step 1: Find $ \frac{\partial L}{\partial \hat{y}} $

$$ 
\frac{\partial L}{\partial \hat{y}} = 
\frac{\partial}{\partial \hat{y}} 
\left[ \frac{1}{2} (y - \hat{y})^2 \right]
= 2 \cdot \frac{1}{2} (y - \hat{y}) \cdot (-1)
= - (y - \hat{y})
= \hat{y} - y
$$ 

This represents the **prediction error**.

### Step 2: Find $ \frac{\partial \hat{y}}{\partial w} $

$$ 
\frac{\partial \hat{y}}{\partial w} = 
\frac{\partial}{\partial w} [w \cdot x + b] = x
$$ 

### Step 3: Combine (Chain Rule)

$$ 
\frac{\partial L}{\partial w} = (\hat{y} - y) \cdot x
$$ 

---

## üîπ Gradient with Respect to the Bias ($ b $)

We want to find $ \frac{\partial L}{\partial b} $.  
Again, we use the **chain rule**:

$$ 
\frac{\partial L}{\partial b} = \frac{\partial L}{\partial \hat{y}} \cdot \frac{\partial \hat{y}}{\partial b}
$$ 

### Step 1: Find $ \frac{\partial L}{\partial \hat{y}} $

(Same as before)

$$ 
\frac{\partial L}{\partial \hat{y}} = \hat{y} - y
$$ 

### Step 2: Find $ \frac{\partial \hat{y}}{\partial b} $

$$ 
\frac{\partial \hat{y}}{\partial b} = 
\frac{\partial}{\partial b} [w \cdot x + b] = 1
$$ 

### Step 3: Combine (Chain Rule)

$$ 
\frac{\partial L}{\partial b} = (\hat{y} - y) \cdot 1 = \hat{y} - y
$$ 

---

## üìù Summary of Gradients

The calculated gradients are:

- **Gradient of Loss w.r.t. Weight:**

  $$ 
  \frac{\partial L}{\partial w} = (\hat{y} - y) \cdot x
  $$ 

- **Gradient of Loss w.r.t. Bias:**

  $$ 
  \frac{\partial L}{\partial b} = \hat{y} - y
  $$ 


In [None]:
# backward pass to compute gradients
# this single command computes the gradient of MSE_loss with respect to all tensors with requires_grad=True
# It populates the .grad attribute of those tensors m and c
MSE_loss.backward()

# the gradients are stored in .grad attributes of m and c
# the .grad attribute holds the gradient values that tells us how to change m and c to reduce the loss
print("Gradient of m:", m.grad)
print("Gradient of c:", c.grad)

Gradient of m: tensor([[0.7908]])
Gradient of c: tensor([-5.4857])
