# Introduction to PyTorch

## What is PyTorch?

PyTorch is an open-source machine learning library developed by Facebook's AI Research lab (FAIR). It is widely used for deep learning applications and has gained popularity for its flexibility and dynamic computation graph, distinguishing it from other frameworks. PyTorch provides a seamless interface for building and training neural networks, making it a preferred choice for both researchers and practitioners.

## Why PyTorch?

1. **Intuitive and Pythonic API:**
   The PyTorch API is designed to be intuitive and Pythonic, making it easier for users to understand and work with. This reduces the learning curve for those new to deep learning.

2. **Research-Focused:**
   PyTorch is widely embraced in the research community due to its flexibility and ease of experimentation. Many cutting-edge research papers and models are released with PyTorch implementations.

3. **Growing Ecosystem:**
   The PyTorch ecosystem is continually expanding, with tools and extensions for various applications, including computer vision, natural language processing, and reinforcement learning.

4. **Strong Community Support:**
   PyTorch has a large and active community, contributing to its rich ecosystem of libraries, tutorials, and resources. This community support is valuable for both beginners and experienced practitioners.

In [None]:
# imports
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision

import numpy as np
import matplotlib.pyplot as plt
from utils import get_accuracy
from mnist import Train, Val
from tqdm import tqdm, trange

# PyTorch Tensors

## What are Tensors?

In PyTorch, a tensor is a multi-dimensional array, similar to NumPy arrays. Tensors are the fundamental building blocks for constructing neural networks and conducting operations in deep learning.

### Defining Tensors

#### From Python List

You can create a PyTorch tensor from a Python list using the `torch.tensor()` function:

In [None]:
python_list = [1, 2, 3, 4, 5]
tensor_from_list = torch.tensor(python_list)
print(tensor_from_list)

#### From NumPy Arrays

Conversion from NumPy arrays to PyTorch tensors is seamless:

In [None]:
numpy_array = np.array([1.0, 2.0, 3.0, 4.0, 5.0])
tensor_from_numpy = torch.tensor(numpy_array)
print(tensor_from_numpy)

#### Using `torch.ones` and `torch.zeros`

Create tensors filled with ones or zeros using `torch.ones` and `torch.zeros`:

In [None]:
ones_tensor = torch.ones((3, 3))
zeros_tensor = torch.zeros((2, 4))
print(ones_tensor)
print(zeros_tensor)

### Various Types of Random Initializations

PyTorch provides functions for random initialization, and two commonly used ones are `torch.rand` and `torch.randn`.

In [None]:
random_uniform_tensor = torch.rand((3, 3))
random_normal_tensor = torch.randn((3, 3))
print(random_uniform_tensor)
print(random_normal_tensor)

#### Uniform Distribution - `torch.rand`

`torch.rand` creates a tensor with values uniformly sampled from the interval [0, 1). Let's visualize this distribution:

In [None]:
# Generate random values from a uniform distribution
random_uniform_tensor = torch.rand((10_000,))

# Plotting the histogram
plt.hist(random_uniform_tensor.numpy(), bins=50, color='tab:blue', edgecolor = "black")
plt.title('Uniform Distribution (torch.rand)')
plt.xlabel('Value')
plt.ylabel('Frequency')

#### Standard Normal Distribution - `torch.randn`

`torch.randn` creates a tensor with values sampled from a standard normal distribution (mean=0, std=1). Let's visualize this distribution:

In [None]:
# Generate random values from a standard normal distribution
random_normal_tensor = torch.randn((10_000,), )

# Plotting the histogram
plt.hist(random_normal_tensor.numpy(), bins=50, color='tab:blue', edgecolor = "black")
plt.title('Standard Normal Distribution (torch.randn)')
plt.xlabel('Value')
plt.ylabel('Frequency')

### Exercise:

1. Experiment with different sizes for the random tensors and observe how it affects the histograms.
2. Modify the code to create tensors using other random initialization functions, such as `torch.randint` or `torch.randperm`, and observe their distributions.

Feel free to explore and visualize other random initialization functions available in PyTorch.

# PyTorch nn.Module and LeNet-300-100 Model

## PyTorch `nn.Module`

In PyTorch, `nn.Module` is the base class for all neural network modules. It provides a convenient way to define, organize, and manage model parameters. Every neural network in PyTorch is built by subclassing `nn.Module` and implementing the `__init__` and `forward` methods.

### Defining a Model with `nn.Module`

Let's create a simple neural network using `nn.Module` with an example of the LeNet-300-100 model.

#### Example: LeNet-300-100 Model

In [None]:
class LeNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(28 * 28, 300)
        self.fc2 = nn.Linear(300, 100)
        self.fc3 = nn.Linear(100, 10)

    def forward(self, x):
        # Flatten the input image
        x = x.view(-1, 28 * 28)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

# Instantiate the LeNet-300-100 model
model = LeNet()
model

In this example, `LeNet300` is a subclass of `nn.Module` with three linear layers. The `__init__` method initializes the layers, and the `forward` method defines the forward pass through the network.

## Training Loop for LeNet-300-100 on MNIST Dataset

In the following code snippet, we demonstrate a simple training loop for a LeNet-300-100 model on the MNIST dataset using PyTorch. The training loop includes loading the data, setting up the model, defining the loss function, and running the training process.

In [None]:
# Set the device to CUDA for GPU acceleration
DEVICE = torch.device("cuda:0")  # can use just "cuda" if u have a single GPU, or "cpu" if you want to train on cpu

# Set a seed for reproducibility
torch.manual_seed(21)

# Move the model to the specified device (GPU)
model.to(DEVICE)

# Define the optimizer (Stochastic Gradient Descent)
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3, momentum=0.9)

# Define the loss function (CrossEntropyLoss)
loss_fn = nn.CrossEntropyLoss()

# Instantiate custom train and test datasets (not shown)
train_data = Train()
test_data = Val()

# Create data loaders for train and test sets
train_loader = torch.utils.data.DataLoader(train_data, batch_size=1024, shuffle=True,  num_workers=16, pin_memory=True, drop_last=True)
test_loader = torch.utils.data.DataLoader(test_data,  batch_size=1024, shuffle=False, num_workers=16, pin_memory=True, drop_last=True)

# Lists to store training statistics
losses = []
train_accuracies = []
test_accuracies = []

In [None]:
for epoch in trange(10):
    for x_data, y_data in train_loader:
        # Move data to the specified device (GPU)
        x_data, y_data = x_data.to(DEVICE), y_data.to(DEVICE)
        y_data = torch.nn.functional.one_hot(y_data.long(), 10).float()
        
        # Zero the gradients, forward pass, backward pass, and optimization step
        optimizer.zero_grad()
        outputs = model(x_data)
        loss = loss_fn(outputs.float(), y_data)
        loss.backward()
        optimizer.step()

    # Calculate and store training and test accuracies for each epoch
    train_accuracies.append(get_accuracy(model, train_loader, DEVICE))
    test_accuracies.append(get_accuracy(model, test_loader, DEVICE))

    # Store the loss for visualization
    losses.append(loss.item())

In [None]:
plt.plot(losses)

In [None]:
plt.plot(train_accuracies, label="train")
plt.plot(test_accuracies, label="test")
plt.grid()
plt.legend()