# Introduction to Deep Learning with PyTorch


## Introduction to Deep Learning

Deep Learning is a subset of machine learning that employs algorithms known as neural networks to learn from and make decisions based on vast amounts of data. It is renowned for its ability to process large and complex datasets, with applications ranging from image and speech recognition to natural language processing and autonomous vehicles.


## Setting Up the Environment

Ensure Python is installed on your system. Then, install PyTorch by running the following command in a code cell.


```bash
!pip install torch torchvision


Now, import the necessary libraries.

In [4]:
import torch
from torch import nn
from torch.utils.data import DataLoader, random_split
from torchvision import datasets, transforms

## Basic Concepts in Deep Learning

### Neural Networks

Neural Networks are computational models inspired by the human brain's structure. They consist of layers of nodes or "neurons," interconnected to form a network.


### Activation Functions

Activation functions determine the output of a neural network node. Common examples include ReLU (Rectified Linear Unit), Sigmoid, and Softmax.

### Loss Functions

Loss functions measure how well the model's predictions match the target data during training. Common loss functions include Mean Squared Error for regression tasks and Cross-Entropy for classification tasks.

### Optimizers

Optimizers are algorithms that adjust the weights of the network to minimize the loss function. Examples include SGD (Stochastic Gradient Descent) and Adam.

<img src="./imgs/NN_learning.png" alt="drawing" width="900"/>

[This video](https://www.youtube.com/watch?v=aircAruvnKk&ab_channel=3Blue1Brown) does an exceptional job of introducing neural networks learn.

## Creating a Simple Deep Learning Model

## I. Data Preparation

In this example, we will work with the MNIST Dataset. 

The MNIST dataset, short for Modified National Institute of Standards and Technology dataset, is one of the most iconic datasets in the field of machine learning and deep learning. Comprising a collection of 70,000 handwritten digits, it is split into a training set of 60,000 examples and a test set of 10,000 examples. Each image is grayscale, 28x28 pixels, and labeled with the digit it represents, ranging from 0 to 9. MNIST serves as a benchmark dataset for evaluating the performance of algorithms in the domain of image recognition. Since its release, it has become a standard dataset for beginners and researchers alike to test and benchmark their machine learning and deep learning models. The simplicity of MNIST allows for quick testing of concepts or algorithms, making it an excellent starting point for anyone new to the field.

<img src="./imgs/mnist.webp" alt="drawing" width="450"/>

Load and preprocess the MNIST dataset

In [5]:
# Load MNIST dataset
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])
mnist_train = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
mnist_test = datasets.MNIST(root='./data', train=False, download=True, transform=transform)

# Split training data for validation
train_size = int(0.8 * len(mnist_train))
val_size = len(mnist_train) - train_size
train_dataset, val_dataset = random_split(mnist_train, [train_size, val_size])

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=64, shuffle=False)
test_loader = DataLoader(mnist_test, batch_size=64, shuffle=False)

# II. Modeling

Building a neural network model in PyTorch usually follows these key steps:

- Define the model architecture using the `nn.Module` base class.
- Set up loss function and optimizer.
- Train the model on your dataset.


### 1. 1. Model Definition

**Class-based approach:** We create a class called NeuralNet that inherits from nn.Module. This gives us the building blocks to create our network.
Key Layers:
- `nn.Flatten`: Reshapes input 2D images into 1D vectors for the linear layers.
- `nn.Linear`: Fully-connected layers that perform the core computations.
- `nn.ReLU`: Non-linear activation function often used in hidden layers.

In [6]:
# Define the model
class NeuralNet(nn.Module):
    def __init__(self):
        super(NeuralNet, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 10)
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

model = NeuralNet()

### 2. Loss & Optimization

Compiling our model in PyTorch requires:

1. **Choosing an optimizer**: We typically default to `torch.optim.Adam` due to its effectiveness across a wide range of tasks. Adam stands for Adaptive Moment Estimation and combines the best properties of the AdaGrad and RMSProp algorithms to handle sparse gradients on noisy problems. Adam is efficient in terms of computation and requires little memory. It's particularly favored for its adaptiveness, making it suitable for most problems without needing much customization or tuning of the learning rate. This optimizer adjusts the learning rate during training, which helps converge faster and more effectively.

2. **Specifying the loss function**: Crucial for neural networks due to the need for a differentiable loss for gradient descent. In our case, we use `nn.CrossEntropyLoss` for multi-class classification. This choice is motivated by our problem type—classification—where we aim to categorize inputs into multiple classes (image to 10 digits). This function compares the distribution of the predictions (the outputs of the softmax function in our model) with the true distribution. It's a good fit for classification problems with multiple classes.

In [7]:
# Define loss function and optimizer
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

### 3. Model Training Steps

**Iterations per Training Epoch**: With a hypothetical dataset of 100,000 rows, assuming a 20% validation split leaves us 80,000 rows for training. With a batch size of 32, we get 2,500 iterations per epoch.

- **Epochs**: Training neural networks involves multiple epochs, where the entire dataset is passed through the model in each epoch. A common range for the number of epochs is 5-10, with adjustments to the learning rate if you're considering higher epochs for improved accuracy.

- During each epoch:
    - The `train` function is called to train the model using the training data loader.
    - The `test` function is called to evaluate the model using the validation data loader, printing out the validation accuracy and loss.


**Interpreting Output**:

- **Loss and Accuracy**: Post each epoch, we can observe the average training loss and accuracy, providing insights into the model's learning progress.
- **Validation Loss and Accuracy**: Calculated at the end of each epoch, these metrics help gauge the model's generalization ability and highlight potential overfitting if the training accuracy significantly surpasses validation accuracy.


In [8]:
# Function to train the model
def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)
        
        # Compute prediction error
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

# Function to test the model
def test(dataloader, model, loss_fn, test=False):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    model.eval()
    test_loss, correct = 0, 0
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    test_loss /= num_batches
    accuracy = correct / size
    if test == False:
        print(f"Val Accuracy: {(100*accuracy):>0.1f}%, Avg loss: {test_loss:>8f} ")
    else:
        print(f"Test Accuracy: {(100*accuracy):>0.1f}%, Avg loss: {test_loss:>8f} ")

# Set device to GPU if available
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)

# Train the model
for epoch in range(10):
    train(train_loader, model, loss_fn, optimizer)
    test(val_loader, model, loss_fn)

Val Accuracy: 93.2%, Avg loss: 0.225838 
Val Accuracy: 95.8%, Avg loss: 0.140560 
Val Accuracy: 96.8%, Avg loss: 0.106705 
Val Accuracy: 96.9%, Avg loss: 0.102025 
Val Accuracy: 97.2%, Avg loss: 0.095793 
Val Accuracy: 97.2%, Avg loss: 0.097478 
Val Accuracy: 97.1%, Avg loss: 0.100423 
Val Accuracy: 97.2%, Avg loss: 0.100610 
Val Accuracy: 97.4%, Avg loss: 0.090294 
Val Accuracy: 97.0%, Avg loss: 0.104901 


### 4. Evaluating the Model

Test the model's performance:

In [9]:
# Evaluate the model
test(test_loader, model, loss_fn, test=True)

Test Accuracy: 97.3%, Avg loss: 0.094405 


## 5. Conclusion

This notebook introduced the basics of training a deep learning model using PyTorch, from setting up the environment and understanding key concepts to building, training, and evaluating a simple model. For further exploration, consider diving into more complex models, experimenting with different datasets, and exploring the extensive features of PyTorch.