# PyTorch Basics and Multi-Layer Perceptrons (MLPs)

Welcome to this comprehensive tutorial on PyTorch basics and Multi-Layer Perceptrons (MLPs)! In this notebook, we will cover fundamental concepts of PyTorch, including tensor operations, neural network creation, and training. We'll also dive deep into the world of MLPs, exploring their architecture, implementation, and training process using PyTorch.

## Table of Contents

1. [Introduction to PyTorch and Tensors](#introduction-to-pytorch-and-tensors)
2. [Tensor Operations and Matrix Math](#tensor-operations-and-matrix-math)
3. [Building Blocks of Neural Networks](#building-blocks-of-neural-networks)
4. [Creating a Multi-Layer Perceptron (MLP)](#creating-a-multi-layer-perceptron-mlp)
5. [Training an MLP](#training-an-mlp)
6. [Evaluating and Using the Trained MLP](#evaluating-and-using-the-trained-mlp)
7. [Advanced Topics and Best Practices](#advanced-topics-and-best-practices)

## Setup

First, let's import the necessary libraries and set up our environment.

##### If CUDA device

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import numpy as np
import matplotlib.pyplot as plt

# Set random seed for reproducibility
torch.manual_seed(42)
np.random.seed(42)

# Check if CUDA is available and set the device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

##### If Apple sillicon device

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import numpy as np
import matplotlib.pyplot as plt

# Set random seed for reproducibility
torch.manual_seed(42)
np.random.seed(42)

# Check if MPS is available and set the device
device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")
print(f"Using device: {device}")

Matplotlib is building the font cache; this may take a moment.


Using device: mps


## 1. Introduction to PyTorch and Tensors

PyTorch is a powerful deep learning framework that provides a flexible and intuitive way to build and train neural networks. At its core, PyTorch uses tensors, which are multi-dimensional arrays similar to NumPy arrays but with additional features for GPU acceleration and automatic differentiation.

In [None]:
# Creating tensors
scalar = torch.tensor(42)
vector = torch.tensor([1.0, 2.0, 3.0])
matrix = torch.tensor([[1, 2], [3, 4]])
tensor_3d = torch.tensor([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])

print("Scalar:", scalar)
print("Vector:", vector)
print("Matrix:", matrix)
print("3D Tensor:", tensor_3d)

# Tensor properties
print("\nTensor properties:")
print("Shape:", tensor_3d.shape)
print("Datatype:", tensor_3d.dtype)
print("Device:", tensor_3d.device)

## 2. Tensor Operations and Matrix Math

Understanding tensor operations is crucial for working with neural networks. These operations form the basis of the computations performed in MLPs and other neural network architectures.

In [None]:
# Basic arithmetic operations
a = torch.tensor([1, 2, 3])
b = torch.tensor([4, 5, 6])

print("Addition:", a + b)
print("Subtraction:", a - b)
print("Element-wise multiplication:", a * b)
print("Element-wise division:", a / b)

# Matrix multiplication
m1 = torch.tensor([[1, 2], [3, 4]])
m2 = torch.tensor([[5, 6], [7, 8]])
print("\nMatrix multiplication:")
print(torch.matmul(m1, m2))

# Reshaping tensors
original = torch.tensor([1, 2, 3, 4, 5, 6])
reshaped = original.view(2, 3)
print("\nOriginal tensor:", original)
print("Reshaped tensor:")
print(reshaped)

# Transposing tensors
print("\nTransposed tensor:")
print(reshaped.t())

# Aggregation operations
print("\nMean:", reshaped.mean())
print("Sum:", reshaped.sum())
print("Max:", reshaped.max())

## 3. Building Blocks of Neural Networks

Before we dive into creating an MLP, let's explore some of the fundamental building blocks of neural networks in PyTorch.

In [None]:
# Linear layer
linear = nn.Linear(in_features=10, out_features=5)
input_tensor = torch.randn(3, 10)  # Batch of 3 samples, each with 10 features
output = linear(input_tensor)
print("Linear layer output shape:", output.shape)

# Activation functions
relu = nn.ReLU()
sigmoid = nn.Sigmoid()
tanh = nn.Tanh()

print("\nActivation functions:")
print("ReLU:", relu(torch.tensor([-1, 0, 1])))
print("Sigmoid:", sigmoid(torch.tensor([-1, 0, 1])))
print("Tanh:", tanh(torch.tensor([-1, 0, 1])))

# Dropout
dropout = nn.Dropout(p=0.5)
input_tensor = torch.ones(10)
print("\nDropout (training mode):")
print(dropout(input_tensor))

# BatchNorm
batch_norm = nn.BatchNorm1d(10)
input_tensor = torch.randn(20, 10)  # 20 samples, 10 features each
normalized = batch_norm(input_tensor)
print("\nBatchNorm output mean:", normalized.mean(dim=0))
print("BatchNorm output std:", normalized.std(dim=0))

## 4. Creating a Multi-Layer Perceptron (MLP)

Now that we understand the building blocks, let's create a Multi-Layer Perceptron (MLP) using PyTorch.

In [None]:
class MLP(nn.Module):
    def __init__(self, input_size, hidden_sizes, output_size, dropout_rate=0.5):
        super(MLP, self).__init__()
        self.layers = nn.ModuleList()
        
        # Input layer
        self.layers.append(nn.Linear(input_size, hidden_sizes[0]))
        self.layers.append(nn.ReLU())
        self.layers.append(nn.BatchNorm1d(hidden_sizes[0]))
        self.layers.append(nn.Dropout(dropout_rate))
        
        # Hidden layers
        for i in range(len(hidden_sizes) - 1):
            self.layers.append(nn.Linear(hidden_sizes[i], hidden_sizes[i+1]))
            self.layers.append(nn.ReLU())
            self.layers.append(nn.BatchNorm1d(hidden_sizes[i+1]))
            self.layers.append(nn.Dropout(dropout_rate))
        
        # Output layer
        self.layers.append(nn.Linear(hidden_sizes[-1], output_size))
    
    def forward(self, x):
        for layer in self.layers:
            x = layer(x)
        return x

# Create an instance of the MLP
input_size = 10
hidden_sizes = [64, 32, 16]
output_size = 1
mlp = MLP(input_size, hidden_sizes, output_size)

print(mlp)

## 5. Training an MLP

Now that we have defined our MLP, let's train it on a simple dataset.

In [None]:
# Generate a simple dataset
def generate_data(num_samples=1000):
    X = torch.randn(num_samples, input_size)
    y = (X.sum(dim=1) > 0).float().view(-1, 1)
    return X, y

X_train, y_train = generate_data()
X_test, y_test = generate_data(200)

# Move data to the appropriate device
X_train, y_train = X_train.to(device), y_train.to(device)
X_test, y_test = X_test.to(device), y_test.to(device)

# Create DataLoader for batch processing
train_dataset = torch.utils.data.TensorDataset(X_train, y_train)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=32, shuffle=True)

# Move the model to the appropriate device
mlp = mlp.to(device)

# Define loss function and optimizer
criterion = nn.BCEWithLogitsLoss()
optimizer = optim.Adam(mlp.parameters(), lr=0.001)

# Training loop
num_epochs = 50
for epoch in range(num_epochs):
    mlp.train()
    for batch_X, batch_y in train_loader:
        optimizer.zero_grad()
        outputs = mlp(batch_X)
        loss = criterion(outputs, batch_y)
        loss.backward()
        optimizer.step()
    
    # Evaluate on the test set
    if (epoch + 1) % 10 == 0:
        mlp.eval()
        with torch.no_grad():
            test_outputs = mlp(X_test)
            test_loss = criterion(test_outputs, y_test)
            accuracy = ((test_outputs > 0) == y_test).float().mean()
        print(f"Epoch {epoch+1}/{num_epochs}, Test Loss: {test_loss:.4f}, Accuracy: {accuracy:.4f}")

## 6. Evaluating and Using the Trained MLP

Now that we have trained our MLP, let's evaluate its performance and use it for predictions.

In [None]:
# Evaluate on the test set
mlp.eval()
with torch.no_grad():
    test_outputs = mlp(X_test)
    test_loss = criterion(test_outputs, y_test)
    accuracy = ((test_outputs > 0) == y_test).float().mean()
    
print(f"Final Test Loss: {test_loss:.4f}, Accuracy: {accuracy:.4f}")

# Make predictions on new data
new_data = torch.randn(5, input_size).to(device)
with torch.no_grad():
    predictions = mlp(new_data)
    binary_predictions = (predictions > 0).float()

print("\nPredictions on new data:")
for i, (pred, binary_pred) in enumerate(zip(predictions, binary_predictions)):
    print(f"Sample {i+1}: Raw prediction = {pred.item():.4f}, Binary prediction = {binary_pred.item():.0f}")

# Visualize decision boundary (for 2D input)
if input_size == 2:
    x_min, x_max = X_test[:, 0].min() - 1, X_test[:, 0].max() + 1
    y_min, y_max = X_test[:, 1].min() - 1, X_test[:, 1].max() + 1
    xx, yy = torch.meshgrid(torch.linspace(x_min, x_max, 100), torch.linspace(y_min, y_max, 100))
    Z = mlp(torch.cat([xx.reshape(-1, 1), yy.reshape(-1, 1)], dim=1).to(device))
    Z = Z.reshape(xx.shape).detach().cpu()
    
    plt.figure(figsize=(10, 8))
    plt.contourf(xx, yy, Z, cmap=plt.cm.RdYlBu, alpha=0.8)
    plt.scatter(X_test[:, 0].cpu(), X_test[:, 1].cpu(), c=y_test.cpu(), cmap=plt.cm.RdYlBu, edgecolors='black')
    plt.xlabel('Feature 1')
    plt.ylabel('Feature 2')
    plt.title('MLP Decision Boundary')
    plt.show()

## 7. Advanced Topics and Best Practices

Here are some advanced topics and best practices to consider when working with MLPs in PyTorch:

1. **Hyperparameter Tuning**: Use techniques like grid search, random search, or Bayesian optimization to find the best hyperparameters (e.g., learning rate, network architecture, dropout rate).

2. **Regularization**: Implement L1/L2 regularization to prevent overfitting:
   ```python
   optimizer = optim.Adam(mlp.parameters(), lr=0.001, weight_decay=1e-5)  # L2 regularization
   ```

3. **Learning Rate Scheduling**: Implement learning rate decay to improve convergence:
   ```python
   scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1)
   ```

4. **Early Stopping**: Implement early stopping to prevent overfitting:
   ```python
   best_loss = float('inf')
   patience = 10
   counter = 0
   
   for epoch in range(num_epochs):
       # ... training code ...
       
       if val_loss < best_loss:
           best_loss = val_loss
           counter = 0
       else:
           counter += 1
           if counter >= patience:
               print(f"Early stopping at epoch {epoch}")
               break
   ```

5. **Model Saving and Loading**: Save and load your trained models:
   ```python
   # Saving
   torch.save(mlp.state_dict(), 'mlp_model.pth')
   
   # Loading
   mlp = MLP(input_size, hidden_sizes, output_size)
   mlp.load_state_dict(torch.load('mlp_model.pth'))
   mlp.eval()
   ```

6. **Data Augmentation**: For image data, consider using data augmentation techniques:
   ```python
   transform = transforms.Compose([
       transforms.RandomHorizontalFlip(),
       transforms.RandomRotation(10),
       transforms.ToTensor(),
   ])
   ```

7. **Transfer Learning**: For complex tasks, consider using pre-trained models and fine-tuning them for your specific task.

8. **Gradient Clipping**: Implement gradient clipping to prevent exploding gradients:
   ```python
   torch.nn.utils.clip_grad_norm_(mlp.parameters(), max_norm=1.0)
   ```

9. **Proper Initialization**: Use appropriate weight initialization techniques:
   ```python
   def init_weights(m):
       if type(m) == nn.Linear:
           torch.nn.init.xavier_uniform_(m.weight)
           m.bias.data.fill_(0.01)
   
   mlp.apply(init_weights)
   ```

10. **Monitoring and Visualization**: Use tools like TensorBoard or Weight & Biases to monitor and visualize your training process.

By implementing these advanced techniques and best practices, you can significantly improve the performance and robustness of your MLPs in PyTorch.