# PyTorch Tutorial for Beginners

Welcome to this comprehensive PyTorch tutorial! This notebook is designed for students who are new to PyTorch and deep learning.

## What is PyTorch?

PyTorch is an open-source machine learning framework developed by Meta (formerly Facebook). It's widely used for:
- Deep learning research and development
- Computer vision tasks
- Natural language processing
- Scientific computing with GPU acceleration

## Why PyTorch?

- **Dynamic computation graphs**: Build models on-the-fly
- **Pythonic**: Feels natural to Python developers
- **Strong community**: Excellent documentation and support
- **Research-friendly**: Easy to experiment and prototype

Let's get started!

## 1. Setup and Installation Check

First, let's make sure PyTorch is installed and check our environment.

In [None]:
# Install PyTorch if not already installed
# Uncomment the line below if you need to install PyTorch
# !pip install torch torchvision torchaudio

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

print(f"PyTorch version: {torch.__version__}")
print(f"Python version: {torch.version.cuda if torch.cuda.is_available() else 'CUDA not available'}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA version: {torch.version.cuda}")
    print(f"GPU device: {torch.cuda.get_device_name(0)}")

## 2. Understanding Tensors

Tensors are the fundamental data structure in PyTorch. They're similar to NumPy arrays but can run on GPUs and support automatic differentiation.

### 2.1 Creating Tensors

In [None]:
# Creating tensors from Python lists
data = [[1, 2], [3, 4]]
tensor_from_list = torch.tensor(data)
print("From list:")
print(tensor_from_list)
print(f"Shape: {tensor_from_list.shape}")
print(f"Data type: {tensor_from_list.dtype}")
print()

In [None]:
# Creating tensors with specific shapes
zeros = torch.zeros(2, 3)  # 2x3 matrix of zeros
ones = torch.ones(2, 3)    # 2x3 matrix of ones
random = torch.randn(2, 3) # 2x3 matrix with random values from standard normal distribution

print("Zeros tensor:")
print(zeros)
print("\nOnes tensor:")
print(ones)
print("\nRandom tensor:")
print(random)

In [None]:
# Creating tensors from NumPy arrays
numpy_array = np.array([[1, 2, 3], [4, 5, 6]])
tensor_from_numpy = torch.from_numpy(numpy_array)

print("NumPy array:")
print(numpy_array)
print("\nTensor from NumPy:")
print(tensor_from_numpy)
print(f"Data type: {tensor_from_numpy.dtype}")

### 2.2 Tensor Properties

In [None]:
# Let's explore tensor properties
tensor = torch.randn(3, 4, 5)

print(f"Tensor shape: {tensor.shape}")
print(f"Tensor size: {tensor.size()}")
print(f"Number of dimensions: {tensor.ndim}")
print(f"Data type: {tensor.dtype}")
print(f"Device: {tensor.device}")
print(f"Total number of elements: {tensor.numel()}")

### 2.3 Indexing and Slicing

In [None]:
# Create a sample tensor
tensor = torch.randn(4, 3)
print("Original tensor:")
print(tensor)
print()

# Indexing (similar to NumPy)
print("First row:", tensor[0])
print("First column:", tensor[:, 0])
print("Element at (1,2):", tensor[1, 2])
print("Last row:", tensor[-1])
print("First two rows:", tensor[:2])

### 2.4 Reshaping Tensors

In [None]:
# Create a tensor and reshape it
tensor = torch.arange(12)  # Creates tensor [0, 1, 2, ..., 11]
print("Original tensor:")
print(tensor)
print(f"Shape: {tensor.shape}")
print()

# Reshape to different dimensions
reshaped = tensor.reshape(3, 4)
print("Reshaped to 3x4:")
print(reshaped)
print()

# Reshape to 3D
reshaped_3d = tensor.reshape(2, 3, 2)
print("Reshaped to 2x3x2:")
print(reshaped_3d)
print()

# Using -1 to infer dimension
auto_reshape = tensor.reshape(-1, 2)  # Let PyTorch figure out the first dimension
print("Auto-reshaped to ?x2:")
print(auto_reshape)
print(f"Shape: {auto_reshape.shape}")

## 3. Tensor Operations

PyTorch provides many operations for manipulating tensors.

### 3.1 Basic Mathematical Operations

In [None]:
# Create sample tensors
a = torch.tensor([[1, 2], [3, 4]], dtype=torch.float32)
b = torch.tensor([[5, 6], [7, 8]], dtype=torch.float32)

print("Tensor a:")
print(a)
print("\nTensor b:")
print(b)
print()

# Element-wise operations
print("Addition (a + b):")
print(a + b)
print()

print("Subtraction (a - b):")
print(a - b)
print()

print("Element-wise multiplication (a * b):")
print(a * b)
print()

print("Element-wise division (a / b):")
print(a / b)

### 3.2 Matrix Operations

In [None]:
# Matrix multiplication
print("Matrix multiplication (a @ b):")
print(a @ b)
print()

# Alternative syntax for matrix multiplication
print("Matrix multiplication (torch.mm(a, b)):")
print(torch.mm(a, b))
print()

# Transpose
print("Transpose of a:")
print(a.T)
print()

# Other useful operations
print("Sum of all elements in a:")
print(a.sum())
print()

print("Sum along rows (axis=0):")
print(a.sum(dim=0))
print()

print("Sum along columns (axis=1):")
print(a.sum(dim=1))

### 3.3 In-place vs Out-of-place Operations

In [None]:
# Out-of-place operations (create new tensor)
x = torch.tensor([1.0, 2.0, 3.0])
print("Original x:", x)
print("x.add(5):", x.add(5))  # Creates new tensor
print("x after add:", x)  # x is unchanged
print()

# In-place operations (modify existing tensor)
y = torch.tensor([1.0, 2.0, 3.0])
print("Original y:", y)
y.add_(5)  # In-place operation (note the underscore)
print("y after add_:", y)  # y is modified
print()

# Warning: Be careful with in-place operations when computing gradients!

## 4. Automatic Differentiation (Autograd)

PyTorch's autograd system automatically computes gradients, which is essential for training neural networks.

In [None]:
# Create a tensor that requires gradients
x = torch.tensor([2.0], requires_grad=True)
print(f"x = {x}")
print(f"x.requires_grad = {x.requires_grad}")
print()

# Define a function y = x^2 + 3x + 1
y = x**2 + 3*x + 1
print(f"y = x^2 + 3x + 1 = {y}")
print()

# Compute gradients
y.backward()  # This computes dy/dx
print(f"dy/dx = {x.grad}")

# Mathematical verification: dy/dx = 2x + 3 = 2(2) + 3 = 7
print(f"Expected gradient: 2*{x.item()} + 3 = {2*x.item() + 3}")

In [None]:
# More complex example with multiple variables
x = torch.tensor([1.0, 2.0], requires_grad=True)
y = torch.tensor([3.0, 4.0], requires_grad=True)

# Define z = x * y + x^2
z = x * y + x**2
print(f"x = {x}")
print(f"y = {y}")
print(f"z = x * y + x^2 = {z}")
print()

# We need to sum z to get a scalar for backward()
loss = z.sum()
print(f"loss = sum(z) = {loss}")
print()

# Compute gradients
loss.backward()
print(f"dL/dx = {x.grad}")
print(f"dL/dy = {y.grad}")

# Mathematical verification:
# dz/dx = y + 2x, so dL/dx = [3+2*1, 4+2*2] = [5, 8]
# dz/dy = x, so dL/dy = [1, 2]

### 4.1 Gradient Accumulation and Zeroing

In [None]:
# Gradients accumulate by default
x = torch.tensor([2.0], requires_grad=True)

# First computation
y1 = x**2
y1.backward()
print(f"After first backward: x.grad = {x.grad}")

# Second computation (gradients accumulate!)
y2 = x**3
y2.backward()
print(f"After second backward: x.grad = {x.grad}")
print("Notice how gradients accumulated!")
print()

# Zero gradients before next computation
x.grad.zero_()
print(f"After zeroing: x.grad = {x.grad}")

# Third computation
y3 = x**2
y3.backward()
print(f"After third backward: x.grad = {x.grad}")

## 5. Building Neural Networks with torch.nn

The `torch.nn` module provides building blocks for creating neural networks.

### 5.1 Basic Building Blocks

In [None]:
# Linear layer (fully connected layer)
linear = nn.Linear(in_features=3, out_features=2)
print("Linear layer:")
print(linear)
print(f"Weight shape: {linear.weight.shape}")
print(f"Bias shape: {linear.bias.shape}")
print()

# Test the linear layer
x = torch.randn(1, 3)  # Batch size 1, 3 features
output = linear(x)
print(f"Input shape: {x.shape}")
print(f"Output shape: {output.shape}")
print(f"Output: {output}")

In [None]:
# Activation functions
x = torch.tensor([-2.0, -1.0, 0.0, 1.0, 2.0])

# ReLU activation
relu = nn.ReLU()
print(f"Input: {x}")
print(f"ReLU: {relu(x)}")
print()

# Sigmoid activation
sigmoid = nn.Sigmoid()
print(f"Sigmoid: {sigmoid(x)}")
print()

# Tanh activation
tanh = nn.Tanh()
print(f"Tanh: {tanh(x)}")

### 5.2 Creating a Simple Neural Network

In [None]:
class SimpleNet(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(SimpleNet, self).__init__()
        # Define layers
        self.linear1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.linear2 = nn.Linear(hidden_size, output_size)
    
    def forward(self, x):
        # Define forward pass
        out = self.linear1(x)
        out = self.relu(out)
        out = self.linear2(out)
        return out

# Create the network
model = SimpleNet(input_size=4, hidden_size=10, output_size=3)
print("Model architecture:")
print(model)
print()

# Count parameters
total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f"Total parameters: {total_params}")
print(f"Trainable parameters: {trainable_params}")

In [None]:
# Test the network
x = torch.randn(5, 4)  # Batch of 5 samples, each with 4 features
output = model(x)

print(f"Input shape: {x.shape}")
print(f"Output shape: {output.shape}")
print("\nFirst few outputs:")
print(output[:3])

### 5.3 Alternative Way to Build Networks

In [None]:
# Using nn.Sequential for simpler models
model_sequential = nn.Sequential(
    nn.Linear(4, 10),
    nn.ReLU(),
    nn.Linear(10, 3)
)

print("Sequential model:")
print(model_sequential)
print()

# Test the sequential model
output_seq = model_sequential(x)
print(f"Sequential output shape: {output_seq.shape}")

## 6. Training a Neural Network

Now let's learn how to train a neural network with a complete example.

### 6.1 Loss Functions and Optimizers

In [None]:
# Common loss functions
# For regression
mse_loss = nn.MSELoss()
mae_loss = nn.L1Loss()

# For classification
cross_entropy_loss = nn.CrossEntropyLoss()
binary_cross_entropy = nn.BCELoss()

print("Loss functions created successfully!")
print()

# Optimizers
model = SimpleNet(4, 10, 3)
optimizer_sgd = optim.SGD(model.parameters(), lr=0.01)
optimizer_adam = optim.Adam(model.parameters(), lr=0.001)

print("Optimizers:")
print(f"SGD: {optimizer_sgd}")
print(f"Adam: {optimizer_adam}")

### 6.2 Training Loop Structure

In [None]:
# Let's create some dummy data for demonstration
def create_dummy_data(num_samples=1000):
    X = torch.randn(num_samples, 4)
    # Create labels based on a simple rule
    y = (X[:, 0] + X[:, 1] > 0).long()  # Binary classification
    return X, y

# Create data
X_train, y_train = create_dummy_data(800)
X_val, y_val = create_dummy_data(200)

print(f"Training data shape: {X_train.shape}")
print(f"Training labels shape: {y_train.shape}")
print(f"Validation data shape: {X_val.shape}")
print(f"Validation labels shape: {y_val.shape}")
print(f"Unique labels: {torch.unique(y_train)}")

In [None]:
# Define model for binary classification
model = nn.Sequential(
    nn.Linear(4, 10),
    nn.ReLU(),
    nn.Linear(10, 2)  # 2 classes for binary classification
)

# Loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop
num_epochs = 100
train_losses = []
val_losses = []

for epoch in range(num_epochs):
    # Training phase
    model.train()  # Set model to training mode
    
    # Forward pass
    outputs = model(X_train)
    loss = criterion(outputs, y_train)
    
    # Backward pass and optimization
    optimizer.zero_grad()  # Clear gradients
    loss.backward()        # Compute gradients
    optimizer.step()       # Update parameters
    
    # Validation phase
    model.eval()  # Set model to evaluation mode
    with torch.no_grad():  # Disable gradient computation for efficiency
        val_outputs = model(X_val)
        val_loss = criterion(val_outputs, y_val)
    
    # Store losses
    train_losses.append(loss.item())
    val_losses.append(val_loss.item())
    
    # Print progress
    if (epoch + 1) % 20 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Train Loss: {loss.item():.4f}, Val Loss: {val_loss.item():.4f}')

print("Training completed!")

In [None]:
# Plot training progress
plt.figure(figsize=(10, 6))
plt.plot(train_losses, label='Training Loss')
plt.plot(val_losses, label='Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Training Progress')
plt.legend()
plt.grid(True)
plt.show()

### 6.3 Model Evaluation

In [None]:
# Evaluate the model
model.eval()
with torch.no_grad():
    # Training accuracy
    train_outputs = model(X_train)
    train_predictions = torch.argmax(train_outputs, dim=1)
    train_accuracy = (train_predictions == y_train).float().mean()
    
    # Validation accuracy
    val_outputs = model(X_val)
    val_predictions = torch.argmax(val_outputs, dim=1)
    val_accuracy = (val_predictions == y_val).float().mean()

print(f"Training Accuracy: {train_accuracy:.4f} ({train_accuracy*100:.2f}%)")
print(f"Validation Accuracy: {val_accuracy:.4f} ({val_accuracy*100:.2f}%)")

## 7. Complete Example: Iris Dataset Classification

Let's put everything together with a real dataset - the famous Iris dataset.

In [None]:
# Load and prepare the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

print(f"Dataset shape: {X.shape}")
print(f"Number of classes: {len(np.unique(y))}")
print(f"Feature names: {iris.feature_names}")
print(f"Class names: {iris.target_names}")
print()

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Convert to PyTorch tensors
X_train_tensor = torch.FloatTensor(X_train_scaled)
X_test_tensor = torch.FloatTensor(X_test_scaled)
y_train_tensor = torch.LongTensor(y_train)
y_test_tensor = torch.LongTensor(y_test)

print(f"Training set size: {X_train_tensor.shape[0]}")
print(f"Test set size: {X_test_tensor.shape[0]}")

In [None]:
# Define the neural network for Iris classification
class IrisNet(nn.Module):
    def __init__(self):
        super(IrisNet, self).__init__()
        self.fc1 = nn.Linear(4, 16)  # 4 input features
        self.fc2 = nn.Linear(16, 8)
        self.fc3 = nn.Linear(8, 3)   # 3 output classes
        self.relu = nn.ReLU()
        self.dropout = nn.Dropout(0.2)
    
    def forward(self, x):
        x = self.relu(self.fc1(x))
        x = self.dropout(x)
        x = self.relu(self.fc2(x))
        x = self.dropout(x)
        x = self.fc3(x)
        return x

# Create the model
iris_model = IrisNet()
print("Iris Classification Model:")
print(iris_model)
print()

# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(iris_model.parameters(), lr=0.01)

print(f"Total parameters: {sum(p.numel() for p in iris_model.parameters())}")

In [None]:
# Training the Iris model
num_epochs = 200
train_losses = []
train_accuracies = []

iris_model.train()
for epoch in range(num_epochs):
    # Forward pass
    outputs = iris_model(X_train_tensor)
    loss = criterion(outputs, y_train_tensor)
    
    # Backward pass
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    # Calculate accuracy
    with torch.no_grad():
        predicted = torch.argmax(outputs, dim=1)
        accuracy = (predicted == y_train_tensor).float().mean()
    
    train_losses.append(loss.item())
    train_accuracies.append(accuracy.item())
    
    if (epoch + 1) % 50 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}, Accuracy: {accuracy.item():.4f}')

print("\nTraining completed!")

In [None]:
# Plot training progress
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5))

# Plot loss
ax1.plot(train_losses)
ax1.set_title('Training Loss')
ax1.set_xlabel('Epoch')
ax1.set_ylabel('Loss')
ax1.grid(True)

# Plot accuracy
ax2.plot(train_accuracies)
ax2.set_title('Training Accuracy')
ax2.set_xlabel('Epoch')
ax2.set_ylabel('Accuracy')
ax2.grid(True)

plt.tight_layout()
plt.show()

In [None]:
# Evaluate on test set
iris_model.eval()
with torch.no_grad():
    test_outputs = iris_model(X_test_tensor)
    test_predictions = torch.argmax(test_outputs, dim=1)
    test_accuracy = (test_predictions == y_test_tensor).float().mean()
    
    # Get probabilities using softmax
    test_probabilities = torch.softmax(test_outputs, dim=1)

print(f"Test Accuracy: {test_accuracy:.4f} ({test_accuracy*100:.2f}%)")
print()

# Show some predictions
print("Sample predictions:")
print("Actual -> Predicted (Probabilities)")
for i in range(min(10, len(y_test))):
    actual = iris.target_names[y_test[i]]
    predicted = iris.target_names[test_predictions[i]]
    probs = test_probabilities[i].numpy()
    print(f"{actual:10} -> {predicted:10} {probs}")

In [None]:
# Make predictions on new data
def predict_iris_class(model, scaler, features):
    """
    Predict iris class for given features
    features: [sepal_length, sepal_width, petal_length, petal_width]
    """
    model.eval()
    with torch.no_grad():
        # Normalize features
        features_scaled = scaler.transform([features])
        features_tensor = torch.FloatTensor(features_scaled)
        
        # Make prediction
        output = model(features_tensor)
        probabilities = torch.softmax(output, dim=1)
        predicted_class = torch.argmax(output, dim=1)
        
        return predicted_class.item(), probabilities.numpy()[0]

# Example predictions
examples = [
    [5.1, 3.5, 1.4, 0.2],  # Typical setosa
    [6.2, 2.8, 4.8, 1.8],  # Typical versicolor
    [7.2, 3.0, 5.8, 1.6],  # Typical virginica
]

print("Predictions for new samples:")
print("Features [SL, SW, PL, PW] -> Prediction (Probabilities)")
for features in examples:
    pred_class, probs = predict_iris_class(iris_model, scaler, features)
    class_name = iris.target_names[pred_class]
    print(f"{features} -> {class_name} {probs}")

## 8. Best Practices and Next Steps

### Key Takeaways:

1. **Tensors are fundamental** - They're like NumPy arrays but with GPU support and automatic differentiation

2. **Autograd is powerful** - PyTorch automatically computes gradients for you

3. **nn.Module is the base** - Always inherit from this when building models

4. **Training loop structure**:
   - Forward pass
   - Compute loss
   - Zero gradients
   - Backward pass
   - Update parameters

5. **Don't forget to**:
   - Set model to train/eval mode
   - Use `torch.no_grad()` for inference
   - Normalize your data
   - Monitor training progress

### Common Debugging Tips:

- Check tensor shapes frequently
- Ensure data types match (float32 for inputs, long for classification labels)
- Watch out for in-place operations when computing gradients
- Use `model.train()` and `model.eval()` appropriately
- Don't forget to zero gradients before backward pass

### Next Steps to Learn:

1. **Data Loading**: `torch.utils.data.DataLoader` for efficient batch processing
2. **Convolutional Networks**: For computer vision tasks
3. **Recurrent Networks**: For sequence data
4. **Transfer Learning**: Using pre-trained models
5. **GPU Computing**: Moving tensors and models to GPU
6. **Saving/Loading Models**: Model persistence

### Useful Resources:

- [PyTorch Documentation](https://pytorch.org/docs/stable/index.html)
- [PyTorch Tutorials](https://pytorch.org/tutorials/)
- [PyTorch Examples](https://github.com/pytorch/examples)
- [Deep Learning with PyTorch](https://pytorch.org/deep-learning-with-pytorch)

## ðŸŽ¯ Exercise for You!

Try modifying the Iris classification example:

1. Change the network architecture (add more layers, change sizes)
2. Try different activation functions (Tanh, LeakyReLU)
3. Experiment with different optimizers and learning rates
4. Add regularization techniques (different dropout rates)
5. Create visualizations of the decision boundaries

Happy learning with PyTorch! ðŸš€