### PyTorch - Deep Learning Framework
- What is PyTorch?

`PyTorch` is a deep learning framework that enables building, training, and deploying neural networks. It's the backbone for training the U-Net segmentation model in VisionExtract.

- Why PyTorch for VisionExtract?

  - Dynamic computation graphs: More flexible than static graphs

  - Pythonic: Easy to debug and experiment with

  - Production-ready: Used by major companies (Tesla, Meta, etc.)

  - Ecosystem: Rich library of pre-trained models and utilities
- #### Installation

In [26]:
!pip install torch torchvision torchaudio
# With GPU support (CUDA 11.8)
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118





[notice] A new release of pip is available: 25.2 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


Looking in indexes: https://download.pytorch.org/whl/cu118



[notice] A new release of pip is available: 25.2 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


### 1. Tensors - PyTorch's Core Data Structure

- `Tensors` are multi-dimensional arrays (similar to NumPy arrays but optimized for GPUs).

In [27]:
import torch
import numpy as np

# Create tensors from scratch
tensor_zeros = torch.zeros(2, 3)
print(f"Zeros:\n{tensor_zeros}")

tensor_ones = torch.ones(2, 3)    # 2x3 ones
print(f"\nOnes:\n{tensor_ones}")

# Create a Python list
tensor_from_list = torch.tensor([[1,2,3], [4,5,6]])
print(f"\nRandom:\n{tensor_from_list}")

# NumPy  PyTorch conversion
numpy_array = np.array([[1, 10], [15, 4]])
tensor_from_numpy = torch.from_numpy(numpy_array)
print(f"\nFrom NumPy:\n{tensor_from_numpy}")

# Back to NumPy
back_to_numpy = tensor_from_numpy.numpy()
print(f"\nBack to NumPy:\n{back_to_numpy}")

Zeros:
tensor([[0., 0., 0.],
        [0., 0., 0.]])

Ones:
tensor([[1., 1., 1.],
        [1., 1., 1.]])

Random:
tensor([[1, 2, 3],
        [4, 5, 6]])

From NumPy:
tensor([[ 1, 10],
        [15,  4]])

Back to NumPy:
[[ 1 10]
 [15  4]]


- #### Key Properties:

In [28]:
tensor = torch.randn(2 ,3 , 224, 224)  # Batch of 2 images, 3 channels, 224x224

print(f"Shape: {tensor.shape}")        # torch.Size([2, 3, 224, 224])
print(f"Data type: {tensor.dtype}")    # torch.float32
print(f"Device: {tensor.device}")      # cpu or cuda
print(f"Requires grad: {tensor.requires_grad}")  # False by default

Shape: torch.Size([2, 3, 224, 224])
Data type: torch.float32
Device: cpu
Requires grad: False


#### 2. Moving Tensors to GPU

  - GPUs dramatically speed up neural network training (50-100x faster).

In [29]:
# Check GPU Availability
print(f"CUDA Availability: {torch.cuda.is_available()}")
print(f"GPU count:{torch.cuda.device_count()}")
if torch.cuda.is_available():
  print(f"GPU name: {torch.cuda.get_device_name(0)}")

# Set device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

# Move tensor to GPU
tensor_cpu = torch.randn(1000, 1000)
tensor_gpu = tensor_cpu.to(device)
print(f"CPU tensor device: {tensor_cpu.device}")
print(f"GPU tensor device: {tensor_gpu.device}")

# GPU operations are much faster
import time

# CPU operations
start = time.time()
results_cpu = torch.matmul(tensor_cpu, tensor_cpu)
cpu_time = time.time()-start

# GPU operation
start = time.time()
result_gpu = torch.matmul(tensor_gpu, tensor_gpu)
gpu_time = time.time() - start

print(f"\nCPU time: {cpu_time:.4f}s")
print(f"GPU time: {gpu_time:.4f}s (including transfer overhead)")
print(f"Speedup: {cpu_time/gpu_time:.1f}x")

CUDA Availability: False
GPU count:0
Using device: cpu
CPU tensor device: cpu
GPU tensor device: cpu

CPU time: 0.0211s
GPU time: 0.0208s (including transfer overhead)
Speedup: 1.0x


#### 3. Building Neural Networks with nn.Module

 - `PyTorch` models are classes that inherit from nn.Module. This is the standard way to define neural networks.

In [30]:
import torch
import torch.nn as nn

# Simple Neural Network
class SimpleNet(nn.Module):
  def __init__(self, input_size, hidden_size, num_classes):
    super(SimpleNet, self).__init__()

    # Define layers
    self.fc1 = nn.Linear(input_size, hidden_size) # Input -> Hidden
    self.relu = nn.ReLU() # Activation
    self.sigmoid = nn.Sigmoid() # Activation 
    self.fc2 = nn.Linear(hidden_size, num_classes) # Hidden -> Ouput

  def forward(self, x):
    """Forward pass: defines how data flows through the network"""
    x = self.fc1(x)  # Linear layer
    x = self.relu(x) # ReLU activation
    x = self.sigmoid(x) # Sigmoid activation 
    x = self.fc2(x) # Output layer
    return x

# Create model instance
model = SimpleNet(input_size=784, hidden_size=128, num_classes=10)

# Test forward pass
dummy_input = torch.randn(32, 784)  # Batch of 32 images, 784 features
output = model(dummy_input)
print(f"Input shape: {dummy_input.shape}")
print(f"Output shape: {output.shape}")

Input shape: torch.Size([32, 784])
Output shape: torch.Size([32, 10])


In [33]:
def properly_test_activations():
    # Set random seed for reproducibility
    torch.manual_seed(42)
    
    # Create normalized input
    x = torch.randn(1, 10)  # Batch size 1, 10 features
    x = F.normalize(x, dim=1)  # Normalize
    
    print("Input range:", x.min().item(), "to", x.max().item())
    print()
    
    # Test different activations
    activations = {
        'ReLU': nn.ReLU(),
        'LeakyReLU': nn.LeakyReLU(0.1),
        'Tanh': nn.Tanh(),
        'Sigmoid': nn.Sigmoid(),
        'ELU': nn.ELU(),
        'GELU': nn.GELU()
    }
    
    for name, activation in activations.items():
        # Create fresh model each time
        model = nn.Sequential(
            nn.Linear(10, 5),
            activation,
            nn.Linear(5, 1)
        )
        
        # Reinitialize weights
        for layer in model:
            if hasattr(layer, 'weight'):
                nn.init.xavier_uniform_(layer.weight)
                nn.init.zeros_(layer.bias)
        
        output = model(x)
        print(f"{name:12}: {output.item():.6f}")

# Run the test
properly_test_activations()

Input range: -0.42136189341545105 to 0.8286473155021667

ReLU        : 0.023503
LeakyReLU   : 0.051662
Tanh        : -0.392495
Sigmoid     : -0.062478
ELU         : -0.393911
GELU        : 0.703459


- Teaching Point: Show the architecture:

Input (32, 784)

    ↓

Linear(784 → 128)

    ↓

ReLU

    ↓

Linear(128 → 10)

    ↓
    
Output (32, 10)
