<div style="text-align:left;">
  <a href="https://code213.tech/" target="_blank">
    <img src="../code213.PNG" alt="code213">
  </a>
  <p><em>prepared by Latreche Sara</em></p>
</div>

# 10.6 - GPU Acceleration in PyTorch

PyTorch allows you to **accelerate computations** by using GPUs instead of CPUs.  
GPUs are specialized for **parallel computations**, which makes them ideal for deep learning.  

### Key Points
1. **CPU vs GPU**
   - CPU: Handles sequential tasks efficiently  
   - GPU: Handles many tasks in parallel → faster matrix and tensor operations  

2. **Tensor Device**
   - Tensors can be on CPU (`torch.device('cpu')`) or GPU (`torch.device('cuda')`)  
   - Move tensors between devices using `.to(device)`  

3. **Why GPU speeds up training**
   - Neural network operations are **highly parallelizable**  
   - GPU can handle thousands of operations simultaneously  

4. **Optional: TPU**
   - Tensor Processing Units (TPUs) are specialized accelerators for tensor operations, often used in Google Cloud

---

In this notebook, we will cover:  
- Checking if GPU is available  
- Moving tensors and models to GPU  
- Performing operations on GPU  
- Comparing CPU vs GPU speed


## Table of Contents

- [1 - Checking Device](#1)
- [2 - Moving Tensors to GPU](#2)
- [3 - Moving Models to GPU](#3)
- [4 - Operations on GPU](#4)
- [5 - Comparing CPU vs GPU](#5)
- [6 - Practice Exercises](#6)


<a name='1'></a>
## 1 - Checking Device

Before using GPU, we need to check if a CUDA-enabled GPU is available.  

### Key Points
- `torch.cuda.is_available()` returns `True` if a GPU is available  
- `torch.device('cuda')` represents a GPU device  
- `torch.device('cpu')` represents the CPU  

This allows you to **write device-agnostic code**, which works on both CPU and GPU.


In [1]:
import torch

# Check if GPU is available
gpu_available = torch.cuda.is_available()
print("GPU available:", gpu_available)

# Set device
device = torch.device('cuda' if gpu_available else 'cpu')
print("Using device:", device)

# Create tensor on the chosen device
x = torch.tensor([1.0, 2.0, 3.0], device=device)
print("Tensor:", x)
print("Tensor device:", x.device)


GPU available: False
Using device: cpu
Tensor: tensor([1., 2., 3.])
Tensor device: cpu


<a name='2'></a>
## 2 - Moving Tensors to GPU

You can move tensors between CPU and GPU using the `.to()` method or `.cuda()` / `.cpu()`.

### Key Points
- `tensor.to(device)` moves a tensor to the specified device  
- `tensor.cuda()` moves a tensor to GPU  
- `tensor.cpu()` moves a tensor back to CPU  

This is useful for ensuring that **all tensors and models are on the same device** to avoid errors during computation.


In [2]:
import torch

# Create tensor on CPU
x_cpu = torch.randn(3, 3)
print("Original tensor device:", x_cpu.device)

# Move tensor to GPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
x_gpu = x_cpu.to(device)
print("Tensor on GPU device:", x_gpu.device)

# Move tensor back to CPU
x_back_cpu = x_gpu.cpu()
print("Tensor back on CPU:", x_back_cpu.device)


Original tensor device: cpu
Tensor on GPU device: cpu
Tensor back on CPU: cpu


<a name='3'></a>
## 3 - Moving Models to GPU

To leverage GPU acceleration, both **tensors** and **models** must be on the same device.  

### Key Points
- Move a model to GPU using `model.to(device)`  
- Ensure all inputs are also on the same device  
- Forward and backward passes will then automatically use the GPU  

This allows you to train models faster and handle larger datasets efficiently.


In [3]:
import torch
import torch.nn as nn

# Define a simple model
model = nn.Linear(3, 1)
print("Original model device:", next(model.parameters()).device)

# Set device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Move model to GPU
model = model.to(device)
print("Model moved to device:", next(model.parameters()).device)

# Create input tensor on the same device
x = torch.randn(2, 3).to(device)

# Forward pass on GPU
output = model(x)
print("Output on device:", output.device)


Original model device: cpu
Model moved to device: cpu
Output on device: cpu


<a name='4'></a>
## 4 - Operations on GPU

Once tensors and models are on GPU, all operations are **performed on the GPU**.  

### Key Points
- PyTorch automatically uses the GPU for tensor operations if tensors are on `cuda` device  
- You can perform standard operations: addition, multiplication, matrix multiplication, etc.  
- Always ensure **all involved tensors are on the same device** to avoid runtime errors  

Example operations:
- Element-wise addition: `x + y`  
- Matrix multiplication: `torch.matmul(a, b)`  
- Activation functions: `torch.relu(tensor)`


In [4]:
import torch

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Create tensors on GPU
a = torch.randn(3, 3, device=device)
b = torch.randn(3, 3, device=device)

# Element-wise addition
c = a + b
print("Element-wise addition result device:", c.device)

# Matrix multiplication
d = torch.matmul(a, b)
print("Matrix multiplication result device:", d.device)

# Applying activation function
e = torch.relu(d)
print("ReLU result device:", e.device)


Element-wise addition result device: cpu
Matrix multiplication result device: cpu
ReLU result device: cpu


<a name='5'></a>
## 5 - Comparing CPU vs GPU

It is often useful to **measure the speedup** gained by using a GPU.  

### Key Points
- GPU accelerates **large tensor operations** much more than CPU  
- Small operations may not show significant speedup due to overhead  
- Use `torch.cuda.synchronize()` to accurately measure GPU time  

Example operations to compare:
- Matrix multiplication of large tensors  
- Forward pass of a simple model


In [5]:
import torch
import time

# Set device
device_cpu = torch.device('cpu')
device_gpu = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Large tensors
size = 10000
x_cpu = torch.randn(size, size, device=device_cpu)
y_cpu = torch.randn(size, size, device=device_cpu)

# CPU timing
start_cpu = time.time()
z_cpu = torch.matmul(x_cpu, y_cpu)
end_cpu = time.time()
print("CPU time:", end_cpu - start_cpu, "seconds")

if device_gpu.type == 'cuda':
    # Move tensors to GPU
    x_gpu = x_cpu.to(device_gpu)
    y_gpu = y_cpu.to(device_gpu)
    
    # GPU timing
    torch.cuda.synchronize()  # Wait for all operations to finish
    start_gpu = time.time()
    z_gpu = torch.matmul(x_gpu, y_gpu)
    torch.cuda.synchronize()
    end_gpu = time.time()
    print("GPU time:", end_gpu - start_gpu, "seconds")


CPU time: 41.122729539871216 seconds


<a name='6'></a>
## 6 - Practice Exercises

Try the following exercises to reinforce your understanding of **GPU usage in PyTorch**:



### **Exercise 1: Check GPU**
- Write a script to check if a CUDA-enabled GPU is available.  
- Set the device accordingly.



### **Exercise 2: Move Tensors**
- Create a tensor on CPU and move it to GPU.  
- Perform a matrix multiplication on the GPU.  
- Move the result back to CPU.



### **Exercise 3: Move Model**
- Create a simple linear model (input size 4 → output size 2).  
- Move the model to GPU.  
- Create a sample input tensor and perform a forward pass.



### **Exercise 4: Timing CPU vs GPU**
- Create two large random tensors (size 5000 x 5000).  
- Compute the matrix multiplication on CPU and GPU.  
- Measure and compare the time.


In [None]:
import torch
import torch.nn as nn
import time

# ----------------------------
# Exercise 1: Check GPU
# ----------------------------
gpu_available = torch.cuda.is_available()
device = torch.device('cuda' if gpu_available else 'cpu')
print("GPU available:", gpu_available)
print("Using device:", device)

# ----------------------------
# Exercise 2: Move Tensors
# ----------------------------
x_cpu = torch.randn(3, 3)
y_cpu = torch.randn(3, 3)
x_gpu = x_cpu.to(device)
y_gpu = y_cpu.to(device)
z_gpu = torch.matmul(x_gpu, y_gpu)
z_cpu = z_gpu.cpu()
print("Result on CPU after GPU matmul:", z_cpu)

# ----------------------------
# Exercise 3: Move Model
# ----------------------------
model = nn.Linear(4, 2)
model = model.to(device)
input_tensor = torch.randn(1, 4).to(device)
output = model(input_tensor)
print("Model output on device:", output)

# ----------------------------
# Exercise 4: Timing CPU vs GPU
# ----------------------------
size = 5000
x = torch.randn(size, size)
y = torch.randn(size, size)

# CPU timing
start_cpu = time.time()
z_cpu = torch.matmul(x, y)
end_cpu = time.time()
print("CPU time:", end_cpu - start_cpu, "seconds")

if gpu_available:
    x_gpu = x.to(device)
    y_gpu = y.to(device)
    torch.cuda.synchronize()
    start_gpu = time.time()
    z_gpu = torch.matmul(x_gpu, y_gpu)
    torch.cuda.synchronize()
    end_gpu = time.time()
    print("GPU time:", end_gpu - start_gpu, "seconds")
