---
# GPU Memory Stress Test
---

In [2]:
# pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

In [3]:
import torch
import time

In [5]:
# Check if CUDA is available
if torch.cuda.is_available():
    print("✅ CUDA is available!")
    device = torch.device("cuda")
    print(f"Using GPU: {torch.cuda.get_device_name(device)}")
else:
    print("❌ CUDA is NOT available. Using CPU.")
    device = torch.device("cpu")

✅ CUDA is available!
Using GPU: NVIDIA GeForce GTX 1650


---

### 🔍 Step-by-Step Explanation

- `torch.randn(10000, 10000, device=device)` Creates two large random matrices `a` and `b` of size `10,000 × 10,000` directly on the specified device (cpu or cuda).

- `torch.matmul(a, b)` Performs matrix multiplication between `a` and `b`. This is a computationally heavy operation, ideal for benchmarking.

- `start = time.time()` and `end = time.time()` Measures the time taken to perform the operation.

- `torch.cuda.synchronize()` Ensures all GPU operations finish before recording the end time. GPU operations are asynchronous by default, so without this, the timing might be inaccurate. It's skipped for CPU.

- `print(...)` Displays the device used and the time taken for the operation.
---

`a = 1000*1000 Matrix` and `b = 1000*1000 Matrix`

In [6]:
# Perform a small matrix operation to test GPU
print("\nRunning a test tensor operation...")

start = time.time()
a = torch.randn(1000, 1000, device=device)
b = torch.randn(1000, 1000, device=device)
c = torch.matmul(a, b)
torch.cuda.synchronize() if device.type == 'cuda' else None  # Ensure all ops finish
end = time.time()

print(f"✅ Operation completed on: {device}")
print(f"⏱ Time taken: {end - start:.3f} seconds")


Running a test tensor operation...
✅ Operation completed on: cuda
⏱ Time taken: 1.394 seconds


---

`a = 10000*10000 Matrix` and `b = 10000*10000 Matrix`

In [7]:
# Perform a small matrix operation to test GPU
print("\nRunning a test tensor operation...")

start = time.time()
a = torch.randn(10000, 10000, device=device)
b = torch.randn(10000, 10000, device=device)
c = torch.matmul(a, b)
torch.cuda.synchronize() if device.type == 'cuda' else None  # Ensure all ops finish
end = time.time()

print(f"✅ Operation completed on: {device}")
print(f"⏱ Time taken: {end - start:.3f} seconds")


Running a test tensor operation...
✅ Operation completed on: cuda
⏱ Time taken: 2.659 seconds


---

`a = 100000*100000 Matrix` and `b = 100000*100000 Matrix`

In [8]:
# Perform a small matrix operation to test GPU
print("\nRunning a test tensor operation...")

start = time.time()
a = torch.randn(100000, 100000, device=device)
b = torch.randn(100000, 100000, device=device)
c = torch.matmul(a, b)
torch.cuda.synchronize() if device.type == 'cuda' else None  # Ensure all ops finish
end = time.time()

print(f"✅ Operation completed on: {device}")
print(f"⏱ Time taken: {end - start:.3f} seconds")


Running a test tensor operation...


OutOfMemoryError: CUDA out of memory. Tried to allocate 37.25 GiB. GPU 0 has a total capacity of 4.00 GiB of which 2.07 GiB is free. Of the allocated memory 1.13 GiB is allocated by PyTorch, and 11.44 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)