---
# GPU Memory Stress Test
---

In [2]:
# pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

In [3]:
import torch
import time

In [5]:
# Check if CUDA is available
if torch.cuda.is_available():
    print("✅ CUDA is available!")
    device = torch.device("cuda")
    print(f"Using GPU: {torch.cuda.get_device_name(device)}")
else:
    print("❌ CUDA is NOT available. Using CPU.")
    device = torch.device("cpu")

✅ CUDA is available!
Using GPU: NVIDIA GeForce GTX 1650


---

### 🔍 Step-by-Step Explanation

- `torch.randn(10000, 10000, device=device)` Creates two large random matrices `a` and `b` of size `10,000 × 10,000` directly on the specified device (cpu or cuda).

- `torch.matmul(a, b)` Performs matrix multiplication between `a` and `b`. This is a computationally heavy operation, ideal for benchmarking.

- `start = time.time()` and `end = time.time()` Measures the time taken to perform the operation.

- `torch.cuda.synchronize()` Ensures all GPU operations finish before recording the end time. GPU operations are asynchronous by default, so without this, the timing might be inaccurate. It's skipped for CPU.

- `print(...)` Displays the device used and the time taken for the operation.
---

`a = 1000*1000 Matrix` and `b = 1000*1000 Matrix`

In [6]:
# Perform a small matrix operation to test GPU
print("\nRunning a test tensor operation...")

start = time.time()
a = torch.randn(1000, 1000, device=device)
b = torch.randn(1000, 1000, device=device)
c = torch.matmul(a, b)
torch.cuda.synchronize() if device.type == 'cuda' else None  # Ensure all ops finish
end = time.time()

print(f"✅ Operation completed on: {device}")
print(f"⏱ Time taken: {end - start:.3f} seconds")


Running a test tensor operation...
✅ Operation completed on: cuda
⏱ Time taken: 1.394 seconds


---

`a = 10000*10000 Matrix` and `b = 10000*10000 Matrix`

In [7]:
# Perform a small matrix operation to test GPU
print("\nRunning a test tensor operation...")

start = time.time()
a = torch.randn(10000, 10000, device=device)
b = torch.randn(10000, 10000, device=device)
c = torch.matmul(a, b)
torch.cuda.synchronize() if device.type == 'cuda' else None  # Ensure all ops finish
end = time.time()

print(f"✅ Operation completed on: {device}")
print(f"⏱ Time taken: {end - start:.3f} seconds")


Running a test tensor operation...
✅ Operation completed on: cuda
⏱ Time taken: 2.659 seconds


---

In [20]:
torch.cuda.empty_cache()   # Frees unused GPU memory

---
## Second Test
---

In [13]:
# Check CUDA availability
if torch.cuda.is_available():
    print("✅ CUDA is available!")
    device = torch.device("cuda")
    print(f"Using GPU: {torch.cuda.get_device_name(device)}")
else:
    print("❌ CUDA is NOT available. Exiting.")
    exit()

✅ CUDA is available!
Using GPU: NVIDIA GeForce GTX 1650


In [22]:
# Configuration
target_duration = 180  # seconds
matrix_size = 10000    # increase for heavier load (e.g., 8192)
log_interval = 10     # seconds

In [23]:
# Warm-up
print("\n⏳ Warming up GPU...")
a = torch.randn(matrix_size, matrix_size, device=device)
b = torch.randn(matrix_size, matrix_size, device=device)
_ = torch.matmul(a, b)
torch.cuda.synchronize()

print("\n🚀 Starting sustained load test...")
start_time = time.time()
next_log = start_time + log_interval
iterations = 0

while True:
    matrix_size += 400
    a = torch.randn(matrix_size, matrix_size, device=device)
    b = torch.randn(matrix_size, matrix_size, device=device)
    c = torch.matmul(a, b)
    torch.cuda.synchronize()
    iterations += 1
    torch.cuda.empty_cache()   # Frees unused GPU memory
    current_time = time.time()
    if current_time >= next_log:
        elapsed = current_time - start_time
        mem_allocated = torch.cuda.memory_allocated(device) / 1024**3
        mem_reserved = torch.cuda.memory_reserved(device) / 1024**3
        print(f"⏱ {elapsed:.1f}s elapsed | Iterations: {iterations} | "
              f"Memory Allocated: {mem_allocated:.2f} GB | Reserved: {mem_reserved:.2f} GB")
        next_log += log_interval

    if current_time - start_time >= target_duration:
        break

print(f"\n✅ Test completed in {time.time() - start_time:.2f} seconds")
print(f"🔁 Total iterations completed: {iterations}")


⏳ Warming up GPU...

🚀 Starting sustained load test...
⏱ 11.2s elapsed | Iterations: 5 | Memory Allocated: 1.61 GB | Reserved: 1.63 GB
⏱ 20.0s elapsed | Iterations: 9 | Memory Allocated: 1.71 GB | Reserved: 1.72 GB
⏱ 31.4s elapsed | Iterations: 14 | Memory Allocated: 1.83 GB | Reserved: 1.85 GB
⏱ 40.2s elapsed | Iterations: 17 | Memory Allocated: 1.91 GB | Reserved: 1.93 GB
⏱ 52.7s elapsed | Iterations: 21 | Memory Allocated: 2.02 GB | Reserved: 2.03 GB
⏱ 60.3s elapsed | Iterations: 24 | Memory Allocated: 2.10 GB | Reserved: 2.12 GB
⏱ 71.3s elapsed | Iterations: 28 | Memory Allocated: 2.22 GB | Reserved: 2.23 GB
⏱ 83.7s elapsed | Iterations: 31 | Memory Allocated: 2.30 GB | Reserved: 2.31 GB
⏱ 92.0s elapsed | Iterations: 33 | Memory Allocated: 2.36 GB | Reserved: 2.37 GB
⏱ 103.5s elapsed | Iterations: 36 | Memory Allocated: 2.45 GB | Reserved: 2.46 GB
⏱ 111.8s elapsed | Iterations: 38 | Memory Allocated: 2.51 GB | Reserved: 2.53 GB
⏱ 123.0s elapsed | Iterations: 41 | Memory Allocated: