# Day 2: CUDA Event Timing

**Focus:** Using `torch.cuda.Event` for accurate GPU timing

## Objectives
- Use CUDA events for ground-truth GPU timing
- Compare CPU wall-clock time vs GPU event time
- Understand why CUDA events are needed for accurate measurements

In [None]:
import torch
import time

assert torch.cuda.is_available(), "CUDA not available"

print(f"GPU: {torch.cuda.get_device_name(0)}")

## Experiment: CPU Time vs GPU Event Time

Compare `time.time()` with `torch.cuda.Event` timing.

In [None]:
N = 4096
A = torch.randn((N, N), device='cuda', dtype=torch.float16)
B = torch.randn((N, N), device='cuda', dtype=torch.float16)

# CPU wall-clock time (with sync)
cpu_start = time.time()
C = A @ B
torch.cuda.synchronize()
cpu_time = (time.time() - cpu_start) * 1000  # ms

# GPU event time (ground truth)
start_event = torch.cuda.Event(enable_timing=True)
end_event = torch.cuda.Event(enable_timing=True)

start_event.record()
C = A @ B
end_event.record()
torch.cuda.synchronize()  # Wait for events to complete
gpu_time = start_event.elapsed_time(end_event)  # ms

print(f"CPU wall-clock time (with sync): {cpu_time:.3f} ms")
print(f"GPU event time:                  {gpu_time:.3f} ms")
print(f"Difference:                      {abs(cpu_time - gpu_time):.3f} ms")

## Multiple Iterations

Time multiple operations to get stable measurements.

In [None]:
WARMUP = 5
ITERS = 10

# Warmup
for _ in range(WARMUP):
    _ = A @ B
torch.cuda.synchronize()

# Timed iterations with events
start_event = torch.cuda.Event(enable_timing=True)
end_event = torch.cuda.Event(enable_timing=True)

start_event.record()
for _ in range(ITERS):
    _ = A @ B
end_event.record()
torch.cuda.synchronize()

avg_time = start_event.elapsed_time(end_event) / ITERS

print(f"Average GEMM time ({ITERS} iterations): {avg_time:.3f} ms")

## Key Observations

**Questions to answer:**
1. Why do CUDA events give more accurate GPU timing?
2. When would CPU time be misleading?
3. What overhead does `synchronize()` add?

_Record your observations here after running the experiments._