# Day 1: Asynchronous vs Synchronous Execution

**Focus:** Understanding CPU orchestration vs GPU execution

## Objectives
- Understand asynchronous execution on GPUs
- See the difference between sync and async operations
- Learn where CPU and GPU time is spent

In [None]:
import torch
import time

assert torch.cuda.is_available(), "CUDA not available"

print(f"GPU: {torch.cuda.get_device_name(0)}")
print(f"CUDA available: {torch.cuda.is_available()}")

## Experiment 1: Synchronous Execution

Using `torch.cuda.synchronize()` to block CPU until GPU completes.

In [None]:
# Create some data
N = 4096
A = torch.randn((N, N), device='cuda', dtype=torch.float16)
B = torch.randn((N, N), device='cuda', dtype=torch.float16)

# Synchronous: CPU waits for GPU
start = time.time()
C = A @ B
torch.cuda.synchronize()  # Block until GPU completes
sync_time = time.time() - start

print(f"Synchronous execution time: {sync_time*1000:.3f} ms")

## Experiment 2: Asynchronous Execution

Without synchronization - CPU continues immediately.

In [None]:
# Asynchronous: CPU continues immediately
start = time.time()
C = A @ B
# No synchronization - GPU work is queued but not waited for
async_time = time.time() - start

print(f"Asynchronous (queued) time: {async_time*1000:.3f} ms")
print(f"Difference: {(sync_time - async_time)*1000:.3f} ms")
print("\nNote: CPU time is much shorter with async, but GPU work still happens!")

## Key Observations

**Questions to answer:**
1. Which time measurement is more accurate for GPU execution time?
2. Why is the async time so much shorter?
3. When would you want to use sync vs async?

_Record your observations here after running the experiments._