## 1. Install Jupyter and Dependencies

Jupyter should already be installed. If not, run in the integrated terminal:

```bash
.venvcriptsython.exe -m pip install jupyter notebook
```

## 2. Create a Jupyter Notebook File

This file is already a Jupyter Notebook (`.ipynb`). You can create new ones using:
- Command Palette: `Ctrl+Shift+P` → "Jupyter: Create New Blank Notebook"
- Or manually save a `.ipynb` file in VS Code

## 3. Configure Python Kernel

Make sure to select the correct kernel:
1. Click "Select Kernel" in the top-right of the notebook
2. Choose `.venv\Scripts\python.exe` (your local PyTorch environment)

## 4. Install PyTorch

PyTorch should already be installed in your `.venv` with CUDA support. If not, run:

```bash
.venvcriptsython.exe -m pip install --index-url https://download.pytorch.org/whl/cu124 torch torchvision torchaudio
```

## 5. Write and Execute Basic Cells

Below are example code cells. Execute each cell using **Shift+Enter** or the Run button.

## 6. Import PyTorch and Verify Installation

In [None]:
import torch

print("PyTorch version:", torch.__version__)
print("CUDA available:", torch.cuda.is_available())
print("CUDA version:", torch.version.cuda)
print("GPU device:", torch.cuda.get_device_name(0) if torch.cuda.is_available() else "CPU")
print("Device count:", torch.cuda.device_count())

### Create Tensors and Move to GPU

In [None]:
# Create a tensor on CPU
x = torch.randn(1000, 1000)
print(f"Tensor on CPU - shape: {x.shape}, device: {x.device}")

# Move tensor to GPU
if torch.cuda.is_available():
    x_gpu = x.cuda()
    print(f"Tensor on GPU - shape: {x_gpu.shape}, device: {x_gpu.device}")
else:
    print("CUDA not available, skipping GPU transfer")

### Matrix Multiplication on GPU

In [None]:
if torch.cuda.is_available():
    # Create matrices on GPU
    A = torch.randn(1000, 1000).cuda()
    B = torch.randn(1000, 1000).cuda()
    
    # Perform matrix multiplication
    C = torch.matmul(A, B)
    
    print(f"Matrix multiplication result:")
    print(f"A shape: {A.shape}")
    print(f"B shape: {B.shape}")
    print(f"C shape: {C.shape}")
    print(f"C device: {C.device}")
    print(f"C mean value: {C.mean().item():.4f}")
else:
    print("CUDA not available")

## 7. Run Notebook Cells - Performance Comparison: GPU vs CPU

In [None]:
import time

# Test parameters
size = 5000
iterations = 5

# Warm-up
_ = torch.randn(100, 100)

# CPU timing
x_cpu = torch.randn(size, size)
y_cpu = torch.randn(size, size)

cpu_times = []
for _ in range(iterations):
    start = time.time()
    result_cpu = torch.matmul(x_cpu, y_cpu)
    cpu_times.append(time.time() - start)

avg_cpu_time = sum(cpu_times) / len(cpu_times)
print(f"CPU Matrix Multiplication ({size}x{size}):")
print(f"  Average time: {avg_cpu_time:.4f}s")
print(f"  Times: {[f'{t:.4f}s' for t in cpu_times]}")

# GPU timing (if available)
if torch.cuda.is_available():
    x_gpu = torch.randn(size, size).cuda()
    y_gpu = torch.randn(size, size).cuda()
    
    # Warm-up
    _ = torch.randn(100, 100).cuda()
    torch.cuda.synchronize()
    
    gpu_times = []
    for _ in range(iterations):
        torch.cuda.synchronize()
        start = time.time()
        result_gpu = torch.matmul(x_gpu, y_gpu)
        torch.cuda.synchronize()
        gpu_times.append(time.time() - start)
    
    avg_gpu_time = sum(gpu_times) / len(gpu_times)
    print(f"\nGPU Matrix Multiplication ({size}x{size}):")
    print(f"  Average time: {avg_gpu_time:.4f}s")
    print(f"  Times: {[f'{t:.4f}s' for t in gpu_times]}")
    
    speedup = avg_cpu_time / avg_gpu_time
    print(f"\nSpeedup (CPU / GPU): {speedup:.2f}x")
else:
    print("\nGPU not available")

## Quick Reference

| Action | Shortcut |
|--------|----------|
| Run current cell | Shift+Enter |
| Run all cells | Ctrl+Alt+Enter |
| Insert cell below | Ctrl+Shift+A |
| Delete cell | Ctrl+Shift+D |
| Clear outputs | Right-click → Clear Outputs |