# hetero-memory-lab üèóÔ∏è Interactive Roofline Model

**Hands-on GPU performance modeling:** bandwidth vs compute vs tiling vs cache.

![roofline](roofline_chart.png)

## üöÄ Quickstart

`pip install -r requirements.txt`

**Run all cells below ‚Üí see roofline magic!**

In [None]:
# Install if needed (Colab)
# !pip install -r requirements.txt

import sys
import os
sys.path.append('..')

from memory_lab.system_model import SystemModel
from memory_lab.compute_core import ComputeCore
from memory_lab.memory_model import MemoryModel
print('‚úÖ Imports ready!')

## üìä Part 1: Bandwidth vs Compute

In [None]:
# Low vs High bandwidth
mem_low = MemoryModel(bandwidth_gbps=1.0, access_pattern="sequential")
mem_high = MemoryModel(bandwidth_gbps=1000.0, access_pattern="sequential")
compute = ComputeCore(peak_flops=1e12)

sys_low = SystemModel(problem_size_bytes=10*1024*1024, compute_core=compute, memory_model=mem_low)
sys_high = SystemModel(problem_size_bytes=10*1024*1024, compute_core=compute, memory_model=mem_high)

r_low = sys_low.run()
r_high = sys_high.run()

print('Low BW (1 Gbit/s):', r_low['regime'])
print('High BW (1 Tbit/s):', r_high['regime'])

## üîÑ Part 2: Sequential vs Random Access

In [None]:
# Same bandwidth, different access patterns
mem_seq = MemoryModel(bandwidth_gbps=1000.0, access_pattern="sequential")
mem_rand = MemoryModel(bandwidth_gbps=1000.0, access_pattern="random")

sys_seq = SystemModel(problem_size_bytes=10*1024*1024, compute_core=compute, memory_model=mem_seq)
sys_rand = SystemModel(problem_size_bytes=10*1024*1024, compute_core=compute, memory_model=mem_rand)

r_seq = sys_seq.run()
r_rand = sys_rand.run()

print(f'Sequential mem time: {r_seq["memory_time_s"]:.1f}Œºs')
print(f'Random mem time:    {r_rand["memory_time_s"]:.1f}Œºs')
print(f'Random penalty:     {r_rand["memory_time_s"]/r_seq["memory_time_s"]:.1f}x slower')

## üß± Part 3: Tiling = Arithmetic Intensity

In [None]:
# Tiling: 2 ‚Üí 16 ops/byte
naive = SystemModel(problem_size_bytes=10*1024*1024, 
                   compute_core=ComputeCore(peak_flops=1e12, ops_per_byte=2.0),
                   memory_model=mem_seq)
tiled = SystemModel(problem_size_bytes=10*1024*1024, 
                   compute_core=ComputeCore(peak_flops=1e12, ops_per_byte=16.0),
                   memory_model=mem_seq)

print('Naive (2 ops/byte):', naive.run()['regime'])
print('Tiled (16 ops/byte):', tiled.run()['regime'])

## üìà Generate Roofline Chart

**Run this ‚Üí perfect roofline plot!**

In [None]:
%run plot_roofline.py
from IPython.display import Image
Image('roofline_chart.png')

## üéì What You Learned

1. **Sequential hits compute roof** (blue dots flatten)
2. **Random stays memory-bound** (red diagonal)
3. **Tiling = rightward shift** on roofline
4. **Real tools measure exactly this:** Intel Advisor, Kerncraft

**Colab link:** [Share this notebook!]