# Text Encoder Roofline Analysis

This notebook provides a detailed breakdown of the compute and memory requirements for the **Text Encoder** (Language Model) of the smolVLA model on the Alveo U280 FPGA.

**Objective**: Analyze individual kernels to identify bottlenecks and guide acceleration strategies.

## 1. Hardware Specifications (Alveo U280)
*   **Frequency**: 300 MHz
*   **Peak Bandwidth**: 460 GB/s (Theoretical), 300 GB/s (Realistic)
*   **Peak Compute**:
    *   FP32: 5.41 TFLOPs
    *   INT8: 18.6 TOPS
    *   INT4: 37.2 TOPS

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

plt.style.use('seaborn-v0_8-paper')
plt.rcParams.update({'font.size': 12, 'figure.dpi': 150})
%matplotlib inline

# Hardware Specs
FREQ = 300e6
BW_REAL = 300e9
P_FP32 = 5.41e12
P_INT8 = 18.6e12
P_INT4 = 37.2e12


## 2. Model Dimensions (Text Encoder)
Derived from `model_shape.txt`.

*   **Hidden Dim ($D$)**: 960
*   **FFN Dim**: 2560
*   **Layers**: 16
*   **Attention (GQA)**:
    *   $Q_{dim} = 960$
    *   $K_{dim} = 320$
    *   $V_{dim} = 320$
    *   $Out_{dim} = 960$
*   **Sequence Length ($S$)**: 50 (Typical text query)
*   **Batch Size ($B$)**: 1

In [None]:
B = 1
S = 50
D = 960
FFN = 2560
Q_D = 960
K_D = 320
V_D = 320
OUT_D = 960

# Precision (Bytes)
precisions = {'FP32': 4, 'BF16': 2, 'INT8': 1, 'INT4': 0.5}


## 3. Kernel Analysis

We calculate FLOPs, Memory Transfer (Bytes), and Operational Intensity (OI) for each kernel type.

**Formula**:
*   $FLOPs = 2 \times M \times K \times N$
*   $Bytes = (M \times K + K \times N + M \times N) \times \text{dtype\_size}$
*   $OI = FLOPs / Bytes$

In [None]:
def analyze_kernel(name, M, K, N, p_bytes):
    flops = 2 * M * K * N
    # Weights (K*N) + Input (M*K) + Output (M*N)
    mem_weights = K * N * p_bytes
    mem_io = (M * K + M * N) * p_bytes
    total_bytes = mem_weights + mem_io
    oi = flops / total_bytes
    return {
        'Kernel': name,
        'M': M, 'K': K, 'N': N,
        'FLOPs': flops,
        'Bytes': total_bytes,
        'OI': oi,
        'Weight_MB': mem_weights / 1e6
    }

data = []
p_name = 'INT8' # Baseline for detailed table
p_bytes = precisions[p_name]

# 1. Attention Q Projection
# M=S(50), K=D(960), N=Q_D(960)
data.append(analyze_kernel('Attn_Q', S, D, Q_D, p_bytes))

# 2. Attention K Projection (Smaller)
# M=S(50), K=D(960), N=K_D(320)
data.append(analyze_kernel('Attn_K', S, D, K_D, p_bytes))

# 3. Attention V Projection (Smaller)
# M=S(50), K=D(960), N=V_D(320)
data.append(analyze_kernel('Attn_V', S, D, V_D, p_bytes))

# 4. Attention Output Projection
# M=S(50), K=D(960), N=D(960)
data.append(analyze_kernel('Attn_Out', S, D, OUT_D, p_bytes))

# 5. MLP Gate/Up Projections
# M=S(50), K=D(960), N=FFN(2560)
data.append(analyze_kernel('MLP_Gate', S, D, FFN, p_bytes))
data.append(analyze_kernel('MLP_Up', S, D, FFN, p_bytes))

# 6. MLP Down Projection
# M=S(50), K=FFN(2560), N=D(960)
data.append(analyze_kernel('MLP_Down', S, FFN, D, p_bytes))

# 7. LM Head
# M=1 (Last Token), K=D(960), N=Vocab(49280)
data.append(analyze_kernel('LM_Head', 1, D, 49280, p_bytes))

df = pd.DataFrame(data)
print(f"--- Kernel Metrics ({p_name}) ---")
display(df[['Kernel', 'FLOPs', 'Bytes', 'OI', 'Weight_MB']].round(2))


## 4. Roofline Plot

We plot these kernels on the Roofline model to visualize their performance limitations.

In [None]:
def plot_roofline(df_metrics):
    fig, ax = plt.subplots(figsize=(12, 8))
    x = np.logspace(-2, 3, 100)
    
    # Ceilings
    ax.loglog(x, np.minimum(P_FP32, BW_REAL * x), 'k-', label='FP32 Peak')
    ax.loglog(x, np.minimum(P_INT8, BW_REAL * x), 'b-', label='INT8 Peak')
    ax.loglog(x, np.minimum(P_INT4, BW_REAL * x), 'g-', label='INT4 Peak')
    ax.loglog(x, BW_REAL * x, 'r--', label='Memory Wall')
    
    # Plot Kernels (INT8 Baseline)
    for i, (_, row) in enumerate(df_metrics.iterrows()):
        oi = row['OI']
        # Calculate perf for INT8
        perf = min(P_INT8, BW_REAL * oi)
        ax.plot(oi, perf, 'b^', markersize=12)
        
        # Offset labels to avoid overlap
        # Alternating vertical offset
        offset = 1.3 if i % 2 == 0 else 0.7
        ax.text(oi, perf * offset, row['Kernel'], fontsize=9, ha='center', va='bottom' if offset > 1 else 'top')
        
    ax.set_xlabel('Operational Intensity (Ops/Byte)')
    ax.set_ylabel('Performance (Ops/s)')
    ax.set_title('Text Encoder Roofline (INT8)')
    ax.grid(True, which="both", ls="-", alpha=0.5)
    ax.legend()
    plt.show()

plot_roofline(df)
