# Model Architecture in Hyena-GLT

This notebook provides an in-depth exploration of the Hyena-GLT architecture, combining Byte Latent Transformer (BLT) tokenization with Striped Hyena blocks for efficient genomic sequence modeling.

## Table of Contents
1. [Architecture Overview](#architecture-overview)
2. [BLT Tokenization Component](#blt-tokenization)
3. [Hyena Blocks](#hyena-blocks)
4. [Model Configuration](#model-configuration)
5. [Forward Pass Analysis](#forward-pass)
6. [Parameter Efficiency](#parameter-efficiency)
7. [Scalability Analysis](#scalability)

In [None]:
import sys
from pathlib import Path

import matplotlib.pyplot as plt
import numpy as np
import torch

# Add parent directory to path
sys.path.append(str(Path().absolute().parent.parent))

from hyena_glt.models.hyena_glt import HyenaGLT, HyenaGLTConfig
from hyena_glt.utils.model_utils import count_parameters

# Set random seeds for reproducibility
torch.manual_seed(42)
np.random.seed(42)

print("Hyena-GLT Model Architecture Tutorial")
print("=" * 40)

## 1. Architecture Overview

Hyena-GLT combines two key innovations:
- **BLT (Byte Latent Transformer)**: Efficient tokenization that maps variable-length sequences to fixed-size latent representations
- **Striped Hyena**: Long-range attention alternative using convolutions and gating mechanisms

### Key Components:
1. **Tokenization Layer**: BLT-based encoder/decoder
2. **Embedding Layer**: Learned embeddings for latent tokens
3. **Hyena Blocks**: Stack of Striped Hyena layers
4. **Output Layer**: Task-specific heads (classification, generation, etc.)

In [None]:
# Create a basic configuration for exploration
config = HyenaGLTConfig(
    vocab_size=4,  # DNA: A, T, G, C
    latent_vocab_size=256,  # BLT latent tokens
    d_model=256,
    n_layers=4,
    sequence_length=1024,
    latent_length=64,  # Compressed representation
    num_heads=8
)

print("Model Configuration:")
print(f"- Vocabulary size: {config.vocab_size}")
print(f"- Latent vocabulary size: {config.latent_vocab_size}")
print(f"- Model dimension: {config.d_model}")
print(f"- Number of layers: {config.n_layers}")
print(f"- Sequence length: {config.sequence_length}")
print(f"- Compressed latent length: {config.latent_length}")
print(f"- Compression ratio: {config.sequence_length / config.latent_length:.1f}x")

In [None]:
# Instantiate the model
model = HyenaGLT(config)

# Analyze model structure
print("\nModel Structure:")
print(model)

# Count parameters
total_params = count_parameters(model)
print(f"\nTotal parameters: {total_params:,}")

# Analyze parameter distribution
for name, module in model.named_children():
    if hasattr(module, 'parameters'):
        module_params = sum(p.numel() for p in module.parameters())
        print(f"- {name}: {module_params:,} parameters ({module_params/total_params*100:.1f}%)")

## 2. BLT Tokenization Component

The BLT (Byte Latent Transformer) component is responsible for compressing variable-length genomic sequences into fixed-size latent representations. This provides several benefits:

### Benefits:
1. **Compression**: Long sequences (e.g., 1024 tokens) → Short latents (e.g., 64 tokens)
2. **Fixed Length**: Variable input lengths become fixed latent lengths
3. **Learned Representation**: The compression is learned end-to-end
4. **Efficiency**: Reduces computational complexity in subsequent layers

### Architecture Components:
- **Encoder**: Maps input sequences to latent space
- **Decoder**: Reconstructs sequences from latent representations
- **Latent Embeddings**: Learned vocabulary for compressed tokens

In [None]:
# Analyze BLT tokenization behavior
def analyze_blt_compression(model, sequence_length=1024):
    """Analyze how BLT compresses sequences"""
    # Create sample input
    batch_size = 2
    sample_input = torch.randint(0, config.vocab_size, (batch_size, sequence_length))

    print(f"Input shape: {sample_input.shape}")

    # Get latent representation
    with torch.no_grad():
        # Forward pass through BLT encoder
        embedded = model.embedding(sample_input)  # (batch, seq_len, d_model)
        print(f"Embedded shape: {embedded.shape}")

        # In a real BLT implementation, this would go through encoder
        # For demonstration, we'll simulate the compression
        compressed_length = config.latent_length

        # Simulate compression (in real implementation, this is learned)
        compressed = embedded[:, ::sequence_length//compressed_length, :]
        print(f"Compressed shape: {compressed.shape}")

        compression_ratio = sequence_length / compressed_length
        print(f"Compression ratio: {compression_ratio:.1f}x")

        return compressed

compressed_repr = analyze_blt_compression(model)

# Visualize compression
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))

# Original sequence visualization
original_seq = torch.randint(0, 4, (100,))
ax1.plot(original_seq.numpy(), 'o-', alpha=0.7)
ax1.set_title('Original Sequence (100 tokens)')
ax1.set_xlabel('Position')
ax1.set_ylabel('Token ID')
ax1.grid(True, alpha=0.3)

# Compressed representation visualization
compressed_demo = torch.randn(16)  # Simulated compressed representation
ax2.plot(compressed_demo.numpy(), 'o-', color='red', alpha=0.7)
ax2.set_title('Compressed Representation (16 latents)')
ax2.set_xlabel('Latent Position')
ax2.set_ylabel('Latent Value')
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"\nCompression efficiency: {100/6.25:.1f}% size reduction")

## 3. Hyena Blocks

Hyena blocks replace traditional attention mechanisms with a combination of:
- **Convolutions**: For local pattern recognition
- **Gating Mechanisms**: For selective information flow
- **Global Mixing**: For long-range dependencies

### Key Advantages:
1. **Linear Complexity**: O(n) vs O(n²) for attention
2. **Long Sequences**: Efficient on very long genomic sequences
3. **Parallel Training**: Better parallelization than RNNs
4. **Memory Efficient**: Lower memory requirements

### Hyena Block Components:
- **Input Projection**: Linear transformation of inputs
- **Convolution Layers**: Local pattern extraction
- **Gating**: Element-wise gating for information flow
- **Output Projection**: Final linear transformation

In [None]:
def analyze_hyena_vs_attention_complexity():
    """Compare computational complexity of Hyena vs Attention"""
    sequence_lengths = [256, 512, 1024, 2048, 4096, 8192]
    d_model = 256

    # Attention complexity: O(n²d)
    attention_ops = [n**2 * d_model for n in sequence_lengths]

    # Hyena complexity: O(nd) + convolution overhead
    hyena_ops = [n * d_model * 3 for n in sequence_lengths]  # 3x for conv layers

    plt.figure(figsize=(10, 6))
    plt.loglog(sequence_lengths, attention_ops, 'o-', label='Attention O(n²d)', linewidth=2)
    plt.loglog(sequence_lengths, hyena_ops, 's-', label='Hyena O(nd)', linewidth=2)

    plt.xlabel('Sequence Length')
    plt.ylabel('Operations (log scale)')
    plt.title('Computational Complexity: Hyena vs Attention')
    plt.legend()
    plt.grid(True, alpha=0.3)

    # Add speedup annotations
    for i, n in enumerate(sequence_lengths):
        if i % 2 == 0:  # Show every other point
            speedup = attention_ops[i] / hyena_ops[i]
            plt.annotate(f'{speedup:.1f}x faster',
                        xy=(n, hyena_ops[i]),
                        xytext=(10, 10),
                        textcoords='offset points',
                        fontsize=8, alpha=0.7)

    plt.tight_layout()
    plt.show()

    print("Complexity Analysis:")
    for i, n in enumerate(sequence_lengths):
        speedup = attention_ops[i] / hyena_ops[i]
        print(f"Sequence length {n:4d}: {speedup:5.1f}x speedup with Hyena")

analyze_hyena_vs_attention_complexity()

In [None]:
def compare_memory_usage():
    """Compare memory usage between different architectures"""
    configs = {
        'Small': HyenaGLTConfig(d_model=128, n_layers=4, sequence_length=512),
        'Medium': HyenaGLTConfig(d_model=256, n_layers=6, sequence_length=1024),
        'Large': HyenaGLTConfig(d_model=512, n_layers=8, sequence_length=2048)
    }

    memory_usage = {}
    parameter_counts = {}

    for name, cfg in configs.items():
        model = HyenaGLT(cfg)
        params = count_parameters(model)

        # Estimate memory usage (simplified)
        # Parameters + activations + gradients
        param_memory = params * 4  # 4 bytes per float32
        activation_memory = cfg.sequence_length * cfg.d_model * cfg.n_layers * 4
        gradient_memory = param_memory  # Same as parameters

        total_memory = param_memory + activation_memory + gradient_memory

        memory_usage[name] = total_memory / (1024**2)  # Convert to MB
        parameter_counts[name] = params

    # Visualization
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

    # Parameter counts
    names = list(parameter_counts.keys())
    params = [parameter_counts[name]/1e6 for name in names]  # Convert to millions

    bars1 = ax1.bar(names, params, color=['skyblue', 'lightcoral', 'lightgreen'])
    ax1.set_ylabel('Parameters (Millions)')
    ax1.set_title('Model Size Comparison')
    ax1.grid(True, alpha=0.3)

    # Add value labels on bars
    for bar, param in zip(bars1, params, strict=False):
        height = bar.get_height()
        ax1.text(bar.get_x() + bar.get_width()/2., height,
                f'{param:.1f}M', ha='center', va='bottom')

    # Memory usage
    memory = [memory_usage[name] for name in names]
    bars2 = ax2.bar(names, memory, color=['skyblue', 'lightcoral', 'lightgreen'])
    ax2.set_ylabel('Memory Usage (MB)')
    ax2.set_title('Estimated Memory Usage')
    ax2.grid(True, alpha=0.3)

    # Add value labels on bars
    for bar, mem in zip(bars2, memory, strict=False):
        height = bar.get_height()
        ax2.text(bar.get_x() + bar.get_width()/2., height,
                f'{mem:.0f}MB', ha='center', va='bottom')

    plt.tight_layout()
    plt.show()

    return memory_usage, parameter_counts

memory_stats, param_stats = compare_memory_usage()
print("\nDetailed Statistics:")
for name in memory_stats:
    print(f"{name:6s}: {param_stats[name]/1e6:5.1f}M params, {memory_stats[name]:6.0f}MB memory")

## 4. Model Configuration

The `HyenaGLTConfig` class provides flexible configuration for different genomic modeling tasks. Key parameters include:

### Core Architecture Parameters:
- `d_model`: Hidden dimension size
- `n_layers`: Number of Hyena blocks
- `num_heads`: Number of attention heads (for hybrid models)
- `sequence_length`: Maximum input sequence length
- `latent_length`: Compressed latent representation length

### Tokenization Parameters:
- `vocab_size`: Input vocabulary size (4 for DNA, 20 for proteins)
- `latent_vocab_size`: Size of BLT latent vocabulary
- `compression_ratio`: How much to compress sequences

### Task-Specific Parameters:
- `num_classes`: Number of output classes for classification
- `dropout`: Dropout rate for regularization
- `layer_norm_eps`: Layer normalization epsilon

In [None]:
# Configuration examples for different genomic tasks

# DNA sequence classification (e.g., promoter prediction)
dna_config = HyenaGLTConfig(
    vocab_size=4,  # A, T, G, C
    d_model=256,
    n_layers=6,
    sequence_length=1000,
    latent_length=50,
    num_classes=2,  # promoter/non-promoter
    dropout=0.1
)

# Protein function prediction
protein_config = HyenaGLTConfig(
    vocab_size=20,  # 20 amino acids
    d_model=512,
    n_layers=8,
    sequence_length=512,
    latent_length=32,
    num_classes=1000,  # GO term classes
    dropout=0.15
)

# Long genomic sequence modeling (e.g., chromosome regions)
long_sequence_config = HyenaGLTConfig(
    vocab_size=4,
    d_model=384,
    n_layers=12,
    sequence_length=10000,  # 10kb sequences
    latent_length=100,
    num_classes=50,  # chromatin state classes
    dropout=0.1
)

configs = {
    'DNA Classification': dna_config,
    'Protein Function': protein_config,
    'Long Sequence': long_sequence_config
}

print("Configuration Comparison:")
print("=" * 60)
for name, config in configs.items():
    model = HyenaGLT(config)
    params = count_parameters(model)
    compression = config.sequence_length / config.latent_length

    print(f"\n{name}:")
    print(f"  Input length: {config.sequence_length:,}")
    print(f"  Latent length: {config.latent_length}")
    print(f"  Compression: {compression:.1f}x")
    print(f"  Parameters: {params/1e6:.1f}M")
    print(f"  Output classes: {config.num_classes}")

## 5. Forward Pass Analysis

Let's trace through a complete forward pass to understand data flow and transformations:

### Forward Pass Steps:
1. **Input Tokenization**: Raw sequence → Token IDs
2. **Embedding**: Token IDs → Dense vectors
3. **BLT Encoding**: Sequence → Compressed latents
4. **Hyena Processing**: Latents → Contextual representations
5. **Output Projection**: Representations → Task outputs

### Shape Transformations:
- Input: `(batch_size, sequence_length)`
- Embedded: `(batch_size, sequence_length, d_model)`
- Compressed: `(batch_size, latent_length, d_model)`
- Processed: `(batch_size, latent_length, d_model)`
- Output: `(batch_size, num_classes)` or `(batch_size, latent_length, vocab_size)`

In [None]:
def trace_forward_pass(model, sequence_length=512, batch_size=2):
    """Trace the forward pass through the model"""
    print("Forward Pass Tracing")
    print("=" * 30)

    # Create sample input
    sample_input = torch.randint(0, model.config.vocab_size, (batch_size, sequence_length))
    print(f"1. Input shape: {sample_input.shape}")

    model.eval()
    with torch.no_grad():
        # Step by step forward pass
        print("\n2. Embedding Layer:")
        embedded = model.embedding(sample_input)
        print(f"   Embedded shape: {embedded.shape}")
        print(f"   Memory: {embedded.numel() * 4 / 1024:.1f} KB")

        # Simulate BLT compression (in real implementation, this would be more complex)
        print("\n3. BLT Compression:")
        compressed_length = model.config.latent_length
        step_size = sequence_length // compressed_length
        compressed = embedded[:, ::step_size, :][:, :compressed_length, :]
        print(f"   Compressed shape: {compressed.shape}")
        print(f"   Compression ratio: {sequence_length / compressed_length:.1f}x")
        print(f"   Memory reduction: {embedded.numel() / compressed.numel():.1f}x")

        # Process through Hyena layers
        print("\n4. Hyena Layers:")
        x = compressed
        for i, layer in enumerate(model.hyena_layers):
            x = layer(x)
            print(f"   Layer {i+1} output shape: {x.shape}")

        # Output projection
        print("\n5. Output Projection:")
        if hasattr(model, 'classifier'):
            # Classification task
            pooled = x.mean(dim=1)  # Global average pooling
            output = model.classifier(pooled)
            print(f"   Pooled shape: {pooled.shape}")
            print(f"   Final output shape: {output.shape}")
            print(f"   Task: Classification ({output.shape[-1]} classes)")
        else:
            # Generation task
            output = model.output_projection(x)
            print(f"   Final output shape: {output.shape}")
            print(f"   Task: Generation (vocab size {output.shape[-1]})")

    return output

# Trace with DNA classification model
print("\nTracing DNA Classification Model:")
dna_model = HyenaGLT(dna_config)
output = trace_forward_pass(dna_model, sequence_length=1000)

# Analyze computational graph
print("\nModel summary:")
print(f"- Total parameters: {count_parameters(dna_model):,}")
print(f"- Model size: {count_parameters(dna_model) * 4 / 1024**2:.1f} MB")
print(f"- Compression efficiency: {dna_config.sequence_length / dna_config.latent_length:.1f}x")

## 6. Parameter Efficiency

Hyena-GLT achieves parameter efficiency through several design choices:

### Efficiency Mechanisms:
1. **BLT Compression**: Reduces sequence length before processing
2. **Linear Hyena Blocks**: O(n) complexity vs O(n²) attention
3. **Shared Parameters**: Reuse weights across sequence positions
4. **Efficient Convolutions**: Depthwise separable convolutions

### Parameter Distribution Analysis:
Let's analyze where parameters are allocated in the model.

In [None]:
def analyze_parameter_distribution(model):
    """Analyze parameter distribution across model components"""
    param_dict = {}

    for name, module in model.named_modules():
        if len(list(module.children())) == 0:  # Leaf modules
            module_params = sum(p.numel() for p in module.parameters())
            if module_params > 0:
                param_dict[name] = module_params

    # Group by component type
    component_groups = {
        'Embedding': {},
        'Hyena Layers': {},
        'Output': {},
        'Other': {}
    }

    for name, params in param_dict.items():
        if 'embedding' in name.lower():
            component_groups['Embedding'][name] = params
        elif 'hyena' in name.lower() or 'layer' in name.lower():
            component_groups['Hyena Layers'][name] = params
        elif 'classifier' in name.lower() or 'output' in name.lower():
            component_groups['Output'][name] = params
        else:
            component_groups['Other'][name] = params

    # Calculate totals
    group_totals = {}
    for group, components in component_groups.items():
        group_totals[group] = sum(components.values())

    total_params = sum(group_totals.values())

    # Visualization
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

    # Pie chart of component distribution
    sizes = [group_totals[group] for group in group_totals if group_totals[group] > 0]
    labels = [group for group in group_totals if group_totals[group] > 0]
    colors = ['skyblue', 'lightcoral', 'lightgreen', 'gold']

    wedges, texts, autotexts = ax1.pie(sizes, labels=labels, autopct='%1.1f%%',
                                       colors=colors[:len(sizes)], startangle=90)
    ax1.set_title('Parameter Distribution by Component')

    # Bar chart of detailed breakdown
    all_components = []
    all_params = []
    all_colors = []
    color_map = {'Embedding': 'skyblue', 'Hyena Layers': 'lightcoral',
                 'Output': 'lightgreen', 'Other': 'gold'}

    for group, components in component_groups.items():
        if components:
            for comp_name, params in components.items():
                # Simplify component names
                simple_name = comp_name.split('.')[-1]
                all_components.append(f"{group}\n{simple_name}")
                all_params.append(params / 1000)  # Convert to thousands
                all_colors.append(color_map[group])

    if all_components:
        bars = ax2.bar(range(len(all_components)), all_params, color=all_colors)
        ax2.set_xticks(range(len(all_components)))
        ax2.set_xticklabels(all_components, rotation=45, ha='right', fontsize=8)
        ax2.set_ylabel('Parameters (Thousands)')
        ax2.set_title('Detailed Parameter Breakdown')
        ax2.grid(True, alpha=0.3)

        # Add value labels on bars
        for bar, param in zip(bars, all_params, strict=False):
            height = bar.get_height()
            ax2.text(bar.get_x() + bar.get_width()/2., height,
                    f'{param:.0f}K', ha='center', va='bottom', fontsize=7)

    plt.tight_layout()
    plt.show()

    # Print summary
    print("\nParameter Distribution Summary:")
    print("=" * 40)
    for group, total in group_totals.items():
        if total > 0:
            percentage = (total / total_params) * 100
            print(f"{group:15s}: {total/1e6:6.2f}M ({percentage:5.1f}%)")
    print(f"{'Total':15s}: {total_params/1e6:6.2f}M (100.0%)")

    return param_dict, group_totals

# Analyze DNA classification model
print("Parameter Analysis for DNA Classification Model:")
param_breakdown, group_totals = analyze_parameter_distribution(dna_model)

## 7. Scalability Analysis

Hyena-GLT is designed to scale efficiently with sequence length and model size. Let's analyze scaling behavior:

### Scaling Dimensions:
1. **Sequence Length**: How performance scales with longer inputs
2. **Model Size**: Impact of increasing model parameters
3. **Batch Size**: Training efficiency with larger batches
4. **Number of Layers**: Depth vs performance trade-offs

In [None]:
def analyze_scaling_behavior():
    """Analyze how model scales with different parameters"""

    # Sequence length scaling
    sequence_lengths = [256, 512, 1024, 2048, 4096]
    base_config = HyenaGLTConfig(d_model=256, n_layers=4)

    seq_scaling_data = {
        'length': [],
        'parameters': [],
        'memory_estimate': [],
        'compression_ratio': []
    }

    print("Sequence Length Scaling:")
    print("-" * 50)

    for seq_len in sequence_lengths:
        config = HyenaGLTConfig(
            d_model=base_config.d_model,
            n_layers=base_config.n_layers,
            sequence_length=seq_len,
            latent_length=max(16, seq_len // 32)  # Adaptive compression
        )

        model = HyenaGLT(config)
        params = count_parameters(model)

        # Estimate memory (simplified)
        memory_mb = (params * 4 + seq_len * config.d_model * 4 * 2) / (1024**2)
        compression = config.sequence_length / config.latent_length

        seq_scaling_data['length'].append(seq_len)
        seq_scaling_data['parameters'].append(params)
        seq_scaling_data['memory_estimate'].append(memory_mb)
        seq_scaling_data['compression_ratio'].append(compression)

        print(f"Length {seq_len:4d}: {params/1e6:5.1f}M params, {memory_mb:6.1f}MB, {compression:4.1f}x compression")

    # Model size scaling
    model_sizes = [128, 256, 512, 768, 1024]

    size_scaling_data = {
        'd_model': [],
        'parameters': [],
        'memory_estimate': []
    }

    print("\nModel Size Scaling:")
    print("-" * 50)

    for d_model in model_sizes:
        config = HyenaGLTConfig(
            d_model=d_model,
            n_layers=6,
            sequence_length=1024,
            latent_length=64
        )

        model = HyenaGLT(config)
        params = count_parameters(model)
        memory_mb = (params * 4 + 1024 * d_model * 4 * 2) / (1024**2)

        size_scaling_data['d_model'].append(d_model)
        size_scaling_data['parameters'].append(params)
        size_scaling_data['memory_estimate'].append(memory_mb)

        print(f"d_model {d_model:4d}: {params/1e6:6.1f}M params, {memory_mb:6.1f}MB")

    # Visualization
    fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(15, 10))

    # Sequence length scaling
    ax1.plot(seq_scaling_data['length'], [p/1e6 for p in seq_scaling_data['parameters']],
             'o-', color='blue', linewidth=2)
    ax1.set_xlabel('Sequence Length')
    ax1.set_ylabel('Parameters (Millions)')
    ax1.set_title('Parameters vs Sequence Length')
    ax1.grid(True, alpha=0.3)
    ax1.set_xscale('log', base=2)

    # Memory scaling
    ax2.plot(seq_scaling_data['length'], seq_scaling_data['memory_estimate'],
             's-', color='red', linewidth=2)
    ax2.set_xlabel('Sequence Length')
    ax2.set_ylabel('Memory (MB)')
    ax2.set_title('Memory Usage vs Sequence Length')
    ax2.grid(True, alpha=0.3)
    ax2.set_xscale('log', base=2)

    # Model size scaling
    ax3.plot(size_scaling_data['d_model'], [p/1e6 for p in size_scaling_data['parameters']],
             '^-', color='green', linewidth=2)
    ax3.set_xlabel('Model Dimension (d_model)')
    ax3.set_ylabel('Parameters (Millions)')
    ax3.set_title('Parameters vs Model Size')
    ax3.grid(True, alpha=0.3)

    # Compression efficiency
    ax4.plot(seq_scaling_data['length'], seq_scaling_data['compression_ratio'],
             'D-', color='purple', linewidth=2)
    ax4.set_xlabel('Sequence Length')
    ax4.set_ylabel('Compression Ratio')
    ax4.set_title('BLT Compression Efficiency')
    ax4.grid(True, alpha=0.3)
    ax4.set_xscale('log', base=2)

    plt.tight_layout()
    plt.show()

    return seq_scaling_data, size_scaling_data

seq_data, size_data = analyze_scaling_behavior()

# Calculate scaling efficiency
print("\nScaling Efficiency Analysis:")
print("=" * 40)

# Parameter scaling with sequence length
seq_param_growth = seq_data['parameters'][-1] / seq_data['parameters'][0]
seq_length_growth = seq_data['length'][-1] / seq_data['length'][0]
print(f"Sequence length increased {seq_length_growth:.1f}x")
print(f"Parameters increased {seq_param_growth:.1f}x")
print(f"Parameter efficiency: {seq_length_growth/seq_param_growth:.2f}")

# Parameter scaling with model size
size_param_growth = size_data['parameters'][-1] / size_data['parameters'][0]
size_dim_growth = size_data['d_model'][-1] / size_data['d_model'][0]
print(f"\nModel dimension increased {size_dim_growth:.1f}x")
print(f"Parameters increased {size_param_growth:.1f}x")
print(f"Expected quadratic growth: {size_dim_growth**2:.1f}x")
print(f"Actual vs expected: {size_param_growth/(size_dim_growth**2):.2f}")

## Conclusion

This notebook explored the Hyena-GLT architecture in detail, covering:

### Key Takeaways:
1. **Efficient Architecture**: Combines BLT compression with Hyena blocks for linear complexity
2. **Scalability**: Scales efficiently with sequence length and model size
3. **Flexibility**: Configurable for various genomic modeling tasks
4. **Parameter Efficiency**: Achieves good performance with fewer parameters than attention-based models

### Architecture Benefits:
- **Linear Complexity**: O(n) vs O(n²) for attention mechanisms
- **Memory Efficiency**: BLT compression reduces memory requirements
- **Long Sequences**: Can handle very long genomic sequences efficiently
- **Task Adaptability**: Easily configured for different genomic tasks

### Next Steps:
1. **Training Tutorial**: Learn how to train models with this architecture
2. **Advanced Techniques**: Explore fine-tuning and transfer learning
3. **Real Applications**: Apply to actual genomic datasets
4. **Performance Optimization**: Techniques for improving speed and memory usage

### Resources:
- [Training Basics Tutorial](04_training_basics.ipynb)
- [Fine-tuning Guide](05_fine_tuning.ipynb)
- [Example Scripts](../examples/)
- [API Documentation](../../docs/)