Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
541 changes: 541 additions & 0 deletions examples/attention_optimization/README.md

Large diffs are not rendered by default.

121 changes: 121 additions & 0 deletions examples/attention_optimization/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
# Configuration for function minimization example
max_iterations: 100
checkpoint_interval: 10
log_level: "INFO"

# LLM configuration
llm:
# primary_model: "gemini-2.0-flash-lite"
# primary_model: "gpt-4.1-nano"
primary_model: "o3"
primary_model_weight: 0.8
# secondary_model: "gemini-2.0-flash"
secondary_model: "gpt-4.1-mini"
secondary_model_weight: 0.2
# api_base: "https://generativelanguage.googleapis.com/v1beta/openai/"
# api_base: "https://api.cerebras.ai/v1"
temperature: 0.7
top_p: 0.95
max_tokens: 4096

# Prompt configuration
prompt:
# system_message: "You are an expert programmer specializing in optimization algorithms. Your task is to improve a function minimization algorithm to find the global minimum of a complex function with many local minima. The function is f(x, y) = sin(x) * cos(y) + sin(x*y) + (x^2 + y^2)/20. Focus on improving the search_algorithm function to reliably find the global minimum, escaping local minima that might trap simple algorithms."
system_message: "
You are an expert MLIR compiler optimization specialist focused on optimizing attention mechanisms for maximum performance. Your goal is to evolve MLIR transformation parameters to achieve 15-32% speedup improvements, similar to DeepMind's AlphaEvolve results.
Your Expertise:
- **MLIR Dialects**: Deep knowledge of Linalg, Vector, SCF, Arith, and Transform dialects
- **Attention Mechanisms**: Understanding of Q@K^T, softmax, and attention@V computations
- **Memory Optimization**: Cache hierarchy, memory bandwidth, data locality patterns
- **Hardware Targets**: CPU vectorization, GPU memory coalescing, tensor core utilization
- **Compiler Transformations**: Tiling, fusion, vectorization, loop optimization
Optimization Space:
Tiling Strategies (Memory Access Optimization):
- **Tile sizes**: Balance between cache utilization and parallelism
- Small tiles (16x16): Better cache locality, less parallelism
- Medium tiles (32x32, 64x64): Balanced approach
- Large tiles (128x128+): More parallelism, potential cache misses
- **Tile dimensions**: Consider sequence length vs head dimension tiling
- **Multi-level tiling**: L1/L2/L3 cache-aware nested tiling
Memory Layout Patterns:
- **row_major**: Standard layout, good for sequential access
- **col_major**: Better for certain matrix operations
- **blocked**: Cache-friendly blocked layouts
- **interleaved**: For reducing bank conflicts
Vectorization Strategies:
- **none**: No vectorization (baseline)
- **outer**: Vectorize outer loops (batch/head dimensions)
- **inner**: Vectorize inner loops (sequence/feature dimensions)
- **full**: Comprehensive vectorization across all suitable dimensions
Fusion Patterns (Reduce Memory Traffic):
- **producer**: Fuse operations with their producers
- **consumer**: Fuse operations with their consumers
- **both**: Aggressive fusion in both directions
- **vertical**: Fuse across computation stages (QK -> softmax -> attention)
- **horizontal**: Fuse across parallel operations
Loop Optimizations:
- **unroll_factor**: 1, 2, 4, 8 (balance code size vs ILP)
- **loop_interchange**: Reorder loops for better cache access
- **loop_distribution**: Split loops for better optimization opportunities
- **loop_skewing**: Transform loop bounds for parallelization
Advanced Optimizations:
- **prefetch_distance**: How far ahead to prefetch data (0-8)
- **cache_strategy**: temporal, spatial, or mixed cache utilization
- **shared_memory**: Use shared memory for GPU optimization
- **pipeline_stages**: Number of pipeline stages for latency hiding
Performance Targets:
- **Baseline**: Standard attention implementation
- **Target**: 32% speedup (1.32x performance improvement)
- **Metrics**: Runtime reduction, memory bandwidth efficiency, cache hit rates
Key Constraints:
- **Correctness**: All optimizations must preserve numerical accuracy
- **Memory bounds**: Stay within available cache/memory limits
- **Hardware limits**: Respect vectorization and parallelization constraints
Optimization Principles:
1. **Memory-bound workloads**: Focus on data layout and cache optimization
2. **Compute-bound workloads**: Emphasize vectorization and instruction-level parallelism
3. **Mixed workloads**: Balance memory and compute optimizations
4. **Attention patterns**: Leverage the specific computational structure of attention
When evolving parameters, consider:
- **Sequence length scaling**: How optimizations perform across different input sizes
- **Hardware characteristics**: Cache sizes, vector widths, memory bandwidth
- **Attention variants**: Standard attention, sparse attention, local attention
- **Numerical precision**: fp32, fp16, bf16 trade-offs
Evolution Strategy:
1. Start with fundamental optimizations (tiling, basic vectorization)
2. Add memory layout optimizations
3. Explore fusion opportunities
4. Fine-tune advanced parameters
5. Consider hardware-specific optimizations
Success Indicators:
- Speedup > 1.0 (any improvement is progress)
- Speedup > 1.15 (good optimization)
- Speedup > 1.25 (excellent optimization)
- Speedup > 1.32 (target achieved - AlphaEvolve level)
Generate innovative parameter combinations that push the boundaries of what's possible with MLIR transformations while maintaining correctness and staying within hardware constraints.
"
num_top_programs: 3
use_template_stochasticity: true

# Database configuration
database:
population_size: 50
archive_size: 20
num_islands: 3
elite_selection_ratio: 0.2
exploitation_ratio: 0.7

# Evaluator configuration
evaluator:
timeout: 60
cascade_evaluation: true
cascade_thresholds: [0.5, 0.75]
parallel_evaluations: 4
use_llm_feedback: false

# Evolution settings
diff_based_evolution: true
allow_full_rewrites: false

# Add or modify this in config.yaml
max_program_length: 55000 # Increase from default 10000
50 changes: 50 additions & 0 deletions examples/attention_optimization/configs/failing_config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# OpenEvolve configuration for MLIR attention optimization

# LLM configuration
llm:
primary_model: "gpt-4.1-nano"
# secondary_models: ["gpt-4.1-mini"]
temperature: 0.7
max_tokens: 2048

# Evolution parameters
evolution:
max_iterations: 500
population_size: 50
mutation_rate: 0.15
crossover_rate: 0.8
selection_strategy: "tournament"
tournament_size: 5

# Database configuration
database:
population_size: 100
num_islands: 3
migration_rate: 0.1

# Evaluation settings
evaluation:
timeout_seconds: 120
max_retries: 3
parallel_evaluations: 4

# Checkpoint settings
checkpoints:
enabled: true
interval: 10
keep_best: true
save_all_programs: false

# Optimization targets
optimization:
target_metric: "speedup"
target_value: 1.32 # 32% speedup like AlphaEvolve paper
minimize: false
convergence_threshold: 0.001
early_stopping_patience: 50

# Logging
logging:
level: "INFO"
save_logs: true
verbose: true
Loading