This repository is the official implementation of SPARK.
- [2026/04/23] π Weβre thrilled to release the SPARK! The paper, code is now open to the community.
- [2026/05/01] π SPARK has been accepted to ICML 2026!
The current codebase is organized as follows:
.
βββ llm/ # LLM clients and OpenAI-compatible API wrappers
βββ prompt/ # Prompt templates for ASR, RC, SAR, and baseline editing
βββ utils/ # Utilities for parsing, diff checking, logging, and evaluation helpers
βββ __init__.py
βββ _version.py
βββ cli.py # Command-line interface helpers
βββ config.py # Python-side configuration definitions
βββ config.yaml # Default experiment configuration
βββ controller.py # Main evolutionary search controller
βββ database.py # Candidate archive, population, and result storage
βββ evaluation_result.py # Evaluation result data structures
βββ evaluator.py # CLRS/NAS evaluator entry point
βββ initial_for.py # Optional initial/search helper program
βββ initial_program.py # Seed architecture program with factor-scoped regions
βββ iteration.py # Per-iteration state, logging, and evolution records
βββ openevolve-run.py # Main executable script
βββ process_parallel.py # Parallel evaluation / process management utilities
git clone https://github.com/AIM-NAS/SPARK.git
cd SPARKconda create -n spark_nas python=3.10 -y
conda activate spark_naspip install -e .If your evaluator depends on CLRS or a local CLRS-PyTorch/JAX environment, install the corresponding benchmark dependencies according to your local setup.
SPARK uses an OpenAI-compatible API interface. Set your API key before running experiments:
export OPENAI_API_KEY="your_api_key_here"Then edit config.yaml to specify your model and endpoint, for example:
llm:
model: "your-model-name"
api_base: "https://your-openai-compatible-endpoint/v1"
temperature: 0.7Do not commit API keys to GitHub.
Run a SPARK search from the initial CLRS architecture:
python openevolve-run.py \
initial_program.py \
evaluator.py \
--config config.yaml \
--output runs/spark_dfs_seed0 \
--iterations 100A typical run will:
- load the seed architecture from
initial_program.py; - sample a parent from the archive;
- use ASR to select
OPERATORorACTION; - use RC to generate a factor-local refinement directive;
- use SAR to generate a complete updated program;
- run syntax/interface/shape feasibility checks;
- train and evaluate valid candidates with
evaluator.py; - update the archive if the candidate improves fitness.
for seed in 0 1 2 3 4; do
python openevolve-run.py \
initial_program.py \
evaluator.py \
--config config.yaml \
--output runs/spark_dfs_seed${seed} \
--iterations 100 \
--seed ${seed}
doneMost experiment settings can be changed in config.yaml, including:
max_iterations: 100
population_size: 100
num_islands: 5
archive_size: 100
llm:
model: "your-model-name"
temperature: 0.7
timeout: 120
retries: 3
spark:
enable: true
factors: ["OPERATOR", "ACTION"]
router_retries: 3
freeze_non_selected_region: true
feasibility_check: trueThe exact fields may differ depending on your local branch. Please check config.py and config.yaml for the final supported options.
initial_program.py should contain explicit editable regions. A typical structure is:
# ===== BEGIN OPERATOR REGION =====
# Define architecture modules, projections, gates, or operator parameters here.
# ===== END OPERATOR REGION =====
# ===== BEGIN ACTION REGION =====
# Define how operators are invoked, routed, masked, and aggregated here.
# ===== END ACTION REGION =====During SPARK evolution, if ASR selects OPERATOR, SAR should only modify the operator region. If ASR selects ACTION, SAR should only modify the action region. Changes outside the selected region are rejected before full evaluation.
| Method | DFS OOD Acc. (%) | Notes |
|---|---|---|
| CLRS reference | 46.78 | Original reference architecture |
| OpenEvolve baseline | 32.54 | Free-form LLM evolution under the same search setting |
| EvoPrompting | 68.14 | Prior LLM-driven NAS baseline |
| FunSearch-style baseline | 74.50 | Program-search baseline |
| EoH-style baseline | 77.27 | Evolution-of-heuristics baseline |
| SPARK (Ours) | 83.74 | Factor-scoped where-then-how evolution |
SPARK reaches its best DFS OOD accuracy with 57 evaluated candidates, corresponding to a 28.1Γ evaluation-efficiency improvement over the 1600-evaluation EvoPrompting setting.
After searching on DFS, the best architecture is transferred to 9 additional CLRS tasks and trained/evaluated from scratch.
| Method | Avg. OOD Acc. (%) | Avg. MACs (K) |
|---|---|---|
| CLRS reference | 71.22 | 450 |
| OpenEvolve baseline | 68.30 | 481 |
| EvoPrompting | 74.42 | 448 |
| FunSearch-style baseline | 77.61 | 469 |
| EoH-style baseline | 79.43 | 463 |
| SPARK (Ours) | 83.92 | 453 |
The results suggest that the improvement mainly comes from more reliable search dynamics and reduced functional entanglement, rather than simply increasing model size or compute.
| Variant | LLM | DFS OOD Acc. (%) | Gain over CLRS |
|---|---|---|---|
| RC + SAR only | DeepSeek-R1 | 56.79 | +10.01 |
| ASR only | DeepSeek-R1 | 65.28 | +18.50 |
| SPARK: ASR + RC + SAR | DeepSeek-R1 | 83.74 | +36.96 |
| RC + SAR only | Qwen-Plus | 56.00 | +9.22 |
| ASR only | Qwen-Plus | 64.50 | +17.72 |
| SPARK: ASR + RC + SAR | Qwen-Plus | 80.50 | +33.72 |
These results show that both scope selection and scope-local refinement are useful, while their combination gives the strongest and most stable improvement.
Each experiment directory may contain:
runs/spark_dfs_seed0/
βββ logs/ # Runtime logs and per-iteration metrics
βββ programs/ # Generated candidate programs
βββ checkpoints/ # Optional evaluator/model checkpoints
βββ results/ # Candidate scores and summary metrics
βββ config.yaml # Copied experiment configuration
Useful quantities to track include:
ood_acc: CLRS out-of-distribution accuracy;macs: multiply-accumulate operations;param_count: model parameter count;valid_rate: fraction of proposals passing feasibility checks;entanglement_rate: fraction of non-factor-local edits;best_so_far: best archived architecture score over iterations.
A typical reproduction workflow is:
# 1. Run DFS search
python openevolve-run.py initial_program.py evaluator.py \
--config config.yaml \
--output runs/spark_dfs_seed0 \
--iterations 100 \
--seed 0
# 2. Parse the best architecture from the run directory
# 3. Train/evaluate the selected architecture on other CLRS tasks
# 4. Aggregate results across tasks/seedsThe exact aggregation script depends on your local evaluator implementation. Please refer to the scripts under utils/ or your experiment-specific parsing scripts.
We would like to express our sincere gratitude to OpenEvolve for providing open-source resources that contributed to the development of this project.
If you find this repository useful, please cite:
@misc{liu2026structuredprogressiveknowledgeactivation,
title={Structured Progressive Knowledge Activation for LLM-Driven Neural Architecture Search},
author={Zhen Liu and Yuhan Liu and Jinjun Wang and Wei Song and Jianyi Liu and Jingwen Fu},
year={2026},
eprint={2605.04057},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2605.04057},
}