GitHub - AXXX-Institute/kernel-evo: Evolutionary generation of efficient GPU kernels

Evolutionary generation of efficient GPU kernels using GigaEvo.
Define a task, run evolution with an LLM backend, extract and compare optimized programs.

Features

Custom tasks — Define your own kernel tasks in KernelBench format and evolve them.
KernelBench integration — Use existing KernelBench problems.
Triton and CUDA inline backends - two most popular ways to create kernels, suitable for different scenarios.
Remote or local execution — Run validation locally or via a remote eval server.
Cost efficient - works with fast models gemini flash 3 and gpt-oss-120b. Current experiments costs 0.5-1$. Frontier models with high reasoning effort would be beneficial, yet cost would be magnitude higher.

Requirements

Python >= 3.12
LLM API — OpenAI-compatible (e.g. OpenRouter, or a local server like SGLang).
Redis — Used by GigaEvo for experiment state.
GPU — Used by the evaluation stage to measure kernel correctness and efficiency.

Installation

From source

git clone https://github.com/AXXX-Institute/kernel-evo.git
cd kernel-evo
pip install -e . --ignore-requires-python

Note: --ignore-requires-python relaxes the Python version check (KernelBench may declare 3.10 but works on 3.12).
For custom branches of gigaevo or kernelbench, edit the Git URLs in pyproject.toml.

Docker

Pull and run (when a pre-built image is published):

docker pull sivtsovdt/kernel-evo:latest
docker run --rm sivtsovdt/kernel-evo:latest kernel-evo --help

To build the image yourself (e.g. for private dependencies or development), see build/README.md.

Custom kernel task

To evolve your own kernel, create a task in KernelBench format. Example layout:

tasks/
└── armt_associate/
    └── task.py

See tasks/armt_associate in this repo for a reference. You can also use any existing task from KernelBench.

Run evolution

Evolution can use a local or remote LLM (e.g. SGLang, OpenRouter). Examples below use OpenRouter and a remote eval server.

1. Start the eval server (optional, for remote validation)

In a separate terminal:

kernel-evo eval-server --port 15000

2. Evolve with a custom task

OPENAI_API_KEY="sk-or-v1-..." kernel-evo evolve \
  --problem-path tasks/armt_associate/task.py \
  --experiment-name custom_associate \
  --backend triton \
  --precision fp16 \
  --model-name <MODEL> \
  --llm-base-url https://openrouter.ai/api/v1 \
  --redis-db 0 \
  --max-generations 400 \
  --max-mutations-per-generation 4 \
  --validator-debug \
  --log-dir <dir_for_logs> \
  --execution-mode remote_execution

3. Evolve with a KernelBench task

OPENAI_API_KEY="<KEY>" kernel-evo evolve \
  --level 1 \
  --problem-id 36 \
  --experiment-name kb_level1_36 \
  --dataset-src huggingface \
  --dataset-name ScalingIntelligence/KernelBench \
  --backend triton \
  --precision fp16 \
  --model-name <MODEL> \
  --llm-base-url https://openrouter.ai/api/v1 \
  --redis-db 0 \
  --max-generations 400 \
  --max-mutations-per-generation 4 \
  --validator-debug \
  --log-dir <dir_for_logs> \
  --execution-mode remote_execution

Monitor progress

cd gigaevo/outputs/<DATE>/<EXPERIMENT_START>
tensorboard --logdir .

Use TensorBoard to find iterations with good performance before extracting programs.

Extract a program

Export the program from a specific iteration (e.g. after inspecting TensorBoard):

kernel-evo extract \
  --redis-db 0 \
  --iteration 8 \
  --redis-prefix "kernel_evo" \
  --output-file best_program.py

Compare two programs

Custom task

kernel-evo compare \
  --program-a prog_a.py \
  --program-b prog_b.py \
  --problem-path tasks/armt_associate/task.py \
  --backend triton \
  --precision fp16 \
  --num-perf-trials 200 \
  --num-correct-trials 20

KernelBench task

kernel-evo compare \
  --program-a prog_a.py \
  --program-b prog_b.py \
  --dataset-src huggingface \
  --dataset-name ScalingIntelligence/KernelBench \
  --level 1 \
  --problem-id 36 \
  --backend triton \
  --precision fp16 \
  --num-perf-trials 200 \
  --num-correct-trials 20

CLI overview

Command	Description
`evolve`	Run evolution (custom or KernelBench)
`eval-server`	Start remote validation server
`extract`	Export program by iteration from Redis
`compare`	Compare two programs (correctness + perf)

Best practices

Model selection

Evolution deeply depends on underlying model. For better results, one should use frontier models, like gpt, claude or gemini.

Recommendation for best value vendor model:

gemini flash 3. Capable, yet not very costly. It creates faulty kernels, but able to recover buggy code.

Recommendation for open-source models:

gpt-oss-120b - best baseline for kernel evolution. Good enough reasoning to recover faulty kernels.
GLM-5. From all very large open LLMs, only one seems to know Triton and generate decent kernels. Downside - slower on generation and very large for local inference.

Experiments

Quality of result depends on starting seeds and can vary between different runs. So it makes sense to restart and try again if the solution is very bad during the first 200k tokens.

Also, we noticed that Triton is better on small efficient kernels, like softmax and matmuls, because it requires less knowledge from the model. For complex tasks like KernelBench level 2, the difference is lower.

Remote validation

Better to run validation via validator server in different terminal. This way, one can see results.

Cheaper start

Use flag --disable-insights-lineage with kernel-evo evolve to disable additional calls. Beneficial for short debug runs or with expensive models.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
build		build
logo		logo
src/kernel_evo		src/kernel_evo
tasks		tasks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Features

Requirements

Installation

From source

Docker

Custom kernel task

Run evolution

1. Start the eval server (optional, for remote validation)

2. Evolve with a custom task

3. Evolve with a KernelBench task

Monitor progress

Extract a program

Compare two programs

Custom task

KernelBench task

CLI overview

Best practices

Model selection

Experiments

Remote validation

Cheaper start

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Features

Requirements

Installation

From source

Docker

Custom kernel task

Run evolution

1. Start the eval server (optional, for remote validation)

2. Evolve with a custom task

3. Evolve with a KernelBench task

Monitor progress

Extract a program

Compare two programs

Custom task

KernelBench task

CLI overview

Best practices

Model selection

Experiments

Remote validation

Cheaper start

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages