AIDE Parallel runs AIDE experiments locally or on a Ray cluster. The simplest first run is the attention task on CPU. KernelBench is available, but it requires a GPU environment and is not the recommended first-run path.
Local and cluster execution now share the same interface:
- use
--localfor one machine - use
--cluster-config <yaml>for a submit-only machine targeting a remote Ray cluster
Use Python 3.10 or newer. aideml will not install on older interpreters.
The commands below use python3.12 as an example; replace it with any installed Python 3.10+ binary.
Create a virtual environment and install the repo:
python3.12 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip setuptools wheel
python -m pip install -r requirements.txt
python -m pip install -e ./aidemlCopy the environment template:
cp .env.example .envSet one provider and one model pair in .env.
Groq example:
GROQ_API_KEY=...
OPENAI_BASE_URL=https://api.groq.com/openai/v1
AIDE_MODEL=llama-3.3-70b-versatile
AIDE_FEEDBACK_MODEL=llama-3.3-70b-versatileAnthropic example:
ANTHROPIC_API_KEY=...
AIDE_MODEL=claude-sonnet-4-20250514
AIDE_FEEDBACK_MODEL=claude-sonnet-4-20250514Optional MLflow tracking:
AIDE_ENABLE_MLFLOW=1
MLFLOW_EXPERIMENT_NAME=aide-publicIf MLFLOW_TRACKING_URI is unset, runs are stored locally under mlruns/.
For longer-lived tracking, prefer setting MLFLOW_TRACKING_URI to a real MLflow server or database-backed deployment.
Run the deterministic setup check:
./cli/aide-checkRun one local optimization step:
AIDE_ATTENTION_FAST_EVAL=1 ./cli/aide-run --local --task attention --num-experiments 1 --num-iterations 1 --steps 1- Use
./cli/aide-checkfirst. It verifies imports and runs a baseline attention evaluation on CPU. - Use the attention task for first run. It now auto-prepares a tiny Shakespeare dataset if the wiki dataset is missing.
- Set
AIDE_MODELandAIDE_FEEDBACK_MODELexplicitly. Do not rely on provider-specific default model availability. - For the current repo state, Groq and Anthropic are the most reliable provider paths.
- Enable
AIDE_ENABLE_MLFLOW=1if you want experiment tracking. Local MLflow works without a server.
KernelBench now runs only in a strict benchmark mode. It needs a CUDA-capable GPU setup plus an explicit baseline contract.
Validate the strict environment first:
./cli/aide-kernelbench-validate-env --kb-reference-baseline H100_PCIe_LambdaLabsOr generate a local eager baseline on your hardware:
./cli/aide-kernelbench-generate-baseline --hardware-name MY_GPURun a local KernelBench job:
./cli/aide-run --local --task kernelbench --kb-task 1_19 --kb-reference-baseline H100_PCIe_LambdaLabs --num-experiments 1 --num-iterations 1 --steps 1Bring up a submit-only 2x8 GPU Ray cluster from a topology file:
./cli/aide-cluster-up --cluster-config configs/cluster.2x8gpu.example.yaml
./cli/aide-cluster-status --cluster-config configs/cluster.2x8gpu.example.yamlRun the same strict KernelBench job on that cluster:
./cli/aide-run --task kernelbench --kb-task 1_19 --kb-reference-baseline H100_PCIe_LambdaLabs --cluster-config configs/cluster.2x8gpu.example.yaml --num-experiments 4 --num-iterations 1 --steps 4Run a resumable strict KernelBench campaign with the same interface shape as AlgoTune:
./cli/run-kb-sequence all --local --kb-reference-baseline H100_PCIe_LambdaLabs --num-experiments 4 --max-concurrent-tasks 4Or against the remote Ray cluster:
./cli/run-kb-sequence all --cluster-config configs/cluster.2x8gpu.example.yaml --kb-reference-baseline H100_PCIe_LambdaLabs --num-experiments 4 --max-concurrent-tasks 4For Linux GPU nodes, install CUDA-specific PyTorch wheels separately, for example:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121AlgoTune is available as an optional advanced benchmark and is not part of the default first-run path.
Use a separate Python 3.10 environment for it:
python3.10 -m venv .venv-algotune
source .venv-algotune/bin/activate
python -m pip install --upgrade pip setuptools wheel
python -m pip install -r requirements.txt
python -m pip install -r requirements-algotune.txt
python -m pip install -e ./aidemlThen run a small local smoke test:
./cli/aide-run --local --task algotune --at-task kmeans --num-experiments 1 --num-iterations 1 --steps 1 --cpus-per-experiment 2Validate the strict benchmark environment before any publishable AlgoTune run:
./cli/aide-algotune-validate-envFetch the local Hugging Face snapshot explicitly before strict benchmark runs:
export ALGOTUNE_HF_REVISION='fc3744ffd7eebaa9e9b55427e2cda440955fdd2d'
./cli/aide-algotune-fetch-dataset --task kmeansRun a strict held-out benchmark task:
./cli/aide-run --local --task algotune --at-task kmeans --num-experiments 1 --num-iterations 1 --steps 1 --cpus-per-experiment 2The repo now exposes only the publishable AlgoTune path. It validates the Python 3.10 environment up front, searches on the train split, runs one held-out test evaluation at the end, and rejects repo-side timeout/compatibility shortcuts.
If you source datasets from Hugging Face for strict runs, pin ALGOTUNE_HF_REVISION to a non-main revision and prefetch the local snapshot so the benchmark run itself does not perform network downloads.
The shared AlgoTune env is only valid if ./cli/aide-algotune-validate-env still passes after aideml is installed.
Run a resumable sweep across the full AlgoTune inventory:
./cli/run-at-sequence all --local --profile coverageControl per-task attempts and concurrent tasks independently:
./cli/run-at-sequence all --local --profile coverage --attempts-per-task 4 --max-concurrent-tasks 4Or submit the same sweep to a remote Ray cluster:
./cli/run-at-sequence all --cluster-config configs/cluster.2x8gpu.example.yaml --profile coverage --attempts-per-task 4 --max-concurrent-tasks 4More details: tasks/algotune/README.md
If you want notebooks, tracing, or extra benchmarking tools:
python -m pip install -r requirements-optional.txtFor the optional AlgoTune benchmark:
python -m pip install -r requirements-algotune.txt
python -m pip install -e ./aideml./cli/aide-check: validate the local install with a deterministic CPU run./cli/aide-algotune-validate-env: validate the strict AlgoTune benchmark environment./cli/aide-kernelbench-validate-env: validate the strict KernelBench benchmark environment./cli/aide-kernelbench-generate-baseline --hardware-name NAME: generate a strict eager KernelBench baseline artifact./cli/aide-run: run AIDE experiments./cli/aide-run --task algotune --at-task <task>: run an AlgoTune task locally or on Ray./cli/aide-run --task kernelbench --kb-task <task> --kb-reference-baseline <name>: run a strict KernelBench task locally or on Ray./cli/run-at-sequence all --profile coverage: run a resumable AlgoTune sweep./cli/run-at-sequence all --attempts-per-task N --max-concurrent-tasks M: control AlgoTune search depth and sweep concurrency./cli/run-kb-sequence all --num-experiments N --max-concurrent-tasks M: run a resumable strict KernelBench sweep./cli/aide-cluster-up --cluster-config FILE: start a Ray cluster from a topology YAML./cli/aide-cluster-status --cluster-config FILE: inspect Ray cluster CPUs/GPUs./cli/aide-cluster-down --cluster-config FILE: stop a Ray cluster from a topology YAML