Autooptimizer

Automated LLM serving parameter optimization using LLM agents.

Inspired by Andrej Karpathy's autoresearch, which lets AI agents autonomously iterate on LLM training code to minimize validation loss. Autooptimizer applies the same autonomous experiment loop idea to a different domain: instead of optimizing training code, it optimizes serving parameters to maximize inference throughput and minimize latency.

Supported Frameworks

Framework	Status
vLLM	Supported
SGLang	Supported
TensorRT-LLM	Planned

Overview

An LLM agent iteratively:

Proposes hypotheses for improving serving performance
Modifies the framework's serve configuration
Starts the server, runs benchmarks, records results
Keeps improvements, reverts failures
Loops indefinitely until manually stopped

Goal: Maximize score (throughput / latency composite metric).

Project Structure

autooptimizer/
├── memory/                              # Agent instructions and context
│   ├── overview.md                      # Target model, framework, goal, best known config
│   ├── experiment_loop.md               # Experiment workflow
│   ├── setup.md                         # How to start a new run
│   ├── rules.md                         # Allowed/prohibited changes
│   ├── search_strategy_vllm.md          # Hypothesis strategy for vLLM params
│   └── search_strategy_sglang.md        # Hypothesis strategy for SGLang params
├── project/
│   ├── edit/
│   │   ├── vllm_serve_config.sh         # Editable vLLM serve command
│   │   └── sglang_serve_config.sh       # Editable SGLang serve command
│   └── no_edit/
│       ├── benchmark.py                 # Fixed benchmark runner & scoring
│       └── adapters/                    # Framework adapters (routing logic)
│           ├── __init__.py
│           ├── vllm.py
│           └── sglang.py
├── artifacts/
│   ├── hypothesis_backlog.tsv           # Experiment queue
│   └── results.tsv                      # Experiment results log
├── pyproject.toml
└── README.md

Quick Start

1. Install dependencies

# For vLLM optimization
uv sync --extra vllm

# For SGLang optimization
uv sync --extra sglang

# Both
uv sync --extra vllm --extra sglang

2. Set the framework

In memory/overview.md, set the FRAMEWORK line:

FRAMEWORK=vllm

or

FRAMEWORK=sglang

3. Start the agent

cursor agent --yolo

"Hi, refresh your memory with .md files in @memory folder, and let's kick off a new experiment! Let's do the setup first."

The agent will:

Create a new experiment branch (e.g., autooptimizer/apr11)
Run baseline benchmark
Begin the experiment loop

4. Manual experiment

# Start server (example for vLLM)
bash project/edit/vllm_serve_config.sh > server.log 2>&1 &

# Wait + benchmark + kill
uv run project/no_edit/benchmark.py run

Configuration

Set the target model and framework in memory/overview.md:

MODEL=Qwen/Qwen2.5-1.5B-Instruct
FRAMEWORK=vllm

If you have a known-good configuration from previous experiments, update the "Best Known Configuration" section in memory/overview.md.

Constraints

Editable: Only the active framework's serve config in project/edit/
Metric: Optimize score only (higher is better)
Model: Fixed per experiment run (set in memory/overview.md)
Framework: Fixed per experiment run (set in memory/overview.md)

Results Format

Tab-separated file (artifacts/results.tsv):

experiment    score    memory_gb    status    description

Status values: keep, discard, crash

Roadmap

Current

Autonomous experiment loop with LLM agent
vLLM serve parameter tuning
SGLang serve parameter tuning
Multi-framework support with adapter pattern
Deterministic benchmarking with fixed seeds
Hypothesis backlog with priority-based ordering
Automatic keep/revert based on score improvement

Planned

TensorRT-LLM support
Unified cross-framework benchmark comparison (--unified flag)
Smarter search — Bayesian optimization, parameter interaction detection, Pareto frontier visualization
Hardware-aware profiles — GPU auto-detection, per-family defaults (A100, H100, L40S, ...)
Production tooling — web dashboard, exportable configs (Docker/K8s), CI/CD integration

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
memory		memory
project		project
tests		tests
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Autooptimizer

Supported Frameworks

Overview

Project Structure

Quick Start

1. Install dependencies

2. Set the framework

3. Start the agent

4. Manual experiment

Configuration

Constraints

Results Format

Roadmap

Current

Planned

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Autooptimizer

Supported Frameworks

Overview

Project Structure

Quick Start

1. Install dependencies

2. Set the framework

3. Start the agent

4. Manual experiment

Configuration

Constraints

Results Format

Roadmap

Current

Planned

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages