Evaluator

An AI coding task evaluator using E2B sandboxes and LLM judges. Run a task, execute it in a secure sandbox, and evaluate the results automatically.

Quick Start

# Install
uv pip install -e .

# Run a task
evaluator "Write a Python script to calculate Fibonacci numbers"

Features

Ad-hoc Task Execution: Give a prompt, get code + evaluation.
E2B Sandbox Integration: Code runs in a secure, isolated cloud environment.
LLM-as-a-Judge: Uses Claude to evaluate the quality and correctness of the solution.
Backtesting: Run suites of regression tests defined in YAML (legacy retrocode functionality).

Usage

Ad-hoc Task

evaluator "Write a React component for a login form"

This will:

Spin up an E2B sandbox.
Use an AI agent to write the code.
Run the code (if applicable).
Evaluate the result using an LLM judge.

Running Test Suites

evaluator test --tests tests/backtests

Architecture

The system uses:

Evaluator CLI: Entry point.
AgentInvoker: Interacts with Claude to generate code.
E2BExecutor: Runs code in a sandbox.
TestRunner: Orchestrates execution and assertions.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.evaluator		.evaluator
.github/workflows		.github/workflows
docs		docs
scripts		scripts
src/evaluator		src/evaluator
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Evaluator

Quick Start

Features

Usage

Ad-hoc Task

Running Test Suites

Architecture

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

GeorgePearse/retrocode

Folders and files

Latest commit

History

Repository files navigation

Evaluator

Quick Start

Features

Usage

Ad-hoc Task

Running Test Suites

Architecture

License

About

Topics

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages