Jujutsu Benchmark

This repository contains benchmarks for Jujutsu (jj), a next-generation version control system.

You can view the evaluation reports at tabbyml.github.io/jj-benchmark.

Project Structure

tasks/: Contains the benchmark tasks, each with its own instructions, bootstrap scripts, and tests.
jobs/: Stores the results of benchmark runs.
site/: A Next.js application to visualize benchmark results.

Contribution

This benchmark is evaluated with the Harbor framework and the Pochi agent.

Here is an example of running the evaluation with a built-in agent (e.g., Codex) and Daytona:

harbor run \
    --agent codex \
    --model "gpt-5.2-codex" \
    --env daytona \
    --path ./tasks \
    --n-attempts 1 \
    --max-retries 5 \
    --n-concurrent 5 \
    --retry-include RuntimeError \
    --retry-include DaytonaError \
    --retry-include AgentTimeoutError

Evaluation Details

Before starting the evaluation, you should set the necessary environment variables. For example, when using Codex, you should export OPENAI_API_KEY before running Harbor. If using Pochi, you should export POCHI_API_KEY, etc.

Evaluation can be run locally with Docker, Daytona.io, or other cloud services by using the -e or --env arguments with values like docker or daytona (docker is the default).

When running with Daytona, please note that Daytona blocks some network access for tier 1 and tier 2 users. If you meet any network issues, please refer to Daytona network limits.

People are welcome to contribute with built-in agents (e.g., supporting claude-code, codex, gemini-cli, etc.) using the --agent or -a arguments, or other custom agents like the Pochi agent.

For running the Pochi agent specifically, you should use --agent-import-path to point to the path of the Pochi agent, such as agents.pochi:Pochi, where agents.pochi is the import path and Pochi is the class name of the Pochi agent.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
.github/workflows		.github/workflows
jobs		jobs
scratchpad		scratchpad
site		site
tasks		tasks
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Jujutsu Benchmark

Project Structure

Contribution

Evaluation Details

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Jujutsu Benchmark

Project Structure

Contribution

Evaluation Details

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages