Skip to content

TabbyML/jj-benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Jujutsu Benchmark

This repository contains benchmarks for Jujutsu (jj), a next-generation version control system.

You can view the evaluation reports at tabbyml.github.io/jj-benchmark.

Project Structure

  • tasks/: Contains the benchmark tasks, each with its own instructions, bootstrap scripts, and tests.
  • jobs/: Stores the results of benchmark runs.
  • site/: A Next.js application to visualize benchmark results.

Contribution

This benchmark is evaluated with the Harbor framework and the Pochi agent.

Here is an example of running the evaluation with a built-in agent (e.g., Codex) and Daytona:

harbor run \
    --agent codex \
    --model "gpt-5.2-codex" \
    --env daytona \
    --path ./tasks \
    --n-attempts 1 \
    --max-retries 5 \
    --n-concurrent 5 \
    --retry-include RuntimeError \
    --retry-include DaytonaError \
    --retry-include AgentTimeoutError

Evaluation Details

Before starting the evaluation, you should set the necessary environment variables. For example, when using Codex, you should export OPENAI_API_KEY before running Harbor. If using Pochi, you should export POCHI_API_KEY, etc.

Evaluation can be run locally with Docker, Daytona.io, or other cloud services by using the -e or --env arguments with values like docker or daytona (docker is the default).

When running with Daytona, please note that Daytona blocks some network access for tier 1 and tier 2 users. If you meet any network issues, please refer to Daytona network limits.

People are welcome to contribute with built-in agents (e.g., supporting claude-code, codex, gemini-cli, etc.) using the --agent or -a arguments, or other custom agents like the Pochi agent.

For running the Pochi agent specifically, you should use --agent-import-path to point to the path of the Pochi agent, such as agents.pochi:Pochi, where agents.pochi is the import path and Pochi is the class name of the Pochi agent.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors