PRBench

Description

PRBench (Professional Reasoning Benchmark) is an environment for evaluating high-stakes professional reasoning in finance and legal domains. It contains 1,650 expert-authored conversations with detailed evaluation rubrics (10-30 weighted criteria per task) covering market microstructure, risk management, regulatory compliance, contract analysis, and case law application.

Capabilities

Professional reasoning in finance and legal domains
Multi-turn conversational context understanding
Expert-level analysis with reference material synthesis
Rubric-based quality evaluation

Compute Requirements

Agents are given a standard environment with no sandbox or file system access.

License

CC BY 4.0.

Tasks

There is one split in this environment:

test: 1,650 tasks (combined from finance, legal, finance_hard, legal_hard)

Tasks include full conversation history, reference texts, and hidden rubric criteria.

Reward Structure

This is a single-turn environment. The agent submits a response via the submit_response tool. An LLM grader (gpt-5-mini) evaluates against 10-30 weighted rubric criteria with importance levels:

Critically Important (8-10 points): Core requirements
Important (4-7 points): Significant contributions
Slightly Important (2-3 points): Nice-to-have additions
Detrimental (-6 to -8 points): Harmful if present (penalty)

Scoring uses the clipped score formula: raw_points / max_positive_points, resulting in reward from 0.0 to 1.0+.

Data

Data consists of Parquet files (finance.parquet, legal.parquet, finance_hard.parquet, legal_hard.parquet) sourced from HuggingFace ScaleAI/PRBench. Each row contains conversation history, reference texts, and rubric criteria. Data is stored on the OpenReward platform.

Tools

Tool	Description
`submit_response`	Submit your professional response for rubric-based evaluation. Ends the episode.

Time Horizon

Single-turn. The agent reads the conversation context and reference materials, then submits one professional response.

Environment Difficulty

PRBench evaluates expert-level professional reasoning with weighted multi-criteria rubrics in finance and legal domains.

Other Environment Requirements

OpenAI API key required for LLM-based grading. Pass via secrets={"openai_api_key": "..."}.

Safety

Agents in PRBench produce professional analysis in a standard environment. Responses are for evaluation purposes only and should not be used for actual financial or legal advice.

Citation

@article{akyurek2025prbench,
  title={PRBench: Large-Scale Expert Rubrics for Evaluating High-Stakes Professional Reasoning},
  author={Afra Feyza Aky{\"u}rek and Advait Gosai and Chen Bo Calvin Zhang and Vipul Gupta and Jaehwan Jeong and Anisha Gunjal and Tahseen Rabbani and Maria Mazzone and David Randolph and Mohammad Mahmoudi Meymand and Gurshaan Chattha and Paula Rodriguez and Diego Mares and Pavit Singh and Michael Liu and Subodh Chawla and Pete Cline and Lucy Ogaz and Ernesto Hernandez and Zihao Wang and Pavi Bhatter and Marcos Ayestaran and Bing Liu and Yunzhong He},
  journal={arXiv preprint arXiv:2511.11562},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
DATA_UPLOAD.md		DATA_UPLOAD.md
Dockerfile		Dockerfile
README.md		README.md
download_data.py		download_data.py
prbench.py		prbench.py
requirements.txt		requirements.txt
server.py		server.py
test_agent.py		test_agent.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PRBench

Description

Capabilities

Compute Requirements

License

Tasks

Reward Structure

Data

Tools

Time Horizon

Environment Difficulty

Other Environment Requirements

Safety

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PRBench

Description

Capabilities

Compute Requirements

License

Tasks

Reward Structure

Data

Tools

Time Horizon

Environment Difficulty

Other Environment Requirements

Safety

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages