CaseLawQA is an environment for evaluating legal text classification capabilities. It contains approximately 29,000 legal opinion analysis tasks from Supreme Court and Court of Appeals cases. Tasks cover 212-243 distinct legal classification categories including case jurisdiction, procedural posture, parties, legal issues, judicial decisions, and ideological direction.
- Legal text comprehension and classification
- Case analysis and jurisdiction identification
- Procedural and substantive law understanding
- Multiple-choice legal reasoning
Agents are given a standard environment with no sandbox or file system access.
MIT.
There are three splits in this environment:
- train: 9,693 tasks
- val: 9,693 tasks
- test: 9,674 tasks
Questions are multiple-choice format with 2-311 options per question depending on the classification task type.
This is a single-turn environment. The agent submits a choice index via the submit_answer tool. Answer verification is deterministic exact match against the correct choice index. Reward is binary: 1.0 if correct, 0.0 if incorrect.
Data consists of processed Parquet files sourced from HuggingFace ricdomolm/caselawqa-8k. Each row contains a legal opinion text, question, multiple-choice options, and correct answer index. Data is stored on the OpenReward platform.
| Tool | Description |
|---|---|
submit_answer |
Submit the choice index (0 for A, 1 for B, etc.). Ends the episode. |
Single-turn. The agent reads the legal opinion and question, then submits one choice.
CaseLawQA evaluates legal text classification across 200+ legal task types with varying numbers of answer choices.
No other secrets required other than OpenReward API key.
Agents in CaseLawQA analyze legal texts in a standard environment. The environment does not present direct safety risks.
@dataset{caselawqa2024,
title={CaseLawQA: Legal Opinion Classification Dataset},
author={HuggingFace},
url={https://huggingface.co/datasets/ricdomolm/caselawqa-8k},
year={2024}
}