FEAT Add OR-Bench dataset loader by romanlutz · Pull Request #1423 · Azure/PyRIT

romanlutz · 2026-03-01T14:25:35Z

Add remote dataset loader for OR-Bench (bench-llm/OR-Bench), an over-refusal benchmark that tests whether language models wrongly refuse safe prompts. Supports both or-bench-hard-1k and or-bench-toxic configurations.

Copilot

Pull request overview

Adds a new remote dataset loader for the HuggingFace OR-Bench benchmark (bench-llm/OR-Bench) so it can be discovered via SeedDatasetProvider and loaded as SeedDataset seeds, with support for both the or-bench-hard-1k and or-bench-toxic configurations.

Changes:

Introduces _ORBenchDataset remote loader that fetches OR-Bench from HuggingFace and converts rows into SeedPrompts.
Registers the new loader for automatic discovery and documents the new dataset name in the datasets loading notebook output.
Adds unit tests covering default loading and the toxic config path.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File	Description
`pyrit/datasets/seed_datasets/remote/or_bench_dataset.py`	Implements the OR-Bench HuggingFace-backed dataset loader and maps records into `SeedPrompt`s.
`pyrit/datasets/seed_datasets/remote/__init__.py`	Imports/exports `_ORBenchDataset` to trigger provider registration and expose it from the remote loaders package.
`tests/unit/datasets/test_or_bench_dataset.py`	Adds unit tests validating prompt mapping and config propagation to the HuggingFace fetch helper.
`doc/code/datasets/1_loading_datasets.ipynb`	Updates the displayed list of available datasets to include `or_bench`.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

pyrit/datasets/seed_datasets/remote/or_bench_dataset.py

doc/code/datasets/1_loading_datasets.ipynb

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

doc/code/datasets/1_loading_datasets.ipynb

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

doc/code/datasets/1_loading_datasets.ipynb

pyrit/datasets/seed_datasets/remote/__init__.py

Add remote dataset loader for OR-Bench (bench-llm/OR-Bench), an over-refusal benchmark that tests whether language models wrongly refuse safe prompts. Supports both or-bench-hard-1k and or-bench-toxic configurations. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…empty categories Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Each OR-Bench config gets its own loader class with a custom description, sharing common fetch logic via _ORBenchBaseDataset. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…afety_tests Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.

…e notebook Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…omanlutz/PyRIT into romanlutz/add-or-bench-dataset

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.

…h-dataset

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

pyproject.toml

doc/code/datasets/1_loading_datasets.ipynb

- Fix pyproject.toml per-file-ignore comment (import-location not ordering) - Regenerate 1_loading_datasets.ipynb with latest dataset list - Fix E712, E722, E731 violations from merged main - Fix mypy cross-platform issue in attack_manager.py Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…h-dataset # Conflicts: # pyproject.toml

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.

Copilot AI review requested due to automatic review settings March 1, 2026 14:25

Copilot started reviewing on behalf of romanlutz March 1, 2026 14:26 View session

romanlutz force-pushed the romanlutz/add-or-bench-dataset branch from fea917e to 5b341d2 Compare March 1, 2026 14:26

Copilot AI reviewed Mar 1, 2026

View reviewed changes

romanlutz force-pushed the romanlutz/add-or-bench-dataset branch from 5b341d2 to 264aec8 Compare March 2, 2026 13:05

Copilot AI review requested due to automatic review settings March 2, 2026 13:43

Copilot started reviewing on behalf of romanlutz March 2, 2026 13:44 View session

Copilot AI reviewed Mar 2, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings March 2, 2026 14:06

Copilot started reviewing on behalf of romanlutz March 2, 2026 14:07 View session

Copilot AI reviewed Mar 2, 2026

View reviewed changes

doc/code/datasets/1_loading_datasets.ipynb Show resolved Hide resolved

doc/code/datasets/1_loading_datasets.ipynb Outdated Show resolved Hide resolved

doc/code/datasets/1_loading_datasets.ipynb Outdated Show resolved Hide resolved

Copilot AI review requested due to automatic review settings March 2, 2026 15:08

Copilot started reviewing on behalf of romanlutz March 2, 2026 15:09 View session

Copilot AI reviewed Mar 2, 2026

View reviewed changes

doc/code/datasets/1_loading_datasets.ipynb Outdated Show resolved Hide resolved

pyrit/datasets/seed_datasets/remote/__init__.py Outdated Show resolved Hide resolved

romanlutz and others added 7 commits March 2, 2026 11:22

Remove dataset_name from constructor, make authors multi-line, guard …

e75bb2d

…empty categories Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Use AsyncMock for _fetch_from_huggingface in tests

5e95331

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Wrap prompt values in raw/endraw, precompute source_url and groups

10d14d7

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Split into three loaders: 80K, Hard-1K, and Toxic

47ac952

Each OR-Bench config gets its own loader class with a custom description, sharing common fetch logic via _ORBenchBaseDataset. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add license notice and content warning to docstring

170eff0

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Update notebook output for renamed OR-Bench datasets and new simple_s…

014b274

…afety_tests Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

romanlutz force-pushed the romanlutz/add-or-bench-dataset branch from 50762a2 to 014b274 Compare March 2, 2026 19:25

Merge branch 'main' into romanlutz/add-or-bench-dataset

071b502

Copilot AI review requested due to automatic review settings March 2, 2026 22:49

Copilot started reviewing on behalf of ValbuenaVC March 2, 2026 22:49 View session

ValbuenaVC approved these changes Mar 2, 2026

View reviewed changes

Copilot AI reviewed Mar 2, 2026

View reviewed changes

romanlutz and others added 3 commits March 2, 2026 15:19

fix: remove stale _ORBenchDataset from __all__

54c1610

fix: remove stale _ORBenchDataset from __all__, merge main, regenerat…

59a8894

…e notebook Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

fix: pre-commit auto-fixes

f42d23c

romanlutz and others added 3 commits March 2, 2026 15:22

fix: nbstripout cleanup

b8a2890

Merge branch 'romanlutz/add-or-bench-dataset' of https://github.com/r…

a969383

…omanlutz/PyRIT into romanlutz/add-or-bench-dataset

fix: alphabetize __all__ in remote __init__.py and add missing entries

7f36e51

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 2, 2026 23:25

Copilot started reviewing on behalf of romanlutz March 2, 2026 23:26 View session

Copilot AI reviewed Mar 2, 2026

View reviewed changes

romanlutz and others added 2 commits March 2, 2026 16:41

Merge remote-tracking branch 'origin/main' into romanlutz/add-or-benc…

ad25358

…h-dataset

fix: add E402/E501 to doc per-file-ignores

06a760e

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 3, 2026 03:57

Copilot started reviewing on behalf of romanlutz March 3, 2026 03:58 View session

Copilot AI reviewed Mar 3, 2026

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

doc/code/datasets/1_loading_datasets.ipynb Show resolved Hide resolved

doc/code/datasets/1_loading_datasets.ipynb Outdated Show resolved Hide resolved

romanlutz and others added 2 commits March 2, 2026 20:34

Merge remote-tracking branch 'origin/main' into romanlutz/add-or-benc…

44707d5

…h-dataset # Conflicts: # pyproject.toml

Copilot AI review requested due to automatic review settings March 3, 2026 04:43

Copilot started reviewing on behalf of romanlutz March 3, 2026 04:44 View session

Copilot AI reviewed Mar 3, 2026

View reviewed changes

romanlutz merged commit b265532 into Azure:main Mar 3, 2026
38 checks passed

romanlutz deleted the romanlutz/add-or-bench-dataset branch March 3, 2026 04:58

Conversation

romanlutz commented Mar 1, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants