FEAT adding transphobia awareness dataset #989

varshini2305 · 2025-06-25T20:26:59Z

Add Transphobia Awareness Dataset Integration and Unit Test to PyRIT

Summary

This introduces the integration of the Transphobia Awareness Dataset into PyRIT, enabling users to fetch, analyze, and utilize this dataset for LLM safety and inclusivity evaluation. It also includes a unit test to check if the dataset loader works as expected.

Key Changes

1. New Dataset Loader

File: pyrit/datasets/transphobia_awareness_dataset.py
- Implements the function fetch_transphobia_awareness_dataset.
- Fetches and parses the transphobia awareness dataset from public URLs (Ratings.xlsx and optionally others).
- Extracts relevant fields (question, keyword, responses, ratings, etc.) and wraps each entry as a SeedPrompt with appropriate metadata and harm categories.
- Returns a SeedPromptDataset with all prompts and dataset-level metadata.
- Harm categories are populated from unique keywords in the dataset, always including "transphobia".

2. Unit Test

File: tests/unit/datasets/test_transphobia_awareness_dataset.py
- Verifies the fetch function returns a SeedPromptDataset with valid prompts and correct harm categories.
- Checks that the generated SeedPrompt matches the mock data, ensuring the loader logic is valid and not dependent on external data sources.

How to Test

Run the unit test:
```
pytest tests/unit/datasets/test_transphobia_awareness_dataset.py -v
```
- This will check both real and mocked data loading, prompt structure, and harm category correctness.
Try the example script (if present):
```
python examples/transphobia_awareness_example.py
```
- This will print dataset stats and sample prompts.

Why is this Important?

Expands PyRIT’s dataset coverage with a real-world, socially relevant dataset.
Enables LLM safety and inclusivity evaluation for transphobia-related queries.

Checklist

Dataset loader implemented
Unit test with mocked data
All tests passed

References

@romanlutz Requesting review.

romanlutz

Nice work! Just a few small things for cleanup. Glad to have this dataset added!

doc/code/datasets/transphobia_awareness_dataset.md

pyrit/datasets/seed_prompts/__init__.py

pyrit/datasets/__init__.py

pyrit/datasets/transphobia_awareness_dataset.py

tests/unit/datasets/test_transphobia_awareness_dataset.py

varshini2305 · 2025-06-27T18:21:29Z

@romanlutz Sorry, I thought I was resolving conflicts and merging it to my forked repo, didn't realize I was attempting to merge to the azure pyrit's main branch

varshini2305 · 2025-06-28T17:30:45Z

@microsoft-github-policy-service agree

varshini2305 · 2025-07-02T14:48:37Z

@microsoft-github-policy-service agree

romanlutz

Looks great! Just a few more small things

pyrit/datasets/transphobia_awareness_dataset.py

integrating transphobia awareness dataset

d12ca71

romanlutz self-assigned this Jun 26, 2025

romanlutz changed the title ~~Integrating transphobia awareness dataset~~ FEAT adding transphobia awareness dataset Jun 26, 2025

romanlutz reviewed Jun 26, 2025

View reviewed changes

vb-creator and others added 2 commits June 27, 2025 11:14

Updates: based on PR review

72ecedf

Merge branch 'main' into main

310a072

vb-creator and others added 5 commits July 2, 2025 08:43

rm unit test with fetch - for transphobia dataset

b279b76

Merge branch 'main' into main

dfbe6e7

comma separated author names

6b300a6

Merge branch 'main' into main

c1d6244

removing long comments, and word wrapping to keep <120 chars

d4b8d09

romanlutz reviewed Jul 2, 2025

View reviewed changes

vb-creator and others added 2 commits July 4, 2025 19:03

minor refactors to clean up and format

4fd4fcb

Merge branch 'main' into main

2d83daa

romanlutz approved these changes Jul 5, 2025

View reviewed changes

pyrit/datasets/transphobia_awareness_dataset.py Outdated Show resolved Hide resolved

vb-creator added 3 commits July 5, 2025 18:27

rm fn args - fetch_transphobia_awareness_dataset

6ae71d4

fix keyword err

2d44eec

rm Optional

a8b4c2e

romanlutz merged commit 90c86c7 into Azure:main Jul 7, 2025
20 checks passed

FEAT adding transphobia awareness dataset #989

FEAT adding transphobia awareness dataset #989

Uh oh!

Conversation

varshini2305 commented Jun 25, 2025

Summary

Key Changes

1. New Dataset Loader

2. Unit Test

How to Test

Why is this Important?

Checklist

References

Uh oh!

romanlutz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

varshini2305 commented Jun 27, 2025

Uh oh!

varshini2305 commented Jun 28, 2025

Uh oh!

varshini2305 commented Jul 2, 2025

Uh oh!

romanlutz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants