Skip to content

Conversation

@varshini2305
Copy link
Contributor

Add Transphobia Awareness Dataset Integration and Unit Test to PyRIT


Summary

This introduces the integration of the Transphobia Awareness Dataset into PyRIT, enabling users to fetch, analyze, and utilize this dataset for LLM safety and inclusivity evaluation. It also includes a unit test to check if the dataset loader works as expected.


Key Changes

1. New Dataset Loader

  • File: pyrit/datasets/transphobia_awareness_dataset.py
    • Implements the function fetch_transphobia_awareness_dataset.
    • Fetches and parses the transphobia awareness dataset from public URLs (Ratings.xlsx and optionally others).
    • Extracts relevant fields (question, keyword, responses, ratings, etc.) and wraps each entry as a SeedPrompt with appropriate metadata and harm categories.
    • Returns a SeedPromptDataset with all prompts and dataset-level metadata.
    • Harm categories are populated from unique keywords in the dataset, always including "transphobia".

2. Unit Test

  • File: tests/unit/datasets/test_transphobia_awareness_dataset.py
    • Verifies the fetch function returns a SeedPromptDataset with valid prompts and correct harm categories.
    • Checks that the generated SeedPrompt matches the mock data, ensuring the loader logic is valid and not dependent on external data sources.

How to Test

  1. Run the unit test:

    pytest tests/unit/datasets/test_transphobia_awareness_dataset.py -v
    • This will check both real and mocked data loading, prompt structure, and harm category correctness.
  2. Try the example script (if present):

    python examples/transphobia_awareness_example.py
    • This will print dataset stats and sample prompts.

Why is this Important?

  • Expands PyRIT’s dataset coverage with a real-world, socially relevant dataset.
  • Enables LLM safety and inclusivity evaluation for transphobia-related queries.

Checklist

  • Dataset loader implemented
  • Unit test with mocked data
  • All tests passed

References

@romanlutz Requesting review.

@romanlutz romanlutz self-assigned this Jun 26, 2025
@romanlutz romanlutz changed the title Integrating transphobia awareness dataset FEAT adding transphobia awareness dataset Jun 26, 2025
Copy link
Contributor

@romanlutz romanlutz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Just a few small things for cleanup. Glad to have this dataset added!

@varshini2305
Copy link
Contributor Author

@romanlutz Sorry, I thought I was resolving conflicts and merging it to my forked repo, didn't realize I was attempting to merge to the azure pyrit's main branch

@varshini2305
Copy link
Contributor Author

@microsoft-github-policy-service agree

@varshini2305
Copy link
Contributor Author

@microsoft-github-policy-service agree

Copy link
Contributor

@romanlutz romanlutz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Just a few more small things

@romanlutz romanlutz merged commit 90c86c7 into Azure:main Jul 7, 2025
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants