Normalize remote dataset file types from URLs by biefan · Pull Request #1486 · Azure/PyRIT

biefan · 2026-03-17T03:07:49Z

Summary

normalize remote dataset file types before handler lookup
ignore URL query strings and fragments when inferring the file extension
add regression coverage for signed-style URLs and uppercase extensions

Problem

_RemoteDatasetLoader._fetch_from_url() currently infers the file type with source.split(".")[-1]. That breaks two real input shapes:

public URLs with query strings, e.g. https://example.com/data.json?download=1
URLs or paths with uppercase extensions, e.g. https://example.com/data.JSON

Both are currently rejected with Invalid file_type before the loader even reaches the HTTP fetch path, even though the underlying data format is valid and supported.

Testing

.venv/bin/pytest tests/unit/datasets/test_remote_dataset_loader.py -q

romanlutz · 2026-03-18T14:46:40Z

This needs fixes to pre-commit steps

Normalize remote dataset file types from URLs

3db0da1

romanlutz approved these changes Mar 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Normalize remote dataset file types from URLs#1486

Normalize remote dataset file types from URLs#1486
biefan wants to merge 1 commit intoAzure:mainfrom
biefan:fix-remote-dataset-url-file-types

biefan commented Mar 17, 2026

Uh oh!

romanlutz commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

biefan commented Mar 17, 2026

Summary

Problem

Testing

Uh oh!

romanlutz commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants