fix: harden io path validation and error handling by lipikaramaswamy · Pull Request #18 · NVIDIA-NeMo/Anonymizer

lipikaramaswamy · 2026-02-27T18:46:16Z

Summary

Add input path validation in read_input flow: fail fast when path does not exist or is not a file.
Normalize read/write filesystem and pandas failures into clear InvalidInputError messages with path context.
Ensure output parent directories are created before writing, and add regression tests for these edge cases.

Type of Change

Testing

Tests pass locally
Added/updated tests for changes
- Verified new failure modes return InvalidInputError instead of raw OS/pandas errors

Related Issues

Closes #12

andreatgretel

src/anonymizer/engine/io/reader.py:49

InvalidInputError is used for both validation problems and IO failures here. If a caller catches it to handle bad config, it'll also catch unrelated IO problems like permission errors or corrupt files.

Could be better to add an AnonymizerIOError(AnonymizerError) for the try/except blocks and keep InvalidInputError for the validation cases. Not blocking, just a thought for a follow-up.

src/anonymizer/engine/io/reader.py:38

nit: these exists() + is_file() checks could live as a Pydantic @field_validator on AnonymizerInput.source, so bad paths get caught at config time instead of read time. Keeps _load_dataframe focused on just parsing.

src/anonymizer/engine/io/reader.py:44 / writer.py:21

suggestion: adding logger.debug before reads/writes and logger.exception in the except blocks would help with debugging without adding noise.

src/anonymizer/engine/io/writer.py:17

nit: {".csv", ".parquet"} shows up in both reader and writer. Could pull it into a shared constant like SUPPORTED_FORMATS so adding a format later is a one-line change.

lipikaramaswamy · 2026-03-02T23:29:18Z

src/anonymizer/engine/io/reader.py:49

InvalidInputError is used for both validation problems and IO failures here. If a caller catches it to handle bad config, it'll also catch unrelated IO problems like permission errors or corrupt files.

Could be better to add an AnonymizerIOError(AnonymizerError) for the try/except blocks and keep InvalidInputError for the validation cases. Not blocking, just a thought for a follow-up.

src/anonymizer/engine/io/reader.py:38

nit: these exists() + is_file() checks could live as a Pydantic @field_validator on AnonymizerInput.source, so bad paths get caught at config time instead of read time. Keeps _load_dataframe focused on just parsing.

src/anonymizer/engine/io/reader.py:44 / writer.py:21

suggestion: adding logger.debug before reads/writes and logger.exception in the except blocks would help with debugging without adding noise.

src/anonymizer/engine/io/writer.py:17

nit: {".csv", ".parquet"} shows up in both reader and writer. Could pull it into a shared constant like SUPPORTED_FORMATS so adding a format later is a one-line change.

Thanks @andreatgretel for the thoughtful review. I've addressed 3/4 items -

Added shared IO format constants so reader/writer don’t duplicate suffix definitions.
Moved source path checks (exists / is_file) into AnonymizerInput via Pydantic @field_validator, so invalid paths fail at config/input creation time.
Introduced AnonymizerIOError and now use it for read/write runtime failures, while keeping InvalidInputError for validation-style issues.

Let's defer logging changes to a dedicated follow-up PR so we can apply a consistent logging policy across the codebase in one diff. Can capture in #14 or a follow up to it.

lipikaramaswamy requested review from a team as code owners February 27, 2026 18:46

fix: validate io paths and normalize pandas filesystem errors

6395e62

lipikaramaswamy force-pushed the lipikaramaswamy/fix/io-path-hardening branch from 11c96ed to 6395e62 Compare February 27, 2026 19:08

andreatgretel reviewed Feb 27, 2026

View reviewed changes

asteier2026 approved these changes Feb 27, 2026

View reviewed changes

fix: refine per pr feedback

1c19fe5

lipikaramaswamy requested a review from andreatgretel March 2, 2026 23:29

lipikaramaswamy merged commit 046812e into main Mar 4, 2026
5 checks passed

lipikaramaswamy deleted the lipikaramaswamy/fix/io-path-hardening branch March 4, 2026 17:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: harden io path validation and error handling#18

fix: harden io path validation and error handling#18
lipikaramaswamy merged 2 commits into
mainfrom
lipikaramaswamy/fix/io-path-hardening

lipikaramaswamy commented Feb 27, 2026

Uh oh!

andreatgretel left a comment

Uh oh!

lipikaramaswamy commented Mar 2, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

lipikaramaswamy commented Feb 27, 2026

Summary

Type of Change

Testing

Related Issues

Uh oh!

andreatgretel left a comment

Choose a reason for hiding this comment

Uh oh!

lipikaramaswamy commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lipikaramaswamy commented Mar 2, 2026 •

edited

Loading