Skip to content

fix: harden io path validation and error handling#18

Merged
lipikaramaswamy merged 2 commits into
mainfrom
lipikaramaswamy/fix/io-path-hardening
Mar 4, 2026
Merged

fix: harden io path validation and error handling#18
lipikaramaswamy merged 2 commits into
mainfrom
lipikaramaswamy/fix/io-path-hardening

Conversation

@lipikaramaswamy
Copy link
Copy Markdown
Collaborator

Summary

  • Add input path validation in read_input flow: fail fast when path does not exist or is not a file.
  • Normalize read/write filesystem and pandas failures into clear InvalidInputError messages with path context.
  • Ensure output parent directories are created before writing, and add regression tests for these edge cases.

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation update
  • Refactoring

Testing

  • Tests pass locally
  • Added/updated tests for changes
    • Verified new failure modes return InvalidInputError instead of raw OS/pandas errors

Related Issues

Closes #12

@lipikaramaswamy lipikaramaswamy requested review from a team as code owners February 27, 2026 18:46
@lipikaramaswamy lipikaramaswamy force-pushed the lipikaramaswamy/fix/io-path-hardening branch from 11c96ed to 6395e62 Compare February 27, 2026 19:08
Copy link
Copy Markdown
Collaborator

@andreatgretel andreatgretel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/anonymizer/engine/io/reader.py:49

InvalidInputError is used for both validation problems and IO failures here. If a caller catches it to handle bad config, it'll also catch unrelated IO problems like permission errors or corrupt files.

Could be better to add an AnonymizerIOError(AnonymizerError) for the try/except blocks and keep InvalidInputError for the validation cases. Not blocking, just a thought for a follow-up.


src/anonymizer/engine/io/reader.py:38

nit: these exists() + is_file() checks could live as a Pydantic @field_validator on AnonymizerInput.source, so bad paths get caught at config time instead of read time. Keeps _load_dataframe focused on just parsing.


src/anonymizer/engine/io/reader.py:44 / writer.py:21

suggestion: adding logger.debug before reads/writes and logger.exception in the except blocks would help with debugging without adding noise.


src/anonymizer/engine/io/writer.py:17

nit: {".csv", ".parquet"} shows up in both reader and writer. Could pull it into a shared constant like SUPPORTED_FORMATS so adding a format later is a one-line change.

@lipikaramaswamy
Copy link
Copy Markdown
Collaborator Author

lipikaramaswamy commented Mar 2, 2026

src/anonymizer/engine/io/reader.py:49

InvalidInputError is used for both validation problems and IO failures here. If a caller catches it to handle bad config, it'll also catch unrelated IO problems like permission errors or corrupt files.

Could be better to add an AnonymizerIOError(AnonymizerError) for the try/except blocks and keep InvalidInputError for the validation cases. Not blocking, just a thought for a follow-up.

src/anonymizer/engine/io/reader.py:38

nit: these exists() + is_file() checks could live as a Pydantic @field_validator on AnonymizerInput.source, so bad paths get caught at config time instead of read time. Keeps _load_dataframe focused on just parsing.

src/anonymizer/engine/io/reader.py:44 / writer.py:21

suggestion: adding logger.debug before reads/writes and logger.exception in the except blocks would help with debugging without adding noise.

src/anonymizer/engine/io/writer.py:17

nit: {".csv", ".parquet"} shows up in both reader and writer. Could pull it into a shared constant like SUPPORTED_FORMATS so adding a format later is a one-line change.

Thanks @andreatgretel for the thoughtful review. I've addressed 3/4 items -

  • Added shared IO format constants so reader/writer don’t duplicate suffix definitions.
  • Moved source path checks (exists / is_file) into AnonymizerInput via Pydantic @field_validator, so invalid paths fail at config/input creation time.
  • Introduced AnonymizerIOError and now use it for read/write runtime failures, while keeping InvalidInputError for validation-style issues.

Let's defer logging changes to a dedicated follow-up PR so we can apply a consistent logging policy across the codebase in one diff. Can capture in #14 or a follow up to it.

@lipikaramaswamy lipikaramaswamy merged commit 046812e into main Mar 4, 2026
5 checks passed
@lipikaramaswamy lipikaramaswamy deleted the lipikaramaswamy/fix/io-path-hardening branch March 4, 2026 17:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: add input path validation in reader and writer

3 participants