fix: harden io path validation and error handling#18
Conversation
11c96ed to
6395e62
Compare
andreatgretel
left a comment
There was a problem hiding this comment.
src/anonymizer/engine/io/reader.py:49
InvalidInputError is used for both validation problems and IO failures here. If a caller catches it to handle bad config, it'll also catch unrelated IO problems like permission errors or corrupt files.
Could be better to add an AnonymizerIOError(AnonymizerError) for the try/except blocks and keep InvalidInputError for the validation cases. Not blocking, just a thought for a follow-up.
src/anonymizer/engine/io/reader.py:38
nit: these exists() + is_file() checks could live as a Pydantic @field_validator on AnonymizerInput.source, so bad paths get caught at config time instead of read time. Keeps _load_dataframe focused on just parsing.
src/anonymizer/engine/io/reader.py:44 / writer.py:21
suggestion: adding logger.debug before reads/writes and logger.exception in the except blocks would help with debugging without adding noise.
src/anonymizer/engine/io/writer.py:17
nit: {".csv", ".parquet"} shows up in both reader and writer. Could pull it into a shared constant like SUPPORTED_FORMATS so adding a format later is a one-line change.
Thanks @andreatgretel for the thoughtful review. I've addressed 3/4 items -
Let's defer logging changes to a dedicated follow-up PR so we can apply a consistent logging policy across the codebase in one diff. Can capture in #14 or a follow up to it. |
Summary
read_inputflow: fail fast when path does not exist or is not a file.InvalidInputErrormessages with path context.Type of Change
Testing
InvalidInputErrorinstead of raw OS/pandas errorsRelated Issues
Closes #12