-
Notifications
You must be signed in to change notification settings - Fork 538
refactor preprocessors #503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the preprocessing module by moving all preprocessing-related classes from a large monolithic file to individual files in a new top-level preprocessors folder. The refactoring improves code organization and makes it easier to add new preprocessing steps.
Key Changes:
- Split preprocessing functionality into individual files under
src/tabpfn/preprocessors/ - Updated import statements throughout the codebase to use the new module structure
- Reorganized class hierarchy to improve separation of concerns
Reviewed Changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
tests/test_preprocessing.py |
Updated imports and code formatting |
src/tabpfn/regressor.py |
Updated imports to use new preprocessors module |
src/tabpfn/preprocessors/*.py |
New individual preprocessor files with extracted classes |
src/tabpfn/preprocessing.py |
Updated imports to use new preprocessors module |
src/tabpfn/model/preprocessing.py |
Updated deprecation notice and imports |
src/tabpfn/inference.py |
Updated imports to use new preprocessors module |
src/tabpfn/architectures/base/preprocessing.py |
File emptied as part of refactoring |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the preprocessing logic from a single large file into a dedicated preprocessors package with multiple modules. This is a great improvement for code organization and maintainability. The changes are well-executed, with updated imports across the codebase. I've found one area for improvement regarding type hint consistency in the new base class for preprocessor steps, which would enhance type safety and remove the need for type: ignore suppressions in subclasses.
noahho
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! I'm suprised the Quantile change didn’t affect predictions at all - is that right (I don't see you had to update consistency files)?
Could you add an entry to the changelog file please: once refactoring, then the speed improvement but (if appropriate) flag it as potential change to behavior at e.g. large X size.
|
Thanks for the quick review, added the inconsistency file (only miniscule changes) and an entry to the changelog. |
noahho
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
* Record copied public PR 503 * refactor preprocessors (#503) (cherry picked from commit 2d5860f) * update description after failed cherry-picks from public * revert changes for copy_pr_from_public --------- Co-authored-by: mirror-bot <mirror-bot@users.noreply.github.com> Co-authored-by: Benjamin Jaeger <benjamin@priorlabs.ai> Co-authored-by: Benjamin Jaeger <jaeger.benjamin7@gmail.com>
Motivation and Context
All things related to preprocessing lived in one large file which made it hard to navigate and polluted the namespace. With this change, things are much cleaner. Each preprocessing step gets a separate file and everything is moved to a top level folder called
preprocessors. This makes it easier to add different processors.Public API Changes
How Has This Been Tested?
tests, full forward loop.
Checklist
CHANGELOG.md(if relevant for users).