Skip to content

Conversation

@bejaeger
Copy link
Contributor

Motivation and Context

All things related to preprocessing lived in one large file which made it hard to navigate and polluted the namespace. With this change, things are much cleaner. Each preprocessing step gets a separate file and everything is moved to a top level folder called preprocessors. This makes it easier to add different processors.


Public API Changes

  • No Public API changes
  • Yes, Public API changes (Details below)

How Has This Been Tested?

tests, full forward loop.


Checklist

  • The changes have been tested locally.
  • Documentation has been updated (if the public API or usage changes).
  • A entry has been added to CHANGELOG.md (if relevant for users).
  • The code follows the project's style guidelines.
  • I have considered the impact of these changes on the public API.

Copilot AI review requested due to automatic review settings September 11, 2025 11:26
@CLAassistant
Copy link

CLAassistant commented Sep 11, 2025

CLA assistant check
All committers have signed the CLA.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the preprocessing module by moving all preprocessing-related classes from a large monolithic file to individual files in a new top-level preprocessors folder. The refactoring improves code organization and makes it easier to add new preprocessing steps.

Key Changes:

  • Split preprocessing functionality into individual files under src/tabpfn/preprocessors/
  • Updated import statements throughout the codebase to use the new module structure
  • Reorganized class hierarchy to improve separation of concerns

Reviewed Changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/test_preprocessing.py Updated imports and code formatting
src/tabpfn/regressor.py Updated imports to use new preprocessors module
src/tabpfn/preprocessors/*.py New individual preprocessor files with extracted classes
src/tabpfn/preprocessing.py Updated imports to use new preprocessors module
src/tabpfn/model/preprocessing.py Updated deprecation notice and imports
src/tabpfn/inference.py Updated imports to use new preprocessors module
src/tabpfn/architectures/base/preprocessing.py File emptied as part of refactoring

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the preprocessing logic from a single large file into a dedicated preprocessors package with multiple modules. This is a great improvement for code organization and maintainability. The changes are well-executed, with updated imports across the codebase. I've found one area for improvement regarding type hint consistency in the new base class for preprocessor steps, which would enhance type safety and remove the need for type: ignore suppressions in subclasses.

@bejaeger bejaeger requested a review from noahho September 11, 2025 11:47
Copy link
Collaborator

@noahho noahho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! I'm suprised the Quantile change didn’t affect predictions at all - is that right (I don't see you had to update consistency files)?
Could you add an entry to the changelog file please: once refactoring, then the speed improvement but (if appropriate) flag it as potential change to behavior at e.g. large X size.

@bejaeger
Copy link
Contributor Author

Thanks for the quick review, added the inconsistency file (only miniscule changes) and an entry to the changelog.

@bejaeger bejaeger requested a review from noahho September 12, 2025 07:53
Copy link
Collaborator

@noahho noahho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@bejaeger bejaeger merged commit 2d5860f into main Sep 12, 2025
10 checks passed
oscarkey pushed a commit that referenced this pull request Nov 12, 2025
* Record copied public PR 503

* refactor preprocessors (#503)

(cherry picked from commit 2d5860f)

* update description after failed cherry-picks from public

* revert changes for copy_pr_from_public

---------

Co-authored-by: mirror-bot <mirror-bot@users.noreply.github.com>
Co-authored-by: Benjamin Jaeger <benjamin@priorlabs.ai>
Co-authored-by: Benjamin Jaeger <jaeger.benjamin7@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants