Add an `n_preprocessing_jobs` option to the classifier and regressor. #555

oscarkey · 2025-10-17T11:42:00Z

And deprecate the disconnected n_jobs option. Set the default value
to 1.

Previously this option was just ignored, and 1 used in all cases. However, we've found that using higher values can be useful in some cases.

I add a new option rather than reconnecting the old option, because users might have set the old option to a large value, or -1, which can hurt performance. E.g. from some quick benchmarking on my macbook:

                                      duration_per_call_seconds
                                                     mean   std
train test cols  wrapper.n_jobs                                
500   10   10    -1                                  0.54  0.66
                  1                                  0.23  0.05
                  2                                  0.53  0.63
                  3                                  0.52  0.62
1000  10   10    -1                                  1.69  0.01
                  1                                  0.59  0.32
                  2                                  0.75  0.63
                  3                                  0.76  0.63

This way, users have to explicitly switch to the new option, and (hopefully) select a sensible value.

Also, Ruff format examples/kv_cache_fast_prediction.py.

ref RES-736

And deprecate the disconnected `n_jobs` option. Set the default value to `1`. Previously this option was just ignored, and `1` used in all cases. However, we've found that using higher values can be useful in some cases.

gemini-code-assist

Code Review

This pull request introduces a new n_preprocessing_jobs parameter to replace the deprecated and non-functional n_jobs parameter in both the classifier and regressor. This is a well-reasoned change that improves clarity and prevents potential performance issues for users. The implementation is thorough, updating function signatures, docstrings, and call sites across the codebase. I've identified a few minor issues, mainly related to copy-paste errors in documentation and warning messages, and a naming inconsistency that could be improved for maintainability. Overall, this is a solid contribution.

src/tabpfn/classifier.py

CHANGELOG.md

src/tabpfn/classifier.py

src/tabpfn/preprocessing.py

src/tabpfn/regressor.py

oscarkey · 2025-10-17T13:20:47Z

sorry Benjamin, this was a bit of a mess, but it should now be ready haha

bejaeger

Nice! LGTM.

src/tabpfn/preprocessing.py

…fier and regressor. (#192) * Record copied public PR 555 * Add an `n_preprocessing_jobs` option to the classifier and regressor. (#555) And deprecate the disconnected `n_jobs` option. Set the default value to `1`. Previously this option was just ignored, and `1` used in all cases. However, we've found that using higher values can be useful in some cases. I add a new option rather than reconnecting the old option, because users might have set the old option to a large value, or -1, which can hurt performance. E.g. from some quick benchmarking on my macbook: ``` duration_per_call_seconds mean std train test cols wrapper.n_jobs 500 10 10 -1 0.54 0.66 1 0.23 0.05 2 0.53 0.63 3 0.52 0.62 1000 10 10 -1 1.69 0.01 1 0.59 0.32 2 0.75 0.63 3 0.76 0.63 ``` This way, users have to explicitly switch to the new option, and (hopefully) select a sensible value. Also, Ruff format `examples/kv_cache_fast_prediction.py`. (cherry picked from commit 033f179) --------- Co-authored-by: mirror-bot <mirror-bot@users.noreply.github.com> Co-authored-by: Oscar Key <oscar@priorlabs.ai>

oscarkey requested a review from bejaeger October 17, 2025 11:42

oscarkey requested a review from a team as a code owner October 17, 2025 11:42

gemini-code-assist bot reviewed Oct 17, 2025

View reviewed changes

oscarkey added 2 commits October 17, 2025 13:49

Fix lots of bugs.

f0a687c

And more bugs.

752056c

oscarkey marked this pull request as draft October 17, 2025 12:41

And more.

1eb4a0d

oscarkey marked this pull request as ready for review October 17, 2025 13:20

bejaeger approved these changes Oct 17, 2025

View reviewed changes

src/tabpfn/preprocessing.py Outdated Show resolved Hide resolved

oscarkey added 2 commits October 20, 2025 11:56

Rename more n_workers.

720a041

Add PR id to changelog.

231d89c

oscarkey enabled auto-merge (squash) October 20, 2025 09:59

oscarkey merged commit 033f179 into main Oct 20, 2025
10 checks passed

oscarkey deleted the ok-workers branch October 20, 2025 13:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add an `n_preprocessing_jobs` option to the classifier and regressor. #555

Add an `n_preprocessing_jobs` option to the classifier and regressor. #555

Uh oh!

oscarkey commented Oct 17, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

oscarkey commented Oct 17, 2025

Uh oh!

bejaeger left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add an n_preprocessing_jobs option to the classifier and regressor. #555

Add an n_preprocessing_jobs option to the classifier and regressor. #555

Uh oh!

Conversation

oscarkey commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

oscarkey commented Oct 17, 2025

Uh oh!

bejaeger left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add an `n_preprocessing_jobs` option to the classifier and regressor. #555

Add an `n_preprocessing_jobs` option to the classifier and regressor. #555

oscarkey commented Oct 17, 2025 •

edited

Loading