Skip to content

Conversation

@oscarkey
Copy link
Contributor

@oscarkey oscarkey commented Oct 17, 2025

And deprecate the disconnected n_jobs option. Set the default value
to 1.

Previously this option was just ignored, and 1 used in all cases. However, we've found that using higher values can be useful in some cases.

I add a new option rather than reconnecting the old option, because users might have set the old option to a large value, or -1, which can hurt performance. E.g. from some quick benchmarking on my macbook:

                                      duration_per_call_seconds
                                                     mean   std
train test cols  wrapper.n_jobs                                
500   10   10    -1                                  0.54  0.66
                  1                                  0.23  0.05
                  2                                  0.53  0.63
                  3                                  0.52  0.62
1000  10   10    -1                                  1.69  0.01
                  1                                  0.59  0.32
                  2                                  0.75  0.63
                  3                                  0.76  0.63

This way, users have to explicitly switch to the new option, and (hopefully) select a sensible value.

Also, Ruff format examples/kv_cache_fast_prediction.py.

ref RES-736

And deprecate the disconnected `n_jobs` option. Set the default value
to `1`.

Previously this option was just ignored, and `1` used in all cases.
However, we've found that using higher values can be useful in some
cases.
@oscarkey oscarkey requested a review from bejaeger October 17, 2025 11:42
@oscarkey oscarkey requested a review from a team as a code owner October 17, 2025 11:42
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new n_preprocessing_jobs parameter to replace the deprecated and non-functional n_jobs parameter in both the classifier and regressor. This is a well-reasoned change that improves clarity and prevents potential performance issues for users. The implementation is thorough, updating function signatures, docstrings, and call sites across the codebase. I've identified a few minor issues, mainly related to copy-paste errors in documentation and warning messages, and a naming inconsistency that could be improved for maintainability. Overall, this is a solid contribution.

@oscarkey oscarkey marked this pull request as draft October 17, 2025 12:41
@oscarkey oscarkey marked this pull request as ready for review October 17, 2025 13:20
@oscarkey
Copy link
Contributor Author

sorry Benjamin, this was a bit of a mess, but it should now be ready haha

Copy link
Contributor

@bejaeger bejaeger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! LGTM.

@oscarkey oscarkey enabled auto-merge (squash) October 20, 2025 09:59
@oscarkey oscarkey merged commit 033f179 into main Oct 20, 2025
10 checks passed
@oscarkey oscarkey deleted the ok-workers branch October 20, 2025 13:51
oscarkey added a commit that referenced this pull request Nov 12, 2025
…fier and regressor. (#192)

* Record copied public PR 555

* Add an `n_preprocessing_jobs` option to the classifier and regressor. (#555)

And deprecate the disconnected `n_jobs` option. Set the default value
to `1`.

Previously this option was just ignored, and `1` used in all cases. However, we've found that using higher values can be useful in some cases.

I add a new option rather than reconnecting the old option, because users might have set the old option to a large value, or -1, which can hurt performance. E.g. from some quick benchmarking on my macbook:
```
                                      duration_per_call_seconds
                                                     mean   std
train test cols  wrapper.n_jobs
500   10   10    -1                                  0.54  0.66
                  1                                  0.23  0.05
                  2                                  0.53  0.63
                  3                                  0.52  0.62
1000  10   10    -1                                  1.69  0.01
                  1                                  0.59  0.32
                  2                                  0.75  0.63
                  3                                  0.76  0.63
```
This way, users have to explicitly switch to the new option, and (hopefully) select a sensible value.

Also, Ruff format `examples/kv_cache_fast_prediction.py`.

(cherry picked from commit 033f179)

---------

Co-authored-by: mirror-bot <mirror-bot@users.noreply.github.com>
Co-authored-by: Oscar Key <oscar@priorlabs.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants