-
Notifications
You must be signed in to change notification settings - Fork 538
Add an n_preprocessing_jobs option to the classifier and regressor.
#555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
And deprecate the disconnected `n_jobs` option. Set the default value to `1`. Previously this option was just ignored, and `1` used in all cases. However, we've found that using higher values can be useful in some cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new n_preprocessing_jobs parameter to replace the deprecated and non-functional n_jobs parameter in both the classifier and regressor. This is a well-reasoned change that improves clarity and prevents potential performance issues for users. The implementation is thorough, updating function signatures, docstrings, and call sites across the codebase. I've identified a few minor issues, mainly related to copy-paste errors in documentation and warning messages, and a naming inconsistency that could be improved for maintainability. Overall, this is a solid contribution.
|
sorry Benjamin, this was a bit of a mess, but it should now be ready haha |
bejaeger
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! LGTM.
…fier and regressor. (#192) * Record copied public PR 555 * Add an `n_preprocessing_jobs` option to the classifier and regressor. (#555) And deprecate the disconnected `n_jobs` option. Set the default value to `1`. Previously this option was just ignored, and `1` used in all cases. However, we've found that using higher values can be useful in some cases. I add a new option rather than reconnecting the old option, because users might have set the old option to a large value, or -1, which can hurt performance. E.g. from some quick benchmarking on my macbook: ``` duration_per_call_seconds mean std train test cols wrapper.n_jobs 500 10 10 -1 0.54 0.66 1 0.23 0.05 2 0.53 0.63 3 0.52 0.62 1000 10 10 -1 1.69 0.01 1 0.59 0.32 2 0.75 0.63 3 0.76 0.63 ``` This way, users have to explicitly switch to the new option, and (hopefully) select a sensible value. Also, Ruff format `examples/kv_cache_fast_prediction.py`. (cherry picked from commit 033f179) --------- Co-authored-by: mirror-bot <mirror-bot@users.noreply.github.com> Co-authored-by: Oscar Key <oscar@priorlabs.ai>
And deprecate the disconnected
n_jobsoption. Set the default valueto
1.Previously this option was just ignored, and
1used in all cases. However, we've found that using higher values can be useful in some cases.I add a new option rather than reconnecting the old option, because users might have set the old option to a large value, or -1, which can hurt performance. E.g. from some quick benchmarking on my macbook:
This way, users have to explicitly switch to the new option, and (hopefully) select a sensible value.
Also, Ruff format
examples/kv_cache_fast_prediction.py.ref RES-736