-
Notifications
You must be signed in to change notification settings - Fork 182
[bugfix, enhancement] Address affinity bug by using threadpoolctl/joblib for n_jobs dispatching #2364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
daal4py/sklearn/_n_jobs_support.py
Outdated
|
||
try: | ||
if not self.n_jobs: | ||
n_jobs = cpu_count() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this later on get limited to the number of physical cores from oneDAL side?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll be honest, I'm not 100% sure yet. The default in the threading.h in daal will set it to the number of CPUs, but with the affinity I didn't spend the full time to track the default setting there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe @Alexsandruss could comment here on whether it'd end up limited to number of physical cores somewhere else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like setting the number of threads like this would not result in that number later on getting limited to the number of physical cores. How about passing argument only_physical_cores=True
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried adding a line to print this value here: https://github.com/uxlfoundation/oneDAL/blob/31cafec9950f1db352b639dafad5875971ca00fe/cpp/daal/src/threading/threading.cpp#L267
.. and from what I see, it is indeed set to the result of cpu_count(only_physical_cores=False)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although from some further testing, this behavior also appears to be the same in the current main branch.
Codecov ReportAttention: Patch coverage is
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
/intelci: run |
/azp run CI |
/intelci: run |
/intelci: run |
def get_num_threads(self): | ||
return num_threads() | ||
|
||
def set_num_threads(self, nthreads): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand this setting would apply globally, which could lead to race conditions if users call this in parallel, for example through some framework that would parallelize estimator calls.
Could it somehow get a mutex (or use atomic ops) either here or on the oneDAL side?
Also, would be better to add a warning that the setting is changed at a global level, so that a user would not try to call these inside multi-threaded code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually on a further look, it does already have a mutex on the daal side. Still better to document this behavior being global.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, will do!
@@ -55,9 +55,6 @@ | |||
sklearn_clone_dict, | |||
) | |||
|
|||
# to reproduce errors even in CI | |||
d4p.daalinit(nthreads=100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing this causes all sorts of memory leak check test failures, not just in windows and not just with pandas.
This last run is very useful/interesting. it shows that:
im very worried about our malloc routines in daal. Thoughts @david-cortes-intel @Vika-F |
Perhaps there could be memory leaks, but I'm not sure that we can conclude anything from these tests alone. I see that some are for cases where the input is from DPCTL - does it set flags like But what'd be even better is to run them through valgrind and asan to see if there are any reports of leaks coming specifically from oneDAL or sklearnex. We unfortunately get a lot of false positives from numpy, scipy, pybind11, and others that make it hard to browse logs; but true leaks should definitely show up there. |
Description
This addresses an issue related to linux thread control using affinity. New tests added which validated n_jobs when a reduced number of threads are available, set by the affinity. This requires 4 threads/CPU cores be made available to the main python pytest process in order to have proper functionality, meaning that this test does not run on azure pipelines runners, but only on intelci and github actions.
This change was needed in order to have scikit-learn-intelex properly work on thread-limited Kubernetes pods.
PR should start as a draft, then move to ready for review state after CI is passed and all applicable checkboxes are closed.
This approach ensures that reviewers don't spend extra time asking for regular requirements.
You can remove a checkbox as not applicable only if it doesn't relate to this PR in any way.
For example, PR with docs update doesn't require checkboxes for performance while PR with any change in actual code should have checkboxes and justify how this code change is expected to affect performance (or justification should be self-evident).
Checklist to comply with before moving PR from draft:
PR completeness and readability
Testing
Performance