-
Notifications
You must be signed in to change notification settings - Fork 182
[enhancement] enable array_api return values from from_table
#2441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
I ended up doing a final catch for types with a lazy load of |
/intelci: run |
/intelci: run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR generalizes the from_table
API to use a single like
parameter (or callable) for array return types instead of separate sua_iface
, sycl_queue
, and xp
arguments. It introduces a return_type_constructor
helper, updates many call sites (Python and C++) to use like
, and refactors SYCL-USM interface code to consistently throw on missing queues.
- Introduce
return_type_constructor
and switchfrom_table
to accept alike
parameter or callable. - Replace all explicit
sua_iface
/sycl_queue
/xp
usage withlike=
in Python modules and tests. - Refactor C++ SYCL USM binding to throw
AttributeError
when no queue is available.
Reviewed Changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
sklearnex/tests/utils/base.py | Updated DummyEstimator to use from_table(..., like=...) . |
sklearnex/spmd/linear_model/tests/*.py | Compare SPMD results via _as_numpy before asserting. |
onedal/datatypes/_data_conversion.py | Added return_type_constructor and revamped from_table . |
onedal/cluster/dbscan.py | Refactored to use from_table(..., like=X) and xp.squeeze . |
onedal/datatypes/sycl_usm/data_conversion.cpp | Moved queue check up front and throw on missing USM queue. |
Comments suppressed due to low confidence (4)
onedal/linear_model/incremental_linear_model.py:65
- [nitpick] The attribute
_outtype
holds the return-type constructor function; consider renaming it to something like_return_type_fn
or_return_type_constructor
to make its purpose clearer.
self._outtype = None
onedal/cluster/dbscan.py:81
- The variable
xp
is used here without being initialized in this scope. You should call_get_sycl_namespace(X)
at the start offit
to retrievexp
(andsua_iface
) before using it.
self.labels_ = xp.squeeze(from_table(result.responses, like=X))
onedal/datatypes/_data_conversion.py:73
- [nitpick] Currently,
return_type_constructor
does not handle pandas DataFrame types or similar objects; consider adding a branch to detect and convert DataFrames (e.g., via a_convert_to_dataframe
) or improving error handling whenarray_api_compat.get_namespace
fails for unsupported types.
def return_type_constructor(array):
sklearnex/tests/utils/base.py:407
- [nitpick] When
y
isNone
, usinglike=X
may unintentionally shape the missing target array based onX
. Consider explicitly passinglike=None
or handlingNone
to fall back to a default (e.g. NumPy) conversion.
self.y_attr_ = from_table(y_table, like=X if y is None else y)
/intelci: run |
/intelci: run |
Description
** Dependent on #2430 **
leaning into the functional style development for
from_table
, its is generalized to work with expected major frameworks (dpctl, dpnp, and various flavors of array_api arrays). This is done by changing thefrom_table
API to depend not on kwargssua_iface
andsycl_queue
but on a single keywordlike
, which can either be an array type or a callable which will return the expected array type. The returned array will preserve the aspects of the array (shape, type, row or column major contiguous, etc.) and in most circumstances the device. A major exception is sycl_usm cpu USM allocations, which require a manual memcpy to a CPU allocation.Follow-on work must consistently apply the use of
like
as the roll-out of raw_inputs in #2153, as some estimators will force values to the return type and others not. As this is inconsistently applied, it must be fixed for any array_api-zero copy enabled estimator.Changes:
__sycl_usm_array_interface__
to throw an AttributeError if a queue is unavailable, rather than set to None. The attribute should only be available when a USM allocation is made, which is not the case for CPU data out of oneDAL. This check is placed early in the attribute code so that it can be used for checking for sycl usm conversion withhas_attr
return_type_constructor
function to generalize oneDAL table to array construction_data_conversion.py
sua_iface
andsycl_queue
keywords to uselike
return_type_constructor
from_dlpack
function from the related namespace if from anarray_api_compat
supported framework, this is the last option as it is an optional dependency. This is done instead of passing around the array namespace from sklearnex viaget_namespace
, as getting the namespace via sklearnex requires settingarray_api_dispatch
to function. This will make return values match for direct use of onedal estimators, independent of sklearnex.xp
in some cases to reduce output dimensions in some estimators (which shouldn't be done, as its sklearn conformance)_as_numpy
into spmd testing because attributes aren't guaranteed to be numpy in those cases any longer (in prep for full array_api support)NOTES: The initial support version for
from_table
is not general enough and highly coded for sycl usm inputs. This changes the interface to mirror the type (and possibly queue) of the data. Getting this integrated will begin the fight against the technical debt introduced withuse_raw_inputs
. The core of this PR is simple, the changes needed associated with the technical debt will be brutal.Incremental algorithms will first collect and store the return type generator function via the
return_type_constructor
function. This is similar to how the queue must be stored, which will be applied at the end at the finalization step. The serialization deserialization of this isn't possible, so it should be handled similarly as the queue.If we want to type hint these changes, I need help. I'm not sure what to do for the specifics of stating an array type.
I don't think performance benchmarks are necessary, but can do upon request.
PR should start as a draft, then move to ready for review state after CI is passed and all applicable checkboxes are closed.
This approach ensures that reviewers don't spend extra time asking for regular requirements.
You can remove a checkbox as not applicable only if it doesn't relate to this PR in any way.
For example, PR with docs update doesn't require checkboxes for performance while PR with any change in actual code should have checkboxes and justify how this code change is expected to affect performance (or justification should be self-evident).
Checklist to comply with before moving PR from draft:
PR completeness and readability
Testing