[enhancement] enable array_api return values from `from_table` #2441

icfaust · 2025-04-17T18:55:27Z

Description

** Dependent on #2430 **

leaning into the functional style development for from_table, its is generalized to work with expected major frameworks (dpctl, dpnp, and various flavors of array_api arrays). This is done by changing the from_table API to depend not on kwargs sua_iface and sycl_queue but on a single keyword like, which can either be an array type or a callable which will return the expected array type. The returned array will preserve the aspects of the array (shape, type, row or column major contiguous, etc.) and in most circumstances the device. A major exception is sycl_usm cpu USM allocations, which require a manual memcpy to a CPU allocation.

Follow-on work must consistently apply the use of like as the roll-out of raw_inputs in #2153, as some estimators will force values to the return type and others not. As this is inconsistently applied, it must be fixed for any array_api-zero copy enabled estimator.

Changes:

Switched oneDAL table __sycl_usm_array_interface__ to throw an AttributeError if a queue is unavailable, rather than set to None. The attribute should only be available when a USM allocation is made, which is not the case for CPU data out of oneDAL. This check is placed early in the attribute code so that it can be used for checking for sycl usm conversion with has_attr
Created return_type_constructor function to generalize oneDAL table to array construction
Added and fixed documentation in _data_conversion.py
Modified functions to remove use of sua_iface and sycl_queue keywords to use like
Remove necessary imports of dpnp and dpctl/dpctl.tensor from this file through generalization in return_type_constructor
created a lazy imported function which collects the from_dlpack function from the related namespace if from an array_api_compat supported framework, this is the last option as it is an optional dependency. This is done instead of passing around the array namespace from sklearnex via get_namespace, as getting the namespace via sklearnex requires setting array_api_dispatch to function. This will make return values match for direct use of onedal estimators, independent of sklearnex.
Temporarily use xp in some cases to reduce output dimensions in some estimators (which shouldn't be done, as its sklearn conformance)
Introduce use of _as_numpy into spmd testing because attributes aren't guaranteed to be numpy in those cases any longer (in prep for full array_api support)

NOTES: The initial support version for from_table is not general enough and highly coded for sycl usm inputs. This changes the interface to mirror the type (and possibly queue) of the data. Getting this integrated will begin the fight against the technical debt introduced with use_raw_inputs. The core of this PR is simple, the changes needed associated with the technical debt will be brutal.

Incremental algorithms will first collect and store the return type generator function via the return_type_constructor function. This is similar to how the queue must be stored, which will be applied at the end at the finalization step. The serialization deserialization of this isn't possible, so it should be handled similarly as the queue.

If we want to type hint these changes, I need help. I'm not sure what to do for the specifics of stating an array type.

I don't think performance benchmarks are necessary, but can do upon request.

PR should start as a draft, then move to ready for review state after CI is passed and all applicable checkboxes are closed.
This approach ensures that reviewers don't spend extra time asking for regular requirements.

You can remove a checkbox as not applicable only if it doesn't relate to this PR in any way.
For example, PR with docs update doesn't require checkboxes for performance while PR with any change in actual code should have checkboxes and justify how this code change is expected to affect performance (or justification should be self-evident).

Checklist to comply with before moving PR from draft:

PR completeness and readability

I have reviewed my changes thoroughly before submitting this pull request.
I have commented my code, particularly in hard-to-understand areas.
I have updated the documentation to reflect the changes or created a separate PR with update and provided its number in the description, if necessary.
Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
I have added a respective label(s) to PR if I have a permission for that.
I have resolved any merge conflicts that might occur with the base branch.

Testing

I have run it locally and tested the changes extensively.
All CI jobs are green or I have provided justification why they aren't.
I have extended testing suite if new functionality was introduced in this PR.

icfaust · 2025-06-27T12:22:07Z

The situation is as follows. I didn't look thoroughly enough into the reasons behind array-api-compat and its inclusion into sklearn. Even if we fully support array_api inputs, it will not mean we will support pytorch out of the box. Pytorch support via array api comes via getting the namespace generating from array-api-compat, meaning manually searching for array_namespace won't work (and the reason why sklearn has get_namespace to compensate for that. Thus either we need to re-invent get_namespace in onedal, acquire array namespaces from sklearn and pass them around like we do for queues (not advised) having used sklearn's get_namespace or think of another way of doing it. The logic is insufficient for pytorch support.

I ended up doing a final catch for types with a lazy load of array_api_compat which may or may not be installed with sklearn. If someone tries to use sklearnex with other array types (like pytorch), then they will have installed compat anyway to interface with sklearn. If someone uses the oneDAL estimators directly, then they will need the warning for pytorch in the case that the sklearn install doesn't exist.

…scikit-learn-intelex into dev/refactor_from_table

icfaust · 2025-06-27T18:13:41Z

/intelci: run

icfaust · 2025-06-27T21:36:11Z

/intelci: run

Copilot

Pull Request Overview

This PR generalizes the from_table API to use a single like parameter (or callable) for array return types instead of separate sua_iface, sycl_queue, and xp arguments. It introduces a return_type_constructor helper, updates many call sites (Python and C++) to use like, and refactors SYCL-USM interface code to consistently throw on missing queues.

Introduce return_type_constructor and switch from_table to accept a like parameter or callable.
Replace all explicit sua_iface/sycl_queue/xp usage with like= in Python modules and tests.
Refactor C++ SYCL USM binding to throw AttributeError when no queue is available.

Reviewed Changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
sklearnex/tests/utils/base.py	Updated DummyEstimator to use `from_table(..., like=...)`.
sklearnex/spmd/linear_model/tests/*.py	Compare SPMD results via `_as_numpy` before asserting.
onedal/datatypes/_data_conversion.py	Added `return_type_constructor` and revamped `from_table`.
onedal/cluster/dbscan.py	Refactored to use `from_table(..., like=X)` and `xp.squeeze`.
onedal/datatypes/sycl_usm/data_conversion.cpp	Moved queue check up front and throw on missing USM queue.

Comments suppressed due to low confidence (4)

onedal/linear_model/incremental_linear_model.py:65

[nitpick] The attribute _outtype holds the return-type constructor function; consider renaming it to something like _return_type_fn or _return_type_constructor to make its purpose clearer.

        self._outtype = None

onedal/cluster/dbscan.py:81

The variable xp is used here without being initialized in this scope. You should call _get_sycl_namespace(X) at the start of fit to retrieve xp (and sua_iface) before using it.

        self.labels_ = xp.squeeze(from_table(result.responses, like=X))

onedal/datatypes/_data_conversion.py:73

[nitpick] Currently, return_type_constructor does not handle pandas DataFrame types or similar objects; consider adding a branch to detect and convert DataFrames (e.g., via a _convert_to_dataframe) or improving error handling when array_api_compat.get_namespace fails for unsupported types.

def return_type_constructor(array):

sklearnex/tests/utils/base.py:407

[nitpick] When y is None, using like=X may unintentionally shape the missing target array based on X. Consider explicitly passing like=None or handling None to fall back to a default (e.g. NumPy) conversion.

        self.y_attr_ = from_table(y_table, like=X if y is None else y)

icfaust · 2025-06-28T12:34:57Z

/intelci: run

icfaust · 2025-06-28T22:37:10Z

/intelci: run

icfaust added 2 commits April 17, 2025 04:27

startup

402f498

refactor from_table

f779aee

icfaust requested a review from Alexsandruss as a code owner April 17, 2025 18:55

icfaust added 15 commits April 17, 2025 23:55

Update _data_conversion.py

048cf58

Update _data_conversion.py

8039919

Update data_conversion.hpp

81b454f

Update _data_conversion.py

7b2c8a5

Update _data_conversion.py

4524be9

Update dbscan.py

6aec73a

Update dbscan.py

62a7966

Update _data_conversion.py

227d857

Update _data_conversion.py

06761b7

Update _data_conversion.py

044441a

Update base.py

44d2c99

Update covariance.py

e6b7532

Update linear_model.py

278054a

Update logistic_regression.py

9bcb63a

Update forest.py

3b1d96d

icfaust requested a review from ahuber21 as a code owner April 18, 2025 00:25

icfaust and others added 11 commits April 18, 2025 02:35

Update incremental_linear_model.py

9b35c20

Update incremental_linear_model.py

8b2f186

formatting

a9bd2cf

Update __init__.py

230a4ab

Update forest.py

29d3db8

Update covariance.py

0714ede

Update test_data.py

1053040

Update data_conversion.cpp

dc5ca8d

Update _data_conversion.py

8cc801b

Update data_conversion.cpp

9bfcb8a

Update data_conversion.cpp

8287874

fix output message

ebbcd90

icfaust requested a review from Copilot June 27, 2025 13:07

This comment was marked as outdated.

Sign in to view

icfaust added 2 commits June 27, 2025 17:04

Update logistic_regression.py

9db2f61

Update _data_conversion.py

ec34934

icfaust requested a review from Copilot June 27, 2025 15:06

This comment was marked as outdated.

Sign in to view

icfaust added 6 commits June 27, 2025 19:57

change to clearer internal functions

c7e6f4c

Merge branch 'dev/refactor_from_table' of https://github.com/icfaust/…

75dd0b1

…scikit-learn-intelex into dev/refactor_from_table

add copy=False

942ac95

this was not needed

cfd67d5

missing space

bcb4c31

remove some sua_iface definitions

c1ec922

icfaust marked this pull request as ready for review June 27, 2025 18:44

icfaust requested review from yuejiaointel, ethanglaser, razdoburdin, avolkov-intel and Vika-F as code owners June 27, 2025 18:44

updates to spmd failures

1d4fe45

icfaust requested a review from Copilot June 27, 2025 21:43

Copilot AI reviewed Jun 27, 2025

View reviewed changes

icfaust added 2 commits June 27, 2025 23:58

fixed bad logic

ff7e1eb

hopefully addresses some of this shape business

826f8d1

attempts at fixing spmd 2

afdb5fe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[enhancement] enable array_api return values from `from_table` #2441

[enhancement] enable array_api return values from `from_table` #2441

icfaust commented Apr 17, 2025 •

edited

Loading

Uh oh!

icfaust commented Jun 27, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

icfaust commented Jun 27, 2025

Uh oh!

icfaust commented Jun 27, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

icfaust commented Jun 28, 2025

Uh oh!

icfaust commented Jun 28, 2025

Uh oh!

Uh oh!

[enhancement] enable array_api return values from from_table #2441

Are you sure you want to change the base?

[enhancement] enable array_api return values from from_table #2441

Conversation

icfaust commented Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

icfaust commented Jun 27, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

icfaust commented Jun 27, 2025

Uh oh!

icfaust commented Jun 27, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

icfaust commented Jun 28, 2025

Uh oh!

icfaust commented Jun 28, 2025

Uh oh!

Uh oh!

[enhancement] enable array_api return values from `from_table` #2441

[enhancement] enable array_api return values from `from_table` #2441

icfaust commented Apr 17, 2025 •

edited

Loading