Migrate `dpctl.tensor` into `dpnp.tensor` by vlad-perevezentsev · Pull Request #2856 · IntelPython/dpnp

vlad-perevezentsev · 2026-04-15T14:08:29Z

This PR migrates the tensor implementation from dpctl.tensor into dpnp.tensor making dpnp the primary owner of the Array API-compliant tensor layer

Major changes:

Move compiled C++/SYCL extensions (_tensor_impl, _tensor_elementwise_impl, _tensor_reductions_impl, _tensor_sorting_impl, _tensor_accumulation_impl, tensor linalg) into dpnp.tensor
Move usm_ndarray, compute-follows-data utilities and tensor tests from dpctl
Replace all dpctl.tensor references with dpnp.tensor in docstrings, error messages and comments
Remove redundant dpctl.tensor C-API interface
Add tensor.rst documentation page describing the module, its relationship to dpnp.ndarray and dpctl and linking to the dpctl 0.21.1 API reference

This simplifies maintenance, reduces cross-project dependencies and enables independent development and release cycles

Have you provided a meaningful PR description?
Have you added a test, reproducer or referred to an issue with a reproducer?
Have you tested your changes locally for CPU and GPU devices?
Have you made sure that new changes do not introduce compiler warnings?
Have you checked performance impact of proposed changes?
Have you added documentation for your changes, if necessary?
Have you added your changes to the changelog?

The PR adds a header file `dpnp4pybind11.hpp` which contains minimum necessary content to write pybind11 extensions and includes a caster for `usm_ndarray` and type enumerators. This PR also includes movement for a part of dpctl.tensor header which previously used in dpnp code. It is needed to get rid of include conflicts, since now repiques including new `dpnp4pybind11.hpp` header everywhere.

Merge master into include-dpctl-tensor

This PR proposes introducing `dpctl_ext` as a new internal extension module (temporarily renamed from `dpctl` to avoid conflicts), adding CMake/packaging support for building `_tensor_impl` via pybind11 and switching dpnp to use `dpctl_ext.tensor. _tensor_impl` The migrated `_tensor_impl` currently supports the following functions: > '_array_overlap', > '_as_c_contig', > '_as_f_contig', > '_contract_iter', > '_contract_iter2', > '_contract_iter3', > '_contract_iter4', > '_copy_usm_ndarray_into_usm_ndarray', > '_ravel_multi_index', > '_same_logical_tensors', > '_unravel_index', > 'default_device_bool_type', > 'default_device_complex_type', > 'default_device_fp_type', > 'default_device_index_type', > 'default_device_int_type', > 'default_device_uint_type' Files in `dpnp` that explicitly `import dpctl.tensor._tensor_impl`

This PR extends `_tensor_impl` in `dpctl_ext.tensor` with the remaining functions that are explicitly used in `dpnp` implementations (`_take`, `_full_usm_ndarray`, `_zeros_usm_ndarray`, `_triu`) enabling a complete switch to `dpctl_ext.tensor._tensor_impl` instead of `dpctl.tensor._tensor_impl` It also adds `take()`, `put()`, `full()`,`tril()` and `triu()` to `dpctl_ext.tensor` and updates the corresponding dpnp functions to use these implementations internally

This PR extends `_tensor_impl` in `dpctl_ext.tensor` with the copy functions (`_copy_usm_ndarray_for_reshape` , `_copy_numpy_ndarray_into_usm_ndarray`. `_copy_usm_ndarray_for_roll_1d`, `_copy_usm_ndarray_for_roll_nd`) It also adds `asnumpy(), astype(), copy(), from_numpy(), to_numpy(), roll(), and reshape()` to `dpctl_ext.tensor` and updates the corresponding dpnp functions to use these implementations internally

This PR extends `_tensor_impl` in `dpctl_ext.tensor` with the advanced indexing (`_extract, _place, _nonzero, mask_positions, `), repeat (`_cumsum_1d`) and `_eye` functions It also adds `eye(), extract(), nonzero(), place(), put_along_axis(), take_along_axis()` to `dpctl_ext.tensor` and updates the corresponding dpnp functions to use these implementations internally

This PR adds a small clean up to already porting dpctl.tensor code: * remove unused includes * add missing includes * remove redundant namespace qualifications when calling function from the same namespace

…2778) This PR extends `_tensor_impl` in `dpctl_ext.tensor` with the `_where, _clip` and repeat functions (`_repeat_by_sequence, _repeat_by_scalar`) It also adds `repeat(), where(), clip()` and `can_cast, finfo, iinfo, isdtype, result_type` from `_type_utils.py` `to dpctl_ext.tensor and updates the corresponding dpnp functions to use these implementations internally

This PR is the final one in the series of extending `_tensor_impl` extension It extends `_tensor_impl` in `dpctl_ext.tensor` with linear sequence functions (`_linspace_step and _linspace_affine`) Also this PR significantly expands Python API of `dpctl_ext.tensor` by adding all missing functions from `dpctl_ext.tensor._ctors` and `dpctl_ext.tensor._manipulation_functions` `_tensor_impl`: 45 / 45 functions Python API dpctl_ext.tensor: 70 / 233 functions

This PR completely moves `_tensor_accumulation_impl` pybind11 extension into `dpctl_ext.tensor` and extends `dpctl_ext.tensor` Python API with the functions `cumulative_logsumexp, cumulative_prod and cumulative_sum` reusing them in dpnp

This PR completely moves `_tensor_sorting_impl` pybind11 extension into `dpctl_ext.tensor` and extends dpctl_ext.tensor Python API with the functions `searchsorted isin, unique_all, unique_counts, unique_inverse, unique_values, argsort, sort and top_k ` reusing them in dpnp

This PR completely moves `_tensor_reductions_impl` pybind11 extension into `dpctl_ext.tensor` and extends dpctl_ext.tensor Python API with the functions: `all, any, diff, argmax, argmin, count_nonzero, logsumexp, max. min, prod, reduce_hypot and sum` reusing them in dpnp

The PR adds missing includes to tensor source and header files.

…#2795) This PR initializes `_tensor_elementwise_impl ` pybind11 extension in `dpctl_ext.tensor` and extends `dpctl_ext.tensor ` Python API with the part of unary functions : `abs, acos , acosh, angle. atan, atanh, bitwise_invert. ceil, conj` This is the first part of the work on migrating `_tensor_elementwise_impl` (unary)_

This PR extends `_tensor_elementwise_impl` with part of the unary functions: `cos, cosh, exp, expm1, floor, imag, isfinite, isinf, isnan, log, log1p, log2, log10, logical_not, negative, positive`

This PR extends `_tensor_elementwise_impl` with the remaining unary functions: `real, reciprocal, round, rsqrt, sign, signbit, sin, sinh, sqrt, square, tan, tanh, trunc`

This PR migrates the `_tensor_linalg_impl` extension to `dpctl_ext.tensor` and extends `dpctl_ext.tensor` Python API with `dpctl.tensor` functions `matmul`, `matrix_transpose`, `tensordot`, and `vecdot`

…r dpnp (#2803) This PR extends `_tensor_elementwise_impl` with part of binary functions : `add, atan2, bitwise_and, bitwise_left_shift, bitwise_or, bitwise_right_shift, bitwise_xor`

This PR extends `_tensor_elementwise_impl` with part of binary functions : `divide, equal, floor_divide, greater, greater_equal, hypot, less, less_equal, logaddexp`

This PR extends _tensor_elementwise_impl with the remaining binary functions : ` copysign, logical_and, logical_or, logical_xor, maximum, minimum, multiply, nextafter, not_equal, pow, remainder, subtract ` This is the last PR series of `_tensor_elementwise_impl` migration which fully migrates all elementwise functions to `dpctl_ext.tensor`

This PR extends `dpctl_ext.tensor` API with the remaining statistical and testing functions adding `std(), var(), mean(), allclose()`

This PR proposes to migrate the tensor interface (`usm_ndarray, dlpack, flags`) into `dpctl_ext/tensor` making `dpnp` independent of `dpctl's` tensor module. Updates: > - Introduce `dpctl_ext_capi.h` > - Implement a clean CMake interface library `DpctlExtCAPI` to properly propagate generated headers to consumers > - Update remaining imports from `dpctl.tensor` to `dpctl_ext.tensor` > - Link all backend extensions against `DpctlExtCAPI` to ensure consistent access to the C-API

This PR removes the unused external C-API from `dpctl_ext.tensor` and replaces function pointer calls with direct struct member access. Changes: 1. Remove all `cdef api` functions from `_usmarray.pyx` 2. Delete `dpctl_ext_capi.h` and `DpctlExtCAPI` CMake interface library 3. Update `dpnp4pybind11.hpp` to access `PyUSMArrayObject` members directly 4. Update build configuration

This PR proposes a refactoring that migrates `dpctl_ext.tensor` module into `dpnp` package as `dpnp.tensor` Changes: 1. Moved `dpctl_ext/tensor/` directory to `dpnp/tensor/` 2. Updated all imports from `dpctl_ext.tensor` to `dpnp.tensor` across the codebase 3. Consolidated build: removed dpctl_ext/CMakeLists.txt, added build_dpnp_tensor_ext() to dpnp/CMakeLists.txt 4. Added `DPNP_BUILD_COMPONENTS` CMake option (`ALL/TENSOR_ONLY/SKIP_TENSOR`) for staged builds 5. Split coverage workflow into two steps to avoid memory issues 6. Updated include paths in all backend extension CMake files 7. Removed `dpctl_ext/` directory and cleaned up `.gitignore`

This PR moves all tensor-related tests to `dpnp/tests/tensor` as part of the ongoing migration of tensor functionality from `dpctl` to `dpnp.tensor` Key changes: > - Relocated 89 tensor tests (elementwise functions, `usm_ndarray`, and tensor utilities) > - Updated imports to use `dpnp.tensor` > - Included tests in packaging configuration > - Integrated tensor tests into CI > - Fixed several issues discovered during migration (dtype expectations, boolean reductions, etc.) > - Fixed a circular import in _usmarray.py > - Added `SKIP_TENSOR_TESTS` env variable to manage the launch of the test scope In a follow-up PR: > - Conditional logic will be added to run dpctl_ext/tests only when changes affect the tensor code. > - Array API tests for tensor will be introduced and executed as a separate CI job.

This PR proposes to move the file `_compute_follows_data.pyx` from `dpctl.utils` to `dpnp.tensor` as part of the migration of `dpctl.tensor` to `dpnp.tensor` ### Changes >- **Moved file**: `dpctl/utils/_compute_follows_data.pyx` → `dpnp/tensor/_compute_follows_data.pyx` >- **Exports** (now available from `dpnp.tensor`): >>- `ExecutionPlacementError` - exception for execution placement errors >>- `get_execution_queue()` - determine execution queue from input arrays >>- `get_coerced_usm_type()` - determine output USM type for compute-follows-data >>- `validate_usm_type()` - validate USM type specifications

Add `__main__.py` for CLI options to get `libtensor` include dirs from module

There was a w/a implemented in scope of [dpctl#2275](IntelPython/dpctl#2275). Thus the PR enables the previously muted tests for `dpnp.cumlogsumexp`.

Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 7.0.0 to 7.0.1.

Bumps [peter-evans/create-pull-request](https://github.com/peter-evans/create-pull-request) from 8.1.0 to 8.1.1.

…eLists

Merge master into include-dpctl-tensor

) This PR proposes to replace `dpctl.tensor` with `dpnp.tensor` across the error messages. Add a Sphinx handler to redirect dpnp.tensor.* cross-references to dpctl 0.21.1 docs and `tensor.rst` page linking to the dpctl API reference

…2851) This PR proposes device-aware output dtype resolution for `dpnp.tensor.round()` with `boolean` input to handle devices that do not support `float16` Boolean support for round() was originally added in #2817 [6f5a792](6f5a792) to match NumPy behavior where numpy.round(bool) returns float16 rather than an integral type like int8. However on devices without fp16 support, returning float16 is not viable. The bool type mapping was removed from the round kernel and an acceptance function `_acceptance_fn_round` was added to ensure the fallback in `_find_buf_dtype` prefers floating-point output over integral types for boolean input Result : fp16 devices: round(bool) -> float16 non-fp16 devices: round(bool) -> float32

This PR proposes to fix test warnings in `dpnp.tensor` tests by replacing deprecated strides assignment with `np.lib.stride_tricks.as_strided` in `test_usm_ndarray_dlpack.py` and suppressing overflow warnings from np.allclose in `test_exp.py:test_exp_complex_contig`

github-actions · 2026-04-15T14:56:31Z

View rendered docs @ https://intelpython.github.io/dpnp/pull/2856/index.html

github-actions · 2026-04-15T15:32:39Z

Array API standard conformance tests for dpnp=0.20.0dev6=py313h509198e_56 ran successfully.
Passed: 1357
Failed: 3
Skipped: 16

antonwolfy · 2026-04-15T15:21:49Z

+        Returns a dictionary of default data types for ``device``.
+
+        Args:
+            device (Optional[:class:`dpctl.SyclDevice`, :class:`dpctl.SyclQueue`, :class:`dpctl.tensor.Device`, str]):


we don't have dpctl.tensor anymore

Or do you assume to update tensor docstrings separetely in the follow-up PR?

antonwolfy · 2026-04-15T15:37:43Z

+#endif
+
+// Include dpctl C-API headers (both declarations and import functions)
+#include "dpctl/_sycl_context.h"


At what step we are going to use dpctl4pybind11.hpp?

antonwolfy and others added 30 commits January 26, 2026 20:36

Merge master into include-dpctl-tensor

b69ffb3

Merge pull request #2776 from IntelPython/update_tensor_branch

464ddc1

Merge master into include-dpctl-tensor

Clean up dpctl.tensor code (#2797)

ecd4991

This PR adds a small clean up to already porting dpctl.tensor code: * remove unused includes * add missing includes * remove redundant namespace qualifications when calling function from the same namespace

Add missing includes (#2810)

0f6d63e

The PR adds missing includes to tensor source and header files.

Extend _tensor_elementwise_impl (unary) part 2 (#2796)

ce5f54e

This PR extends `_tensor_elementwise_impl` with part of the unary functions: `cos, cosh, exp, expm1, floor, imag, isfinite, isinf, isnan, log, log1p, log2, log10, logical_not, negative, positive`

Extend _tensor_elementwise_impl (unary) part 3 (#2801)

3a0c2ff

This PR extends `_tensor_elementwise_impl` with the remaining unary functions: `real, reciprocal, round, rsqrt, sign, signbit, sin, sinh, sqrt, square, tan, tanh, trunc`

add tensor linalg extension (#2799)

b0647db

This PR migrates the `_tensor_linalg_impl` extension to `dpctl_ext.tensor` and extends `dpctl_ext.tensor` Python API with `dpctl.tensor` functions `matmul`, `matrix_transpose`, `tensordot`, and `vecdot`

Extend _tensor_elementwise_impl with binary functions and use it fo…

5dcdd27

…r dpnp (#2803) This PR extends `_tensor_elementwise_impl` with part of binary functions : `add, atan2, bitwise_and, bitwise_left_shift, bitwise_or, bitwise_right_shift, bitwise_xor`

Extend _tensor_elementwise_impl with binary functions part 2 (#2804)

1bfa011

This PR extends `_tensor_elementwise_impl` with part of binary functions : `divide, equal, floor_divide, greater, greater_equal, hypot, less, less_equal, logaddexp`

Extend dpctl_ext.tensor with the remaining functions (#2806)

8d1c75b

This PR extends `dpctl_ext.tensor` API with the remaining statistical and testing functions adding `std(), var(), mean(), allclose()`

Merge master into include-dpctl-tensor

d30c357

Use ExecutionPlacementError from dpnp.exceptions in dpnp

d835f96

Apply clang-format

56a0af4

ndgrigorian and others added 18 commits April 10, 2026 09:55

add __main__.py

fbc4f43

add test for new CLI options

1679951

Merge pull request #2836 from IntelPython/add-main-py

761573c

Add `__main__.py` for CLI options to get `libtensor` include dirs from module

add __main__.py

8fb303d

add test for new CLI options

cf313c8

Remove the unnecessary inludes after the merge

c83f72b

Enable muted tests for dpnp.cumlogsumexp (#2842)

e89e3d6

There was a w/a implemented in scope of [dpctl#2275](IntelPython/dpctl#2275). Thus the PR enables the previously muted tests for `dpnp.cumlogsumexp`.

Bump actions/upload-artifact from 7.0.0 to 7.0.1 (#2847)

cd1f5ca

Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 7.0.0 to 7.0.1.

Bump peter-evans/create-pull-request from 8.1.0 to 8.1.1 (#2846)

eb94817

Bumps [peter-evans/create-pull-request](https://github.com/peter-evans/create-pull-request) from 8.1.0 to 8.1.1.

Remove redundant Dpctl_TENSOR_INCLUDE_DIR from backend extension CMak…

3a1db6f

…eLists

Update copyright year in __main__.py

6857a9f

Remove redundant include directories from lapack extension

cb1044d

Merge master into update-include-dpctl-tensor

cf94242

Merge branch 'include-dpctl-tensor' into update-include-dpctl-tensor

23da515

Merge pull request #2843 from IntelPython/update-include-dpctl-tensor

728ae42

Merge master into include-dpctl-tensor

vlad-perevezentsev self-assigned this Apr 15, 2026

vlad-perevezentsev requested review from antonwolfy and ndgrigorian as code owners April 15, 2026 14:08

antonwolfy mentioned this pull request Apr 15, 2026

Restructure dpctl, delegating all array functionality to dpnp #2743

Open

antonwolfy added this to the 0.20.0 release milestone Apr 15, 2026

antonwolfy reviewed Apr 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate `dpctl.tensor` into `dpnp.tensor`#2856

Migrate `dpctl.tensor` into `dpnp.tensor`#2856
vlad-perevezentsev wants to merge 48 commits intomasterfrom
include-dpctl-tensor

vlad-perevezentsev commented Apr 15, 2026

Uh oh!

github-actions bot commented Apr 15, 2026

Uh oh!

github-actions bot commented Apr 15, 2026

Uh oh!

antonwolfy Apr 15, 2026

Uh oh!

antonwolfy Apr 15, 2026

Uh oh!

antonwolfy Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

vlad-perevezentsev commented Apr 15, 2026

Uh oh!

github-actions bot commented Apr 15, 2026

Uh oh!

github-actions bot commented Apr 15, 2026

Uh oh!

antonwolfy Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

antonwolfy Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

antonwolfy Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants