Skip to content

Conversation

sycloid
Copy link
Contributor

@sycloid sycloid commented Sep 26, 2025

In order to resolve gh-2156 move definition of the mask_positions and _cumsum_1d functions to _tensor_accumulations_impl

Changed Python scripts accordingly, as well as CMake scripts to add implementation cpp file to the list of source files for the _tensor_accumulations_impl MODULE library.

Also moved find_package(Python) to find Module.Development component before pybind11 is being activated to resolve CMake warning.

Incidentally, this change also results in reduced binary size and improved compilation tiles, since accumulation kernels are not being generated in duplicates (once for _tensor_ctor module, and once for _tensor_accumulation_impl module).

This change resolves the segmentation fault for me locally using both OpenCL CPU and Level-Zero GPU devices.

Closes #2156

  • Have you provided a meaningful PR description?
  • Have you added a test, reproducer or referred to an issue with a reproducer?
  • Have you tested your changes locally for CPU and GPU devices?
  • Have you made sure that new changes do not introduce compiler warnings?
  • Have you checked performance impact of proposed changes?
  • Have you added documentation for your changes, if necessary?
  • Have you added your changes to the changelog?
  • If this PR is a work in progress, are you opening the PR as a draft?

@intel-python-devops
Copy link

Can one of the admins verify this patch?

@coveralls
Copy link
Collaborator

coveralls commented Sep 26, 2025

Coverage Status

coverage: 85.26% (-0.002%) from 85.262%
when pulling a2640ab on sycloid:move-all-accumulation-functions-to-one-module
into 878cc19 on IntelPython:master.

…sitions

and _cumsum_1d functions to _tensor_accumulations_impl

Changed Python scripts accordingly, as well as CMake scripts to
add implementation cpp file to the list of source files for the
_tensor_accumulations_impl MODULE library.

Also moved find_package(Python) to find Module.Development component
before pybind11 is being activated to resolve CMake warning.

Incidentally, this change also results in reduced binary size and improved
compilation tiles, since accumulation kernels are not being generated
in duplicates (once for _tensor_ctor module, and once for _tensor_accumulation_impl
module).
@sycloid sycloid force-pushed the move-all-accumulation-functions-to-one-module branch from 07d0243 to a2640ab Compare September 26, 2025 16:46
@ndgrigorian
Copy link
Collaborator

this did resolve the OS crash, as can be seen here:
https://github.com/IntelPython/dpctl/actions/runs/18040405632

@sycloid
Copy link
Contributor Author

sycloid commented Sep 29, 2025

The latest nightly build no longer exhibits the crash (see https://github.com/IntelPython/dpctl/actions/runs/18052500456), hence this change is no longer necessary.

I confirmed that the test suite run on CPU passes in master built using SYCL build

clang version 22.0.0git (https://github.com/intel/llvm 36363e606092e25deab5ba9d493f890c912b0462)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/sycloid/sycl-builds/test_llvm/build_sycl/bin
Build config: +assertions

@sycloid sycloid closed this Sep 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Crash in dpctl test suite run on opencl::cpu using SYCL nightly compiler and runtime
4 participants