Move accumulation functions mask_positions and cumsum_1d to tensor accumulation module #2157
+34
−24
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In order to resolve gh-2156 move definition of the
mask_positions
and_cumsum_1d
functions to_tensor_accumulations_impl
Changed Python scripts accordingly, as well as CMake scripts to add implementation cpp file to the list of source files for the
_tensor_accumulations_impl
MODULE library.Also moved
find_package(Python)
to findModule.Development
component beforepybind11
is being activated to resolve CMake warning.Incidentally, this change also results in reduced binary size and improved compilation tiles, since accumulation kernels are not being generated in duplicates (once for
_tensor_ctor
module, and once for_tensor_accumulation_impl
module).This change resolves the segmentation fault for me locally using both OpenCL CPU and Level-Zero GPU devices.
Closes #2156