Skip to content

Releases: uxlfoundation/oneDPL

oneDPL 2022.9.0 release

22 Jun 11:02
9d72f92
Compare
Choose a tag to compare

New Features

  • Added parallel range algorithms in namespace oneapi::dpl::ranges: fill, move, replace, replace_if,
    remove, remove_if, mismatch, minmax_element, min, max, find_first_of, find_end,
    is_sorted_until. These algorithms operate with C++20 random access ranges.
  • Improved performance of set operation algorithms when using device policies: set_union, set_difference,
    set_intersection, set_symmetric_difference.
  • Improved performance of copy, fill, for_each, replace, reverse, rotate, transform and 30+
    other algorithms with device policies on GPUs when using std::reverse_iterator.
  • Added ADL-based customization point is_onedpl_indirectly_device_accessible, which can be used to mark iterator
    types as indirectly device accessible. Added public trait oneapi::dpl::is_directly_device_accessible[_v] to
    query if types are indirectly device accessible.

Fixed Issues

  • Eliminated runtime exceptions encountered when compiling code that called inclusive_scan, copy_if,
    partition, unique, reduce_by_segment, and related algorithms with device policies using
    the open source oneAPI DPC++ Compiler without specifying an optimization flag.
  • Fixed a compilation error in reduce_by_segment regarding return type deduction when called with a device policy.
  • Eliminated multiple compile time warnings throughout the library.

Known Issues and Limitations

New in This Release

  • The set_intersection, set_difference, set_symmetric_difference, and set_union algorithms with a device policy
    require GPUs with double-precision support on Windows, regardless of the value type of the input sequences.

Existing Issues
See the oneDPL Guide for other restrictions and known limitations

  • Incorrect results may be observed when calling sort with a device policy on Intel® Arc™ graphics 140V with data
    sizes of 4-8 million elements.
  • histogram algorithm requires the output value type to be an integral type no larger than four bytes
    when used with a device policy on hardware that does not support 64-bit atomic operations.
  • histogram may provide incorrect results with device policies in a program built with -O0 option and the driver
    version is 2448.13 or older.
  • For transform_exclusive_scan and exclusive_scan to run in-place (that is, with the same data
    used for both input and destination) and with an execution policy of unseq or par_unseq,
    it is required that the provided input and destination iterators are equality comparable.
    Furthermore, the equality comparison of the input and destination iterator must evaluate to true.
    If these conditions are not met, the result of these algorithm calls is undefined.
  • Incorrect results may be produced by exclusive_scan, inclusive_scan, transform_exclusive_scan,
    transform_inclusive_scan, exclusive_scan_by_segment, inclusive_scan_by_segment, reduce_by_segment
    with unseq or par_unseq policy when compiled by Intel® oneAPI DPC++/C++ Compiler 2024.1 or earlier
    with -fiopenmp, -fiopenmp-simd, -qopenmp, -qopenmp-simd options on Linux.
    To avoid the issue, pass -fopenmp or -fopenmp-simd option instead.
  • With libstdc++ version 10, the compilation error SYCL kernel cannot use exceptions occurs
    when calling the range-based adjacent_find, is_sorted or is_sorted_until algorithms with device policies.
  • The range-based count_if may produce incorrect results on Intel® Data Center GPU Max Series when the driver version
    is "Rolling 2507.12" and newer.

oneDPL 2022.8.0 release

31 Mar 17:11
89d8d8b
Compare
Choose a tag to compare

New Features

  • Added support of host policies for histogram algorithms.
  • Added support for an undersized output range in the range-based merge algorithm.
  • Improved performance of the merge and sorting algorithms
    (sort, stable_sort, sort_by_key, stable_sort_by_key) that rely on Merge sort*,
    with device policies for large data sizes.
  • Improved performance of copy, fill, for_each, replace, reverse, rotate, transform and 30+
    other algorithms with device policies on GPUs.
  • Improved oneDPL use with SYCL implementations other than Intel oneAPI DPC++/C++ Compiler.

Fixed Issues

  • Fixed an issue with drop_view in the experimental range-based API.

  • Fixed compilation errors in find_if and find_if_not with device policies where the user provided predicate is
    device copyable but not trivially copyable.

  • Fixed incorrect results or synchronous SYCL exceptions for several algorithms when compiled with -O0 and executed
    on a GPU device.

  • Fixed an issue preventing inclusion of the <numeric> header after <execution> and <algorithm> headers.

  • Fixed several issues in the sort, stable_sort, sort_by_key and stable_sort_by_key algorithms that:

    • Allows the use of non-trivially-copyable comparators.
    • Eliminates duplicate kernel names.
    • Resolves incorrect results on devices with sub-group sizes smaller than four.
    • Resolved synchronization errors that were seen on Intel® Arc™ ** B-series GPU devices.

Known Issues and Limitations

New in This Release

  • Incorrect results may be observed when calling sort with a device policy on Intel® Arc™ graphics 140V with data
    sizes of 4-8 million elements.
  • sort, stable_sort, sort_by_key and stable_sort_by_key algorithms fail to compile
    when using Clang 17 and earlier versions, as well as compilers based on these versions,
    such as Intel oneAPI DPC++/C++ Compiler 2023.2.0.
  • When compiling code that uses device policies with the open source oneAPI DPC++ Compiler (clang++ driver),
    synchronous SYCL runtime exceptions regarding unfound kernels may be encountered unless an optimization flag is
    specified (for example -O1) as opposed to relying on the compiler's default optimization level.

Existing Issues
See oneDPL Guide for other restrictions and known limitations.

  • histogram algorithm requires the output value type to be an integral type no larger than four bytes
    when used with an FPGA policy.
  • histogram may provide incorrect results with device policies in a program built with -O0 option.
  • Compilation issues may be encountered when passing zip iterators to exclusive_scan_by_segment on Windows.
  • For transform_exclusive_scan and exclusive_scan to run in-place (that is, with the same data
    used for both input and destination) and with an execution policy of unseq or par_unseq,
    it is required that the provided input and destination iterators are equality comparable.
    Furthermore, the equality comparison of the input and destination iterator must evaluate to true.
    If these conditions are not met, the result of these algorithm calls is undefined.
  • Incorrect results may be produced by exclusive_scan, inclusive_scan, transform_exclusive_scan,
    transform_inclusive_scan, exclusive_scan_by_segment, inclusive_scan_by_segment, reduce_by_segment
    with unseq or par_unseq policy when compiled by Intel® oneAPI DPC++/C++ Compiler
    with -fiopenmp, -fiopenmp-simd, -qopenmp, -qopenmp-simd options on Linux.
    To avoid the issue, pass -fopenmp or -fopenmp-simd option instead.

*The sorting algorithms in oneDPL use Radix sort for arithmetic data types and
sycl::half (since oneDPL 2022.6) compared with std::less or std::greater, otherwise Merge sort.
**Intel, the Intel logo, and Arc are the trademarks of Intel Corporation or its subsidiaries.

oneDPL 2022.7.1 release

02 Dec 16:46
4d9921f
Compare
Choose a tag to compare

Fixed Issues

  • Fixed a build error for the oneapi::dpl::sort_by_key algorithm when multiple calls are made to the algorithm
    with identically typed parameter lists.

Known Issues and Limitations

Existing Issues

See oneDPL Guide for other restrictions and known limitations.

  • histogram may provide incorrect results with device policies in a program built with -O0 option.
  • Inclusion of <oneapi/dpl/dynamic_selection> prior to <oneapi/dpl/random> may result in compilation errors.
    Include <oneapi/dpl/random> first as a workaround.
  • Incorrect results may occur when using oneapi::dpl::experimental::philox_engine with no predefined template
    parameters and with word_size values other than 64 and 32.
  • Incorrect results or a synchronous SYCL exception may be observed with the following algorithms built
    with -O0 option and executed on a GPU device: exclusive_scan, inclusive_scan, transform_exclusive_scan,
    transform_inclusive_scan, copy_if, remove, remove_copy, remove_copy_if, remove_if,
    partition, partition_copy, stable_partition, unique, unique_copy, and sort.
  • The value type of the input sequence should be convertible to the type of the initial element for the following
    algorithms with device execution policies: transform_inclusive_scan, transform_exclusive_scan,
    inclusive_scan, and exclusive_scan.
  • The following algorithms with device execution policies may exceed the C++ standard requirements on the number
    of applications of user-provided predicates or equality operators: copy_if, remove, remove_copy,
    remove_copy_if, remove_if, partition_copy, unique, and unique_copy. In all cases,
    the predicate or equality operator is applied O(n) times.
  • The adjacent_find, all_of, any_of, equal, find, find_if, find_end, find_first_of,
    find_if_not, includes, is_heap, is_heap_until, is_sorted, is_sorted_until, mismatch,
    none_of, search, and search_n algorithms may cause a segmentation fault when used with a device execution
    policy on a CPU device, and built on Linux with Intel® oneAPI DPC++/C++ Compiler 2025.0.0 and -O0 -g compiler options.
  • histogram algorithm requires the output value type to be an integral type no larger than 4 bytes
    when used with an FPGA policy.
  • Compilation issues may be encountered when passing zip iterators to exclusive_scan_by_segment on Windows.
  • For transform_exclusive_scan and exclusive_scan to run in-place (that is, with the same data
    used for both input and destination) and with an execution policy of unseq or par_unseq,
    it is required that the provided input and destination iterators are equality comparable.
    Furthermore, the equality comparison of the input and destination iterator must evaluate to true.
    If these conditions are not met, the result of these algorithm calls is undefined.
  • sort, stable_sort, sort_by_key, stable_sort_by_key, partial_sort_copy algorithms
    may work incorrectly or cause a segmentation fault when used a device execution policy on a CPU device,
    and built on Linux with Intel® oneAPI DPC++/C++ Compiler and -O0 -g compiler options.
    To avoid the issue, pass -fsycl-device-code-split=per_kernel option to the compiler.
  • Incorrect results may be produced by exclusive_scan, inclusive_scan, transform_exclusive_scan,
    transform_inclusive_scan, exclusive_scan_by_segment, inclusive_scan_by_segment, reduce_by_segment
    with unseq or par_unseq policy when compiled by Intel® oneAPI DPC++/C++ Compiler
    with -fiopenmp, -fiopenmp-simd, -qopenmp, -qopenmp-simd options on Linux.
    To avoid the issue, pass -fopenmp or -fopenmp-simd option instead.
  • Incorrect results may be produced by reduce, reduce_by_segment, and transform_reduce
    with 64-bit data types when compiled by Intel® oneAPI DPC++/C++ Compiler versions 2021.3 and newer
    and executed on a GPU device. For a workaround, define the ONEDPL_WORKAROUND_FOR_IGPU_64BIT_REDUCTION
    macro to 1 before including oneDPL header files.
  • std::tuple, std::pair cannot be used with SYCL buffers to transfer data between host and device.
  • std::array cannot be swapped in DPC++ kernels with std::swap function or swap member function
    in the Microsoft* Visual C++ standard library.
  • The oneapi::dpl::experimental::ranges::reverse algorithm is not available with -fno-sycl-unnamed-lambda option.
  • STL algorithm functions (such as std::for_each) used in DPC++ kernels do not compile with the debug version of
    the Microsoft* Visual C++ standard library.

oneDPL 2022.7.0 release

15 Nov 21:13
Compare
Choose a tag to compare

New Features

  • Improved performance of the adjacent_find, all_of, any_of, copy_if, exclusive_scan, equal,
    find, find_if, find_end, find_first_of, find_if_not, inclusive_scan, includes,
    is_heap, is_heap_until, is_partitioned, is_sorted, is_sorted_until, lexicographical_compare,
    max_element, min_element, minmax_element, mismatch, none_of, partition, partition_copy,
    reduce, remove, remove_copy, remove_copy_if, remove_if, search, search_n,
    stable_partition, transform_exclusive_scan, transform_inclusive_scan, unique, and unique_copy
    algorithms with device policies.
  • Improved performance of sort, stable_sort and sort_by_key algorithms with device policies when using Merge
    sort 1 .
  • Added stable_sort_by_key algorithm in namespace oneapi::dpl.
  • Added parallel range algorithms in namespace oneapi::dpl::ranges: all_of, any_of,
    none_of, for_each, find, find_if, find_if_not, adjacent_find, search, search_n,
    transform, sort, stable_sort, is_sorted, merge, count, count_if, equal, copy,
    copy_if, min_element, max_element. These algorithms operate with C++20 random access ranges
    and views while also taking an execution policy similarly to other oneDPL algorithms.
  • Added support for operators ==, !=, << and >> for RNG engines and distributions.
  • Added experimental support for the Philox RNG engine in namespace oneapi::dpl::experimental.
  • Added the <oneapi/dpl/version> header containing oneDPL version macros and new feature testing macros.

Fixed Issues

  • Fixed unused variable and unused type warnings.
  • Fixed memory leaks when using sort and stable_sort algorithms with the oneTBB backend.
  • Fixed a build error for oneapi::dpl::begin and oneapi::dpl::end functions used with
    the Microsoft* Visual C++ standard library and with C++20.
  • Reordered template parameters of the histogram algorithm to match its function parameter order.
    For affected histogram calls we recommend to remove explicit specification of template parameters
    and instead add explicit type conversions of the function arguments as necessary.
  • gpu::esimd::radix_sort and gpu::esimd::radix_sort_by_key kernel templates now throw std::bad_alloc
    if they fail to allocate global memory.
  • Fixed a potential hang occurring with gpu::esimd::radix_sort and
    gpu::esimd::radix_sort_by_key kernel templates.
  • Fixed documentation for sort_by_key algorithm, which used to be mistakenly described as stable, despite being
    possibly unstable for some execution policies. If stability is required, use stable_sort_by_key instead.
  • Fixed an error when calling sort with device execution policies on CUDA devices.
  • Allow passing C++20 random access iterators to oneDPL algorithms.
  • Fixed issues caused by initialization of SYCL queues in the predefined device execution policies.
    These policies have been updated to be immutable (const) objects.

Known Issues and Limitations

New in This Release

  • histogram may provide incorrect results with device policies in a program built with -O0 option.
  • Inclusion of <oneapi/dpl/dynamic_selection> prior to <oneapi/dpl/random> may result in compilation errors.
    Include <oneapi/dpl/random> first as a workaround.
  • Incorrect results may occur when using oneapi::dpl::experimental::philox_engine with no predefined template
    parameters and with word_size values other than 64 and 32.
  • Incorrect results or a synchronous SYCL exception may be observed with the following algorithms built
    with -O0 option and executed on a GPU device: exclusive_scan, inclusive_scan, transform_exclusive_scan,
    transform_inclusive_scan, copy_if, remove, remove_copy, remove_copy_if, remove_if,
    partition, partition_copy, stable_partition, unique, unique_copy, and sort.
  • The value type of the input sequence should be convertible to the type of the initial element for the following
    algorithms with device execution policies: transform_inclusive_scan, transform_exclusive_scan,
    inclusive_scan, and exclusive_scan.
  • The following algorithms with device execution policies may exceed the C++ standard requirements on the number
    of applications of user-provided predicates or equality operators: copy_if, remove, remove_copy,
    remove_copy_if, remove_if, partition_copy, unique, and unique_copy. In all cases,
    the predicate or equality operator is applied O(n) times.
  • The adjacent_find, all_of, any_of, equal, find, find_if, find_end, find_first_of,
    find_if_not, includes, is_heap, is_heap_until, is_sorted, is_sorted_until, mismatch,
    none_of, search, and search_n algorithms may cause a segmentation fault when used with a device execution
    policy on a CPU device, and built on Linux with Intel® oneAPI DPC++/C++ Compiler 2025.0.0 and -O0 -g compiler options.

Existing Issues
See oneDPL Guide for other restrictions and known limitations.

  • histogram algorithm requires the output value type to be an integral type no larger than 4 bytes
    when used with an FPGA policy.
  • Compilation issues may be encountered when passing zip iterators to exclusive_scan_by_segment on Windows.
  • For transform_exclusive_scan and exclusive_scan to run in-place (that is, with the same data
    used for both input and destination) and with an execution policy of unseq or par_unseq,
    it is required that the provided input and destination iterators are equality comparable.
    Furthermore, the equality comparison of the input and destination iterator must evaluate to true.
    If these conditions are not met, the result of these algorithm calls is undefined.
  • sort, stable_sort, sort_by_key, stable_sort_by_key, partial_sort_copy algorithms
    may work incorrectly or cause a segmentation fault when used a device execution policy on a CPU device,
    and built on Linux with Intel® oneAPI DPC++/C++ Compiler and -O0 -g compiler options.
    To avoid the issue, pass -fsycl-device-code-split=per_kernel option to the compiler.
  • Incorrect results may be produced by exclusive_scan, inclusive_scan, transform_exclusive_scan,
    transform_inclusive_scan, exclusive_scan_by_segment, inclusive_scan_by_segment, reduce_by_segment
    with unseq or par_unseq policy when compiled by Intel® oneAPI DPC++/C++ Compiler
    with -fiopenmp, -fiopenmp-simd, -qopenmp, -qopenmp-simd options on Linux.
    To avoid the issue, pass -fopenmp or -fopenmp-simd option instead.
  • Incorrect results may be produced by reduce, reduce_by_segment, and transform_reduce
    with 64-bit data types when compiled by Intel® oneAPI DPC++/C++ Compiler versions 2021.3 and newer
    and executed on a GPU device. For a workaround, define the ONEDPL_WORKAROUND_FOR_IGPU_64BIT_REDUCTION
    macro to 1 before including oneDPL header files.
  • std::tuple, std::pair cannot be used with SYCL buffers to transfer data between host and device.
  • std::array cannot be swapped in DPC++ kernels with std::swap function or swap member function
    in the Microsoft* Visual C++ standard library.
  • The oneapi::dpl::experimental::ranges::reverse algorithm is not available with -fno-sycl-unnamed-lambda option.
  • STL algorithm functions (such as std::for_each) used in DPC++ kernels do not compile with the debug version of
    the Microsoft* Visual C++ standard library.
  1. sorting algorithms in oneDPL use Radix sort for arithmetic data types and
    sycl::half (since oneDPL 2022.6) compared with std::less or std::greater, otherwise Merge sort.

oneDPL 2022.6.0 release

27 Jun 15:46
456bbe5
Compare
Choose a tag to compare

News

  • oneAPI DPC++ Library Manual Migration Guide to simplify the migration of Thrust* and CUB* APIs from CUDA*.
  • radix_sort and radix_sort_by_key kernel templates were moved into oneapi::dpl::experimental::kt::gpu::esimd namespace. The former oneapi::dpl::experimental::kt::esimd namespace is deprecated and will be removed in a future release.
  • The for_loop, for_loop_strided, for_loop_n, for_loop_n_strided algorithms in namespace oneapi::dpl::experimental are enforced to fail with device execution policies.

New Features

  • Added experimental inclusive_scan kernel template algorithm residing in the oneapi::dpl::experimental::kt::gpu namespace.
  • radix_sort and radix_sort_by_key kernel templates are extended with overloads for out-of-place sorting.
    These overloads preserve the input sequence and sort data into the user provided output sequence.
  • Improved performance of the reduce, min_element, max_element, minmax_element, is_partitioned,
    lexicographical_compare, binary_search, lower_bound, and upper_bound algorithms with device policies.
  • sort, stable_sort, sort_by_key algorithms now use Radix sort for sorting sycl::half elements compared with std::less or std::greater.

Fixed Issues

  • Fixed compilation errors when using reduce, min_element, max_element, minmax_element, is_partitioned, and lexicographical_compare with Intel oneAPI DPC++/C++ compiler 2023.0 and earlier.
  • Fixed possible data races in the following algorithms used with device execution policies:
    remove_if, unique, inplace_merge, stable_partition, partial_sort_copy, rotate.
  • Fixed excessive copying of data in std::vector allocated with a USM allocator for standard library implementations which have allocator information in the std::vector::iterator type.
  • Fixed an issue where checking std::is_default_constructible for transform_iterator with a functor that is not default-constructible could cause a build error or an incorrect result.
  • Fixed handling of sycl device copyable_ for internal and public oneDPL types.
  • Fixed handling of std::reverse_iterator as input to oneDPL algorithms using a device policy.
  • Fixed set_intersection to always copy from the first input sequence to the output, where previously some calls would copy from the second input sequence.
  • Fixed compilation errors when using oneapi::dpl::zip_iterator with the oneTBB backend and C++20.

New Known Issues and Limitations

  • histogram algorithm requires the output value type to be an integral type no larger than 4 bytes when used with an FPGA policy.

oneDPL 2022.5.0 release

09 Apr 16:24
4cdc990
Compare
Choose a tag to compare

New Features

  • Added new histogram algorithms for generating a histogram from an input sequence into an output sequence representing either equally spaced or user-defined bins. These algorithms are currently only available for device execution policies.
  • Supported zip_iterator for transform algorithm.

Fixed Issues

  • Fixed handling of permutation_iterator as input to oneDPL algorithms for a variety of source iterator and permutation types which caused issues.
  • Fixed zip_iterator to be sycl device copyable for trivially copyable source iterator types.
  • Added a workaround for reduction algorithm failures with 64-bit data types. Define the ONEDPL_WORKAROUND_FOR_IGPU_64BIT_REDUCTION macro to 1 before including oneDPL header files.

New Known Issues and Limitations

  • Crashes or incorrect results may occur when using oneapi::dpl::reverse_iterator or std::reverse_iterator as input to oneDPL algorithms with device execution policies.

oneDPL 2022.4.0 release

05 Mar 13:33
8a48f19
Compare
Choose a tag to compare

New Features

  • Added experimental radix_sort and radix_sort_by_key algorithms residing in
    the oneapi::dpl::experimental::kt::esimd namespace. These algorithms are first
    in the family of kernel templates that allow configuring a variety of parameters
    including the number of elements to process by a work item, and the size of a workgroup.
    The algorithms only work with Intel® Data Center GPU Max Series.
  • Added new transform_if algorithm for applying a transform function conditionally
    based on a predicate, with overloads provided for one and two input sequences
    that use correspondingly unary and binary operations and predicates.
  • Optimizations used with Intel® oneAPI DPC++/C++ Compiler are expanded to the open source oneAPI DPC++ compiler.

New Known Issues and Limitations

  • esimd::radix_sort and esimd::radix_sort_by_key kernel templates fail to compile when a program
    is built with -g, -O0, -O1 compiler options.
  • esimd::radix_sort_by_key kernel template produces wrong results with the following combinations
    of kernel_param and types of keys and values:
    • sizeof(key_type) + sizeof(val_type) == 12, kernel_param::workgroup_size == 64, and kernel_param::data_per_workitem == 96
    • sizeof(key_type) + sizeof(val_type) == 16, kernel_param::workgroup_size == 64, and kernel_param::data_per_workitem == 64

oneDPL 2022.3.0 release

22 Nov 12:26
180f18a
Compare
Choose a tag to compare

New Features

  • Added an experimental feature to dynamically select an execution context, e.g., a SYCL queue.
    The feature provides selection functions such as select, submit and submit_and_wait,
    and several selection policies: fixed_resource_policy, round_robin_policy,
    dynamic_load_policy, and auto_tune_policy.
  • unseq and par_unseq policies now enable vectorization also for Intel® oneAPI DPC++/C++ Compiler.
  • Added support for passing zip iterators as segment value data in reduce_by_segment,
    exclusive_scan_by_segment, and inclusive_scan_by_segment.
  • Improved performance of the merge, sort, stable_sort, sort_by_key,
    reduce, min_element, max_element, minmax_element, is_partitioned, and
    lexicographical_compare algorithms with DPC++ execution policies.

Fixed Issues

  • Fixed the reduce_async function to not ignore the provided binary operation.

New Known Issues and Limitations

  • When compiled with -fsycl-pstl-offload option of Intel® oneAPI DPC++/C++ compiler and with
    libstdc++ version 8 or libc++, oneapi::dpl::execution::par_unseq offloads
    standard parallel algorithms to the SYCL device similarly to std::execution::par_unseq
    in accordance with the -fsycl-pstl-offload option value.
  • When using the dpl modulefile to initialize the user's environment and compiling with -fsycl-pstl-offload
    option of Intel® oneAPI DPC++/C++ compiler, a linking issue or program crash may be encountered due to the directory
    containing libpstloffload.so not being included in the search path. Use the env/vars.sh to configure the working
    environment to avoid the issue.
  • Compilation issues may be encountered when passing zip iterators to exclusive_scan_by_segment on Windows.
  • Incorrect results may be produced by set_intersection with a DPC++ execution policy,
    where elements are copied from the second input range rather than the first input range.
  • For transform_exclusive_scan and exclusive_scan to run in-place (that is, with the same data
    used for both input and destination) and with an execution policy of unseq or par_unseq,
    it is required that the provided input and destination iterators are equality comparable.
    Furthermore, the equality comparison of the input and destination iterator must evaluate to true.
    If these conditions are not met, the result of these algorithm calls is undefined.
  • sort, stable_sort, sort_by_key, partial_sort_copy algorithms may work incorrectly or cause
    a segmentation fault when used a DPC++ execution policy for CPU device, and built
    on Linux with Intel® oneAPI DPC++/C++ Compiler and -O0 -g compiler options.
    To avoid the issue, pass -fsycl-device-code-split=per_kernel option to the compiler.
  • Incorrect results may be produced by exclusive_scan, inclusive_scan, transform_exclusive_scan,
    transform_inclusive_scan, exclusive_scan_by_segment, inclusive_scan_by_segment, reduce_by_segment
    with unseq or par_unseq policy when compiled by Intel® oneAPI DPC++/C++ Compiler
    with -fiopenmp, -fiopenmp-simd, -qopenmp, -qopenmp-simd options on Linux.
    To avoid the issue, pass -fopenmp or -fopenmp-simd option instead.
  • Incorrect results may be produced by reduce and transform_reduce with 64-bit types and std::multiplies,
    sycl::multiplies operations when compiled by Intel® C++ Compiler 2021.3 and newer and executed on GPU devices.

oneDPL 2022.2.0 release

25 Jul 17:57
c697fac
Compare
Choose a tag to compare

New Features

  • Added sort_by_key algorithm for key-value sorting.
  • Improved performance of the reduce, min_element, max_element, minmax_element,
    is_partitioned, and lexicographical_compare algorithms with DPC++ execution policies.
  • Improved performance of the reduce_by_segment, inclusive_scan_by_segment, and
    exclusive_scan_by_segment algorithms for binary operators with known identities
    when using DPC++ execution policies.
  • Added value_type to all views in oneapi::dpl::experimental::ranges.
  • Extended oneapi::dpl::experimental::ranges::sort to support projections applied to the range elements prior to comparison.

Fixed Issues

  • The minimally required CMake version is raised to 3.11 on Linux and 3.20 on Windows.
  • Added new CMake package oneDPLIntelLLVMConfig.cmake to resolve issues using CMake 3.20+ on Windows for icx and icx-cl.
  • Fixed an error in the sort and stable_sort algorithms when performing a descending sort
    on signed numeric types with negative values.
  • Fixed an error in reduce_by_segment algorithm when a non-commutative predicate is used.
  • Fixed an error in sort and stable_sort algorithms for integral types wider than 4 bytes.
  • Fixed an error for some compilers where OpenMP or SYCL backend was selected by CMake scripts without full compiler support.

New Known Issues and Limitations

  • Incorrect results may be produced with in-place scans using unseq and par_unseq policies on
    CPUs with the Intel® C++ Compiler 2021.8.

This release also includes the following changes from oneDPL 2022.1.1

New Features

  • Improved sort algorithm performance for the arithmetic data types with std::less or std::greater comparison operator and DPC++ policy.

Fixes Issues

  • Fixed an error that caused segmentation faults in transform_reduce, minmax_element, and related algorithms when ran on CPU devices.
  • Fixed a compilation error in transform_reduce, minmax_element, and related algorithms on FPGAs.
  • Fixed permutation_iterator to support C-style array as a permutation map.
  • Fixed a radix-sort issue with 64-bit signed integer types.

oneDPL 2022.1.0 release

25 Apr 15:08
Compare
Choose a tag to compare

New Features

  • Added generate, generate_n, transform algorithms to Tested Standard C++ API.
  • Improved performance of inclusive_scan, exclusive_scan, reduce and
    max_element algorithms with DPC++ execution policies.

Fixed Issues

  • Added a workaround for the TBB headers not found issue occurring with libstdc++ version 9 when
    oneTBB headers are not present in the environment. The workaround requires inclusion of the oneDPL headers before the libstdc++ headers.
  • When possible, oneDPL CMake scripts now enforce C++17 as the minimally required language version. Inspired by Daniel Simon (#739).
  • Fixed an error in the exclusive_scan algorithm when the output iterator is equal to the
    input iterator (in-place scan).