You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Vec<D, T> class typically uses an unaligned array T[D] to store its data. However, when UM2_ENABLE_SIMD_VEC is on, if D is a power of 2 and T is an arithmetic type, then GCC vector extensions are used as the underlying storage instead. This enables very nice SIMD optimizations on Vec. It also increases its alignment from sizeof(T) to D * sizeof(T). See https://godbolt.org/z/or73xrxbh.
However, in Vector<T> , we allocate memory to store T using (1) https://en.cppreference.com/w/cpp/memory/new/operator_new. It is unclear whether this memory will be appropriately aligned, since we do not explicitly request an alignment. Therefore, when using over-aligned types or GCC vector extensions we want to verify that the memory, access to the memory, and related pointers are appropriately aligned.
Failure to properly align will result in undefined behavior, reads that are incorrect, and likely segfaults.
Tasks related to this issue are:
When UM2_ENABLE_SIMD_VEC is off, ensure that T[D] is still aligned for types which map to SIMD vectors. Use something like
staticconstevalautoisPowerOf2(Int x) noexcept -> bool
{
return (x & (x - 1)) == 0;
};
template< Int D, classT>
staticconstevalautovecAlignment() noexcept -> Int
{
ifconstexpr (isPowerOf2(D) && std::is_arithmetic_v<T>) {
return D * sizeof(T);
} else {
returnalignof(T[D]);
}
};
template <Int D, classT>
classVec
{
using Data = typename VecData<D, T>::Data;
alignas(vecAlignment<D, T>()) Data _data;
...
};
Investigate usage of new and delete in Vector and ensure that all pointers use properly aligned memory for over-aligned types. It should be sufficient to check addressof(pointer) % alignof(T) == 0
A potential add-on task:
When T is not an arithmetic type, but the underlying representation still maps to a SIMD vector, investigate usage of that SIMD vector as the storage. Example: Vec<2, Vec<4, double>> can be stored as __m512. When UM2_ENABLE_SIMD_VEC is off and the storage is aligned, clang18 is able to perform optimizations like this, but gcc14 is not. Testing addition of two Vec<2, Vec<4, double>> shows a single 512-bit add for aligned array storage in clang18, but two 256-bit adds when using GCC vector extensions.
The text was updated successfully, but these errors were encountered:
The
Vec<D, T>
class typically uses an unaligned arrayT[D]
to store its data. However, whenUM2_ENABLE_SIMD_VEC
is on, ifD
is a power of 2 andT
is an arithmetic type, then GCC vector extensions are used as the underlying storage instead. This enables very nice SIMD optimizations onVec
. It also increases its alignment fromsizeof(T)
toD * sizeof(T)
. Seehttps://godbolt.org/z/or73xrxbh.
However, in
Vector<T>
, we allocate memory to storeT
using (1) https://en.cppreference.com/w/cpp/memory/new/operator_new. It is unclear whether this memory will be appropriately aligned, since we do not explicitly request an alignment. Therefore, when using over-aligned types or GCC vector extensions we want to verify that the memory, access to the memory, and related pointers are appropriately aligned.Failure to properly align will result in undefined behavior, reads that are incorrect, and likely segfaults.
Tasks related to this issue are:
UM2_ENABLE_SIMD_VEC
is off, ensure thatT[D]
is still aligned for types which map to SIMD vectors. Use something likenew
anddelete
inVector
and ensure that all pointers use properly aligned memory for over-aligned types. It should be sufficient to checkaddressof(pointer) % alignof(T) == 0
A potential add-on task:
T
is not an arithmetic type, but the underlying representation still maps to a SIMD vector, investigate usage of that SIMD vector as the storage. Example:Vec<2, Vec<4, double>>
can be stored as__m512
. WhenUM2_ENABLE_SIMD_VEC
is off and the storage is aligned, clang18 is able to perform optimizations like this, but gcc14 is not. Testing addition of twoVec<2, Vec<4, double>>
shows a single 512-bit add for aligned array storage in clang18, but two 256-bit adds when using GCC vector extensions.The text was updated successfully, but these errors were encountered: