Implement/check proper treatment of over-aligned Vec and Vector data #154

KyleVaughn · 2024-05-15T04:45:09Z

The Vec<D, T> class typically uses an unaligned array T[D] to store its data. However, when UM2_ENABLE_SIMD_VEC is on, if D is a power of 2 and T is an arithmetic type, then GCC vector extensions are used as the underlying storage instead. This enables very nice SIMD optimizations on Vec. It also increases its alignment from sizeof(T) to D * sizeof(T). See
https://godbolt.org/z/or73xrxbh.

However, in Vector<T> , we allocate memory to store T using (1) https://en.cppreference.com/w/cpp/memory/new/operator_new. It is unclear whether this memory will be appropriately aligned, since we do not explicitly request an alignment. Therefore, when using over-aligned types or GCC vector extensions we want to verify that the memory, access to the memory, and related pointers are appropriately aligned.

Failure to properly align will result in undefined behavior, reads that are incorrect, and likely segfaults.

Tasks related to this issue are:

When UM2_ENABLE_SIMD_VEC is off, ensure that T[D] is still aligned for types which map to SIMD vectors. Use something like

static consteval auto
isPowerOf2(Int x) noexcept -> bool
{
  return (x & (x - 1)) == 0;
};

template< Int D, class T>
static consteval auto
vecAlignment() noexcept -> Int
{
  if constexpr (isPowerOf2(D) && std::is_arithmetic_v<T>) {
    return D * sizeof(T);
  } else {
    return alignof(T[D]);
  }
};

template <Int D, class T>
class Vec
{

  using Data =  typename VecData<D, T>::Data;
  alignas(vecAlignment<D, T>()) Data _data;
...
};

Investigate usage of new and delete in Vector and ensure that all pointers use properly aligned memory for over-aligned types. It should be sufficient to check addressof(pointer) % alignof(T) == 0

A potential add-on task:

When T is not an arithmetic type, but the underlying representation still maps to a SIMD vector, investigate usage of that SIMD vector as the storage. Example: Vec<2, Vec<4, double>> can be stored as __m512. When UM2_ENABLE_SIMD_VEC is off and the storage is aligned, clang18 is able to perform optimizations like this, but gcc14 is not. Testing addition of two Vec<2, Vec<4, double>> shows a single 512-bit add for aligned array storage in clang18, but two 256-bit adds when using GCC vector extensions.

The text was updated successfully, but these errors were encountered:

KyleVaughn · 2024-06-19T20:18:16Z

Implemented in "format" branch, which will be merged into main in the next few days.

KyleVaughn added the priority: medium label May 15, 2024

KyleVaughn closed this as completed Jun 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement/check proper treatment of over-aligned Vec and Vector data #154

Implement/check proper treatment of over-aligned Vec and Vector data #154

KyleVaughn commented May 15, 2024 •

edited

Loading

KyleVaughn commented Jun 19, 2024

Implement/check proper treatment of over-aligned Vec and Vector data #154

Implement/check proper treatment of over-aligned Vec and Vector data #154

Comments

KyleVaughn commented May 15, 2024 • edited Loading

KyleVaughn commented Jun 19, 2024

KyleVaughn commented May 15, 2024 •

edited

Loading