Libraries for explicit vectorization that might be usable for the Alpaka element layer #652

bussmann · 2018-09-24T12:42:18Z

Vectorization is still an open issue (Does it belong in Alpaka at all? How do we enforce it?).
I want to use this issue to create a list of libraries that might help.
Please extend to your own liking.

Update: I have started ordering the list to my own liking in terms of usability, sustainability, etc.
My current take is that VecCore from CERN uses Vc as a backend while xsimd is really an independent approach. Vc is also Helmholtz (Volker Lindenstruth) and thus has in principle long term support and it seems there's activity in including this into the C++ standard.

Vc: https://github.com/VcDevel/Vc
xsimd: https://github.com/QuantStack/xsimd
VecCore: https://github.com/root-project/veccore

These projects seem to be mostly one person efforts or not very active at all:
Inastemp: https://gitlab.mpcdf.mpg.de/bbramas/inastemp
boost.simd: https://github.com/NumScale/boost.simd
VCL: https://www.agner.org/optimize/#vectorclass
VCL KNC: https://github.com/mancoast/vclknc
QuickVec (a student project): https://www.andrew.cmu.edu/user/mkellogg/15-418/final.html#

ax3l · 2018-09-24T13:13:03Z

xsimd: https://github.com/QuantStack/xsimd

sbastrakov · 2018-09-24T15:44:31Z

Never tried that, but could be good (but currently not in boost): https://github.com/NumScale/boost.simd

j-stephan · 2021-01-26T13:07:13Z

We would like to get this into alpaka 0.7.0. However, this requires #38 to be resolved.

bernhardmgruber · 2021-01-26T17:23:47Z

One of the main issues with SIMD libraries and alpaka is that you want to write your kernel code using such SIMD facilities, have it nicely emit vector code for CPU targets, but also make it compile for GPUs as well. Using existing libraries, this is not trivial.

LLAMA contains such an approach using Vc in: alpaka-group/llama#128. The key idea here is that for GPU targets, the kernel code compiles down to a scalar version and does not use the SIMD library at all, because SIMD library functions are usually not annotated with __host__ or __device__, so they cannot be referenced when we compile for CUDA or HIP.

CERN's VecCore solves exactly that by offering a vector type that can also, at compile time, be switched between a Vc vector or a scalar, thus also keeping Vc out when compiling for CUDA. So VecCore could be a potential off-the-shelf solution.

We could also hand-roll our own small SIMD wrapper that either compiles to scalar, or a loop over a vector of elements, or use a SIMD library such as Vc. But I guess this is a significant effort.

As for the API design, it seems like some implementations are converging on the std::simd design, which you can find here: https://en.cppreference.com/w/cpp/experimental/simd/simd. For a detailed rational on the design, you can read Matthias Kretz's PhD thesis.

Also interesting, the Kokkos SIMD library uses exactly this approach as well: https://github.com/kokkos/simd-math Also see tutorial slides here: https://github.com/kokkos/kokkos-tutorials/blob/main/LectureSeries/KokkosTutorial_05_SIMDStreamsTasking.pdf.

Kokkos SIMD also shows the interaction with Kokkos Views, which seems like you declare your SIMD types already in your views: Kokkos::View<Kokkos::SIMD<float>>. But there are more interesting options for the SIMD ABI parameter, which I have not studied in detail yet. So we also need to consider how the SIMD types interact with memory views.

sbastrakov · 2021-01-27T12:41:44Z

Just to add to the list: https://github.com/google/highway

bernhardmgruber · 2022-11-03T18:43:51Z

Btw, I solved this recently in LLAMA. Here is the documentation: https://llama-doc.readthedocs.io/en/latest/pages/simd.html
I also presented it on my poster at ACAT22 last week: https://indico.cern.ch/event/1106990/contributions/4991311/attachments/2533306/4361386/LLAMA%20poster.pdf

fwyzard · 2022-11-05T14:44:25Z

Btw, Intel is working to propose a SIMD library based on xvec/simd into Boost: https://lists.boost.org/Archives/boost/2022/09/253579.php .

bernhardmgruber · 2022-11-06T01:16:49Z

IIUC, this is an implementation of std::simd by Intel. It's great to see more implementations appearing! And I am especially happy they try to get it into Boost. That is going to be a tough :)

Thanks for sharing the link!

bussmann added Type:Enhancement State:Help Wanted Type:Refactoring labels Sep 24, 2018

bussmann added this to the Future milestone Sep 24, 2018

BenjaminW3 removed this from the Future milestone May 26, 2019

j-stephan mentioned this issue Dec 17, 2020

Feature wish list #1232

Open

j-stephan added this to To do in Release 0.7 via automation Jan 26, 2021

j-stephan added this to the Version 0.7.0 milestone Jan 26, 2021

j-stephan added this to To do in Release 0.8 via automation May 11, 2021

j-stephan removed this from To do in Release 0.7 May 11, 2021

j-stephan modified the milestones: Version 0.7.0, Version 0.8.0 May 11, 2021

j-stephan modified the milestones: Version 0.8.0, Version 0.9.0 (I/2022) Nov 10, 2021

j-stephan removed this from To do in Release 0.8 Nov 10, 2021

j-stephan added this to To do in Release 0.9 via automation Nov 10, 2021

j-stephan removed this from To do in Release 0.9 Mar 29, 2022

j-stephan added this to To do in Release 1.0 via automation Mar 29, 2022

j-stephan removed this from the Version 0.9.0 (I/2022) milestone Mar 29, 2022

bernhardmgruber removed this from To do in Release 1.0 Dec 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Libraries for explicit vectorization that might be usable for the Alpaka element layer #652

Libraries for explicit vectorization that might be usable for the Alpaka element layer #652

bussmann commented Sep 24, 2018 •

edited

Loading

ax3l commented Sep 24, 2018

sbastrakov commented Sep 24, 2018

j-stephan commented Jan 26, 2021

bernhardmgruber commented Jan 26, 2021

sbastrakov commented Jan 27, 2021

bernhardmgruber commented Nov 3, 2022

fwyzard commented Nov 5, 2022

bernhardmgruber commented Nov 6, 2022

Libraries for explicit vectorization that might be usable for the Alpaka element layer #652

Libraries for explicit vectorization that might be usable for the Alpaka element layer #652

Comments

bussmann commented Sep 24, 2018 • edited Loading

ax3l commented Sep 24, 2018

sbastrakov commented Sep 24, 2018

j-stephan commented Jan 26, 2021

bernhardmgruber commented Jan 26, 2021

sbastrakov commented Jan 27, 2021

bernhardmgruber commented Nov 3, 2022

fwyzard commented Nov 5, 2022

bernhardmgruber commented Nov 6, 2022

bussmann commented Sep 24, 2018 •

edited

Loading