Calculate Sum of Absolute Difference (SAD) by AVX-512.
Your machine should support C++17, AVX2 and AVX-512.
Before using this library, you should aware of something.
1.
The code is optimized by the following environments.
Motherboard: ASUS TUF X299 MARK 1
CPU: Intel(R) Core(TM) i7-7820X
Memory: 16 GiB DDR4 2400 MHz * 2
OS: Linux 4.17.8-1-ARCH #1 SMP PREEMPT
Compiler information:
clang version 6.0.1 (tags/RELEASE_601/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
It may not outperformance in your machine. You should modify the code if necessary.
Hyper-threading should be carefully used. When it is enable, the latency and throughput increase together. (Low latency and high throughput are good.)
2.
Due to the first reason, I put a lot of other approaches in intel_sad.cpp.
3.
Write a program to test correctness. Compilers may do wrong optimization.
I provide a check code in test. Compile the code by clang++ -march=native -std=c++17 ...
.
4.
Latency and throughput is a good metrics, although they cannot perfectly express execution time sometimes.
5.
Although parameters are __m128i, __m256i and __m512i, you can use reinterpret_cast to convert integral type to destination type.
However, when you use reinterpret_cast, you should check alignment first.