Use vector instructions for union2by2 #288

lemire · 2020-10-29T23:47:47Z

We have ARM assembly for union2by2 as per #287

It should be said that there are vectorized algorithm (hint: ARM NEON) for union routines. It appears in the C code (for SSE only but it is portable).

https://github.com/RoaringBitmap/CRoaring/blob/master/src/array_util.c#L1569-L1647

It is described in section 4.3 of the following paper:

Roaring Bitmaps: Implementation of an Optimized Software Library, Software: Practice and Experience 48 (4), 2018

It is actually not difficult to implement. For someone that is either fluent in ARM NEON or wants to learn... it is a great project.

jacksonrnewhouse · 2020-10-30T19:42:56Z

https://github.com/DLTcollab/sse2neon seems like a valuable reference

lemire · 2020-10-30T20:21:28Z

If someone is interested, I am fluent in the various SIMD dialects. Finding the intrinsics and the instructions is not so difficult. There are few of them to find anyhow.

The hard part is to put it all together.

jacksonrnewhouse · 2020-11-01T22:12:34Z

Unfortunately golang hasn't ported over UMIN and UMAX. I opened an issue on it, but will probably have to figure out the bitwise representation for now.

jacksonrnewhouse · 2020-11-09T01:08:56Z

@lemire, the intersection algorithms use _mm_cmpestrm to compare 2 registers of 8 shorts each, and there isn't an equivalent instruction in arm64, I believe. Have any suggestions? My thoughts are to do something like sse_merge, where you run CMEQ between the registers, then rotate 8 times using EXT, and pop_count the result, shifting right by 4 bits (each equal case sets 16 bits). Any suggestions on other approaches?

lemire · 2020-11-09T01:13:35Z

Though cmpestrm looks really good, it is an expensive instruction with many cycles of latency so the approach you suggest is more competitive than it appears.

lemire added help wanted performance labels Oct 29, 2020

This was referenced Oct 30, 2020

Union2by2 arm64 assembly #287

Merged

Vectorized union #289

Closed

jacksonrnewhouse mentioned this issue Nov 1, 2020

cmd/asm: ARM64 NEON unsigned min and max instructions (VUMAX, VUMIN) golang/go#42326

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use vector instructions for union2by2 #288

Use vector instructions for union2by2 #288

lemire commented Oct 29, 2020

jacksonrnewhouse commented Oct 30, 2020

lemire commented Oct 30, 2020

jacksonrnewhouse commented Nov 1, 2020

jacksonrnewhouse commented Nov 9, 2020

lemire commented Nov 9, 2020

Use vector instructions for union2by2 #288

Use vector instructions for union2by2 #288

Comments

lemire commented Oct 29, 2020

jacksonrnewhouse commented Oct 30, 2020

lemire commented Oct 30, 2020

jacksonrnewhouse commented Nov 1, 2020

jacksonrnewhouse commented Nov 9, 2020

lemire commented Nov 9, 2020