NEON64: enc: convert full encoding loop to inline assembly #98

aklomp · 2022-07-20T14:53:02Z

Convert the full encoding loop to an inline assembly implementation for compilers that can use inline assembly.

The motivation for this change is issue #96: when optimization is turned off on recent versions of clang, the encoding table is sometimes not loaded into sequential registers. This happens despite taking pains to ensure that the compiler uses an explicit set of registers for the load (v8-v11).

This leaves us with not much options beside rewriting the full encoding loop in inline assembly. Only that way can we be absolutely certain that the correct registers are used. Thankfully, aarch64 assembly is not very difficult to write by hand.

Fixes #96.
Closes #97.

The text was updated successfully, but these errors were encountered:

aklomp self-assigned this Jul 20, 2022

aklomp added the enhancement label Jul 20, 2022

aklomp mentioned this issue Jul 20, 2022

clang build fails with inline ASM on NEON64 (Apple M1) #96

Closed

aklomp closed this as completed in dd7a2b5 Jul 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NEON64: enc: convert full encoding loop to inline assembly #98

NEON64: enc: convert full encoding loop to inline assembly #98

aklomp commented Jul 20, 2022 •

edited

Loading

NEON64: enc: convert full encoding loop to inline assembly #98

NEON64: enc: convert full encoding loop to inline assembly #98

Comments

aklomp commented Jul 20, 2022 • edited Loading

aklomp commented Jul 20, 2022 •

edited

Loading