Enhanced Load Masking for Prefixes and Suffixes #29

ashvardanian · 2023-10-17T05:06:36Z

SimSIMD predominantly relies on unaligned loads for its operations. In instances where AVX-512 is utilized, masked loads are employed to bypass sequential operations on tail elements. However, a more expedient, albeit advanced, scheme can be explored. Under the assumption that any byte within a 64-byte cache line partaking in a vector implies the entire cache line is accessible, we can shift towards exclusively using aligned loads. This approach entails fetching the complete cache line with each load, inevitably conducting some superfluous operations but decidedly evading unaligned loads.

To circumvent potential complications with memory sanitizers, it's advised to incorporate the following attributes: __attribute__((no_sanitize_address)) and __attribute__((no_sanitize_thread)).

The text was updated successfully, but these errors were encountered:

ashvardanian mentioned this issue Oct 17, 2023

Enhanced Load Masking ashvardanian/StringZilla#55

Closed

ashvardanian closed this as completed Jan 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhanced Load Masking for Prefixes and Suffixes #29

Enhanced Load Masking for Prefixes and Suffixes #29

ashvardanian commented Oct 17, 2023

Enhanced Load Masking for Prefixes and Suffixes #29

Enhanced Load Masking for Prefixes and Suffixes #29

Comments

ashvardanian commented Oct 17, 2023