remove unaligned loads and stores on x86 #22

arvidn · 2017-11-04T12:04:21Z

It invokes undefined behavior regardless of the compiler back-end. more details in #21

pdimov · 2017-12-20T16:45:48Z

Instead of removing the #ifdefs altogether, why not use memcpy there?

arvidn · 2017-12-20T18:30:51Z

You mean as an optimisation of unrolled_byte_loops when no endian conversion is necessary?
I think such optimisation is orthogonal to the current special-case, since it should:

be applied based on the host endianness (not specific architecture)
be applied for big-endian types on big endian hosts

pdimov · 2017-12-20T18:32:26Z

I mean as a replacement for the current casts, as this would minimize the patch both in lines and in spirit.

pdimov · 2017-12-21T14:27:00Z

be applied based on the host endianness (not specific architecture)

It's not clear whether it will be an optimization. It's possible for the unrolled_byte_loops to be more efficient if memcpy is not an intrinsic but compiles to a function call. We'll have to check on godbolt.org.

For x86 I think that all major compilers recognize memcpy on 4/8 bytes.

pdimov · 2017-12-21T14:39:18Z

Here's a testbed: https://godbolt.org/g/MfP3Hc

pdimov · 2017-12-21T14:39:41Z

At least for MSVC, removing the special case notably degrades the code.

arvidn · 2017-12-21T14:52:04Z

I can put in memcpy there instead. It's tempting to depend on endian-macros instead of architecture macros though, and to use memcpy for the big endian case (on big endian machines).

It seems like a pretty safe assumption that memcpy is more efficient than a hand-written copy loop, in general. It's likely an easier optimisation to make.

pdimov · 2017-12-21T16:00:18Z

As tested on godbolt, all x86 compilers (a) benefit significantly from the special case and (b) generate the same code with the current one and with memcpy.

ARM however generates a function call to memcpy so I'm not sure that the special case should be applied there.

Looks like the safest choice is to keep the current #ifdefs and just replace the cast inside with memcpy.

…or regardless of the compiler back-end

pdimov · 2017-12-23T19:06:21Z

Test added in 62802fe. Fixed in e93f6a2.

arvidn force-pushed the no-unaligned-stores branch 2 times, most recently from 42026ab to 791ffad Compare December 21, 2017 22:39

remove unaligned loads and stores on x86. It invokes undefined behavi…

d918b6d

…or regardless of the compiler back-end

arvidn force-pushed the no-unaligned-stores branch from 791ffad to d918b6d Compare December 21, 2017 22:47

pdimov closed this Dec 23, 2017

arvidn deleted the no-unaligned-stores branch December 23, 2017 20:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

remove unaligned loads and stores on x86 #22

remove unaligned loads and stores on x86 #22

arvidn commented Nov 4, 2017

pdimov commented Dec 20, 2017

arvidn commented Dec 20, 2017

pdimov commented Dec 20, 2017

pdimov commented Dec 21, 2017

pdimov commented Dec 21, 2017

pdimov commented Dec 21, 2017

arvidn commented Dec 21, 2017

pdimov commented Dec 21, 2017

pdimov commented Dec 23, 2017

remove unaligned loads and stores on x86 #22

remove unaligned loads and stores on x86 #22

Conversation

arvidn commented Nov 4, 2017

pdimov commented Dec 20, 2017

arvidn commented Dec 20, 2017

pdimov commented Dec 20, 2017

pdimov commented Dec 21, 2017

pdimov commented Dec 21, 2017

pdimov commented Dec 21, 2017

arvidn commented Dec 21, 2017

pdimov commented Dec 21, 2017

pdimov commented Dec 23, 2017