Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix valgrind: Operate only in little endian #5

Merged
merged 1 commit into from Sep 13, 2017
Merged

Fix valgrind: Operate only in little endian #5

merged 1 commit into from Sep 13, 2017

Conversation

rsaxvc
Copy link
Contributor

@rsaxvc rsaxvc commented Apr 12, 2017

Previously, memcmp() switched the processor into
big-endian mode so that vectorized compares could
be done more efficiently. However, this makes
emulation quite difficult, and breaks valgrind.

Usually I would fix valgrind, but this was the
only place I could find setend being used.

This patch removes 2 SETEND instructions,
and adds 8 REV instructions if there is a
difference detected. Both SETEND and REV
should take 1 cycle on ARM11.

Previously, memcmp() switched the processor into
big-endian mode so that vectorized compares could
be done more efficiently. However, this makes
emulation quite difficult, and breaks valgrind.

Usually I would fix valgrind, but this was the
only place I could find setend being used.

This patch removes 2 SETEND instructions,
and adds 8 REV instructions if there is a
difference detected. Both SETEND and REV
should take 1 cycle on ARM11.
@rsaxvc rsaxvc changed the title Operate only in little endian Fix valgrind: Operate only in little endian Jun 3, 2017
@bavison
Copy link
Owner

bavison commented Sep 13, 2017

Thanks for this patch, though I had to think a bit to convince myself that it was valid to re-do all four CMPs again at the end in all cases (and I've updated the test harness to be doubly sure).

As you say, it should save 2 cycles on ARM11 in the no-difference-found case, and gain 10 when there is a difference, but that should be neither here nor there in the grand scheme of things. I've benchmarked it on ARM11, Cortex-A7 and Cortex-A53 and there's no statistically significant timing difference that I can detect (so no unexpected interactions with caches etc).

I'm aware that the SETEND trick I used also caused grief for people wanting to run the binaries under qemu, so they should appreciate this patch too.

@bavison bavison merged commit d0d77b9 into bavison:master Sep 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants