Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests: Add alignment test #14

Merged
merged 1 commit into from
Aug 14, 2018
Merged

tests: Add alignment test #14

merged 1 commit into from
Aug 14, 2018

Conversation

dajohi
Copy link
Contributor

@dajohi dajohi commented Aug 14, 2018

No description provided.

@dajohi
Copy link
Contributor Author

dajohi commented Aug 14, 2018

This test fails on ARM currently

@dchest dchest merged commit 749fcbe into dchest:master Aug 14, 2018
@dchest
Copy link
Owner

dchest commented Aug 14, 2018

Thanks! Do you know how to fix it or should I look into it?

@dajohi
Copy link
Contributor Author

dajohi commented Aug 14, 2018

Apparently it works with the arm asm removed, and built with generic.

@dchest
Copy link
Owner

dchest commented Aug 14, 2018

Definitely. I'll try to figure out the problem, and if not, will just disable ARM asm until there's a solution.

@dajohi
Copy link
Contributor Author

dajohi commented Aug 14, 2018

Perhaps @rakitzis or @dgryski has an idea?

@davecgh
Copy link

davecgh commented Aug 14, 2018

We didn't dig into precisely where, but the problem is the hashloopunaligned branch of the arm assembler optimizations.

@davecgh
Copy link

davecgh commented Aug 14, 2018

Also, for another data point, if you modify that test to use d instead of d[1:] in the call to Hash and update data to remove the leading 00 (aka make it aligned), the test will pass.

@dchest
Copy link
Owner

dchest commented Aug 14, 2018

OK, thanks. I've verified that it indeed failed the test (on Raspberry Pi 3) and removed ARM assembly until someone sends a PR with a fixed version. (I've created https://github.com/dchest/siphash/tree/arm-asm branch that points to the last commit before removal and issue #15).

@dchest dchest mentioned this pull request Aug 14, 2018
@rakitzis
Copy link
Contributor

I'll take a look. Having a repro will definitely help unearth the issue.

@dchest
Copy link
Owner

dchest commented Aug 14, 2018

@rakitzis thank you!

@Kunde21
Copy link

Kunde21 commented Aug 14, 2018

@rakitzis Only have time for a quick readthrough, but I don't see an alignment check that jumps to hashloop128unaligned

@davecgh
Copy link

davecgh commented Aug 14, 2018

It's here:

	AND.S	$3,R10,R8
	BNE hashloopunaligned

In Go code, that is effectively:

if val&0x03 != 0 {
	hashloopunaligned()
}

EDIT: Oh you were talking about hashloop128unaligned. I missed the 128 when I read your comment.

@rakitzis
Copy link
Contributor

I'm sure the problem is more subtle but I need to sit with the code when I can stop thinking about other things for a little while... it's tricky to get into the right mindset!

@davecgh
Copy link

davecgh commented Aug 14, 2018

To clarify, I didn't mean to imply the root of the issue is there. I was responding to @Kunde21 about the check that jumps to the branch (although for the non 128-bit version). That definitely is not the root cause of the issue for 32-bit and 64-bit.

@Kunde21
Copy link

Kunde21 commented Aug 14, 2018

@davecgh That's present in Hash, but missing from Hash128.

@rakitzis
Copy link
Contributor

OK so that might be it, but it's best to sit with the test cases and make sure there is 100% coverage. Unaligned is considerably trickier to get right.

@rakitzis
Copy link
Contributor

I found the bug: MOVB vs. MOVBU. Byte loads on ARM are sign-extending.

Here is a patch:

diff --git a/blocks_arm.s b/blocks_arm.s
index 42dcd23..997eee4 100644
--- a/blocks_arm.s
+++ b/blocks_arm.s
@@ -115,19 +115,19 @@ blocksloop:
        MOVW    sav-8(SP),R10
        RET
 blocksunaligned:
-       MOVB    (R10),R12
-       MOVB    1(R10),R11
+       MOVBU   (R10),R12
+       MOVBU   1(R10),R11
        ORR     R11<<8,R12,R12
-       MOVB    2(R10),R11
+       MOVBU   2(R10),R11
        ORR     R11<<16,R12,R12
-       MOVB    3(R10),R11
+       MOVBU   3(R10),R11
        ORR     R11<<24,R12,R12
-       MOVB    4(R10),R14
-       MOVB    5(R10),R11
+       MOVBU   4(R10),R14
+       MOVBU   5(R10),R11
        ORR     R11<<8,R14,R14
-       MOVB    6(R10),R11
+       MOVBU   6(R10),R11
        ORR     R11<<16,R14,R14
-       MOVB    7(R10),R11
+       MOVBU   7(R10),R11
        ORR     R11<<24,R14,R14
        ADD     $8,R10,R10
        EOR     R12,R6,R6
diff --git a/hash_arm.s b/hash_arm.s
index ddad8f8..44c670d 100644
--- a/hash_arm.s
+++ b/hash_arm.s
@@ -96,19 +96,19 @@ hashloopunaligned:
        SUB     R10,R11,R11
        SUB.S   $8,R11
        BLO     hashend
-       MOVB    (R10),R12
-       MOVB    1(R10),R11
+       MOVBU   (R10),R12
+       MOVBU   1(R10),R11
        ORR     R11<<8,R12,R12
-       MOVB    2(R10),R11
+       MOVBU   2(R10),R11
        ORR     R11<<16,R12,R12
-       MOVB    3(R10),R11
+       MOVBU   3(R10),R11
        ORR     R11<<24,R12,R12
-       MOVB    4(R10),R14
-       MOVB    5(R10),R11
+       MOVBU   4(R10),R14
+       MOVBU   5(R10),R11
        ORR     R11<<8,R14,R14
-       MOVB    6(R10),R11
+       MOVBU   6(R10),R11
        ORR     R11<<16,R14,R14
-       MOVB    7(R10),R11
+       MOVBU   7(R10),R11
        ORR     R11<<24,R14,R14
        ADD     $8,R10,R10
        EOR     R12,R6,R6

@rakitzis
Copy link
Contributor

In truth I would like to make sure Hash128 works at all -- the lack of the branch to the unaligned code seems suspicious to me. On the other hand I have found that unaligned loads work on the ARM v7 CPU I have access to. Can someone with deep understanding of the ARM family offer an opinion as to whether special treatment of unaligned loads is even necessary here? Thanks.

@rakitzis
Copy link
Contributor

Incidentally the broken code passed the initial tests because the source bytes are in the range 0-63, no sign extension was ever exercised.

@rakitzis
Copy link
Contributor

Answering my own question: it seems armv5 (supported by Go) requires unaligned accesses to be synthesized in software.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants