Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors in CRC-32C accleration on Z #16560

Closed
Spencer-Comin opened this issue Jan 16, 2023 · 1 comment · Fixed by #16579
Closed

Errors in CRC-32C accleration on Z #16560

Spencer-Comin opened this issue Jan 16, 2023 · 1 comment · Fixed by #16579

Comments

@Spencer-Comin
Copy link
Contributor

While benchmarking (see benchmark below) the changes at #16379, I found a ~2.5% error rate in my 15B testcase. Out of 200 tests, I got 3 NullPointerExceptions and 2 segfaults. Looking into the coredumps produced @r30shah and I were able to isolate the cause of the segfaults (both segfaults failed identically). I've annotated the disassembly to illustrate the problem:

l         %r10, 8(%r1)                  ; Incoming checksum value is loaded into %r10
cgite     %r8, 0
sllg      %r2, %r8, 3 
lgr       %r3, %r2
xr        %r4, %r4
l         %r0, 4(%r2) 
llgfr     %r4, %r4
llgfr     %r0, %r0
la        %r8, 0x10(%r4, %r3)
sgrk      %r9, %r0, %r4
clgijl    %r9, 0x10, 0x3ff64b91c00      ; (1) Jump to OOL call to original Java impl if less than 16B of data
larl      %r14, 0x3ff64b91d80
vlm       %v9, %v14, 0x80(%r14)
vzero     %v0
vlvgf     %v0, %r10, 3
stg       %r10, 0xb0(%r5)               ; %r10 is spilled here
lg        %r10, 0xa8(%r5)               ; If jump at (1) is taken, this spilled is missed, and an incorrect value is left in %r10 

...                                     ; rest of CRC-32C acceleration impl

clgijlh   %r9, 0, 0x3ff64b91c00         ; Jump to OOL call to original Java impl if remaining data  
st        %r8, 8(%r1)  <<< ^+3678       ; OOL returns to here
l         %r8, 0x20c(%r10)              ; Segfault occurs dereferencing %r10 here

It looks like it should be a matter of fixing some register dependencies and internal control flow to prevent the problematic spill from happening where it does. Hopefully resolving the segfaults will also resolve the NullPointerExceptions as well.

Here is the benchmark I used:

import org.openjdk.jmh.annotations.*;

import java.util.zip.CRC32C;
import java.util.Random;

@Fork(value = 1)
@Warmup(iterations = 3)
@Measurement(iterations = 10)
public class CRC32CBenchmark {
    @State(Scope.Thread)
    public static class TestValues {
        public CRC32C crc;
        public byte[] array;

        @Param({"15", "48", "4096", "16384"})
        public int arraySize;

        @Setup(Level.Trial)
        public void setup() {
            crc = new CRC32C();
            Random rng = new Random(0xdeadbeef);

            array = new byte[arraySize];
            rng.nextBytes(array);
        }
    }

    @Benchmark
    public void updateCRC32C(TestValues tv) {
        tv.crc.update(tv.array);
    }
}
@Spencer-Comin
Copy link
Contributor Author

Here is the full jitdump from the segfault corresponding to the disassembly above: jitdump.20230113.134230.57555.0004.txt

Spencer-Comin added a commit to Spencer-Comin/openj9 that referenced this issue Jan 19, 2023
The first branch to OOL of the acceleration implementation and the
merge point from OOL were not included in the ICF, and the correct
dependencies were not attached to the merge point. This opened up the
possibility of register spills being inserted that would be missed if
the first branch to OOL is taken. This patch resolves this issue by
wrapping the acceleration implementation from the first branch to OOL
to the end in an outer ICF with the appropriate dependencies attached
at the end.

Fixes: eclipse-openj9#16560
Signed-off-by: Spencer Comin <spencer.comin@ibm.com>
Spencer-Comin added a commit to Spencer-Comin/openj9 that referenced this issue Jan 30, 2023
The first branch to OOL of the acceleration implementation and the
merge point from OOL were not included in the ICF, and the correct
dependencies were not attached to the merge point. This opened up the
possibility of register spills being inserted that would be missed if
the first branch to OOL is taken. This patch resolves this issue by
extending the ICF from the first branch to OOL to the merge point and
moving all the dependencies to the merge point.

Fixes: eclipse-openj9#16560
Signed-off-by: Spencer Comin <spencer.comin@ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant