-
Notifications
You must be signed in to change notification settings - Fork 714
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Panama MemoryAccessAPI: Add jit intrinsic support for vectorizedMismatch #15204
Comments
@gita-omr Please take a look at this |
It is fairly similar to arrayCopy, except not moving memory content but comparing them. Did it require:
If only true/false return is required, we can short-circuit it by comparing their length(s), cannot we? From JIT perspective, i think it can come through like arrayCopy. |
hmm ... there is even existing arrayCmp implementations already. This API has almost identical semantic as arrayCmp, from looking at the source code @r30shah posted in the slack channel. This can be mapped to arrayCmp somehow before coming to codegen. Pretty good performance. |
On Z, I can see that we do generate a version of arrayCmp which returns the index of first unequal element [1]. For that it seems to checking for a node flag [2] which looking at the code base is set from IdiomTransformation [3]. Only difference for this API and such arrayCmp is that the API requires us to return number of remaining pairs of elements needed to compare in the tails of two arrays. [1]. https://github.com/eclipse/omr/blob/cf8ddbd1adcd87e388cad6be1fd0e0c085285a29/compiler/z/codegen/OMRTreeEvaluator.cpp#L11567-L11609
|
Perf comparisons for With hotspot the SegmentMismatchAccessor takes at most ~700 ms whereas on openj9 can take a few seconds when passing and >20 seconds when failing (timeout). Hotspot:
OpenJ9:
Here the closing thread had to wait much longer for the accessors, which indicates that the mismatch accessors are slower. |
@gita-omr any update on this? |
for p, i assigned the implementation to @bhavanisn technically, if your platform supports vector byte comparison and counting equal-byte in the right endian (p10 does and pre-p10 need a longer instr sequence), you will have fairly efficient implementations. the complicated return value for no-mismatching case in implementation can be simplified as follows:
|
For Z implementation, generated code for @tajila This one is tagged 0.34/Java19 which might not be the correct target, are we expecting this to be resolved in 0.36? |
@r30shah it is tantalizing to take advantages of the existing arrayCmp implementation for mis-match cases and adding new instructions to calculate the no-mismatch cases. there are two issues to be followed up though:
|
Is it about generating inlined code for |
@knn-k it is about better implementation/code for vectorizedMismatch() than JIT-compiled code. there is no dependency on existence of arrayCmp inlined ... that is just a coincidental finding of possible reuse/hint for this implementation. |
Memery Access APIs is a preview for JDK19 so we need to be functionally complete. That being said this is technically a perf issue not a functional problem, but we had to disable tests to not hit the timeouts due to poor performance. We can move it out, if its unfeasible to deliver this in the next month. |
while we are at this, it might be worthwhile to take a closer look why the usual JIT-compiled code is too slow. there might be a common issue slowing it down. for example, the Unsafe accesses to long are not/could not be optimized ... going through a long way to cover all possible cases. |
I've done some testing and determined that the following pseudocode is equivalent to int vectorizedMismatch(byte *a, long aOffset, byte *b, long bOffset, int length, int log2ArrayIndexScale) {
int lengthInBytes = length << log2ArrayIndexScale;
int mask = (log2ArrayIndexScale<<1) | 3;
int n = lengthInBytes & ~(mask);
int res = arrayCmpLen(a+aOffset, b+bOffset, n);
if (res == n) // no mismatch found
return ~((lengthInBytes & mask) >> log2ArrayIndexScale);
else // mismatch found
return res >> log2ArrayIndexScale;
} Swapping out the body of
Caveat: I've only done functionality testing locally on Z with My dev branch is working at this commit: Spencer-Comin@dfea139 with this test: TestVectorizedArrayMismatch.java.txt [1] https://github.com/ibmruntimes/openj9-openjdk-jdk19/blob/332135c6a1fedc19e2e607b273125eb242b5d213/src/java.base/share/classes/jdk/internal/util/ArraysSupport.java#L92-L102 |
Thanks a lot @Spencer-Comin for detailed summary of the experiments. I like the idea of doing this in IL. Would like @zl-wang's opinion on this approach. I am bit confused about second issue though about arraycmp working with compressed strings. Can you elaborate bit more on it? FYI @dchopra001 |
@r30shah A good try indeed. It is also understood that the existing IL is not in complete form yet. Exactly you need to take care of the different situations. the IL transformation needs to test if a/b is null and take the object header size (or off-heap addressing) into account. I saw the calling offsets are compensating for this purpose in the tests. compressed strings have two further implications: 1) need to take care of the difference between intended element-size and representational in-memory element-size; 2) little-endian and big-endian difference in result calculation (might not applicable to z/x supporting single mode only). @bhavanisn FYI ... |
it is also nice that, if a/b is not null, it can be assumed to be an array object. otherwise, it is much more complicated. |
Summarizing a couple of offline discussions regarding the following
We may have a scenario where two arrays are passed in that represent strings. One array is representing a compressed string and the other is not. In addition, the However the internals of how a string is represented is private to the |
A rather convoluted test case to see what happens when I pass the internal import java.lang.reflect.Field;
import jdk.internal.misc.Unsafe;
import jdk.internal.util.ArraysSupport;
public class TestCompressedString {
public static void main(String[] args) throws IllegalAccessException, NoSuchFieldException {
String a = "This is a string that I am using for testing that should be long enough to give a good answer";
String b = "This is a string that I am using for test\u0101ng that should be long enough to give a good answer";
// The char at index 41 is different
// \u0101 should force b to not use a compressed representation
Field value = String.class.getDeclaredField("value");
value.setAccessible(true);
int vv = ArraysSupport.vectorizedMismatch(
((byte[])value.get(a)), Unsafe.ARRAY_BYTE_BASE_OFFSET,
((byte[])value.get(b)), Unsafe.ARRAY_BYTE_BASE_OFFSET,
((byte[])value.get(a)).length,
ArraysSupport.LOG2_ARRAY_BYTE_INDEX_SCALE
);
System.out.println(vv);
}
} Running it on Z/Linux with OpenJ9: $ java -version
openjdk version "19.0.1-internal" 2022-10-18
OpenJDK Runtime Environment (build 19.0.1-internal-adhoc.spencer.test)
Eclipse OpenJ9 VM (build master-892071c6a, JRE 19 Linux s390x-64-Bit Compressed References 20230109_000000 (JIT enabled, AOT enabled)
OpenJ9 - 277af292f
OMR - 9c865e119
JCL - d78ba9640fe based on jdk-19.0.1+10)
$ # without compressed strings
$ java -XX:-CompactStrings --add-exports java.base/jdk.internal.misc=ALL-UNNAMED --add-exports java.base/jdk.internal.util=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED TestCompressedString
82
$ # with compressed strings
$ java -XX:+CompactStrings --add-exports java.base/jdk.internal.misc=ALL-UNNAMED --add-exports java.base/jdk.internal.util=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED TestCompressedString
0 On Z/Linux with OpenJDK: $ java -version
openjdk version "19.0.1" 2022-10-18
OpenJDK Runtime Environment Temurin-19.0.1+10 (build 19.0.1+10)
OpenJDK 64-Bit Server VM Temurin-19.0.1+10 (build 19.0.1+10, mixed mode, sharing)
$ # without compressed strings
$ java -XX:-CompactStrings --add-exports java.base/jdk.internal.misc=ALL-UNNAMED --add-exports java.base/jdk.internal.util=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED TestCompressedString
82
$ # with compressed strings
$ java -XX:+CompactStrings --add-exports java.base/jdk.internal.misc=ALL-UNNAMED --add-exports java.base/jdk.internal.util=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED TestCompressedString
0 On Intel MacOS with OpenJDK: $ java -version
openjdk version "19.0.1" 2022-10-18
OpenJDK Runtime Environment Homebrew (build 19.0.1)
OpenJDK 64-Bit Server VM Homebrew (build 19.0.1, mixed mode, sharing)
$ # without compressed strings
$ java -XX:-CompactStrings --add-exports java.base/jdk.internal.misc=ALL-UNNAMED --add-exports java.base/jdk.internal.util=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED TestCompressedString
82
$ # with compressed strings
$ java -XX:+CompactStrings --add-exports java.base/jdk.internal.misc=ALL-UNNAMED --add-exports java.base/jdk.internal.util=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED TestCompressedString
1 |
Additionally, a quick test to see what happens when I pass in a import jdk.internal.misc.Unsafe;
import jdk.internal.util.ArraysSupport;
public class TestCompressedString {
public static void main(String[] args) throws IllegalAccessException, NoSuchFieldException {
String a = "This is a string that I am using for testing that should be long enough to give a good answer";
String b = "This is a string that I am using for test\u0101ng that should be long enough to give a good answer";
Field value = String.class.getDeclaredField("value");
value.setAccessible(true);
int vv = ArraysSupport.vectorizedMismatch(
a, Unsafe.ARRAY_CHAR_BASE_OFFSET,
b, Unsafe.ARRAY_CHAR_BASE_OFFSET,
a.length(),
ArraysSupport.LOG2_ARRAY_CHAR_INDEX_SCALE
);
System.out.println(vv);
}
}
|
eclipse/omr#6904 implemented |
Most of the development work for this is done and there are two PRs [1][2] which should go in couple of weeks. Given the code-split for 0.40 is already happened, I do not think we would be get the PRs in. @tajila should we move this to next milestone? [1]. eclipse/omr#6983 |
There's actually also a third PR (#17382, changing OpenJ9 to use |
This patch adds ArraysSupport.vectorizedMismatch as a recognized method, and adds a SupportsInlineVectorizedMismatch flag to the code generator. This flag is set in Z, Power, aarch64 and x86 code generator initialization if the arraycmp opcode is supported and the TR_disableInlineVectorizedMismatch environment variable is not set. If the flag is set, vectorizedMismatch call nodes are transformed to a functionally equivalent tree that uses arraycmp. Fixes: eclipse-openj9#15204 Signed-off-by: Spencer Comin <spencer.comin@ibm.com>
As the pull requests are still in progress, I'll move this out to the 0.43 release |
This patch adds ArraysSupport.vectorizedMismatch as a recognized method, and adds a SupportsInlineVectorizedMismatch flag to the code generator. This flag is set in Z, Power, aarch64 and x86 code generator initialization if the arraycmp opcode is supported and the TR_disableInlineVectorizedMismatch environment variable is not set. If the flag is set, vectorizedMismatch call nodes are transformed to a functionally equivalent tree that uses arraycmp. Fixes: eclipse-openj9#15204 Signed-off-by: Spencer Comin <spencer.comin@ibm.com>
This patch adds ArraysSupport.vectorizedMismatch as a recognized method, and adds a SupportsInlineVectorizedMismatch flag to the code generator. This flag is set in Z, Power, aarch64 and x86 code generator initialization if the arraycmp opcode is supported and the TR_disableInlineVectorizedMismatch environment variable is not set. If the flag is set, vectorizedMismatch call nodes are transformed to a functionally equivalent tree that uses arraycmp. Fixes: eclipse-openj9#15204 Signed-off-by: Spencer Comin <spencer.comin@ibm.com>
The following is used for mismatch operations in MemoryAccessAPIs. The JDK supplies a java implementation which is very slow. We are seeing test failures due to timeouts as a result of this, #13211.
This API takes two arrays (of the same type) and compares each elements until there is a mismatch. This operation is can be applied to all primitive array types, the
log2ArrayIndexScale
denotes which type it is.The text was updated successfully, but these errors were encountered: