Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix implementation of String equalsIgnoreCase() and regionMatches() #11423

Merged
merged 1 commit into from
Dec 16, 2020

Conversation

sharon-wang
Copy link
Contributor

@sharon-wang sharon-wang commented Dec 9, 2020

Fixes: #11399

According to the javadoc, equalsIgnoreCase() and regionMatches() both involve calling Character.toLowerCase(Character.toUpperCase(character)) on each character in the string. This PR matches that expected behaviour.

Added tests with Turkish characters for equalsIgnoreCase(), regionMatches() and compareToIgnoreCase().

Uses new helper charValuesEqualIgnoreCase() to compare if two characters are equal (case insensitive).

@tajila
Copy link
Contributor

tajila commented Dec 9, 2020

@JasonFengJ9 can you please review this?

@tajila tajila added the comp:vm label Dec 9, 2020
@sharon-wang
Copy link
Contributor Author

sharon-wang commented Dec 9, 2020

Updated to use Character.toLowerCase(Character.toUpperCase()) in place of Character.toLowerCase(toUpperCase()) and added test cases to Test_String.java.

It seems we may also want to update regionMatches() to call Character.toLowerCase(Character.toUpperCase())? As per the API Spec:

http://cr.openjdk.java.net/~iris/se/15/build/latest/api/java.base/java/lang/String.html#regionMatches(boolean,int,java.lang.String,int,int)

ignoreCase is true and there is some nonnegative integer k less than len such that:

Character.toLowerCase(Character.toUpperCase(this.charAt(toffset+k))) !=
Character.toLowerCase(Character.toUpperCase(other.charAt(ooffset+k)))

Right now, we're doing

(toUpperCase(char1) !=  toUpperCase(char2)) && (toLowerCase(char1) !=  toLowerCase(char2))

@sharon-wang
Copy link
Contributor Author

sharon-wang commented Dec 9, 2020

Using a similar test case to #11399, I'm getting the following for regionMatches():

"İİİ".regionMatches("İİİ") = true
"İİİ".regionMatches("ııı") = false
"İİİ".regionMatches("Iıı") = false
"İİİ".regionMatches("iıİ") = false

"ııı".regionMatches("İİİ") = false
"ııı".regionMatches("ııı") = true
"ııı".regionMatches("Iıı") = true
"ııı".regionMatches("iıİ") = false

"Iıı".regionMatches("İİİ") = false
"Iıı".regionMatches("ııı") = true
"Iıı".regionMatches("Iıı") = true
"Iıı".regionMatches("iıİ") = false

"iıİ".regionMatches("İİİ") = false
"iıİ".regionMatches("ııı") = false
"iıİ".regionMatches("Iıı") = false
"iıİ".regionMatches("iıİ") = true

but the RI returns true for all. Will be applying the same fix to regionMatches().

@sharon-wang sharon-wang changed the title Fix implementation of String.equalsIgnoreCase() Fix implementation of String equalsIgnoreCase() and regionMatches() Dec 9, 2020
@sharon-wang
Copy link
Contributor Author

Updated with regionMatches() fix and corresponding tests.

@sharon-wang sharon-wang force-pushed the streqigcs branch 2 times, most recently from fcbd73d to 4899053 Compare December 10, 2020 00:07
@sharon-wang
Copy link
Contributor Author

Addressed review comments and ran test cases with and without -XX:+CompactStrings -- these changes now show the same behaviour as the RI.

@sharon-wang sharon-wang force-pushed the streqigcs branch 3 times, most recently from 8440704 to e3cbba5 Compare December 10, 2020 21:40
@sharon-wang
Copy link
Contributor Author

Updated PR with fixes. Ran personal build with the sanity.functional test suite, which succeeded.

@sharon-wang sharon-wang force-pushed the streqigcs branch 2 times, most recently from 91c9e0a to c51e721 Compare December 11, 2020 07:15
@fjeremic
Copy link
Contributor

FYI tagging @0xdaryl. I forget if regionMatches pops up on workloads we track, I believe it does on DT7 IIRC as one of the top five methods. I think this will negatively impact performance. Not sure by how much or that we should even care but just bringing it up for your awareness.

@pshipton
Copy link
Member

jenkins test sanity plinux jdk8,jdk11

@0xdaryl
Copy link
Contributor

0xdaryl commented Dec 14, 2020

I forget if regionMatches pops up on workloads we track, I believe it does on DT7 IIRC as one of the top five methods. I think this will negatively impact performance. Not sure by how much or that we should even care but just bringing it up for your awareness.

Agree. We won't know its impact until we measure. It seems prudent to perform some performance runs on workloads OpenJ9 cares about before this is merged.

@pshipton
Copy link
Member

It seems prudent to perform some performance runs on workloads OpenJ9 cares about before this is merged.

@sharon-wang pls follow up with Piyush on this.

According to the javadoc, equalsIgnoreCase() and regionMatches()
both involve calling
    `Character.toLowerCase(Character.toUpperCase(character))`
on each character in the string.

Add tests with Turkish characters for equalsIgnoreCase(),
regionMatches() and compareToIgnoreCase().

Use new helper charValuesEqualIgnoreCase() to compare if two
characters are equal (case insensitive).

[skip ci]

Signed-off-by: Sharon Wang <sharon-wang-cpsc@outlook.com>
@sharon-wang
Copy link
Contributor Author

LibertyStartupDT7 Results

These are the aggregate startup results from 6 iterations of the benchmark.

Mean Startup Time (ms) Median Startup Time (ms) Confidence Interval
with new changes 4653.0 4668.5 0.0047
no changes 4613.7 4599.5 0.0056
diff +39.3 +69
diff % +0.85% +1.50%

Is this regression in startup performance considered slight (acceptable), or will more optimization need to be done?

@0xdaryl
Copy link
Contributor

0xdaryl commented Dec 15, 2020

@mpirvu: can you provide some guidance on what is considered acceptable for an OpenJ9 startup regression? In Sharon's results above, it is a little under 1% on average for DT7.

@mpirvu
Copy link
Contributor

mpirvu commented Dec 15, 2020

Typically 0.85% is considered noise if you do just 6 iterations of the benchmark, but the quoted confidence interval is suspiciously low. If that holds, then this is a true regression.
To answer @0xdaryl question, the size of the regression is considered against the purpose of the change. If this is a change needed for correctness (or conformance to the spec) and there is no better way of implementing it, we'll have to accept it.
Also, to be on the safe side we need throughput runs as well.

I can do those runs if I am provided with two builds: before and after.

@mpirvu
Copy link
Contributor

mpirvu commented Dec 15, 2020

FYI @vijaysun-omr

@fjeremic
Copy link
Contributor

I dug up some old profiles of DayTrader7. Here are the top methods:

Ticks	Pct_Total	%   	Symbol/Functions                                                                                                                                                                                                                                                                                                                                                                                                               	
41298	1.29     	2.09	java/util/concurrent/ConcurrentHashMap.get(Ljava/lang/Object;)Ljava/lang/Object;                                                                                                                                                                                                                                                                                                                                               	
32972	1.03     	1.67	com/ibm/jit/JITHelpers.unsafeObjectShallowCopy64(Ljava/lang/Object;Ljava/lang/Object;J)V                                                                                                                                                                                                                                                                                                                                       	
29315	0.92     	1.48	java/lang/String.equals(Ljava/lang/Object;)Z                                                                                                                                                                                                                                                                                                                                                                                   	
26300	0.82     	1.33	org/eclipse/persistence/internal/descriptors/ObjectBuilder.buildWorkingCopyCloneFromRow(Lorg/eclipse/persistence/queries/ObjectBuildingQuery;Lorg/eclipse/persistence/internal/queries/JoinedAttributeManager;Lorg/eclipse/persistence/internal/sessions/AbstractRecord;Lorg/eclipse/persistence/internal/sessions/UnitOfWorkImpl;Ljava/lang/Object;Lorg/eclipse/persistence/internal/identitymaps/CacheKey;)Ljava/lang/Object;	
22732	0.71     	1.15	com/ibm/tx/jta/impl/TransactionImpl.equals(Ljava/lang/Object;)Z                                                                                                                                                                                                                                                                                                                                                                	
22224	0.70     	1.13	com/ibm/ejs/j2c/SharedPool.getSharedConnection(Ljava/lang/Object;Ljavax/security/auth/Subject;Ljavax/resource/spi/ConnectionRequestInfo;ZLjava/lang/String;II)Lcom/ibm/ws/j2c/MCWrapper;                                                                                                                                                                                                                                       	
18697	0.59     	0.95	org/apache/myfaces/view/facelets/tag/jsf/ComponentTagHandlerDelegate.apply(Ljavax/faces/view/facelets/FaceletContext;Ljavax/faces/component/UIComponent;)V                                                                                                                                                                                                                                                                     	
18324	0.57     	0.93	javax/faces/component/UIComponent.popComponentFromEL(Ljavax/faces/context/FacesContext;)V                                                                                                                                                                                                                                                                                                                                      	
16738	0.52     	0.85	java/util/HashMap.putVal(ILjava/lang/Object;Ljava/lang/Object;ZZ)Ljava/lang/Object;                                                                                                                                                                                                                                                                                                                                            	
16690	0.52     	0.84	org/eclipse/persistence/internal/helper/ConcurrentFixedCache.put(Ljava/lang/Object;Ljava/lang/Object;)V                                                                                                                                                                                                                                                                                                                        	
16057	0.50     	0.81	java/io/Writer.write(Ljava/lang/String;II)V     

What I was concerned about was the third one, i.e. String.equals. Looking at the code though it does call regionMatches but the internal version, i.e.
https://github.com/eclipse/openj9/blob/67bb44416a1caafdbcebdff6a2463b44d82f13cd/jcl/src/java.base/share/classes/java/lang/String.java#L2382-L2413

Going back to the changes being made here, they are in the case sensitive version of this API. This will not be on the critical path of at least DayTrader7. I think we should be ok here.

@mpirvu
Copy link
Contributor

mpirvu commented Dec 15, 2020

My own DT7 startup experiments show virtually no difference between the two builds provided by @sharon-wang

Results for JDK=/home/mpirvu/sdks/Sharon/default jvmOpts=-Xmx256m
StartupTime     avg=4021        min=3829        max=4270        stdDev=92.7     maxVar=11.5%    confInt=0.30%   samples=159
Footprint       avg=216777      min=209524      max=225664      stdDev=2423.5   maxVar=7.7%     confInt=0.16%   samples=127
CThreadTime     avg=2304        min=1894        max=3136        stdDev=247.9    maxVar=65.6%    confInt=1.41%   samples=159
ProcessTime     avg=7408        min=6820        max=8660        stdDev=429.7    maxVar=27.0%    confInt=0.79%   samples=146

Results for JDK=/home/mpirvu/sdks/Sharon/withchanges jvmOpts=-Xmx256m
StartupTime     avg=4019        min=3848        max=4293        stdDev=86.5     maxVar=11.6%    confInt=0.28%   samples=160
Footprint       avg=217072      min=210372      max=225436      stdDev=2444.4   maxVar=7.2%     confInt=0.16%   samples=131
CThreadTime     avg=2304        min=1888        max=2993        stdDev=220.8    maxVar=58.5%    confInt=1.25%   samples=159
ProcessTime     avg=7412        min=6890        max=8580        stdDev=375.7    maxVar=24.5%    confInt=0.69%   samples=146

@fjeremic
Copy link
Contributor

I think we should be ok to proceed here.

@sharon-wang
Copy link
Contributor Author

Thank you all for the help with investigating perf! @pshipton Looks like this should be ready to merge.

@pshipton
Copy link
Member

@JasonFengJ9 any further concerns?

@sharon-wang was your last push 2 days ago just a rebase, or did something change?

@sharon-wang
Copy link
Contributor Author

was your last push 2 days ago just a rebase, or did something change?

Yes, it was a rebase to bring sources up to date for the perf tests.

@pshipton
Copy link
Member

pshipton commented Dec 16, 2020

In that case I won't bother re-running the PR testing, which passed. It lgtm, once Jason signs off this can be merged. We should also add the change to the 0.24 release.

Copy link
Member

@JasonFengJ9 JasonFengJ9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks @sharon-wang

@sharon-wang
Copy link
Contributor Author

Opened #11497 to add this to 0.24.

@pshipton pshipton merged commit 8a09d26 into eclipse-openj9:master Dec 16, 2020
@sharon-wang sharon-wang deleted the streqigcs branch December 16, 2020 22:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

String.equalsIgnoreCase() behaves differently to OpenJDK for Turkish 'i' characters
7 participants