New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use StandardCharsets.UTF_8 #7599
use StandardCharsets.UTF_8 #7599
Conversation
Codecov Report
@@ Coverage Diff @@
## master #7599 +/- ##
============================================
+ Coverage 71.56% 71.63% +0.06%
- Complexity 3881 3882 +1
============================================
Files 1559 1559
Lines 79053 79049 -4
Branches 11706 11706
============================================
+ Hits 56575 56627 +52
+ Misses 18669 18607 -62
- Partials 3809 3815 +6
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
I had found this to result in more garbage before. |
I have not read the entire JDK PRs, but it talks only about speed. Our problem (at the time) was specifically GC overhead in realtime when segments were being built. The result was query performance suffered. Can you make any memory usage comparisons across these two? |
I have some data from last time I went through this particular exercise - let me clean it up to demonstrate this is beneficial for anyone using JDK9+ (later this week) |
This benchmark demonstrates the improvement on JDK11 and also that the implementation was sensible on JDK8. It uses 4 different character types, to generate all 4 possible lengths of UTF8 sequence: @State(Scope.Benchmark)
public class DecodingBenchmark {
@Param({"1", "10", "100", "1000"})
int length;
@Param({"a", "ß", "道", "\uD841\uDF0E"})
String unit;
private String data;
@Setup(Level.Trial)
public void setup() {
StringBuilder sb = new StringBuilder(length * unit.length());
for (int i = 0; i < length; i++) {
sb.append(unit);
}
this.data = sb.toString();
}
@Benchmark
public byte[] decodeUTF8Charset() {
return data.getBytes(StandardCharsets.UTF_8);
}
@Benchmark
public byte[] decodeUTF8CharsetName() {
try {
return data.getBytes("UTF-8");
} catch (UnsupportedEncodingException e) {
throw new RuntimeException(e);
}
}
} JDK11: pay attention to both the avgt (better even for quite long strings) and the normalised allocation rate (always the same)
Just compare any pair with the same parameters. E.g. for a 100 char ASCII string, there is no extra allocation but the time is halved:
On JDK8, the avgt is much higher but more is allocated when using
For the same 100 char ASCII string mentioned above, the user would be better off upgrading their JVM than relying on this optimisation: JDK11
JDK8
|
@Jackie-Jiang would you mind rerunning the tests (and taking a look)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Please rebase it to the latest master to trigger the CI rerun because we just merged the fix for the integration test
…t and doesn't need to handle excetions
a1aadd2
to
f2817d4
Compare
The tests really do need a rerun this time |
@richardstartin thanks for the diligence and detailed benchmark results. |
…t and doesn't need to handle excetions (apache#7599)
Description
Modernise
StringUtils
- useStandardCharsets
which is faster (I can demonstrate this if necessary).For context, this same change has been made within the JDK for performance reasons:
The problem this code was working around is still present in the latest JDK8 builds but this is a tradeoff - optimise for JDK8 or optimise for JDK11+.
Upgrade Notes
Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion)
backward-incompat
, and complete the section below on Release Notes)Does this PR fix a zero-downtime upgrade introduced earlier?
backward-incompat
, and complete the section below on Release Notes)Does this PR otherwise need attention when creating release notes? Things to consider:
release-notes
and complete the section on Release Notes)Release Notes
Documentation