New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug-66532] Improve performance of SheetDataWriter #443
Conversation
- Simplify loop and avoid codepoint to string conversions.
Following you will find the results of a JMH benchmark I ran with 3 different versions. Unfortunately I was unable to get JMH running with Gradle, so I ran this in an external Maven Project. The latest released version (5.2.3): 'Main.benchOriginal' Benchmark Mode Cnt Score Error Units |
for (Iterator<String> iter = CodepointsUtil.iteratorFor(s); iter.hasNext(); ) { | ||
String codepoint = iter.next(); | ||
for (int i = 0; i < s.length(); i++) { | ||
final int codepoint = s.codePointAt(i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this code is wrong - s.length() is not the number of codepoints - it is the number of chars - in a string.
see https://www.w3resource.com/java-tutorial/string/string_codepointcount.php
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might need to improve the readability of the loop, however I think this is covered by line 441, where (in case of a character pair) the counter is additionally increased.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
loops should not increment in multiple places (ideally) - if you use the codePointCount for the limit of the loop then the count will be correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but then the codePointAt() method will not receive the correct index. I switched to using the codePoints iterator, which does not seem to have a big performance impact.
switch (codepoint) { | ||
case "<": | ||
case 60: // < |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you test with '<'
? - this should have similar performance but would be more readable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. Done.
could you share your benchmark code? a GitHub project or some such |
https://github.com/rascmatt/poi-benchmark |
thanks - looks good. Let me look over this but it looks like this should be safe to merge over the coming days. |
I added 0275daa - not exactly what is in your PR. Could you try this out in your benchmark? |
Thank you, I added this version to the benchmark and it looks good. In addition I added a benchmark for a more integrated setting (rascmatt/poi-benchmark@d7fa4a8). It shows a speedup of about ~45% for the SheetDataWriter. |
Simplify loop and avoid code point to string conversions.