Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug-66532] Improve performance of SheetDataWriter #443

Closed
wants to merge 2 commits into from

Conversation

rascmatt
Copy link

Simplify loop and avoid code point to string conversions.

	- Simplify loop and avoid codepoint to
	  string conversions.
@rascmatt
Copy link
Author

rascmatt commented Mar 17, 2023

Following you will find the results of a JMH benchmark I ran with 3 different versions. Unfortunately I was unable to get JMH running with Gradle, so I ran this in an external Maven Project.

The latest released version (5.2.3): 'Main.benchOriginal'
An unreleased improvement (#405): 'Main.bench_405'
And the version proposed in this commit: 'Main.bench_66532'

Benchmark Mode Cnt Score Error Units
Main.benchOriginal thrpt 15 520433.760 ± 81525.743 ops/s
Main.bench_405 thrpt 15 579381.912 ± 30395.196 ops/s
Main.bench_66532 thrpt 15 3029062.360 ± 141908.012 ops/s

@rascmatt rascmatt marked this pull request as ready for review March 17, 2023 19:38
for (Iterator<String> iter = CodepointsUtil.iteratorFor(s); iter.hasNext(); ) {
String codepoint = iter.next();
for (int i = 0; i < s.length(); i++) {
final int codepoint = s.codePointAt(i);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code is wrong - s.length() is not the number of codepoints - it is the number of chars - in a string.

see https://www.w3resource.com/java-tutorial/string/string_codepointcount.php

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might need to improve the readability of the loop, however I think this is covered by line 441, where (in case of a character pair) the counter is additionally increased.

Copy link
Contributor

@pjfanning pjfanning Mar 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

loops should not increment in multiple places (ideally) - if you use the codePointCount for the limit of the loop then the count will be correct

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but then the codePointAt() method will not receive the correct index. I switched to using the codePoints iterator, which does not seem to have a big performance impact.

switch (codepoint) {
case "<":
case 60: // <
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you test with '<'? - this should have similar performance but would be more readable

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Done.

@pjfanning
Copy link
Contributor

Following you will find the results of a JMH benchmark I ran with 3 different versions. Unfortunately I was unable to get JMH running with Gradle, so I ran this in an external Maven Project.

The latest released version (5.2.3): 'Main.benchOriginal' An unreleased improvement (#405): 'Main.bench_405' And the version proposed in this commit: 'Main.bench_66532'

Benchmark Mode Cnt Score Error Units Main.benchOriginal thrpt 15 520433.760 ± 81525.743 ops/s Main.bench_405 thrpt 15 579381.912 ± 30395.196 ops/s Main.bench_66532 thrpt 15 3029062.360 ± 141908.012 ops/s

could you share your benchmark code? a GitHub project or some such

@rascmatt
Copy link
Author

https://github.com/rascmatt/poi-benchmark
The (updated) benchmark with code and results.

@pjfanning
Copy link
Contributor

thanks - looks good. Let me look over this but it looks like this should be safe to merge over the coming days.

@pjfanning
Copy link
Contributor

I added 0275daa - not exactly what is in your PR. Could you try this out in your benchmark?

@rascmatt
Copy link
Author

Thank you, I added this version to the benchmark and it looks good. In addition I added a benchmark for a more integrated setting (rascmatt/poi-benchmark@d7fa4a8). It shows a speedup of about ~45% for the SheetDataWriter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants