[Bug-66532] Improve performance of SheetDataWriter #443

rascmatt · 2023-03-17T19:28:38Z

Simplify loop and avoid code point to string conversions.

- Simplify loop and avoid codepoint to string conversions.

rascmatt · 2023-03-17T19:34:00Z

Following you will find the results of a JMH benchmark I ran with 3 different versions. Unfortunately I was unable to get JMH running with Gradle, so I ran this in an external Maven Project.

The latest released version (5.2.3): 'Main.benchOriginal'
An unreleased improvement (#405): 'Main.bench_405'
And the version proposed in this commit: 'Main.bench_66532'

Benchmark Mode Cnt Score Error Units
Main.benchOriginal thrpt 15 520433.760 ± 81525.743 ops/s
Main.bench_405 thrpt 15 579381.912 ± 30395.196 ops/s
Main.bench_66532 thrpt 15 3029062.360 ± 141908.012 ops/s

pjfanning · 2023-03-17T19:46:48Z

poi-ooxml/src/main/java/org/apache/poi/xssf/streaming/SheetDataWriter.java

-        for (Iterator<String> iter = CodepointsUtil.iteratorFor(s); iter.hasNext(); ) {
-            String codepoint = iter.next();
+        for (int i = 0; i < s.length(); i++) {
+            final int codepoint = s.codePointAt(i);


this code is wrong - s.length() is not the number of codepoints - it is the number of chars - in a string.

see https://www.w3resource.com/java-tutorial/string/string_codepointcount.php

I might need to improve the readability of the loop, however I think this is covered by line 441, where (in case of a character pair) the counter is additionally increased.

loops should not increment in multiple places (ideally) - if you use the codePointCount for the limit of the loop then the count will be correct

Yes, but then the codePointAt() method will not receive the correct index. I switched to using the codePoints iterator, which does not seem to have a big performance impact.

pjfanning · 2023-03-17T19:53:38Z

poi-ooxml/src/main/java/org/apache/poi/xssf/streaming/SheetDataWriter.java

            switch (codepoint) {
-                case "<":
+                case 60: // <


could you test with '<'? - this should have similar performance but would be more readable

Good idea. Done.

pjfanning · 2023-03-17T19:56:59Z

Following you will find the results of a JMH benchmark I ran with 3 different versions. Unfortunately I was unable to get JMH running with Gradle, so I ran this in an external Maven Project.

The latest released version (5.2.3): 'Main.benchOriginal' An unreleased improvement (#405): 'Main.bench_405' And the version proposed in this commit: 'Main.bench_66532'

Benchmark Mode Cnt Score Error Units Main.benchOriginal thrpt 15 520433.760 ± 81525.743 ops/s Main.bench_405 thrpt 15 579381.912 ± 30395.196 ops/s Main.bench_66532 thrpt 15 3029062.360 ± 141908.012 ops/s

could you share your benchmark code? a GitHub project or some such

rascmatt · 2023-03-17T21:29:36Z

https://github.com/rascmatt/poi-benchmark
The (updated) benchmark with code and results.

pjfanning · 2023-03-17T21:32:47Z

thanks - looks good. Let me look over this but it looks like this should be safe to merge over the coming days.

pjfanning · 2023-03-17T23:38:35Z

I added 0275daa - not exactly what is in your PR. Could you try this out in your benchmark?

rascmatt · 2023-03-18T10:03:04Z

Thank you, I added this version to the benchmark and it looks good. In addition I added a benchmark for a more integrated setting (rascmatt/poi-benchmark@d7fa4a8). It shows a speedup of about ~45% for the SheetDataWriter.

[Bug-66532] Improve performance of SheetDataWriter

09a2553

- Simplify loop and avoid codepoint to string conversions.

rascmatt marked this pull request as ready for review March 17, 2023 19:38

pjfanning suggested changes Mar 17, 2023

View reviewed changes

pjfanning reviewed Mar 17, 2023

View reviewed changes

[Bug-66532] Improve performance of SheetDataWriter

8568e18

rascmatt closed this Mar 18, 2023

rascmatt deleted the bug-66532 branch March 18, 2023 12:11

dpsutton mentioned this pull request Apr 14, 2023

Regression in exports returns "502 Bad Gateway" in files metabase/metabase#29839

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug-66532] Improve performance of SheetDataWriter #443

[Bug-66532] Improve performance of SheetDataWriter #443

rascmatt commented Mar 17, 2023

rascmatt commented Mar 17, 2023 •

edited

pjfanning Mar 17, 2023

rascmatt Mar 17, 2023

pjfanning Mar 17, 2023 •

edited

rascmatt Mar 17, 2023

pjfanning Mar 17, 2023

rascmatt Mar 17, 2023

pjfanning commented Mar 17, 2023

rascmatt commented Mar 17, 2023

pjfanning commented Mar 17, 2023

pjfanning commented Mar 17, 2023

rascmatt commented Mar 18, 2023

[Bug-66532] Improve performance of SheetDataWriter #443

[Bug-66532] Improve performance of SheetDataWriter #443

Conversation

rascmatt commented Mar 17, 2023

rascmatt commented Mar 17, 2023 • edited

pjfanning Mar 17, 2023

Choose a reason for hiding this comment

rascmatt Mar 17, 2023

Choose a reason for hiding this comment

pjfanning Mar 17, 2023 • edited

Choose a reason for hiding this comment

rascmatt Mar 17, 2023

Choose a reason for hiding this comment

pjfanning Mar 17, 2023

Choose a reason for hiding this comment

rascmatt Mar 17, 2023

Choose a reason for hiding this comment

pjfanning commented Mar 17, 2023

rascmatt commented Mar 17, 2023

pjfanning commented Mar 17, 2023

pjfanning commented Mar 17, 2023

rascmatt commented Mar 18, 2023

rascmatt commented Mar 17, 2023 •

edited

pjfanning Mar 17, 2023 •

edited