Add Worksheet.values() batch function to the API #135

akbertram · 2020-12-19T15:47:14Z

For use-cases where rows are written sequentially, this avoids the
overhead of resizing the cells array each time a new cell is added,
as well as repeating all of the bounds checks for each cell.

For use-cases where rows are written sequentially, this avoids the overhead of resizing the cells array each time a new cell is added, as well as repeating all of the bounds checks for each cell.

rzymek · 2020-12-20T17:03:25Z

I did a micro benchmark comparison of:

private static final int NB_ROWS = 10000;
private static Object[] row = IntStream.range(0, 200).boxed().toArray();

@Benchmark
public int oneByOne() throws IOException {
  CountingOutputStream count = new CountingOutputStream(new NullOutputStream());
  Workbook wb = new Workbook(count, "Perf", "1.0");
  Worksheet ws = wb.newWorksheet("Sheet 1");
  for (int r = 0; r < NB_ROWS; ++r) {
    for (int c = 0; c < row.length; c++) {
      ws.value(r, c, row[c]);
    }
    if (r % 1000 == 0) {
      ws.flush();
    }
  }
  wb.finish();
  return count.getCount();
}

against

@Benchmark
public int wholeRowReverse() throws IOException {
  CountingOutputStream count = new CountingOutputStream(new NullOutputStream());
  Workbook wb = new Workbook(count, "Perf", "1.0");
  Worksheet ws = wb.newWorksheet("Sheet 1");
  for (int r = 0; r < NB_ROWS; ++r) {
    ws.values(r, row);
    if (r % 1000 == 0) {
      ws.flush();
    }
  }
  wb.finish();
  return count.getCount();
}

and the results unfortunately do not show any difference in performance:

WriterMultipleRows.oneByOne           ss    5  0.987 ± 0.039   s/op
WriterMultipleRows.wholeRowReverse    ss    5  0.988 ± 0.121   s/op

akbertram · 2020-12-20T20:24:22Z

Fascinating! Thanks for benchmarking. Another reminder that the JVM is really good at lifting things out of loops. I am workkng on optimizing our export routine, but I will go back to the drawing board and benchmark myself before submitting another PR!

…

On Sun, Dec 20, 2020, 18:03 Krzysztof Rzymkowski ***@***.***> wrote: I did a micro benchmark comparison of: private static final int NB_ROWS = 10000; private static Object[] row = IntStream.range(0, 200).boxed().toArray(); @benchmark public int oneByOne() throws IOException { CountingOutputStream count = new CountingOutputStream(new NullOutputStream()); Workbook wb = new Workbook(count, "Perf", "1.0"); Worksheet ws = wb.newWorksheet("Sheet 1"); for (int r = 0; r < NB_ROWS; ++r) { for (int c = 0; c < row.length; c++) { ws.value(r, c, row[c]); } if (r % 1000 == 0) { ws.flush(); } } wb.finish(); return count.getCount(); } against @benchmark public int wholeRowReverse() throws IOException { CountingOutputStream count = new CountingOutputStream(new NullOutputStream()); Workbook wb = new Workbook(count, "Perf", "1.0"); Worksheet ws = wb.newWorksheet("Sheet 1"); for (int r = 0; r < NB_ROWS; ++r) { ws.values(r, row); if (r % 1000 == 0) { ws.flush(); } } wb.finish(); return count.getCount(); } and the results unfortunately do not show any difference in performance: WriterMultipleRows.oneByOne ss 5 0.987 ± 0.039 s/op WriterMultipleRows.wholeRowReverse ss 5 0.988 ± 0.121 s/op — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#135 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AADO5QZZLLM6J27VD7UMYNDSVYU6TANCNFSM4VCMBJ5Q> .

rzymek · 2021-01-02T17:09:07Z

Great! Keeping my fingers crossed that you'll find some place that can be optimized!
Microbenchmark infrastructure for fastexcel is setup in https://github.com/dhatim/fastexcel/tree/master/e2e subproject. Just create a class that extends BenchmarkLauncher, mark methods with JMH's @Benchmark annotation and run the class as a JUnit test. Feel free to drop be a line at rzymek@gmail.com if you need any help.
As for this PR, I'm gonna close it,as the new API method does not bring noticeable performance benefit.

Add Worksheet.values() batch function to the API

56fb205

For use-cases where rows are written sequentially, this avoids the overhead of resizing the cells array each time a new cell is added, as well as repeating all of the bounds checks for each cell.

rzymek closed this Jan 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Worksheet.values() batch function to the API #135

Add Worksheet.values() batch function to the API #135

akbertram commented Dec 19, 2020

rzymek commented Dec 20, 2020

akbertram commented Dec 20, 2020 via email

rzymek commented Jan 2, 2021

Add Worksheet.values() batch function to the API #135

Add Worksheet.values() batch function to the API #135

Conversation

akbertram commented Dec 19, 2020

rzymek commented Dec 20, 2020

akbertram commented Dec 20, 2020 via email

rzymek commented Jan 2, 2021