Table.expand #105

RomanTsegelskyi · 2014-07-21T20:12:45Z

I have implemented table.expand in Rcpp. I did a lot of comparative testing with original table.expand, everything seems to be correct and work fine.

As for benchmarks, it is kind of controversial. General speed up of pandoc.table.return is between 1.5 - 5x. For example

Small data.frame

> b.old.simple
Unit: milliseconds
                                                         expr      min       lq   median       uq      max neval
 pander(data.frame(a = "foo\\nbar"), keep.line.breaks = TRUE) 5.501682 6.224382 7.290573 8.382554 12.23485   100
> b.new.simple
Unit: milliseconds
                                                         expr     min       lq   median       uq      max neval
 pander(data.frame(a = "foo\\nbar"), keep.line.breaks = TRUE) 2.30613 2.587211 2.924714 3.806796 5.605237   100

mtcars

> b.old.mtcars
Unit: milliseconds
                             expr      min       lq   median       uq      max neval
 pander(mtcars, style = "simple") 192.7679 201.4737 206.0359 210.8648 282.0304   100
> b.new.mtcars
Unit: milliseconds
                             expr     min       lq   median       uq      max neval
 pander(mtcars, style = "simple") 66.4057 73.86317 76.89688 83.96956 119.5512   100

volcano

> b.old.volcano
Unit: seconds
            expr      min       lq   median       uq      max neval
 pander(volcano) 4.408776 4.415709 4.479123 4.513334 4.536083    10
> b.new.volcano
Unit: seconds
            expr      min       lq   median       uq      max neval
 pander(volcano) 3.502687 3.545587 3.583752 3.634691 3.850749    10

So for really big datasets (volcano is one of the biggest in datasets package) speed it is getting smaller, but for other ones it is better.
One reason I am not fully satisfied with results of pandoc.table speedup is that table.expand speedup is much better (10-25x).

> b1
Unit: microseconds
                                                              expr     min      lq   median      uq      max neval
           table.expand(argv[, 1], argv[, 2], argv[, 3], sep.cols) 711.426 743.061 810.3655 863.430 2219.897   100
 tableExpand_cpp(argv[, 1], argv[, 2], argv[, 3], sep.cols, style)  57.685  64.676  84.1530 100.071  476.688   100
> b2
Unit: microseconds
                                                              expr      min        lq    median       uq      max neval
           table.expand(argv[, 1], argv[, 2], argv[, 3], sep.cols) 1929.499 1989.7145 2066.2760 2599.152 4317.919   100
 tableExpand_cpp(argv[, 1], argv[, 2], argv[, 3], sep.cols, style)   58.566   65.9905   85.2165   96.538  167.777   100
> b3
Unit: microseconds
                                                              expr      min       lq    median       uq      max neval
           table.expand(argv[, 1], argv[, 2], argv[, 3], sep.cols) 1366.966 1405.690 1455.6460 1502.291 3627.009   100
 tableExpand_cpp(argv[, 1], argv[, 2], argv[, 3], sep.cols, style)   55.687   67.638   87.1345   96.651  190.005   100
> b4
Unit: microseconds
                                                              expr      min        lq    median       uq      max neval
           table.expand(argv[, 1], argv[, 2], argv[, 3], sep.cols) 1326.334 1365.1385 1384.4160 1412.322 3128.808   100
 tableExpand_cpp(argv[, 1], argv[, 2], argv[, 3], sep.cols, style)   54.686   65.6485   78.1735   91.113  134.672   100
> b5
Unit: microseconds
                                                              expr      min        lq   median       uq      max neval
           table.expand(argv[, 1], argv[, 2], argv[, 3], sep.cols) 1326.983 1378.6020 1395.999 1414.224 3168.186   100
 tableExpand_cpp(argv[, 1], argv[, 2], argv[, 3], sep.cols, style)   55.294   68.7475   87.765   92.075  164.647   100

So it seems to me that there is still good room for improvement and that for big tables there are other bottlenecks.
Also I have realized that there is no S3 method for microbenchmark in pander, so I am planning to add that also.

daroczig · 2014-07-21T23:12:21Z

Cool! The "slowdown" on larger datasets are probably due to the recursive call -- probably running the tests after panderOptions('table.split.table', Inf) the results would be a lot better for even larger datasets, nope?

And supporting microbenchmark is a great idea indeed, thanks for that! Can you please also implement function if you'd have some time? Probably just printing the function body in Verbatim code blocks.

RomanTsegelskyi · 2014-07-22T14:58:15Z

Yeah, in that case it is a little better, 2-3x times speedup.
Also I realized that recursive calls to tableExpand_cpp in here can be efficiently parallelized. I was thinking on doing that before merging the pull request. What do you think?

daroczig · 2014-07-22T15:11:26Z

That would be awesome of course, but I'm afraid we are running out of time,
so I'd suggest to finish that 2 new methods (microbenchmark + function),
and proceed with tests and documentation. That latter requires serious
updates.

Sent from my phone with an annoying input interface. Please excuse my
brevity.
On Jul 22, 2014 4:58 PM, "Roman Tsegelskyi" notifications@github.com
wrote:

Yeah, in that case it is a little better, 2-3x times speedup.
Also I realized that recursive calls to tableExpand_cpp in here
https://github.com/Rapporter/pander/pull/105/files#diff-7df21640d7aa551aa54c241717a9419dR60
can be efficiently parallelized. I was thinking on doing that before
merging the pull request. What do you think?

—
Reply to this email directly or view it on GitHub
#105 (comment).

RomanTsegelskyi added 5 commits July 17, 2014 21:11

First version of tableExpand_cpp. Before testing and refactoring

916d68b

Working version of tableExpand_cpp. Before refactoring

7d274de

Working, stable and refactored implementation of tableExpand_cpp

aae6b3e

Removed comments

4d92353

Added RcppExports

d998894

daroczig merged commit d998894 into Rapporter:master Jul 22, 2014

RomanTsegelskyi deleted the table.expand branch July 22, 2014 19:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Table.expand #105

Table.expand #105

RomanTsegelskyi commented Jul 21, 2014

daroczig commented Jul 21, 2014

RomanTsegelskyi commented Jul 22, 2014

daroczig commented Jul 22, 2014

Table.expand #105

Table.expand #105

Conversation

RomanTsegelskyi commented Jul 21, 2014

daroczig commented Jul 21, 2014

RomanTsegelskyi commented Jul 22, 2014

daroczig commented Jul 22, 2014