Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Table.expand #105

Merged
merged 5 commits into from Jul 22, 2014
Merged

Table.expand #105

merged 5 commits into from Jul 22, 2014

Conversation

RomanTsegelskyi
Copy link
Contributor

I have implemented table.expand in Rcpp. I did a lot of comparative testing with original table.expand, everything seems to be correct and work fine.

As for benchmarks, it is kind of controversial. General speed up of pandoc.table.return is between 1.5 - 5x. For example

  1. Small data.frame
> b.old.simple
Unit: milliseconds
                                                         expr      min       lq   median       uq      max neval
 pander(data.frame(a = "foo\\nbar"), keep.line.breaks = TRUE) 5.501682 6.224382 7.290573 8.382554 12.23485   100
> b.new.simple
Unit: milliseconds
                                                         expr     min       lq   median       uq      max neval
 pander(data.frame(a = "foo\\nbar"), keep.line.breaks = TRUE) 2.30613 2.587211 2.924714 3.806796 5.605237   100
  1. mtcars
> b.old.mtcars
Unit: milliseconds
                             expr      min       lq   median       uq      max neval
 pander(mtcars, style = "simple") 192.7679 201.4737 206.0359 210.8648 282.0304   100
> b.new.mtcars
Unit: milliseconds
                             expr     min       lq   median       uq      max neval
 pander(mtcars, style = "simple") 66.4057 73.86317 76.89688 83.96956 119.5512   100
  1. volcano
> b.old.volcano
Unit: seconds
            expr      min       lq   median       uq      max neval
 pander(volcano) 4.408776 4.415709 4.479123 4.513334 4.536083    10
> b.new.volcano
Unit: seconds
            expr      min       lq   median       uq      max neval
 pander(volcano) 3.502687 3.545587 3.583752 3.634691 3.850749    10

So for really big datasets (volcano is one of the biggest in datasets package) speed it is getting smaller, but for other ones it is better.
One reason I am not fully satisfied with results of pandoc.table speedup is that table.expand speedup is much better (10-25x).

> b1
Unit: microseconds
                                                              expr     min      lq   median      uq      max neval
           table.expand(argv[, 1], argv[, 2], argv[, 3], sep.cols) 711.426 743.061 810.3655 863.430 2219.897   100
 tableExpand_cpp(argv[, 1], argv[, 2], argv[, 3], sep.cols, style)  57.685  64.676  84.1530 100.071  476.688   100
> b2
Unit: microseconds
                                                              expr      min        lq    median       uq      max neval
           table.expand(argv[, 1], argv[, 2], argv[, 3], sep.cols) 1929.499 1989.7145 2066.2760 2599.152 4317.919   100
 tableExpand_cpp(argv[, 1], argv[, 2], argv[, 3], sep.cols, style)   58.566   65.9905   85.2165   96.538  167.777   100
> b3
Unit: microseconds
                                                              expr      min       lq    median       uq      max neval
           table.expand(argv[, 1], argv[, 2], argv[, 3], sep.cols) 1366.966 1405.690 1455.6460 1502.291 3627.009   100
 tableExpand_cpp(argv[, 1], argv[, 2], argv[, 3], sep.cols, style)   55.687   67.638   87.1345   96.651  190.005   100
> b4
Unit: microseconds
                                                              expr      min        lq    median       uq      max neval
           table.expand(argv[, 1], argv[, 2], argv[, 3], sep.cols) 1326.334 1365.1385 1384.4160 1412.322 3128.808   100
 tableExpand_cpp(argv[, 1], argv[, 2], argv[, 3], sep.cols, style)   54.686   65.6485   78.1735   91.113  134.672   100
> b5
Unit: microseconds
                                                              expr      min        lq   median       uq      max neval
           table.expand(argv[, 1], argv[, 2], argv[, 3], sep.cols) 1326.983 1378.6020 1395.999 1414.224 3168.186   100
 tableExpand_cpp(argv[, 1], argv[, 2], argv[, 3], sep.cols, style)   55.294   68.7475   87.765   92.075  164.647   100

So it seems to me that there is still good room for improvement and that for big tables there are other bottlenecks.
Also I have realized that there is no S3 method for microbenchmark in pander, so I am planning to add that also.

@daroczig
Copy link
Member

Cool! The "slowdown" on larger datasets are probably due to the recursive call -- probably running the tests after panderOptions('table.split.table', Inf) the results would be a lot better for even larger datasets, nope?

And supporting microbenchmark is a great idea indeed, thanks for that! Can you please also implement function if you'd have some time? Probably just printing the function body in Verbatim code blocks.

@RomanTsegelskyi
Copy link
Contributor Author

Yeah, in that case it is a little better, 2-3x times speedup.
Also I realized that recursive calls to tableExpand_cpp in here can be efficiently parallelized. I was thinking on doing that before merging the pull request. What do you think?

@daroczig
Copy link
Member

That would be awesome of course, but I'm afraid we are running out of time,
so I'd suggest to finish that 2 new methods (microbenchmark + function),
and proceed with tests and documentation. That latter requires serious
updates.

Sent from my phone with an annoying input interface. Please excuse my
brevity.
On Jul 22, 2014 4:58 PM, "Roman Tsegelskyi" notifications@github.com
wrote:

Yeah, in that case it is a little better, 2-3x times speedup.
Also I realized that recursive calls to tableExpand_cpp in here
https://github.com/Rapporter/pander/pull/105/files#diff-7df21640d7aa551aa54c241717a9419dR60
can be efficiently parallelized. I was thinking on doing that before
merging the pull request. What do you think?


Reply to this email directly or view it on GitHub
#105 (comment).

@daroczig daroczig merged commit d998894 into Rapporter:master Jul 22, 2014
@RomanTsegelskyi RomanTsegelskyi deleted the table.expand branch July 22, 2014 19:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants