Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R/add_data_table] cleanup update_cell() #275

Merged
merged 14 commits into from
Jul 16, 2022
Merged

[R/add_data_table] cleanup update_cell() #275

merged 14 commits into from
Jul 16, 2022

Conversation

JanMarvin
Copy link
Owner

Frist attempt to speed up update_cell().

The benchmark below is now ~22s (prev ~26s) on my desktop. Mostly spent in update_cell(). Writing the file initial only took 37 ms.

library(microbenchmark)

# a larger data frame with strings, missings and numbers
xlsxFile <- system.file("extdata", "readTest.xlsx", package = "openxlsx2")
wb1 <- wb_load(xlsxFile)
data <- wb_to_df(wb1, sheet = 3, skipEmptyRows = TRUE)

res <- microbenchmark::microbenchmark(
  {
    options("loop" = "Rcpp")
    wb <- wb_workbook()$add_worksheet()$add_data(x = data)$add_data(x = data)
    # Rcpp <- wb$worksheets[[1]]$sheet_data$cc
  }, 
  {
    options("loop" = "R")
    wb <- wb_workbook()$add_worksheet()$add_data(x = data)$add_data(x = data)
    # R <- wb$worksheets[[1]]$sheet_data$cc
  },
  
  times = 1
); res

# all.equal(Rcpp, R)
> res
Unit: seconds
                                                                                                         expr      min       lq     mean   median       uq      max neval
 {     options(loop = "Rcpp")     wb <- wb_workbook()$add_worksheet()$add_data(x = data)$add_data(x = data) } 21.67788 21.67788 21.67788 21.67788 21.67788 21.67788     1
    {     options(loop = "R")     wb <- wb_workbook()$add_worksheet()$add_data(x = data)$add_data(x = data) } 25.29638 25.29638 25.29638 25.29638 25.29638 25.29638     1

Replacing the loop with Rcpp definitively has benefits, though the entire function is still pretty slow.

@JanMarvin
Copy link
Owner Author

Once I fixed a couple of my initial thinkos (or better, make use of some functions we have developed since), updating time is now acceptably fast. The Rcpp loop brings nice speed improvements too :)

> library(microbenchmark)
> library(openxlsx2)
> 
> # a larger data frame with strings, missings and numbers
> xlsxFile <- system.file("extdata", "readTest.xlsx", package = "openxlsx2")
> wb1 <- wb_load(xlsxFile)
> data <- wb_to_df(wb1, sheet = 3, skipEmptyRows = TRUE)
> 
> res <- microbenchmark::microbenchmark(
+   {
+     options("loop" = "Rcpp")
+     wb <- wb_workbook()$add_worksheet()$add_data(x = data)$add_data(x = data)
+     # Rcpp <- wb$worksheets[[1]]$sheet_data$cc
+   }, 
+   {
+     options("loop" = "R")
+     wb <- wb_workbook()$add_worksheet()$add_data(x = data)$add_data(x = data)
+     # R <- wb$worksheets[[1]]$sheet_data$cc
+   },
+   
+   times = 1
+ ); res
Unit: seconds
                                                                                                         expr      min       lq     mean
 {     options(loop = "Rcpp")     wb <- wb_workbook()$add_worksheet()$add_data(x = data)$add_data(x = data) } 1.837195 1.837195 1.837195
    {     options(loop = "R")     wb <- wb_workbook()$add_worksheet()$add_data(x = data)$add_data(x = data) } 7.250326 7.250326 7.250326
   median       uq      max neval
 1.837195 1.837195 1.837195     1
 7.250326 7.250326 7.250326     1

@JanMarvin JanMarvin changed the title replace slower R loop with faster Rcpp loop [R/add_data_table] cleanup update_cell() Jul 16, 2022
@JanMarvin JanMarvin added this to the v0.3.0 rc2 milestone Jul 16, 2022
@JanMarvin JanMarvin linked an issue Jul 16, 2022 that may be closed by this pull request
@JanMarvin
Copy link
Owner Author

I assume that the failing test is just because of whatever R core was doing with gsub

@JanMarvin JanMarvin merged commit a5e7a53 into main Jul 16, 2022
@JanMarvin JanMarvin deleted the rcpp_update_cell branch July 16, 2022 16:47
@JanMarvin
Copy link
Owner Author

Final benchmark runtime on desktop is ~1.96s. Still a lot longer than creating from scratch, but way better than the 25s from earlier today. Now the entire time is spent in the Rcpp loop (if anyone wants to spend another day improving this).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

update_cell() is slow
1 participant