-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Free speed gains?!? Too good to be true?!?! #21
Comments
RE: You can add arbitrary key-value pairs to Description, here's Hadley:
|
Doing the |
Thanks Aaron. I'm in favor of this solution. Let us know what you find about whether suggests auto-installs. |
data.table should not get auto-installed when a user does
`install.packages("DeclareDesign")`, but *would* get autoinstalled by CRAN
/ CI servers - . we might want an option for testing to turn off this
optimization so that we can make sure that the non-optimized branch(es)
doesn't break.
…On Sat, Oct 28, 2017 at 10:29 AM, Graeme Blair ***@***.***> wrote:
Thanks Aaron. I'm in favor of this solution. Let us know what you find
about whether suggests auto-installs.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#21 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAZjTjXeMPrLvCF1S0GuVOrit-lAf0X0ks5sw2SNgaJpZM4QIRTh>
.
|
Ah, got it. Thanks, Neal. That seems like a good solution. |
I'll be adding an option call (potentially private-only) for us to test the non-data.table code path and then merge the finished code. |
I've been profiling the bootstrap/resampling functionality in fabricatr this week. I have a branch where I've implemented a few incremental speedups, but these are child's play.
One big speed boost we can get is replacing
rbind
withrbindlist
(a function from the data.table package). In benchmarks, with a moderately large data, rbindlist runs about 9x faster than rbind, and the overall resample process runs about 2x faster using rbindlist than rbind. This is a pretty huge gain and I am very much in favour of it.One issue is of course, the "Malawi problem", where we don't want to increase the size of numbers of dependencies for people who are extremely bandwidth constrained. But what if we could trade-off with both, allowing users who have data.table installed to make use of it, while allowing users who don't to be able to use our package without being told to install it.
Consider the following snippet:
requireNamespace will return false if data.table is not installed, so people without the package will get the do.call rbind version. I've benchmarked various ways of zapping the row names, don't worry about that.
If, on the other hand, the user DOES have data.table, then we can call the rbindlist function. I add the class and attr lines so that our function will return a data.frame -- in other words, the returned data will be exactly identical and pass an
identical()
call whether you have the data.table package or not.In terms of how we signal this to users, we modify the docs/vignette. Neal assures me we can arbitrary key/value pairs to the DESCRIPTION file, so we could also add a key/value pair that has no ordinary meaning to let people know (i.e. FasterWith: data.table)
I'll post a full standalone profile script in Slack so you guys can play around with this
Summary:
@graemeblair Suggested I post an issue to make clear my intent here and see if anyone has a strong objection, but I really think this is a solution that's great!
The text was updated successfully, but these errors were encountered: