New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vignettes update, closes #2952 #2954
Conversation
vignettes/datatable-benchmarking.Rmd
Outdated
|
||
For convinience data.table automatically builds index on fields you are doing subset data. It will add some overhead to first subset on particular fields but greatly reduce time to query those columns in subsequent runs. When measuring speed best way is to measure index creation and query using index separately. Having such timings it is easy to decide what is the optimal strategy for your use case. | ||
To control usage of index use following options (see `?datatable.optimize` for more details): | ||
For convinience `data.table` automatically builds index on fields you are doing subset data. It will add some overhead to first subset on particular fields but greatly reduce time to query those columns in subsequent runs. When measuring speed best way is to measure index creation and query using index separately. Having such timings it is easy to decide what is the optimal strategy for your use case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
grammar:
- convenience
on fields you are doing subset data-> on fields you use to subset data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Push to branch please :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some other possible small grammar improvements:
builds index => builds an index
greatly reduce time => greatly reduces time
speed best way => speed, the best way
using index separately => using an index separately
This is great detail w.r.t. data.table but high level general benchmarking principles need introducing first like the pitfalls of |
Codecov Report
@@ Coverage Diff @@
## master #2954 +/- ##
=======================================
Coverage 90.85% 90.85%
=======================================
Files 61 61
Lines 11736 11736
=======================================
Hits 10663 10663
Misses 1073 1073 Continue to review full report at Codecov.
|
closes #2952