Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vignettes update, closes #2952 #2954

Merged
merged 5 commits into from Sep 4, 2018
Merged

Vignettes update, closes #2952 #2954

merged 5 commits into from Sep 4, 2018

Conversation

jangorecki
Copy link
Member

closes #2952

@jangorecki jangorecki changed the title Vignettes update Vignettes update, closes #2952 Jun 24, 2018

For convinience data.table automatically builds index on fields you are doing subset data. It will add some overhead to first subset on particular fields but greatly reduce time to query those columns in subsequent runs. When measuring speed best way is to measure index creation and query using index separately. Having such timings it is easy to decide what is the optimal strategy for your use case.
To control usage of index use following options (see `?datatable.optimize` for more details):
For convinience `data.table` automatically builds index on fields you are doing subset data. It will add some overhead to first subset on particular fields but greatly reduce time to query those columns in subsequent runs. When measuring speed best way is to measure index creation and query using index separately. Having such timings it is easy to decide what is the optimal strategy for your use case.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grammar:

  • convenience
  • on fields you are doing subset data -> on fields you use to subset data

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Push to branch please :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some other possible small grammar improvements:

builds index => builds an index
greatly reduce time => greatly reduces time
speed best way => speed, the best way
using index separately => using an index separately

@jangorecki jangorecki added this to the 1.11.6 milestone Jul 4, 2018
@mattdowle
Copy link
Member

This is great detail w.r.t. data.table but high level general benchmarking principles need introducing first like the pitfalls of microbenchmark() on small data, call overhead etc. Follow up issue: #3028.

@codecov
Copy link

codecov bot commented Sep 4, 2018

Codecov Report

Merging #2954 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #2954   +/-   ##
=======================================
  Coverage   90.85%   90.85%           
=======================================
  Files          61       61           
  Lines       11736    11736           
=======================================
  Hits        10663    10663           
  Misses       1073     1073

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b654303...179d154. Read the comment docs.

@mattdowle mattdowle merged commit 5d01041 into master Sep 4, 2018
@mattdowle mattdowle deleted the vignettes-update branch September 4, 2018 01:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update _keys_ and _indices_ vignette for subset with index on multiple fields
4 participants