New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New vignette -- Usages of .SD #3572
Conversation
I haven't gone through yet but looks like a comprehensive guide on using
|
Haven't look either yet but just on the png size, removing -g compiler flag saved 1MB recently (.so reduced from 1.5MB to 0.5MB) so the package size is now apx 4MB of 5MB limit. 12KB not an issue (1.2% of remaining). The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
…d warning about package depending on R 3.5+
Codecov Report
@@ Coverage Diff @@
## master #3572 +/- ##
=======================================
Coverage 97.58% 97.58%
=======================================
Files 66 66
Lines 12695 12695
=======================================
Hits 12389 12389
Misses 306 306 Continue to review full report at Codecov.
|
1 similar comment
Codecov Report
@@ Coverage Diff @@
## master #3572 +/- ##
=======================================
Coverage 97.58% 97.58%
=======================================
Files 66 66
Lines 12695 12695
=======================================
Hits 12389 12389
Misses 306 306 Continue to review full report at Codecov.
|
For future reference ... |
why not use csv.gz instead of RData? there is no risk that vignette can be build on newer R only due to format incompatibility? |
|
||
This vignette will explain the most common ways to use the `.SD` variable in your `data.table` analyses. It is an adaptation of [this answer](https://stackoverflow.com/a/47406952/3576984) given on StackOverflow. | ||
|
||
# What is `.SD`? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://rdatatable.gitlab.io/data.table/library/data.table/doc/datatable-sd-usage.html
rendered from R it results into What is <code>.SD</code>?
tab name in browser, maybe better remove code and leave .SD as plaintext
Pitching[ , coef(lm(ERA ~ ., data = .SD))['W'], .SDcols = c('W', rhs)] | ||
}) | ||
barplot(lm_coef, names.arg = sapply(models, paste, collapse = '/'), | ||
main = 'Wins Coefficient\nWiith Various Covariates', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wiith
double i
|
||
## Conditional Joins | ||
|
||
`data.table` syntax is beautiful for its simplicity and robustness. The syntax `x[i]` flexibly handles two common approaches to subsetting -- when `i` is a `logical` vector, `x[i]` will return those rows of `x` corresponding to where `i` is `TRUE`; when `i` is _another `data.table`_, a (right) `join` is performed (in the plain form, using the `key`s of `x` and `i`, otherwise, when `on = ` is specified, using matches of those columns). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is also a case of DT["someid"]
|
||
Note that this approach can of course be combined with `.SDcols` to return only portions of the `data.table` for each `.SD` (with the caveat that `.SDcols` should be fixed across the various subsets) | ||
|
||
_NB_: `.SD[1L]` is currently optimized by [_`GForce`_](https://jangorecki.gitlab.io/data.table/library/data.table/html/datatable-optimize.html) ([see also](https://stackoverflow.com/questions/22137591/about-gforce-in-data-table-1-9-2)), `data.table` internals which massively speed up the most common grouped operations like `sum` or `mean` -- see `?GForce` for more details and keep an eye on/voice support for feature improvement requests for updates on this front: [1](https://github.com/Rdatatable/data.table/issues/735), [2](https://github.com/Rdatatable/data.table/issues/2778), [3](https://github.com/Rdatatable/data.table/issues/523), [4](https://github.com/Rdatatable/data.table/issues/971), [5](https://github.com/Rdatatable/data.table/issues/1197), [6](https://github.com/Rdatatable/data.table/issues/1414) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use Rdatatable namespace instead of jangorecki: https://Rdatatable.gitlab.io/data.table/library/data.table/html/datatable-optimize.html
Closes #3412
Not sure if we should track the png on GH or not