Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Created CJ.dt - for cross joining two data.tables #814

Closed
wants to merge 1 commit into from
Closed

Created CJ.dt - for cross joining two data.tables #814

wants to merge 1 commit into from

Conversation

stephlocke
Copy link

Done as separate code on the assumption that it can later be deprecated when better code is integrated into CJ to handle two data.tables

Was initially going to be a helper function in optiRum but getting it into data.table is much more preferable. I've not included any unit tests as the difference is significant but please see these testthat unit tests for it I've written for the function

Done as separate code on the assumption that it can later be deprecated when better code is integrated into CJ to handle two data.tables
@eantonya
Copy link
Contributor

The sorting of your function does not match that of CJ itself.

On Tue, Sep 16, 2014 at 2:50 PM, stephlocke notifications@github.com
wrote:

Done as separate code on the assumption that it can later be deprecated
when better code is integrated into CJ to handle two data.tables

Was initially going to be a helper function in optiRum but getting it into
data.table is much more preferable. I've not included any unit tests as the
difference is significant but please see these testthat unit tests for it
https://github.com/stephlocke/optiRum/blob/master/tests/testthat/test-CJ.dt.R

I've written for the function

You can merge this Pull Request by running

git pull https://github.com/stephlocke/data.table master

Or view, comment on, or merge it at:

#814
Commit Summary

  • Create CJ.dt

File Changes

Patch Links:


Reply to this email directly or view it on GitHub
#814.

@eantonya
Copy link
Contributor

Here's a suggestion for how to do it using CJ, that can take an arbitrary
number of data.tables or vectors:

CJ.dt = function(...) {
rows = do.call(CJ, lapply(list(...), function(x) if(is.data.frame(x)) seq_len(nrow(x)) else seq_along(x)));
do.call(data.table, Map(function(x, y) x[y], list(...), rows))
}

(the subset part above can almost certainly be optimized)

On Tue, Sep 16, 2014 at 3:09 PM, Eduard Antonyan eduard.antonyan@gmail.com
wrote:

The sorting of your function does not match that of CJ itself.

On Tue, Sep 16, 2014 at 2:50 PM, stephlocke notifications@github.com
wrote:

Done as separate code on the assumption that it can later be deprecated
when better code is integrated into CJ to handle two data.tables

Was initially going to be a helper function in optiRum but getting it
into data.table is much more preferable. I've not included any unit tests
as the difference is significant but please see these testthat unit
tests for it
https://github.com/stephlocke/optiRum/blob/master/tests/testthat/test-CJ.dt.R

I've written for the function

You can merge this Pull Request by running

git pull https://github.com/stephlocke/data.table master

Or view, comment on, or merge it at:

#814
Commit Summary

  • Create CJ.dt

File Changes

Patch Links:


Reply to this email directly or view it on GitHub
#814.

@stephlocke
Copy link
Author

You're right it doesn't, it does not do any sorting and leaves data in the order of that the data.tables are already in.

Totally appreciate that the behaviour of CJ is to apply sorted=TRUE for backwards compatibility - it could do some sort of default sort, or the sorted argument could be made optional but with a default if/when CJ.dt was integrated?

@stephlocke
Copy link
Author

Just saw your suggestion - yes, making a more generic function is a much better idea 👍

@skanskan
Copy link

R documentation speaks about CJ.dt
https://www.rdocumentation.org/packages/optiRum/versions/0.40.1/topics/CJ.dt

But when I try o use it I get the error

Error in CJ.dt(z1, z2) : could not find function "CJ.dt"

@MichaelChirico
Copy link
Member

@skanskan I guess this is filed on the wrong issue tracker?

https://github.com/lockedata/optiRum/issues

I see CJ.dt in the NAMESPACE so maybe install from dev?

@stephlocke
Copy link
Author

Yes - I implemented this in optiRum initially and look forward to it being deprecated one day :)

@skanskan
Copy link

Anyway I'm getting the result I want with
myDT[,myvector, by=c("all columns")]
It works well if my vector is constant.

@MichaelChirico
Copy link
Member

Hmm I guess this is #1717

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants