Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upFixed duplicate column names in `merge()` when by.x in names(y) #2631
Conversation
|
Also added an equivalent warning message to base for cases when duplicate column names are returned. With the patch above, the only way this can happen is if the user supplies identical values to the suffixes argument (e.g. |
| joined = merge(parents, children, by.x="name", by.y="parent") | ||
| test(1877.1, length(names(joined)), length(unique(names(joined)))) | ||
| test(1877.2, merge(parents, children, by.x="name", by.y="parent", suffixes=c("",""), | ||
| warning = "column names 'name', 'sex', 'age' are duplicated in the result"), |
HughParsonage
Feb 17, 2018
Member
Trailing , should be removed.
Trailing , should be removed.
|
RE: failed checks: I can't work out how to write the test that checks for the appropriate warning message - would appreciate advice here since there's no documentation for data.table:::test(). I also can't run the tests on my local machine because the package fails to compile (I don't have OpenMP). |
|
Looks good. On |
This test now fails because the output of merge.data.table does not match the output of merge.data.frame because base::merge.data.frame still leads to duplicate column names where by.x is in names(y).
Codecov Report
@@ Coverage Diff @@
## master #2631 +/- ##
==========================================
+ Coverage 93.13% 93.14% +<.01%
==========================================
Files 61 61
Lines 12120 12130 +10
==========================================
+ Hits 11288 11298 +10
Misses 832 832
Continue to review full report at Codecov.
|
|
Thanks @mattdowle, I couldn't get the package to build using RStudio package build system - but I probably have something misconfigured in my environment. I was able to get the tests to work with a bit of trial and error. |
|
I added a little bit to the Contributing wiki to make it a bit easier to understand how the testing regime works. Thanks for the PR. |
|
I love it, although it breaks consistency with |
|
@sritchie73 I assume the Rstudio issue is the same as #2585. I've been using the command line to re-install lately |
|
Thanks @MarkusBonsch , for what its worth I've also sent a similar patch to the R-devel mailing list for merge.data.frame - but the two people who responded (neither core dev team) were pessimistic about my chances of getting the patch accepted. @MichaelChirico I get a different error:
In the end I managed to get data.table:::test() working almost the same way in my local R session, by copying the file preamble, then fixed the remaining issues that came up in Travis after pushing the changes out to my branch. |
|
If this PR makes merge.data.table to be less consistent to data.frame method then please describe that in manual. You can link your R-devel patch there also so we (anyone) can track it later on if it happens to be merged to R-devel. |
|
Here is the thread on R-devel: http://r.789695.n4.nabble.com/Duplicate-column-names-created-by-base-merge-when-by-x-has-the-same-name-as-a-column-in-y-td4748345.html There is now a suggestion of just adding the suffix to the column name in y to keep backwards compatibility (i.e. any by.x column can still be referred to by its original name). If that patch is accepted I can similarly update merge.data.table to have this behaviour also. |
|
LGTM. You sometimes use |
|
Thanks @jangorecki - I had used |
|
Lets wait for R-devel, paste is not that important |
|
Given Martin Maechler's reply today, it's looking likely to be accepted for R. Wow! Well navigated! I updated manual page accordingly (including link to thread too as Jan suggested) and will merge. |
|
Thanks! I am following up with Martin to clarify the functionality of the proposed |
When joining two
data.tablesusingmerge()the resultingdata.tablewill contain duplicate column names ifby.x != by.yandby.xis also innames(y).An example:
Output:
This behaviour is also present in
base:::merge.data.frame(), but throws an additional warning:This patch fixes this problem by checking for names shared between
by.xandnames(y), and adding the appropriatesuffixesto those column names.