Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

non-equi join returning incorrect column name? #1807

Closed
SymbolixAU opened this issue Aug 11, 2016 · 6 comments
Closed

non-equi join returning incorrect column name? #1807

SymbolixAU opened this issue Aug 11, 2016 · 6 comments

Comments

@SymbolixAU
Copy link

SymbolixAU commented Aug 11, 2016

When performing a non-equi join (in v1.9.7) on columns with different names, the returned column is given the name of the table on the left of the join, but contains values from the table on the right of the join - is this by design?

## baseline example where the join columns have the same name
library(data.table)

dt1 <- data.table(id = c(1,2,3),
                val = c(1,2,3))

dt2 <- data.table(id = c(1,2,3),
                val = c(2,3,4))

## the returned 'val' column comes from the table inside [ 
## - shown by the val == 4 (as it only exists in dt2)
dt1[ dt2, on = .(val < val), nomatch = 0]
#    id val i.id
#1:  1   2    1
#2:  1   3    2
#3:  2   3    2
#4:  1   4    3
#5:  2   4    3
#6:  3   4    3


## When join columns have different names,
## the returned column name is from the column outside [
dt1 <- data.table(id = c(1,2,3),
                val1 = c(1,2,3))

dt2 <- data.table(id = c(1,2,3),
                val2 = c(2,3,4))

dt1[ dt2, on = .(val1 < val2), nomatch = 0]

#   id val1 i.id
#1:  1    2    1
#2:  1    3    2
#3:  2    3    2
#4:  1    4    3
#5:  2    4    3
#6:  3    4    3
@franknarf1
Copy link
Contributor

franknarf1 commented Aug 11, 2016

the returned column is given the name of the table on the left of the join, but contains values from the table on the right of the join

This matches the behavior for rolling joins, so it's not new with non-equi joins:

library(data.table)
DT <- data.table(id = c(1,2,3), val = c(1,2,3))

DT[.(yada = 4), on=.(id = yada), roll=TRUE]
#     id val
#  1:  4   3

It makes sense if you read x[i] like i selecting a subset of x -- we want x's column names, but we want to use i's values.

Edit: By the way, when names overlap, you can use i.name or x.name. I think this was added recently.

@SymbolixAU
Copy link
Author

SymbolixAU commented Aug 11, 2016

Ah yes, I see that now. That leads to the question as to whether it should do it this way?

I see that from a data.table-way of doing it it makes sense, but from a human-readable way, maybe less so?

@jangorecki
Copy link
Member

This is a matter of being consistent to base R merge, related issue #1615. For base R it was not a problem as there was no non-equi merge. Before allowing x. there was no way to get those column, so at least that is now possible. I would say preferred way is to use x., on eventual change in column naming output (#1700), this one should not be affected.

@arunsrinivasan
Copy link
Member

I see that from a data.table-way of doing it it makes sense, but from a human-readable way, maybe less so?

The introduction of non-equi joins certainly brings the need to revisit this. Perhaps returning both x. and i. cols for non-equi joins (when used without by=.EACHI) would be a start.

@eantonya
Copy link
Contributor

eantonya commented Aug 15, 2016

Unless I'm missing smth, this is same as #1700, and imo non-equi joins just shine a bright light on this issue which has been bothering me since I started using rolling joins. I have to experiment with 2-3 combinations every time when using the rolling joins before I finally get the right one due to the confusing column naming.

@jangorecki
Copy link
Member

agree with @eantonya, closing as duplicate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants