Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisit allow.cartesian #1123

Closed
arunsrinivasan opened this issue Apr 21, 2015 · 1 comment
Closed

Revisit allow.cartesian #1123

arunsrinivasan opened this issue Apr 21, 2015 · 1 comment

Comments

@arunsrinivasan
Copy link
Member

Just came across some inconsistencies in allow.cartesian:

require(data.table) # v1.9.5, commit 1813
x = data.table(a=rep(1:2, each=2), b=10, key="a")
#    a  b
#1: 1 10
#2: 1 10
#3: 2 10
#4: 2 10
y = data.table(a=rep(1L, 4), b=5:6, key="a")
#    a b
#1: 1 5
#2: 1 6
#3: 1 5
#4: 1 6

y[x]
# Error in vecseq(f__, len__, if (allow.cartesian || notjoin || !anyDuplicated(f__,  : 
# Join results in 10 rows; more than 8 = nrow(x)+nrow(i). Check for duplicate key values in i ...

y[x, nomatch=0L]
#    a b i.b
#1: 1 5  10
#2: 1 6  10
#3: 1 5  10
#4: 1 6  10
#5: 1 5  10
#6: 1 6  10
#7: 1 5  10
#8: 1 6  10

?data.table explains allow.cartesian as:

FALSE prevents joins that would result in more than max(nrow(x),nrow(i)) rows.

Both joins results in more than max(nrow(x), nrow(i)) rows.. nomatch=NA results in 10, and nomatch=0L results in 8. So why is the second one working fine? And why is the error message mentioning about join rows being larger than nrow(x) + nrow(i)??

Additionally, if we are to rename allow.cartesian as allow.i.dups (#914), then the error should occur irrespective of the number of rows, and only depending on whether i has duplicates on it's key columns.

@jangorecki
Copy link
Member

btw. self-joins allow cartesian

x[x]
#    a  b i.b
# 1: 1 10  10
# 2: 1 10  10
# 3: 1 10  10
# 4: 1 10  10
# 5: 2 10  10
# 6: 2 10  10
# 7: 2 10  10
# 8: 2 10  10

@arunsrinivasan arunsrinivasan added this to the v1.9.6 milestone Sep 17, 2015
@arunsrinivasan arunsrinivasan self-assigned this Sep 17, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants