New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid row order after join #1766

Closed
mllg opened this Issue Jul 7, 2016 · 2 comments

Comments

Projects
None yet
3 participants
@mllg
Contributor

mllg commented Jul 7, 2016

MWE:

library(data.table)

A = data.table(i = 1:6, j = rep(1:2, 3), x = letters[1:6], key = "i")
B = data.table(j = 1:2, y = letters[1:2], key = "j")

res = A[B, on = "j"]

res has key i but is unordered:

   i j x y
1: 1 1 a a
2: 3 1 c a
3: 5 1 e a
4: 2 2 b b
5: 4 2 d b
6: 6 2 f b

Setting the key manually (setkeyv(res, "i")) yields a warning.

Session info:

R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Arch Linux

locale:
 [1] LC_CTYPE=de_DE.UTF-8       LC_NUMERIC=C               LC_TIME=de_DE.UTF-8        LC_COLLATE=C               LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=de_DE.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] data.table_1.9.6 nvimcom_0.9-19   setwidth_1.0-4

loaded via a namespace (and not attached):
[1] tools_3.3.1    parallel_3.3.1 chron_2.3-47
@mllg

This comment has been minimized.

Show comment
Hide comment
@mllg

mllg Jul 7, 2016

Contributor

Side note: You can still index the resulting table, but the lookup on i just does not work. E.g.,

res[.(2), nomatch = 0]

returns a data.table with 0 rows.

Contributor

mllg commented Jul 7, 2016

Side note: You can still index the resulting table, but the lookup on i just does not work. E.g.,

res[.(2), nomatch = 0]

returns a data.table with 0 rows.

@jangorecki jangorecki added the bug label Jul 7, 2016

@jangorecki

This comment has been minimized.

Show comment
Hide comment
@jangorecki

jangorecki Jul 7, 2016

Member

Thanks for reporting. Reproducible on 1.9.7. The actual warning:

In setkeyv(x, cols, verbose = verbose, physical = physical) :
  Already keyed by this key but had invalid row order, key rebuilt. If you didn't go under the hood please let datatable-help know so the root cause can be fixed.

Looks like key is not removed during join, it should be removed. Optionally results could have key set on j column, as the key of B.

Member

jangorecki commented Jul 7, 2016

Thanks for reporting. Reproducible on 1.9.7. The actual warning:

In setkeyv(x, cols, verbose = verbose, physical = physical) :
  Already keyed by this key but had invalid row order, key rebuilt. If you didn't go under the hood please let datatable-help know so the root cause can be fixed.

Looks like key is not removed during join, it should be removed. Optionally results could have key set on j column, as the key of B.

@jangorecki jangorecki added this to the v1.9.8 milestone Jul 7, 2016

@arunsrinivasan arunsrinivasan self-assigned this Jul 21, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment