Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multi-column non-equi join produces invalid results #2360

ethanbsmith opened this issue Sep 15, 2017 · 4 comments


Copy link

@ethanbsmith ethanbsmith commented Sep 15, 2017

I'm trying to do self non-equi join on a date and a numeric column. The single column version of both non-equi joins produce the expected results, but when i try both columns in the on clause, i get some
very funky output.

Sample script to reproduce the problem.

    d <- data.table(index = as.Date(c('2017-08-25', '2017-08-28', '2017-08-29', '2017-08-30', '2017-08-31', '2017-09-01', '2017-09-05', '2017-09-06', '2017-09-07', '2017-09-08', '2017-09-11', '2017-09-12', '2017-09-13')),
        High = c(52.85, 51.81, 51.86, 52.29, 52.59, 52.77, 51.9, 50.77, 50.69, 50.45, 50.67, 51.07, 51.105),
        FwdHi = c(51.81, 51.86, 52.29, 51.9, 50.77, 50.69, 50.45, 50.45, 50.45, 50.67, NA, NA, NA))
    setkey(d, index)
    d[d, .(FirstDate = min(x.index)), on = .(index > index), by = .EACHI]
    d[d, .(FirstDate = min(x.index)), on = .(FwdHi > High), by = .EACHI]
    d[d, .(FirstDate = min(x.index)), on = .(index > index, FwdHi > High), by = .EACHI]

The final line produces an FirstDate = "1970-01-1"


This comment has been minimized.

Copy link

@franknarf1 franknarf1 commented Sep 15, 2017

Quite odd. Also, if we zoom in on that row...

d[d, if (.GRP == 4) x.index, on = .(index > index, FwdHi > High), by = .EACHI]
#         index FwdHi         V1
# 1: 2017-08-30 52.29 1970-01-01
# same result OP sees

d[d[4], x.index, on = .(index > index, FwdHi > High), by = .EACHI]
#         index FwdHi x.index
# 1: 2017-08-30 52.29    <NA>
# correct result

Somehow the inclusion of other groups in the join affects the values in this group. I think it's similar to #2275


This comment has been minimized.

Copy link

@ethanbsmith ethanbsmith commented Sep 17, 2017

I have been doing some work on a workaround for this issues, by chaining d[...][...][...] the output of each condition into a discrete expression of the next. This produces correct output.

My real world case is more complex than the example above and i have been testing various combination of chaining schemes. What is interesting is that when i mix data-types on conditions. the processing is much slower than chaining those same conditions. eg:

d[d, on = .(index > index, FwdHi > High)]
where processing is on both date and numerics, is much slower than:
d[d, on = .(index > index)[FwdHi > High)]

I figured i'd note this, just in case it helps identify the source of the problem

@ethanbsmith ethanbsmith changed the title multi-column non-equi join produces invalid data multi-column non-equi join produces invalid results Sep 18, 2017
@arunsrinivasan arunsrinivasan added this to the v1.10.6 milestone Nov 2, 2017
arunsrinivasan added a commit that referenced this issue Nov 3, 2017
@arunsrinivasan arunsrinivasan added the bug label Nov 3, 2017

This comment has been minimized.

Copy link

@arunsrinivasan arunsrinivasan commented Nov 3, 2017

Thanks for the nice report. Fixed in PR. Should be merged by Matt ASAP.


This comment has been minimized.

Copy link

@mattdowle mattdowle commented Nov 8, 2017

Closed by Arun's PR #2461 (the "closes #2461" needs to appear in the PR's first comment).

@mattdowle mattdowle closed this Nov 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
4 participants
You can’t perform that action at this time.