Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upHow is order determined in the result of a non-equi join? #1991
Comments
|
I cannot explain the logic, but the pattern is |
|
Update SO post https://stackoverflow.com/a/47148117/559784 |
This question was originally posted on SO. Arun asked me to file an issue, so here we go. Not sure about the ethics on
data.tablegithub, if a 'SO link only' is OK? Anyway, here's the question, verbatim from SO.I'm trying to understand the underlying logic of how the result of a non-equi join in
data.tableis ordered within each level of theon-variable.Just to make it clear from the start: I have no problem with the order itself, or to order the output in a desired way after the join. However, because I find the output from all other
data.tableoperations highly consistent, I suspect there is a ordering pattern to be revealed in non-equi joins as well.I will give two examples, where two different 'large' data sets are joined with a smaller. I have tried to describe the most obvious patterns in the output, as well as instances where the pattern differs between the joins of the two data sets.
Non-equi join between the first large data set and the small,
on = .(y >= val).The second 'large' data set:
Same non-equi join between the second large data set with the small:
Can anyone explain the logic of (1) the order within each level of the
on-variable, here especially within the second match, where original order of the data isn't kept in the result. And (2) why does the order between chunks within matches differ when the two different data sets are used?An even smaller example which makes it easier to track the re-ordering spotted by @franknarf1; in the result of the join,
xis sorted by its join variable: