New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Full sorting merge join, pt 1 #35796
Conversation
2710e9e
to
d6452fa
Compare
2f92fc9
to
70a6f1a
Compare
58bc152
to
5fd0d4f
Compare
5fd0d4f
to
d8d3347
Compare
Still need to investigate UPD: seems that server just hadn't enough time to shut down |
This PR contains bunch of changed and added tests and lots of them are |
130d235
to
96d67e0
Compare
96d67e0
to
f6d3300
Compare
bool type_equals | ||
= table_join->hasUsing() ? left_type->equals(*right_type) : removeNullable(left_type)->equals(*removeNullable(right_type)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, does it mean we can't join like UInt32 and UInt64 columns?
Maybe we need to find the least common type (at least for numeric columns)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We add conversion on previous steps, so UInt32
and UInt64
would work. Here is just to check once more that type conversion were added successfully
} | ||
|
||
/// Used just to get result header | ||
void joinBlock(Block & block, std::shared_ptr<ExtraBlock> & /* not_processed */) override |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably we need to update IJoin interface a little bit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(later)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To have something like IJoin::transformHeader
?
@@ -43,7 +64,8 @@ class IJoin | |||
|
|||
/// StorageJoin/Dictionary is already filled. No need to call addJoinedBlock. | |||
/// Different query plan is used for such joins. | |||
virtual bool isFilled() const { return false; } | |||
virtual bool isFilled() const { return pipelineType() == JoinPipelineType::FilledRight; } | |||
virtual JoinPipelineType pipelineType() const { return JoinPipelineType::FillRightFirst; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like JoinPipelineType
enum, but could you add a comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's some comment at enum class JoinPipelineType
declaration, but I've added short info here as well https://github.com/vdimir/ClickHouse/blob/c262d4d2c7c3551d49d1b8d4f324e4a3a550698f/src/Interpreters/IJoin.h#L20-L39
@@ -105,6 +115,14 @@ class QueryPipelineBuilder | |||
bool keep_left_read_in_order, | |||
Processors * collected_processors = nullptr); | |||
|
|||
static std::unique_ptr<QueryPipelineBuilder> joinPipelines2( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a better name here :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to QueryPipelineBuilder::joinPipelinesYShaped
and QueryPipelineBuilder::joinPipelinesRightLeft
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM mostly
f6d3300
to
c262d4d
Compare
95e806a
to
b27d3ec
Compare
@KochetovNicolai @yakov-olkhovskiy let's merge? |
Hi, I have a question about how this is supposed to work. I posted an issue here: #39542 |
if (!cursors[1]->cursor.isValid() && !cursors[1]->fullyCompleted()) | ||
return Status(1); | ||
|
||
if (auto result = handleAllJoinState()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just I am curious, why call handleAllJoinState()
method here instead of calling it in allJoin()
, and calling handleAnyJoinState
is located in anyJoin()
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest I don't remember exactly. Perhaps handle*State
functions rely on the code below (/// check if blocks are not intersecting at all
) in a different ways. Or it's also possible that there's no particular reason, need to dive into the code to understand.
{ | ||
assert(state == nullptr); | ||
state = std::make_unique<AllJoinState>(left_cursor.cursor, lpos, right_cursor.cursor, rpos); | ||
state->addRange(0, left_cursor.getCurrent().clone(), lpos, lnum); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we only copy the necessary rows from left_cursor.getCurrent() chunk and right_cursor.getCurrent() chunk?
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
hash
algorithm.Anti
/Semi
/Asof
joinsinsertFrom/inserdDefault/...
in internal loops (need benchmark)