-
Notifications
You must be signed in to change notification settings - Fork 13.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-14209 : Rewrite self joins to use single state store 2/3 #12644
Conversation
final GraphNode parent = joinNode.parentNodes().stream().findFirst().get(); | ||
GraphNode left = null, right = null; | ||
for (final GraphNode child: parent.children()) { | ||
if (child instanceof WindowedStreamProcessorNode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current JoinNode might have other JoinNodes as siblings. We need to differentiate between the WindowedStreamProcessorNode
nodes that belong the current node versus others.
streams/src/main/java/org/apache/kafka/streams/kstream/internals/KStreamKStreamSelfJoin.java
Outdated
Show resolved
Hide resolved
I opened the follow-up ticket for improving runtime by doing a single-loop https://issues.apache.org/jira/browse/KAFKA-14251. |
streams/src/main/java/org/apache/kafka/streams/kstream/internals/KStreamKStreamSelfJoin.java
Show resolved
Hide resolved
b503a10
to
f66d80a
Compare
address comments
f66d80a
to
59ab634
Compare
.../main/java/org/apache/kafka/streams/kstream/internals/graph/WindowedStreamProcessorNode.java
Outdated
Show resolved
Hide resolved
streams/src/main/java/org/apache/kafka/streams/kstream/internals/InternalStreamsBuilder.java
Outdated
Show resolved
Hide resolved
...ams/src/main/java/org/apache/kafka/streams/kstream/internals/graph/StreamStreamJoinNode.java
Show resolved
Hide resolved
@@ -198,7 +207,8 @@ public <K, V1, V2, VOut> KStream<K, VOut> join(final KStream<K, V1> lhs, | |||
.withOtherWindowedStreamProcessorParameters(otherWindowStreamProcessorParams) | |||
.withOuterJoinWindowStoreBuilder(outerJoinWindowStore) | |||
.withValueJoiner(joiner) | |||
.withNodeName(joinMergeName); | |||
.withNodeName(joinMergeName) | |||
.withSelfJoinProcessorParameters(selfJoinProcessorParams); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a strong opinion, since I know we are messing the logical planner with physical info quite badly already.. but I'm wondering if we could defer this in StreamStreamJoinNode#writeToTopology
inside the isSelfJoin
condition? We still have all the pieces we need: 1) left store name, 2) both join window specs. 3) joiner, 4) time tracker, from other params, so that in writeToTopology
we can still generate the KStreamKStreamSelfJoin
object if self-join is enabled.
My motivation is just to not spill more physical node info into logical planning phase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @guozhangwang! You mean to move to StreamStreamJoinNode#writeToTopology
the code about the creation of the self-join node
final KStreamKStreamSelfJoin<K, V1, V2, VOut> selfJoin = new KStreamKStreamSelfJoin<>(
thisWindowStore.name(),
internalWindows,
joiner,
sharedTimeTracker
);
I tried it but I don't think it makes the code clearer. All the components of the StreamStreamJoinNode
are built in the KStreamJoinImpl
and are only visible there because they are not public
. So, I would have to special case the self-join part which fixes the problem of spilling physical information into the logical plan but does it only for a small special case. I am worried this will be confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, thanks a lot for letting me know :)
streams/src/main/java/org/apache/kafka/streams/kstream/internals/KStreamKStreamSelfJoin.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @vpapavas , I do not have further comments on the non-testing part. Made a final pass on the testing part, and just one more question. Otherwise I think it's good to go.
...ams/src/test/java/org/apache/kafka/streams/kstream/internals/KStreamKStreamSelfJoinTest.java
Show resolved
Hide resolved
These test failures are unrelated:
|
…e#12644) Implements KIP-862: https://cwiki.apache.org/confluence/x/WSf1D Reviewers: Guozhang Wang <guozhang@apache.org>, Austin Heyne <aheyne>, John Roesler <vvcephei@apache.org>
…e#12644) Implements KIP-862: https://cwiki.apache.org/confluence/x/WSf1D Reviewers: Guozhang Wang <guozhang@apache.org>, Austin Heyne <aheyne>, John Roesler <vvcephei@apache.org>
…e#12644) (#35) Implements KIP-862: https://cwiki.apache.org/confluence/x/WSf1D Reviewers: Guozhang Wang <guozhang@apache.org>, Austin Heyne <aheyne>, John Roesler <vvcephei@apache.org> Co-authored-by: Vicky Papavasileiou <vpapavas@users.noreply.github.com>
KIP can be found here: https://cwiki.apache.org/confluence/display/KAFKA/KIP-862%3A+Self-join
It only applies to Stream-Stream joins and not n-way self-joins.
This is an inner-join topology (without the optimization)
and this is the optimized self-join topology
Testing: Unit tests (Integration and upgrade test in follow up PR)
Committer Checklist (excluded from commit message)