Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
[REVIEW NEEDED] - KAFKA-3705 Added a foreignKeyJoin implementation for KTable. #5527
Foreign Key Join:
Allows for a KTable to map its value to a given foreign key and join on another KTable keyed on that foreign key. Applies the joiner, then returns the tuples keyed on the original key. This supports updates from both sides of the join.
The intent of this design was to build a totally encapsulated function that operates very similarly to the regular join function. No further work is required by the user to obtain their foreignKeyJoin results after calling the function. That being said, there is increased cost in some of the topology components, especially due to resolving out-of-order arrival due to foreign key changes. I would appreciate any and all feedback on this approach, as my understanding of the Kafka Streams DSL is to provide higher level functionality without requiring the users to know exactly what's going on under the hood.
Some points of note:
Testing is covered by a two integration tests that exercises the foreign key join.
important The second test (KTableKTableForeignKeyInnerJoinMultiIntegrationTest) attempts to join using foreign key twice. This results in a NullPointerException regarding a missing task, and must be resolved before committing this.
5 times, most recently
Oct 26, 2018
Hi All - I'm at a point where I need some feedback on a couple of things:
Feedback is very much appreciated, as this is the first PR I've put up against Kafka and I'm sure I've violated a number of things.
referenced this pull request
Apr 16, 2019
@vvcephei Hi John - thanks for the feedback so far! I haven't had time to attend to this due to some recent personal matters, but I should be able to take a crack at it this upcoming week. I think that (in my mind) the discussion about the topic names has been resolved, so I don't think there are any impediments other than me getting this cleaned up and then rebased to trunk.
@vvcephei - Completed all your feedback so far John. Thanks so much.
Currently, I do not know enough about the underlying mechanisms to get variable-partition counts working (ie: join a KTable with 7 partitions with that of 11 partitions). This is explained above in the April 12th post on KTableKTableForeignKeyInnerJoinMultiIntegrationTest. Multi-partition support could be added in a later revision if we wish to get this in for 2.3.
I will rebase this to trunk and commit that too shortly.
Rebasing to trunk has been considerably longer than I planned. Dealing with the new timestamped data stores has been a bit of a nightmare. Additionally, data which used to be present in the KTableImpl class is no longer available. In the KTableImpl constructor,storebuilder and isQueryable have been replaced by materialized.queryableStoreName(), which means that I do not have the ability to attach my resolver to the original, "this" materialized instance in the case where a queryable name is not set. I will look at ways to resolve this, but I do not anticipate being done before 2.3. I have spent considerable time on it in the past day and it's looking like much more is required.
Hey @bellemare ,
Thanks for the update, and for the rebase work. Yes, the new timestamped stores changed a lot of implementation classes. It's a bummer that it happened to get merged after you forked. I agree, it's unlikely that this be able to get merged by Friday (the feature freeze for 2.3).
But no worries, it just means that we'll have more time to review it, write lots of tests, system tests, work on docs and blogs, etc., before it does get released, which decreases the overall risk of such a big feature.
Once you finish up the rebase, I'll take another pass. I didn't make it all the way through last time, and wound up just commenting on the little things I noticed along the way.
There's an issue with the changing of the KTableImpl API
2.0-trunk (what I had rebased it to last)
The problem is that the oneToMany joiner needs to query the underlying statestore previously represented by
bellemare left a comment
I will go back through it and clean up the comments, the commented-out code, etc. As it stands currently though, it does have a few issues that I have highlighted that I would love to get some specific feedback on.