-
Notifications
You must be signed in to change notification settings - Fork 14k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-5045: KTable cleanup #2832
Conversation
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Test failure unrelated to streams or this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I share the same concern with @dguy that, with this change it seems no matter if user provide the queryable name or not we will always materialize every single KTable with an underlying store. This will kill the performance of Streams. I understand that this may be done in a follow-up PR but would like to discuss how we will change the underlying impl?
* @param prefix processor name prefix | ||
* @return a new unique name | ||
*/ | ||
public String newStoreName(final String prefix) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a JIRA for tracking this change, or any of the existing KIPs include this one?
@@ -47,7 +50,7 @@ | |||
* final KafkaStreams streams = ...; | |||
* streams.start() | |||
* ... | |||
* final String queryableStoreName = table.getStoreName(); // returns null if KTable is not queryable | |||
* final String queryableStoreName = table.queryableStoreName(); // returns null if KTable is not queryable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, right now even if the user does not provide a queryable name in the operator that generates this KTable, we would still return the internal name in this call. Is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, as before. No changes. Just name change (I've added name change to KIP)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will adjust to return null if no queryable name is passed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
* alphanumerics, '.', '_' and '-'. If {@code null} this is the equivalent of {@link KTable#through(String)()} | ||
* @return a {@code KTable} that contains the exact same (and potentially repartitioned) records as this {@code KTable} | ||
*/ | ||
KTable<K, V> through(final String topic, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand adding the store name / store supplier here is to achieve consistency with builder#table
, but I'm wondering what would be the recommended pattern for using through. I.e. with the following topology:
KTable table1 = table0.someOps(..., "name1");
KTable table2 = table1.through("topic", "name2");
What's the difference of querying store name1
and querying store name2
? Would those be the same stores? Shall we in this case save one store in the underlying impl?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'd have two queryable stores if you did that pattern.
We could add this to an optimization JIRA, sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is fine, just curious if we have ever thought about how users would leverage the APIs to determine which stores to query. We can discuss this in a follow-up JIRA.
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
@guozhangwang regarding "queryable name or not we will always materialize every single KTable with an underlying store". It's not true, we only materialize a KTable if a queryable name is provided, otherwise we don't by default (unless it must be materialized, like an aggregation). For example, if no name is provided in @dguy 's concern is different, he points out that we can have |
Refer to this link for build results (access rights to CI server needed): |
return new KTableImpl<>(topology, name, processorSupplier, sourceNodes, this.queryableStoreName); | ||
} | ||
|
||
private KTable<K, V> doQueryableFilter(final Predicate<? super K, ? super V> predicate, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we collapse the code path for having a queryable store name or not into the same function? For example:
filter(.. /*nothing*/) calls filter(.. (String) null);
filter(.. "storeName") calls filter(.. storeSupplier); // if storeName is not null, otherwise pass null as well
filter(.. supplier) do the actual impl, which checks if supplier is null or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made the change, but it's not quite exactly as above since the code that takes the supplier does not accept a null supplier. But it's similar.
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
System tests passed again https://jenkins.confluent.io/job/system-test-kafka-branch-builder-2/287/ & https://jenkins.confluent.io/job/system-test-kafka-branch-builder-2/286/ @guozhangwang . One unit test is unrelated to streams, one is queryOnRebalance timeout again, which we see every now and then and I guess is not fixed yet. |
retest this please |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
kafka.api.ConsumerBounceTest > testConsumptionWithBrokerFailures FAILED |
LGTM! |
Merged to trunk. Thanks @enothereska ! |
This is the implementation of KIP-114: KTable state stores and improved semantics:
In this implementation, state stores are materialized if the user desires them to be queryable. In subsequent versions we can offer a second option, to have a view-like state store. The tradeoff then would be between storage space (materialize) and re-computation (view). That tradeoff can be exploited by later query optimizers.