-
Notifications
You must be signed in to change notification settings - Fork 13.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-6936: Implicit Materialized for aggregates #5066
Conversation
Regarding the following ..
For the |
@debasishg How can you propagate the Serde if the resulting type of the |
@debasishg We are removing the deprecated APIs in Java that only takes a serde as parameters, so now aggregate operators only have two overloaded variant: with and without a @joan38 Currently there is no semantics difference between the two overloaded functions with and without the |
TimeWindowedKStream => TimeWindowedKStreamJ, | ||
KGroupedTable => KGroupedTableJ, _} | ||
|
||
import org.apache.kafka.streams.kstream.{KGroupedStream => KGroupedStreamJ, KGroupedTable => KGroupedTableJ, KStream => KStreamJ, KTable => KTableJ, SessionWindowedKStream => SessionWindowedKStreamJ, TimeWindowedKStream => TimeWindowedKStreamJ, _} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it intentional to put them in a single line? Multi-lines are more human readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this change has been introduced by my IDE and doesn't make sense. I will revert it.
c90332e
to
1cd7718
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a meta comment: to make aggregate
reduce
and count
API consistency, I'd still prefer either maintain two overloads for each, or one for each, instead of two for reduce
and count
and one for aggregate
. Personally I'd prefer two for each as it will benefit in the future to optimize away physical materialization, but I'm not sure if we can disambiguate aggregate(..)(implicit Materialized)
with aggregate(..)
, is it possible?
On the other we could consider making reduce
and count
also only having one overload function. E.g. in KGroupedTable
:
def count(implicit materialized: Materialized[K, Long, ByteArrayKeyValueStore]): KTable[K, Long]
def reduce(adder: (V, V) => V)(subtractor: (V, V) => V)(implicit materialized: Materialized[K, V, ByteArrayKeyValueStore]): KTable[K, V]
Note for reduce
I used the same pattern in the other PR trying to leverage on type inference, but again I'm not 100% sure if it would work.
@debasishg @joan38 wdyt?
@guozhangwang I agree in principle. I am on a vacation right now and don't have the infrastructure to code. It would be good if @joan38 could verify your concerns. |
@guozhangwang the first way of doing is what I tried, unfortunately it's not able to desambiguate the 2. I think the other way around unifying |
c95686a
to
fa72082
Compare
Should we do |
df38a90
to
7f7be17
Compare
@guozhangwang @debasishg Any news on this? |
Sorry for the late reply. If we cannot desambiguate the two overloaded functions for I'd prefer |
Ok so the first point is done. I will revert back to Thanks |
Note the jenkins failures are relevant:
|
@guozhangwang The errors are fixed now. This should be ready to go I guess? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One nit comment, otherwise LGTM.
.groupBy((k, v) => v) | ||
.count() | ||
.groupBy((_, v) => v) | ||
.count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should call count()
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arf, missed this one. Thanks for picking it
Thanks @guozhangwang |
…ache#5066) In apache#4919 we propagate the SerDes for each of these aggregation operators. As @guozhangwang mentioned in that PR: ``` reduce: inherit the key and value serdes from the parent XXImpl class. count: inherit the key serdes, enforce setting the Serdes.Long() for value serdes. aggregate: inherit the key serdes, do not set for value serdes internally. ``` Although it's all good for reduce and count, it is quiet unsafe to have aggregate without Materialized given. In fact I don't see why we would not give a Materialized for the aggregate since the result type will always be different (otherwise use reduce) and also the value Serde is simply not propagated. This has been discussed previously in a broader PR before but I believe for aggregate we could pass implicitly a Materialized the same way we pass a Joined, just to avoid the stupid case. Then if the user wants to specialize, he can give his own Materialized. Reviewers: Debasish Ghosh <dghosh@acm.org>, Guozhang Wang <guozhang@confluent.io>
In #4919 we propagate the SerDes for each of these aggregation operators.
As @guozhangwang mentioned in that PR:
Although it's all good for
reduce
andcount
, it is quiet unsafe to haveaggregate
withoutMaterialized
given. In fact I don't see why we would not give aMaterialized
for theaggregate
since the result type will always be different (otherwise usereduce
) and also the value Serde is simply not propagated.This has been discussed previously in a broader PR before but I believe for
aggregate
we could pass implicitly aMaterialized
the same way we pass aJoined
, just to avoid the stupid case. Then if the user wants to specialize, he can give his ownMaterialized
.@guozhangwang @debasishg @seglo Let me know your thoughts.
Committer Checklist (excluded from commit message)