Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-8410: Migrating stateful operators to new Processor API #10507

Closed
wants to merge 42 commits into from

Conversation

jeqo
Copy link
Contributor

@jeqo jeqo commented Apr 8, 2021

Continuation of #10381. Migration of Kafka Streams stateful operators (KTable, KStream aggregations, joins).

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@jeqo jeqo changed the title KAFKA-8410: Migrating KStream/KTable aggregations and joins to new Processor API KAFKA-8410: Migrating stateful operators to new Processor API Apr 12, 2021
@jeqo jeqo marked this pull request as ready for review April 12, 2021 16:26
@vvcephei vvcephei self-requested a review April 27, 2021 15:32
Copy link
Contributor

@vvcephei vvcephei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry it took so long for me to pick this up. Just a quick first batch of review comments.

* calling {@code forward()} (and some other methods) would result in a runtime exception.
*
* @param context processor context
* @param record record that failed deserialization
* @param exception the actual exception
*/
DeserializationHandlerResponse handle(final ProcessorContext context,
DeserializationHandlerResponse handle(final ProcessorContext<?, ?> context,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, man. I overlooked this in the KIP, and we can't just change this in-place, as it will break any subclasses.

What we need to do is deprecate this method and introduce a new one with a default implementation that calls back here. We can update the KIP with this change, since it's a simple oversight and follows established patterns for migrating interfaces.

Comment on lines -29 to 33
import org.apache.kafka.streams.processor.ProcessorContext;
import org.apache.kafka.streams.processor.ProcessorSupplier;
import org.apache.kafka.streams.processor.StreamPartitioner;
import org.apache.kafka.streams.processor.TopicNameExtractor;
import org.apache.kafka.streams.processor.api.ProcessorContext;
import org.apache.kafka.streams.processor.api.ProcessorSupplier;
import org.apache.kafka.streams.state.KeyValueStore;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @jeqo , based on my understanding of the KIP, nothing in this interface should have changed. Was this changeset intentional?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I skimmed over this interface and changes are in line breaks of comments and renaming of type parameters. In the interest of good reviews, I would not do those changes in this PR but rather open a separate PR for this interface. However, I might have missed an important part. @jeqo Could you clarify?

Regarding the comments, we usually add a break after each sentence.

@@ -16,6 +16,7 @@
*/
package org.apache.kafka.streams.kstream;

import java.util.function.Function;
import org.apache.kafka.common.serialization.Serde;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the same question here: do we need any changes to this interface?

Comment on lines -26 to -28
* @param <V1> first value type
* @param <V2> second value type
* @param <VR> joined value type
Copy link
Contributor

@vvcephei vvcephei May 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here: it doesn't seem strictly necessary to rename the generic parameters as part of this PR.

Specifically, funny story: these params used to be called V and V1, and we renamed them to V1 and V2 because we thought it made more sense :)

vvcephei added a commit that referenced this pull request May 28, 2021
* Lay the groundwork for migrating KTable Processors to the new PAPI.
* Migrate the KTableFilter processor to prove that the groundwork works.

This is an effort to help break up #10507 into multiple PRs.

Reviewers: Boyang Chen <boyang@apache.org>
@vvcephei
Copy link
Contributor

Hey @jeqo , now that #10744 is merged, what do you think about just closing this PR and breaking it up into a series of PRs that migrate one or two processors at a time?

Copy link
Contributor

@cadonna cadonna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeqo I started to review the PR but haven't finished yet.

Could you please rebase the PR because it has some conflicts?

I think you should undo the changes to KStream. AFAIS they are not required for this PR and pollute the PR a lot.

Please look carefully at my comments in CogroupedStreamAggregateBuilder. I am not sure if I missing something there or if there is a bug.

@@ -32,7 +32,7 @@
private static final Logger log = LoggerFactory.getLogger(LogAndContinueExceptionHandler.class);

@Override
public DeserializationHandlerResponse handle(final ProcessorContext context,
public DeserializationHandlerResponse handle(final ProcessorContext<?, ?> context,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to deprecate also this method and add a new one? Technically, it is a class of the public API that can be extended.

@@ -32,7 +32,7 @@
private static final Logger log = LoggerFactory.getLogger(LogAndFailExceptionHandler.class);

@Override
public DeserializationHandlerResponse handle(final ProcessorContext context,
public DeserializationHandlerResponse handle(final ProcessorContext<?, ?> context,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to deprecate also this method and add a new one? Technically, it is a class of the public API that can be extended.

Comment on lines -29 to 33
import org.apache.kafka.streams.processor.ProcessorContext;
import org.apache.kafka.streams.processor.ProcessorSupplier;
import org.apache.kafka.streams.processor.StreamPartitioner;
import org.apache.kafka.streams.processor.TopicNameExtractor;
import org.apache.kafka.streams.processor.api.ProcessorContext;
import org.apache.kafka.streams.processor.api.ProcessorSupplier;
import org.apache.kafka.streams.state.KeyValueStore;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I skimmed over this interface and changes are in line breaks of comments and renaming of type parameters. In the interest of good reviews, I would not do those changes in this PR but rather open a separate PR for this interface. However, I might have missed an important part. @jeqo Could you clarify?

Regarding the comments, we usually add a break after each sentence.

Comment on lines +19 to +22
import java.util.Collection;
import java.util.HashSet;
import java.util.Objects;
import java.util.Set;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In KAFKA-10787 we agreed on an import order kafka, org.apache.kafka, com, net, org, java, javax and static imports. Additionally, there should be a empty line between import blocks.

Note, PR #10428 introduces check and a formatter for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeqo Yes, but don't feel so burdened - I am ready to expand the formatter to streams module as soon as the PR (currently formats core module only) is merged. 😉 @cadonna

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cadonna The sooner you merge the PR, I can start to apply the formatter to the streams module sooner. 😃

boolean stateCreated = false;
int counter = 0;
for (final Entry<KGroupedStreamImpl<K, ?>, Aggregator<? super K, Object, VOut>> kGroupedStream : groupPatterns.entrySet()) {
final KStreamAggProcessorSupplier<K, K, ?, ?> parentProcessor =
final KStreamAggregateProcessorSupplier<K, K, ?, ?> parentProcessor =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be KStreamAggregateProcessorSupplier<K, ?, K, ?>? The positions of the parameters KOut and VIn on KStreamAggregateProcessorSupplier changed with respect to KStreamAggProcessorSupplier.

boolean stateCreated = false;
int counter = 0;
for (final Entry<KGroupedStreamImpl<K, ?>, Aggregator<? super K, Object, VOut>> kGroupedStream : groupPatterns.entrySet()) {
final KStreamAggProcessorSupplier<K, K, ?, ?> parentProcessor =
(KStreamAggProcessorSupplier<K, K, ?, ?>) new KStreamWindowAggregate<K, K, VOut, W>(
final KStreamWindowAggregate<K, K, VOut, W> parentProcessor =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be KStreamWindowAggregate<K, VOut, K, W>? Here I am not sure if I am missing something since the type parameter positions did not change. Why is the type parameter for V in KStreamWindowAggregate K and not ??

Comment on lines +19 to +20
import java.util.Objects;
import java.util.Set;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment above about import order.

Comment on lines +185 to +187
new KStreamAggregate<>(materializedInternal.storeName(),
aggregateBuilder.countInitializer,
aggregateBuilder.countAggregator),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
new KStreamAggregate<>(materializedInternal.storeName(),
aggregateBuilder.countInitializer,
aggregateBuilder.countAggregator),
new KStreamAggregate<>(
materializedInternal.storeName(),
aggregateBuilder.countInitializer,
aggregateBuilder.countAggregator
),

or

Suggested change
new KStreamAggregate<>(materializedInternal.storeName(),
aggregateBuilder.countInitializer,
aggregateBuilder.countAggregator),
new KStreamAggregate<>(
materializedInternal.storeName(),
aggregateBuilder.countInitializer,
aggregateBuilder.countAggregator),

@vvcephei vvcephei mentioned this pull request Jun 11, 2021
3 tasks
@jeqo jeqo closed this Oct 19, 2021
@jeqo jeqo deleted the new-processor-kstream-2 branch October 19, 2021 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants