New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-4923: Add Exactly-Once Semantics to Streams #2945

Closed
wants to merge 11 commits into
base: trunk
from

Conversation

Projects
None yet
8 participants
@mjsax
Member

mjsax commented Apr 30, 2017

No description provided.

@mjsax

This comment has been minimized.

Show comment
Hide comment
@mjsax

mjsax Apr 30, 2017

Member

Call for review @dguy @enothereska @guozhangwang
(it's again one cleanup and one actual commit)

Member

mjsax commented Apr 30, 2017

Call for review @dguy @enothereska @guozhangwang
(it's again one cleanup and one actual commit)

@mjsax

This comment has been minimized.

Show comment
Hide comment
@mjsax

mjsax Apr 30, 2017

Member

We need to add more test of course...

Member

mjsax commented Apr 30, 2017

We need to add more test of course...

@asfbot

This comment has been minimized.

Show comment
Hide comment
@asfbot

asfbot Apr 30, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/3310/
Test FAILed (JDK 8 and Scala 2.11).

asfbot commented Apr 30, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/3310/
Test FAILed (JDK 8 and Scala 2.11).

@asfbot

This comment has been minimized.

Show comment
Hide comment
@asfbot

asfbot Apr 30, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/3301/
Test FAILed (JDK 8 and Scala 2.12).

asfbot commented Apr 30, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/3301/
Test FAILed (JDK 8 and Scala 2.12).

@asfbot

This comment has been minimized.

Show comment
Hide comment
@asfbot

asfbot Apr 30, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/3305/
Test FAILed (JDK 7 and Scala 2.10).

asfbot commented Apr 30, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/3305/
Test FAILed (JDK 7 and Scala 2.10).

@asfbot

This comment has been minimized.

Show comment
Hide comment
@asfbot

asfbot Apr 30, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/3310/
Test FAILed (JDK 7 and Scala 2.10).

asfbot commented Apr 30, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/3310/
Test FAILed (JDK 7 and Scala 2.10).

@asfbot

This comment has been minimized.

Show comment
Hide comment
@asfbot

asfbot Apr 30, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/3315/
Test PASSed (JDK 8 and Scala 2.11).

asfbot commented Apr 30, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/3315/
Test PASSed (JDK 8 and Scala 2.11).

@asfbot

This comment has been minimized.

Show comment
Hide comment
@asfbot

asfbot Apr 30, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/3306/
Test PASSed (JDK 8 and Scala 2.12).

asfbot commented Apr 30, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/3306/
Test PASSed (JDK 8 and Scala 2.12).

@mjsax

This comment has been minimized.

Show comment
Hide comment
Member

mjsax commented May 1, 2017

Show outdated Hide outdated streams/src/main/java/org/apache/kafka/streams/StreamsConfig.java
Show outdated Hide outdated streams/src/main/java/org/apache/kafka/streams/StreamsConfig.java
if (time1 < time2) return -1;
if (time1 > time2) return 1;
public int compare(final RecordQueue queue1, final RecordQueue queue2) {
final long time1 = queue1.timestamp();

This comment has been minimized.

@enothereska

enothereska May 1, 2017

Contributor

I really don't like final for primitive variables used like this. I think we're going overboard with final. It's useful when passed to methods, but in other cases just clutters up code. cc @ijuma @dguy. What are we doing in the rest of the code about this, e.g., client etc? There is a risk we keep overwriting streams code to add or remove final.

@enothereska

enothereska May 1, 2017

Contributor

I really don't like final for primitive variables used like this. I think we're going overboard with final. It's useful when passed to methods, but in other cases just clutters up code. cc @ijuma @dguy. What are we doing in the rest of the code about this, e.g., client etc? There is a risk we keep overwriting streams code to add or remove final.

This comment has been minimized.

@dguy

dguy May 3, 2017

Contributor

It should be used everywhere a local, field, param is immutable. It is better for readability and can help avoid bugs. Granted it adds 5 extra characters to a declaration, but not much can do about that.

@dguy

dguy May 3, 2017

Contributor

It should be used everywhere a local, field, param is immutable. It is better for readability and can help avoid bugs. Granted it adds 5 extra characters to a declaration, but not much can do about that.

This comment has been minimized.

@ijuma

ijuma May 3, 2017

Contributor

Interesting, so you have seen cases where there were bugs because local variables or method parameters were not final? I don't remember that ever happening, so interested in your experience.

Unlike fields, final in method params and local variables make no difference at runtime, don't affect the memory model and are used in a much smaller scope. In Java 8, they even introduced effectively final local variables so that one can use them in lambdas and inner classes without having to add the final keyword. So, the readability point is arguable (and the Java language designers seem to think that the cost is perhaps not worth the benefit).

@ijuma

ijuma May 3, 2017

Contributor

Interesting, so you have seen cases where there were bugs because local variables or method parameters were not final? I don't remember that ever happening, so interested in your experience.

Unlike fields, final in method params and local variables make no difference at runtime, don't affect the memory model and are used in a much smaller scope. In Java 8, they even introduced effectively final local variables so that one can use them in lambdas and inner classes without having to add the final keyword. So, the readability point is arguable (and the Java language designers seem to think that the cost is perhaps not worth the benefit).

This comment has been minimized.

@dguy

dguy May 3, 2017

Contributor

I don't think you can argue with the readability. It is much easier to grok the code if you know that a param or local is final. It tells the reader that it is immutable without having to think any further - how is that not better for readability/understanding of the code?

As for bugs, yes i've seen cases where people have accidentally changed/re-used a variable when they shouldn't have. Params probably not so much, but locals definitely.

@dguy

dguy May 3, 2017

Contributor

I don't think you can argue with the readability. It is much easier to grok the code if you know that a param or local is final. It tells the reader that it is immutable without having to think any further - how is that not better for readability/understanding of the code?

As for bugs, yes i've seen cases where people have accidentally changed/re-used a variable when they shouldn't have. Params probably not so much, but locals definitely.

This comment has been minimized.

@ijuma

ijuma May 3, 2017

Contributor

Of course one can argue! :) Readability is subjective and the sweet spot between the amount of information presented versus conciseness varies from person to person (and it often changes over time for the same person). Verbosity is additional noise and makes it both harder to understand what the code is doing and makes it easier to miss important details.

To give a concrete example, I (and all the other reviewers) missed a bug introduced in a recent Streams PR because a removed line got lost in the noise of other clean-ups.

Finally, final only says that the variable doesn't change. In Java, collections are usually mutable, so mutability is ever present.

@ijuma

ijuma May 3, 2017

Contributor

Of course one can argue! :) Readability is subjective and the sweet spot between the amount of information presented versus conciseness varies from person to person (and it often changes over time for the same person). Verbosity is additional noise and makes it both harder to understand what the code is doing and makes it easier to miss important details.

To give a concrete example, I (and all the other reviewers) missed a bug introduced in a recent Streams PR because a removed line got lost in the noise of other clean-ups.

Finally, final only says that the variable doesn't change. In Java, collections are usually mutable, so mutability is ever present.

This comment has been minimized.

@mjsax

mjsax May 3, 2017

Member

To give a concrete example, I (and all the other reviewers) missed a bug introduced in a recent Streams PR because a removed line got lost in the noise of other clean-ups.

But that's not a problem of adding final -- it's a problem in our process -- We started to separated commits to avoid this in the future.

And for collections, you are right. Where is my C/C++ const keyword? :(
(Btw: you would still use an Unmodifyable collection for this case)

@mjsax

mjsax May 3, 2017

Member

To give a concrete example, I (and all the other reviewers) missed a bug introduced in a recent Streams PR because a removed line got lost in the noise of other clean-ups.

But that's not a problem of adding final -- it's a problem in our process -- We started to separated commits to avoid this in the future.

And for collections, you are right. Where is my C/C++ const keyword? :(
(Btw: you would still use an Unmodifyable collection for this case)

}
/**
* Get the next record and queue
*
* @return StampedRecord
*/
public StampedRecord nextRecord(RecordInfo info) {
StampedRecord nextRecord(final RecordInfo info) {

This comment has been minimized.

@enothereska

enothereska May 1, 2017

Contributor

Good use of final.

@enothereska

enothereska May 1, 2017

Contributor

Good use of final.

int oldSize = recordQueue.size();
int newSize = recordQueue.addRawRecords(rawRecords);
final int oldSize = recordQueue.size();

This comment has been minimized.

@enothereska

enothereska May 1, 2017

Contributor

Not good use of final

@enothereska

enothereska May 1, 2017

Contributor

Not good use of final

Show outdated Hide outdated .../main/java/org/apache/kafka/streams/processor/internals/StandbyTask.java
Show outdated Hide outdated ...main/java/org/apache/kafka/streams/processor/internals/AbstractTask.java
Show outdated Hide outdated ...c/main/java/org/apache/kafka/streams/processor/internals/StreamTask.java
Show outdated Hide outdated ...c/main/java/org/apache/kafka/streams/processor/internals/StreamTask.java
Show outdated Hide outdated ...c/main/java/org/apache/kafka/streams/processor/internals/StreamTask.java
@asfbot

This comment has been minimized.

Show comment
Hide comment
@asfbot

asfbot May 1, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/3331/
Test PASSed (JDK 8 and Scala 2.11).

asfbot commented May 1, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/3331/
Test PASSed (JDK 8 and Scala 2.11).

@asfbot

This comment has been minimized.

Show comment
Hide comment
@asfbot

asfbot May 1, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/3326/
Test PASSed (JDK 7 and Scala 2.10).

asfbot commented May 1, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/3326/
Test PASSed (JDK 7 and Scala 2.10).

@asfbot

This comment has been minimized.

Show comment
Hide comment
@asfbot

asfbot May 1, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/3322/
Test PASSed (JDK 8 and Scala 2.12).

asfbot commented May 1, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/3322/
Test PASSed (JDK 8 and Scala 2.12).

@enothereska

This comment has been minimized.

Show comment
Hide comment
@enothereska

enothereska May 2, 2017

Contributor

I think in general LGTM, would like to see results from streams benchmarks to see impact of changes with EoS enabled. Also streams system tests with EoS enabled. Thanks.

Contributor

enothereska commented May 2, 2017

I think in general LGTM, would like to see results from streams benchmarks to see impact of changes with EoS enabled. Also streams system tests with EoS enabled. Thanks.

@@ -136,20 +136,13 @@ public void flush() {
checkForException();

This comment has been minimized.

@enothereska

enothereska May 2, 2017

Contributor

@mjsax there are lots of places in this file where we check for exceptions and throw an exception, sometimes at unexpected times. There is a JIRA open for this at https://issues.apache.org/jira/browse/KAFKA-5006. With EoS these kind of exceptions probably don't make sense to be thrown this way. Also with EoS, I guess we'll need to rollback the transaction. My actual question:

  • does it make sense to fix that JIRA as part of this PR?
@enothereska

enothereska May 2, 2017

Contributor

@mjsax there are lots of places in this file where we check for exceptions and throw an exception, sometimes at unexpected times. There is a JIRA open for this at https://issues.apache.org/jira/browse/KAFKA-5006. With EoS these kind of exceptions probably don't make sense to be thrown this way. Also with EoS, I guess we'll need to rollback the transaction. My actual question:

  • does it make sense to fix that JIRA as part of this PR?

This comment has been minimized.

@enothereska

enothereska May 2, 2017

Contributor

Alternatively I can fix it by not checking for exceptions like that as suggested in the JIRA, but could you double check that's ok? Thanks.

@enothereska

enothereska May 2, 2017

Contributor

Alternatively I can fix it by not checking for exceptions like that as suggested in the JIRA, but could you double check that's ok? Thanks.

This comment has been minimized.

@mjsax

mjsax May 2, 2017

Member

I guess this can be fixed independently, and it would be cleaner not to have it in this PR, but just to do it with KAFKA-5006. Thanks!

@mjsax

mjsax May 2, 2017

Member

I guess this can be fixed independently, and it would be cleaner not to have it in this PR, but just to do it with KAFKA-5006. Thanks!

@mjsax

This comment has been minimized.

Show comment
Hide comment
@mjsax

mjsax May 2, 2017

Member

We can't to end-to-end tests yet, as broker code is not ready.

Member

mjsax commented May 2, 2017

We can't to end-to-end tests yet, as broker code is not ready.

@dguy

I don't really like that we have checks like if(eosEnabled) sprinkled around various places in the code base. IMO - this is a bit of a smell. Ideally the check would be done once and we'd construct implementations of interfaces at that one point and then polymorphism FTW! However, i understand that is a not possible without a major refactor. Anyway, just making it known.

Show outdated Hide outdated streams/src/main/java/org/apache/kafka/streams/StreamsConfig.java
Show outdated Hide outdated streams/src/main/java/org/apache/kafka/streams/StreamsConfig.java
if (time1 < time2) return -1;
if (time1 > time2) return 1;
public int compare(final RecordQueue queue1, final RecordQueue queue2) {
final long time1 = queue1.timestamp();

This comment has been minimized.

@dguy

dguy May 3, 2017

Contributor

It should be used everywhere a local, field, param is immutable. It is better for readability and can help avoid bugs. Granted it adds 5 extra characters to a declaration, but not much can do about that.

@dguy

dguy May 3, 2017

Contributor

It should be used everywhere a local, field, param is immutable. It is better for readability and can help avoid bugs. Granted it adds 5 extra characters to a declaration, but not much can do about that.

log.trace("{} Committing", logPrefix);
metrics.metrics.measureLatencyNs(
time,
new Runnable() {
@Override
public void run() {
flushState();
stateMgr.checkpoint(recordCollectorOffsets());
commitOffsets();
if (!eosEnabled) {

This comment has been minimized.

@dguy

dguy May 3, 2017

Contributor

it appears we have both eosEnabled and exactlyOnceEnabled in the same class

@dguy

dguy May 3, 2017

Contributor

it appears we have both eosEnabled and exactlyOnceEnabled in the same class

This comment has been minimized.

@mjsax

mjsax May 3, 2017

Member

Good catch! I moved eosEnabled to base class and forgot to remove exactlyOnceEnabled var here.

@mjsax

mjsax May 3, 2017

Member

Good catch! I moved eosEnabled to base class and forgot to remove exactlyOnceEnabled var here.

Show outdated Hide outdated streams/src/test/java/org/apache/kafka/streams/StreamsConfigTest.java
}
@Test(expected = ConfigException.class)
public void shouldThrowExceptionIfProducerMaxInFlightRequestPerConnectionsIsOverriddenIfEosEnabled() {

This comment has been minimized.

@dguy

dguy May 3, 2017

Contributor

ditto

@dguy

dguy May 3, 2017

Contributor

ditto

Show outdated Hide outdated ...st/java/org/apache/kafka/streams/processor/internals/StreamTaskTest.java
Show outdated Hide outdated .../java/org/apache/kafka/streams/processor/internals/StreamThreadTest.java
Show outdated Hide outdated .../java/org/apache/kafka/streams/processor/internals/StreamThreadTest.java
Show outdated Hide outdated .../java/org/apache/kafka/streams/processor/internals/StreamThreadTest.java
@mjsax

This comment has been minimized.

Show comment
Hide comment
@mjsax

mjsax May 3, 2017

Member

@eno we could do a test run, just with max.inflight.transaction=1 -- but we know from producer test that there is a 20% hit. Do you think we can gain any insight for running Streams test with this setting?

Member

mjsax commented May 3, 2017

@eno we could do a test run, just with max.inflight.transaction=1 -- but we know from producer test that there is a 20% hit. Do you think we can gain any insight for running Streams test with this setting?

@sriramsub

This comment has been minimized.

Show comment
Hide comment
@sriramsub

sriramsub May 3, 2017

Contributor

I agree with Damian about all the isEosEnabled checks. We should file a follow up PR to tackle this and make the code cleaner/maintainable/debuggable

Contributor

sriramsub commented May 3, 2017

I agree with Damian about all the isEosEnabled checks. We should file a follow up PR to tackle this and make the code cleaner/maintainable/debuggable

@mjsax

This comment has been minimized.

Show comment
Hide comment
@mjsax

mjsax May 3, 2017

Member

@sriramsub Do you mean "cleanRun" ? This affects only test code atm. We can also fix it right away in this PR.

Member

mjsax commented May 3, 2017

@sriramsub Do you mean "cleanRun" ? This affects only test code atm. We can also fix it right away in this PR.

@sriramsub

This comment has been minimized.

Show comment
Hide comment
@sriramsub

sriramsub May 3, 2017

Contributor

I meant all the different code path with the eosEnabled checks makes it hard to understand code, hard to maintain and will not be easy to debug. It would be good to think about code restructuring after this release to make this better.

Contributor

sriramsub commented May 3, 2017

I meant all the different code path with the eosEnabled checks makes it hard to understand code, hard to maintain and will not be easy to debug. It would be good to think about code restructuring after this release to make this better.

@asfbot

This comment has been minimized.

Show comment
Hide comment
@asfbot

asfbot May 3, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/3446/
Test FAILed (JDK 8 and Scala 2.11).

asfbot commented May 3, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/3446/
Test FAILed (JDK 8 and Scala 2.11).

@asfbot

This comment has been minimized.

Show comment
Hide comment
@asfbot

asfbot May 3, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/3440/
Test FAILed (JDK 7 and Scala 2.10).

asfbot commented May 3, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/3440/
Test FAILed (JDK 7 and Scala 2.10).

@asfbot

This comment has been minimized.

Show comment
Hide comment
@asfbot

asfbot May 3, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/3437/
Test PASSed (JDK 8 and Scala 2.12).

asfbot commented May 3, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/3437/
Test PASSed (JDK 8 and Scala 2.12).

@mjsax

This comment has been minimized.

Show comment
Hide comment
@mjsax

mjsax May 4, 2017

Member

@sriramsub I would be happy to do it differently. But the overall architecture makes is hard to refactor IMHO. The whole task life cycle with rebalanced etc is tricky. (cf. https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Architecture) -- btw: I think keeping this architectural picture in mind, help a lot to understand the code (btw. it needs some updates as we did some task refactoring already to make EOS code simpler)

Member

mjsax commented May 4, 2017

@sriramsub I would be happy to do it differently. But the overall architecture makes is hard to refactor IMHO. The whole task life cycle with rebalanced etc is tricky. (cf. https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Architecture) -- btw: I think keeping this architectural picture in mind, help a lot to understand the code (btw. it needs some updates as we did some task refactoring already to make EOS code simpler)

@dguy

Overall LGTM. I've left a couple of replies to previous comments, but nothing that should stop this from going in

@mjsax

This comment has been minimized.

Show comment
Hide comment
@mjsax

mjsax May 5, 2017

Member

Updated this to address the latest comments/discussions

Member

mjsax commented May 5, 2017

Updated this to address the latest comments/discussions

@asfbot

This comment has been minimized.

Show comment
Hide comment
@asfbot

asfbot May 5, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/3508/
Test PASSed (JDK 7 and Scala 2.10).

asfbot commented May 5, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/3508/
Test PASSed (JDK 7 and Scala 2.10).

@asfbot

This comment has been minimized.

Show comment
Hide comment
@asfbot

asfbot May 5, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/3514/
Test PASSed (JDK 8 and Scala 2.11).

asfbot commented May 5, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/3514/
Test PASSed (JDK 8 and Scala 2.11).

@asfbot

This comment has been minimized.

Show comment
Hide comment
@asfbot

asfbot May 5, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/3505/
Test FAILed (JDK 8 and Scala 2.12).

asfbot commented May 5, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/3505/
Test FAILed (JDK 8 and Scala 2.12).

@guozhangwang

Left a first round of comments. One meta comment is to finer-constraint the try-catch blocks across the thread / task classes and try avoid catch exception or throwable, instead list all the expected exceptions and handle them respectively, also consider not try-catching unnecessary code blocks if we are not expecting exceptions.

Show outdated Hide outdated clients/src/main/java/org/apache/kafka/clients/producer/MockProducer.java
* @param parsedValues unmodifiable map of current configuration
* @return a map of updates that should be applied to the configuration (will be validated to prevent bad updates)
*/
protected Map<String, Object> postProcessParsedConfig(Map<String, Object> parsedValues) {

This comment has been minimized.

@guozhangwang

guozhangwang May 9, 2017

Contributor

Why we want to leak this logic in the underlying class? Could we just modify StreamsConfig itself? Plus, having the underlying impl that ignores the passed in parameter and always return the empty map is not appropriate.

See my comments below.

@guozhangwang

guozhangwang May 9, 2017

Contributor

Why we want to leak this logic in the underlying class? Could we just modify StreamsConfig itself? Plus, having the underlying impl that ignores the passed in parameter and always return the empty map is not appropriate.

See my comments below.

This comment has been minimized.

@mjsax

mjsax May 10, 2017

Member

In the initial PR, I did not modify AbstractConfig and did everything in StreamsConfig. @ijuma did not like this and requested to do it in AbstractConfig. In the first version, I just made this.originals protected so I can modify the default value -- but this was the disadvantage that originals is not immutable anymore. Thus, we come up with this solution.

Personally, I am fine either way. Looking for further feedback before changing forth-and-back again.

@mjsax

mjsax May 10, 2017

Member

In the initial PR, I did not modify AbstractConfig and did everything in StreamsConfig. @ijuma did not like this and requested to do it in AbstractConfig. In the first version, I just made this.originals protected so I can modify the default value -- but this was the disadvantage that originals is not immutable anymore. Thus, we come up with this solution.

Personally, I am fine either way. Looking for further feedback before changing forth-and-back again.

This comment has been minimized.

@ijuma

ijuma May 10, 2017

Contributor

@guozhangwang the original solution meant that if you iterated over the configs, you'd not get the right default and the logged config values would also be wrong. That's why I suggested doing it this way.

@ijuma

ijuma May 10, 2017

Contributor

@guozhangwang the original solution meant that if you iterated over the configs, you'd not get the right default and the logged config values would also be wrong. That's why I suggested doing it this way.

This comment has been minimized.

@guozhangwang

guozhangwang May 16, 2017

Contributor

What I was thinking is actually different from the original solution: in getCommonConsumerProducerConfigs we check on the EOS config value, and then depending on that modify the streams overridden defaults before user's overrides:

        final Map<String, Object> consumerProps = new HashMap<>(CONSUMER_DEFAULT_OVERRIDES);

        consumerProps.putAll(EOS_CONSUMER_DEFAULT_OVERRIDES); // this is added line.

        consumerProps.putAll(clientProvidedProps);

Does it make sense?

@guozhangwang

guozhangwang May 16, 2017

Contributor

What I was thinking is actually different from the original solution: in getCommonConsumerProducerConfigs we check on the EOS config value, and then depending on that modify the streams overridden defaults before user's overrides:

        final Map<String, Object> consumerProps = new HashMap<>(CONSUMER_DEFAULT_OVERRIDES);

        consumerProps.putAll(EOS_CONSUMER_DEFAULT_OVERRIDES); // this is added line.

        consumerProps.putAll(clientProvidedProps);

Does it make sense?

This comment has been minimized.

@mjsax

mjsax May 16, 2017

Member

But if there are no user overwrites, the original config will still have the non-EOS default for this case. That is what we try to avoid.

@mjsax

mjsax May 16, 2017

Member

But if there are no user overwrites, the original config will still have the non-EOS default for this case. That is what we try to avoid.

This comment has been minimized.

@mjsax

mjsax May 16, 2017

Member

One more thing to add here: with EOS, for consumer/producer config you approach is the way we go -- be we also have Streams config COMMIT_INTERVAL_MS that gets a different default value for EOS, and for this case, we need to set the value correctly from the beginning on.

@mjsax

mjsax May 16, 2017

Member

One more thing to add here: with EOS, for consumer/producer config you approach is the way we go -- be we also have Streams config COMMIT_INTERVAL_MS that gets a different default value for EOS, and for this case, we need to set the value correctly from the beginning on.

Show outdated Hide outdated streams/src/main/java/org/apache/kafka/streams/StreamsConfig.java
Show outdated Hide outdated streams/src/main/java/org/apache/kafka/streams/StreamsConfig.java
}
@Override
protected Map<String, Object> postProcessParsedConfig(final Map<String, Object> parsedValues) {

This comment has been minimized.

@guozhangwang

guozhangwang May 9, 2017

Contributor

Instead of using an overridden function as part of the constructor, we could simply override it based on eosEnabled after CONSUMER_DEFAULT_OVERRIDES are provided and before consumerProps.putAll(clientProvidedProps);.

@guozhangwang

guozhangwang May 9, 2017

Contributor

Instead of using an overridden function as part of the constructor, we could simply override it based on eosEnabled after CONSUMER_DEFAULT_OVERRIDES are provided and before consumerProps.putAll(clientProvidedProps);.

}
});
if (e != null) {
throw e;

This comment has been minimized.

@guozhangwang

guozhangwang May 9, 2017

Contributor

Do we always want to rethrow all other exceptions other than producerFenced? For example consumer#commit could return CommitFailedException here, which means:

if the commit failed and cannot be retried.
     *             This can only occur if you are using automatic group management with {@link #subscribe(Collection)},
     *             or if there is an active group with the same groupId which is using group management.

In this case it means a rebalance has likely happened and hence the consumer no longer owns the partition that it tries to commit to. In this case shouldn't we handle it internally than throwing it all the way to the user?

@guozhangwang

guozhangwang May 9, 2017

Contributor

Do we always want to rethrow all other exceptions other than producerFenced? For example consumer#commit could return CommitFailedException here, which means:

if the commit failed and cannot be retried.
     *             This can only occur if you are using automatic group management with {@link #subscribe(Collection)},
     *             or if there is an active group with the same groupId which is using group management.

In this case it means a rebalance has likely happened and hence the consumer no longer owns the partition that it tries to commit to. In this case shouldn't we handle it internally than throwing it all the way to the user?

This comment has been minimized.

@mjsax

mjsax May 11, 2017

Member

As discussed offline. This code behaves as before. I will do an separate PR to improve exception handling in general.

@mjsax

mjsax May 11, 2017

Member

As discussed offline. This code behaves as before. I will do an separate PR to improve exception handling in general.

Show outdated Hide outdated ...main/java/org/apache/kafka/streams/processor/internals/StreamThread.java
try {
consumer.commitSync(consumedOffsetsAndMetadata);
} catch (final CommitFailedException e) {
log.warn("{} Failed offset commits {} due to {}", logPrefix, consumedOffsetsAndMetadata, e.getMessage());

This comment has been minimized.

@guozhangwang

guozhangwang May 9, 2017

Contributor

CommitFailedException can only be thrown if the consumer has fallen out of the group, we should handle it inside Streams than throw the exception all the way up.

@guozhangwang

guozhangwang May 9, 2017

Contributor

CommitFailedException can only be thrown if the consumer has fallen out of the group, we should handle it inside Streams than throw the exception all the way up.

This comment has been minimized.

@mjsax

mjsax May 10, 2017

Member

I agree, but this PR did not change the behavior. It's unmodified code here. I would rather do a separate PR to fix this.

@mjsax

mjsax May 10, 2017

Member

I agree, but this PR did not change the behavior. It's unmodified code here. I would rather do a separate PR to fix this.

This comment has been minimized.

@mjsax

mjsax May 11, 2017

Member

Btw: this exception will be caught in StreamThread and swallowed there.

@mjsax

mjsax May 11, 2017

Member

Btw: this exception will be caught in StreamThread and swallowed there.

Show outdated Hide outdated ...c/main/java/org/apache/kafka/streams/processor/internals/StreamTask.java
Show outdated Hide outdated ...c/main/java/org/apache/kafka/streams/processor/internals/StreamTask.java
@guozhangwang

This comment has been minimized.

Show comment
Hide comment
@guozhangwang

guozhangwang May 10, 2017

Contributor

Just to give an example to what I said in the previous comment: inside RecordCollectorImpl#send() we call producer.send() which could throw the following exceptions (assuming EOS is enabled):

  1. TimeoutException: fetching metadata timed out, this is a fatal error since the topic may not exist forever. We should capture it higher in code hierarchy in StreamTask#process(), try-catch currNode.process() and abort transaction / close task. Also we need to rethrow this exception so that users can be notified and close the whole instance (we can re-consider the global exception handling in another JIRA).

  2. SerializationException: this should never happen since we are using byte<>, byte<>, hence it'll be a bug if it really happens. We should fail-fast the whole instance directly, i.e. try-catch on even higher level StreamThread#run() and shutdown the whole instance.

  3. ProducerFencedException: a rebalance has happened and the thread's consumer has fallen out of the group. We should try-catch it in lower level StreamTask#process()'s currNode.process() and abort transaction / close task. However we do not need to rethrow the exception for this case, and hoping the next consumer.poll will rejoin the group with the callbacks to revoke / reassign tasks.

So we can see that it's better to list all these exceptions and consider capturing them on different levels of the hierarchy to handle them differently. In the future when there is a bug, it will be then easier to traverse the code path and locate its root cause.

Contributor

guozhangwang commented May 10, 2017

Just to give an example to what I said in the previous comment: inside RecordCollectorImpl#send() we call producer.send() which could throw the following exceptions (assuming EOS is enabled):

  1. TimeoutException: fetching metadata timed out, this is a fatal error since the topic may not exist forever. We should capture it higher in code hierarchy in StreamTask#process(), try-catch currNode.process() and abort transaction / close task. Also we need to rethrow this exception so that users can be notified and close the whole instance (we can re-consider the global exception handling in another JIRA).

  2. SerializationException: this should never happen since we are using byte<>, byte<>, hence it'll be a bug if it really happens. We should fail-fast the whole instance directly, i.e. try-catch on even higher level StreamThread#run() and shutdown the whole instance.

  3. ProducerFencedException: a rebalance has happened and the thread's consumer has fallen out of the group. We should try-catch it in lower level StreamTask#process()'s currNode.process() and abort transaction / close task. However we do not need to rethrow the exception for this case, and hoping the next consumer.poll will rejoin the group with the callbacks to revoke / reassign tasks.

So we can see that it's better to list all these exceptions and consider capturing them on different levels of the hierarchy to handle them differently. In the future when there is a bug, it will be then easier to traverse the code path and locate its root cause.

@asfbot

This comment has been minimized.

Show comment
Hide comment
@asfbot

asfbot May 11, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/3735/
Test PASSed (JDK 7 and Scala 2.10).

asfbot commented May 11, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/3735/
Test PASSed (JDK 7 and Scala 2.10).

@asfbot

This comment has been minimized.

Show comment
Hide comment
@asfbot

asfbot May 11, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/3731/
Test FAILed (JDK 8 and Scala 2.12).

asfbot commented May 11, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/3731/
Test FAILed (JDK 8 and Scala 2.12).

@asfbot

This comment has been minimized.

Show comment
Hide comment
@asfbot

asfbot May 11, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/3741/
Test PASSed (JDK 8 and Scala 2.11).

asfbot commented May 11, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/3741/
Test PASSed (JDK 8 and Scala 2.11).

@guozhangwang

This comment has been minimized.

Show comment
Hide comment
@guozhangwang

guozhangwang May 16, 2017

Contributor

@mjsax could you rebase?

Contributor

guozhangwang commented May 16, 2017

@mjsax could you rebase?

@asfbot

This comment has been minimized.

Show comment
Hide comment
@asfbot

asfbot May 16, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.11/4008/
Test PASSed (JDK 7 and Scala 2.11).

asfbot commented May 16, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.11/4008/
Test PASSed (JDK 7 and Scala 2.11).

@mjsax

This comment has been minimized.

Show comment
Hide comment
@mjsax

mjsax May 16, 2017

Member

@guozhangwang Rebased -- also update StreamsConfig for setting producer/consumer EOS default values.

Member

mjsax commented May 16, 2017

@guozhangwang Rebased -- also update StreamsConfig for setting producer/consumer EOS default values.

@asfbot

This comment has been minimized.

Show comment
Hide comment
@asfbot

asfbot May 16, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.11/4018/
Test PASSed (JDK 7 and Scala 2.11).

asfbot commented May 16, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.11/4018/
Test PASSed (JDK 7 and Scala 2.11).

@asfbot

This comment has been minimized.

Show comment
Hide comment
@asfbot

asfbot May 16, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/4004/
Test PASSed (JDK 8 and Scala 2.12).

asfbot commented May 16, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/4004/
Test PASSed (JDK 8 and Scala 2.12).

@asfbot

This comment has been minimized.

Show comment
Hide comment
@asfbot

asfbot May 16, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/4012/
Test FAILed (JDK 8 and Scala 2.12).

asfbot commented May 16, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/4012/
Test FAILed (JDK 8 and Scala 2.12).

@mjsax

This comment has been minimized.

Show comment
Hide comment
@mjsax

mjsax May 16, 2017

Member

Could reproduce the failing test. It's a race condition on MockConsumer that is not synchronized.

Member

mjsax commented May 16, 2017

Could reproduce the failing test. It's a race condition on MockConsumer that is not synchronized.

@mjsax

This comment has been minimized.

Show comment
Hide comment
Member

mjsax commented May 16, 2017

@asfbot

This comment has been minimized.

Show comment
Hide comment
@asfbot

asfbot May 16, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.11/4027/
Test PASSed (JDK 7 and Scala 2.11).

asfbot commented May 16, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.11/4027/
Test PASSed (JDK 7 and Scala 2.11).

@asfbot

This comment has been minimized.

Show comment
Hide comment
@asfbot

asfbot May 17, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/4013/
Test PASSed (JDK 8 and Scala 2.12).

asfbot commented May 17, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/4013/
Test PASSed (JDK 8 and Scala 2.12).

@guozhangwang

This comment has been minimized.

Show comment
Hide comment
@guozhangwang

guozhangwang May 17, 2017

Contributor

LGTM. Merged to trunk.

Contributor

guozhangwang commented May 17, 2017

LGTM. Merged to trunk.

@asfgit asfgit closed this in ebc7f7c May 17, 2017

@mjsax mjsax deleted the mjsax:kafka-4923-add-eos-to-streams branch May 17, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment