Skip to content

Conversation

@pnowojski
Copy link
Contributor

What is the purpose of the change

This pull request fixes a bug in all versions of KafkaConsumers, that resulted in application crash, if snapshot before KafkaConsumer was able to asynchronously properly initialize.

Verifying this change

This change adds additional unit test in AbstractFetcherTest and a check state to ensure/enforce contract that internal sentinel offsets will not be committed to Kafka.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
  • The serializers: (yes / no / don't know)
  • The runtime per-record code paths (performance sensitive): (yes / no / don't know)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / no / don't know)

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@pnowojski
Copy link
Contributor Author

@tzulitai @aljoscha could you take a look?

Copy link
Contributor

@tzulitai tzulitai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch on this issue @pnowojski.

One question:
With this change, if we restore a savepoint from say 1.3, that has a sentinel offset in it, and after the restore a checkpoint also occurs before that sentinel is replaced, that partition state would then be dropped, correct? I think we might need to cover that case also before applying this fix.

return offset != KafkaTopicPartitionStateSentinel.OFFSET_NOT_SET;
}

public final boolean isSentinel() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would hasSentinelOffset be a better name here?

Copy link
Contributor Author

@pnowojski pnowojski Nov 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In new approach method was dropped, but indeed that would be a better name :)

public static final long GROUP_OFFSET = -915623761773L;

public static boolean isSentinel(long offset) {
return offset < 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this implementation could be a bit too broad. Could be a bit more specific by matching the static values in KafkaTopicPartitionStateSentinel.

Copy link
Contributor Author

@pnowojski pnowojski Nov 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kafka doesn't allow to commit any negative values, so this check makes sense.

@tzulitai
Copy link
Contributor

tzulitai commented Nov 1, 2017

By the way, what exactly was the error that caused the application crash in the described case?

@pnowojski
Copy link
Contributor Author

@tzulitai please check the details in the ticket: https://issues.apache.org/jira/browse/FLINK-7732

I have changed the approach as we discussed and now we filtering out happens just before committing offsets.

Copy link
Contributor

@tzulitai tzulitai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pnowojski, changes LGTM. Will wait for Travis to finish a test run.

Map<KafkaTopicPartition, Long> offsets,
@Nonnull KafkaCommitCallback commitCallback) throws Exception;

private Map<KafkaTopicPartition, Long> filerOutSentinels(Map<KafkaTopicPartition, Long> offsets) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: filterOutSentinels, missing t.

@tzulitai
Copy link
Contributor

tzulitai commented Nov 2, 2017

Travis passes, merging ...

@asfgit asfgit closed this in c61d186 Nov 2, 2017
@pnowojski
Copy link
Contributor Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants