Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-4000] Fix for checkpoint state restore at MessageAcknowledgingSourceBase #2062

Closed
wants to merge 1 commit into from

Conversation

asavartsov
Copy link
Contributor

As says documentation for MessageAcknowledgingSourceBase.restoreState()

This method is invoked when a function is executed as part of a recovery run. Note that restoreState() is called before open().

So current implementation

  1. Fails on restoreState with NullPointerException, jobs fail to restart.
  2. Does not restore anything because following open erases all checkpoint data immediately.
  3. As consequence, violates exactly once rule because processed but not acknowledged list erased.

Proposed change fixes that.

…SourceBase

As says documentation for MessageAcknowledgingSourceBase.restoreState() 

This method is invoked when a function is executed as part of a recovery run. Note that restoreState() is called before open().

So current implementation

1. Fails on restoreState with NullPointerException, jobs fail to restart.
2. Does not restore anything because following open erases all checkpoint data immediately.
3. As consequence, violates exactly once rule because processed but not acknowledged list erased.

Proposed change fixes that.
@StephanEwen
Copy link
Contributor

Thanks, this is a critical fix.
+1

Merging this...

StephanEwen pushed a commit to StephanEwen/flink that referenced this pull request Jun 7, 2016
…knowledgingSourceBase

As says documentation for MessageAcknowledgingSourceBase.restoreState()

This method is invoked when a function is executed as part of a recovery run. Note that restoreState() is called before open().

So current implementation

1. Fails on restoreState with NullPointerException, jobs fail to restart.
2. Does not restore anything because following open erases all checkpoint data immediately.
3. As consequence, violates exactly once rule because processed but not acknowledged list erased.

Proposed change fixes that.

This closes apache#2062
StephanEwen pushed a commit to StephanEwen/flink that referenced this pull request Jun 7, 2016
…knowledgingSourceBase

As says documentation for MessageAcknowledgingSourceBase.restoreState()

This method is invoked when a function is executed as part of a recovery run. Note that restoreState() is called before open().

So current implementation

1. Fails on restoreState with NullPointerException, jobs fail to restart.
2. Does not restore anything because following open erases all checkpoint data immediately.
3. As consequence, violates exactly once rule because processed but not acknowledged list erased.

Proposed change fixes that.

This closes apache#2062
@asfgit asfgit closed this in ae679bb Jun 8, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants