Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-1674] Fix Flink State GC #2217

Merged
merged 3 commits into from Mar 10, 2017
Merged

Conversation

aljoscha
Copy link
Contributor

This is a proper solution, as discussed in the Jira issue. If we merge this we can drop #2215. (Thanks for quickly providing that PR, though!)

R: @kennknowles

We now set the GC timer for window.maxTimestamp() + 1 to ensure that a
user timer set for window.maxTimestamp() still has all state.

This also adds tests for late data dropping and state GC specifically
for the Flink DoFnOperator.
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.03%) to 70.138% when pulling 1a8e1f7 on aljoscha:jira-1674-fix-flink-gc into 2c2424c on apache:master.

@asfbot
Copy link

asfbot commented Mar 10, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/8315/
--none--

@aljoscha
Copy link
Contributor Author

Run Flink RunnableOnService

@asfbot
Copy link

asfbot commented Mar 10, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PostCommit_Java_RunnableOnService_Flink/1879/
--none--

@JingsongLi
Copy link
Contributor

Sorry, a few days ago I have some private matters. Now see it. I had a comment here.

doFnRunner.onTimer(timerId, window, timestamp, timeDomain);
}
// a timer can never be late because we don't allow setting timers after GC time
doFnRunner.onTimer(timerId, window, timestamp, timeDomain);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A event timer can never be late, but a process timer may be late. We need drop the late processTimer here. What do you think? @aljoscha

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, you mean a processing-time timer that fires for a window that was already garbage collected?

I'll add this again. Thanks for spotting this!

@aljoscha
Copy link
Contributor Author

@JingsongLi Don't worry. 😃

@aljoscha
Copy link
Contributor Author

@JingsongLi I pushed another commit for the processing-time timer thing.

@JingsongLi
Copy link
Contributor

This is good~

@kennknowles
Copy link
Member

Run Flink RunnableOnService

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.03%) to 70.14% when pulling dbfcf4b on aljoscha:jira-1674-fix-flink-gc into 2c2424c on apache:master.

}

/**
* A {@link StatefulDoFnRunner.StateCleaner} implemented by StateInternals.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

javadoc still thinks it is in the other file

}

/**
* A {@link StatefulDoFnRunner.StateCleaner} implemented by StateInternals.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it in two places now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the original one to the Flink Runner because I'm changing the GC time to window.max + 1, which seemed like a Flink-specific thing. Do you think I should simply leave it as is and change to the +1 behaviour for the default implementation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think you should just change the common implementation to do a +1. I bet most runners leveraging it might have similar troubles, and likely the same solution will work a lot of the time. It should be clearly documented that it can only be used if timers are delivered in order.

@asfbot
Copy link

asfbot commented Mar 10, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PostCommit_Java_RunnableOnService_Flink/1883/
--none--

@asfbot
Copy link

asfbot commented Mar 10, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/8325/
--none--

@kennknowles
Copy link
Member

kennknowles commented Mar 10, 2017

Since it is release blocking, and the weekend already for many timezones, let's merge and tidy up later.

@asfgit asfgit merged commit dbfcf4b into apache:master Mar 10, 2017
asfgit pushed a commit that referenced this pull request Mar 10, 2017
  Properly deal with late processing-time timers
  Introduce Flink-specific state GC implementations
  Move GC timer checking to StatefulDoFnRunner.CleanupTimer
@kennknowles
Copy link
Member

Filed BEAM-1689 to do the cleanup.

@aljoscha
Copy link
Contributor Author

Thanks! 😃

@aljoscha aljoscha deleted the jira-1674-fix-flink-gc branch March 10, 2017 23:53
stateCleaner.clearForWindow(window);
// There should invoke the onWindowExpiration of DoFn
} else {
if (isEventTimer || !dropLateData(window)) {
// An event-time timer can never be late because we don't allow setting timers after GC time.
// Ot can happen that a processing-time time fires for a late window, we need to ignore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

time fires -> timer fires

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants