Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-8543] Dataflow streaming timers are not strictly time ordered when set earlier mid-bundle #11924

Merged
merged 7 commits into from Jul 29, 2020

Conversation

rehmanmuradali
Copy link
Contributor


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- Build Status --- --- Build Status
Java Build Status Build Status Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status Build Status
Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
--- Build Status
Build Status
Build Status
Build Status
Build Status
--- --- Build Status
XLang --- --- --- Build Status --- --- Build Status

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status
Build Status
Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

@rehmanmuradali
Copy link
Contributor Author

R: @reuvenlax
Could you please take a look that I am on right track?

if (userTimerInternals.hasTimerBefore(currentInputWatermark)) {
while (!toBeFiredTimersOrdered.isEmpty()) {
userTimerInternals.setTimer(toBeFiredTimersOrdered.poll());
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kennknowles for comment. This doesn't look right to me, as I don't think we should be modifying the WindmillTimerInternals here. I think we just want to merge the timer modifications from processing the workitem into this priority queue; note that if timers are deleted, we need to detect that as well and remove from the priority queue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I don't actually understand what this block is for.

FWIW to do timer deletion/reset cheaply without building a bespoke data structure just keep a map from id to firing time or tombstone. This way, whenever a timer comes up in the prio queue you pull out the actual time for it from the map. If it is actually set for another time, don't fire it. If it is obsolete, don't fire it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reuvenlax done

@@ -577,12 +583,21 @@ public void flushState() {
WindmillTimerInternals.windmillTimerToTimerData(
WindmillNamespacePrefix.USER_NAMESPACE_PREFIX, timer, windowCoder))
.iterator();

cachedFiredUserTimers.forEachRemaining(toBeFiredTimersOrdered::add);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we even need cachedFiredUserTimers? It seems obsolete if we populate the priority queue. The name is also wrong - even before this PR it wasn't a cache. It is a lazily initialized iterator. Instead, we should have a lazily initialized priority queue (like you do) and just a flag to say whether the incoming timers have been loaded yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if (userTimerInternals.hasTimerBefore(currentInputWatermark)) {
while (!toBeFiredTimersOrdered.isEmpty()) {
userTimerInternals.setTimer(toBeFiredTimersOrdered.poll());
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I don't actually understand what this block is for.

FWIW to do timer deletion/reset cheaply without building a bespoke data structure just keep a map from id to firing time or tombstone. This way, whenever a timer comes up in the prio queue you pull out the actual time for it from the map. If it is actually set for another time, don't fire it. If it is obsolete, don't fire it.

@ajamato
Copy link

ajamato commented Jun 18, 2020

@kennknowles Would you mind following up on this PR?

@kennknowles kennknowles self-requested a review July 14, 2020 22:34
import java.util.List;
import java.util.Map;
import java.util.Set;
import java.util.*;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep the imports all explicit.

private Iterator<TimerData> cachedFiredUserTimers = null;
private PriorityQueue<TimerData> toBeFiredTimersOrdered = null;

// to track if timer is reset earlier mid-bundle.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment about what are the keys and values of this map?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

}

private void testEventTimeTimerOrderingWithInputPTransform(
Instant now,
int numTestElements,
PTransform<PBegin, PCollection<KV<String, String>>> transform)
PTransform<PBegin, PCollection<KV<String, String>>> transform,
boolean isStreaming)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't depend on streaming or not, but just controls whether the pcollection should be bounded or unbounded. For clarity, you can just make this parameter IsBounded isBounded

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to IsBounded

}

private final Instant start;
private final Instant end;
private final boolean isStreaming;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Instant start,
Instant end,
PTransform<PBegin, PCollection<KV<Void, Void>>> input,
boolean isStreaming) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -4331,6 +4378,7 @@ public PDone expand(PBegin input) {
PCollection<String> result =
input
.apply(inputPTransform)
.setIsBoundedInternal(isStreaming ? IsBounded.UNBOUNDED : IsBounded.BOUNDED)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way to do this that might be better is to use TestStream in the unbounded case. This will probably give best coverage. Even for an unbounded PCollection the watermark might instantly move to infinity.

# Conflicts:
#	runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingModeExecutionContext.java
#	runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillTimerInternals.java
@rehmanmuradali
Copy link
Contributor Author

rehmanmuradali commented Jul 29, 2020

R: @kennknowles @reuvenlax

Copy link
Member

@kennknowles kennknowles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ready to merge this. Thanks for sticking with us through the review. I will squash the commits since they are all incremental and the overall PR is one change.

@kennknowles
Copy link
Member

run dataflow validatesrunner

@lukecwik
Copy link
Member

lukecwik commented Aug 7, 2020

This PR enabled new tests which are failing with some of the validates runner tests. for example:
https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/7773/testReport/
https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/7805/testReport/

Expanded existing JIRA: https://issues.apache.org/jira/browse/BEAM-8460, opened #12503 to disable failing test category.

kennknowles added a commit to kennknowles/beam that referenced this pull request Sep 30, 2020
…rdered when set earlier mid-bundle (apache#11924)"

This reverts commit 88acc52.

There is a bug in this commit that causes deleted timers to not clear their
watermark holds, resulting in stuck pipelines. See
https://issues.apache.org/jira/browse/BEAM-10991
kennknowles added a commit that referenced this pull request Oct 1, 2020
…streaming timers are not strictly time ordered when set earlier mid-bundle (#11924)"
kennknowles added a commit to kennknowles/beam that referenced this pull request Oct 1, 2020
…rdered when set earlier mid-bundle (apache#11924)"

This reverts commit 88acc52.

There is a bug in this commit that causes deleted timers to not clear their
watermark holds, resulting in stuck pipelines. See
https://issues.apache.org/jira/browse/BEAM-10991
robinyqiu pushed a commit that referenced this pull request Oct 5, 2020
[BEAM-10991] Cherrypick #12980 to 2.25.0: Revert "[BEAM-8543] Dataflow streaming timers are not strictly time ordered when set earlier mid-bundle (#11924)"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants