Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-11144] Fix trigger prefetching so that the correct trigger index #13221

Merged
merged 1 commit into from Nov 3, 2020

Conversation

scwhittle
Copy link
Contributor

is used for the state namespace.

Previously prefetching was fetching non-existent state, possibly adding
unnecessary fetches as well as neglecting to prefetch actual state used.

Another way to accomplish this would be to change the TriggerStateMachine
prefetch methods to take a TriggerContext instead of a raw state accessor.
This would allow for TriggerStateMachine to get the correct state for
subtriggers. That seems more inline with how the nonprefetch methods work
however that is a public interface change.

Please add a meaningful description for your change here


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

Post-Commit Tests Status (on master branch)

Lang SDK Dataflow Flink Samza Spark Twister2
Go Build Status --- Build Status --- Build Status ---
Java Build Status Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status
Build Status
Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
--- Build Status ---
XLang Build Status --- Build Status --- Build Status ---

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website Whitespace Typescript
Non-portable Build Status Build Status
Build Status
Build Status
Build Status
Build Status Build Status Build Status Build Status
Portable --- Build Status --- --- --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

@scwhittle
Copy link
Contributor Author

R: @kennknowles Let me know if it is safe to change the parameter for TriggerStateMachine.prefetch* to a TriggerContext. If so If so I think fixing it that way would be more consistent with how the non-prefetch methods access state.

Copy link
Member

@kennknowles kennknowles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overarching concern: I'd like to move ownership of each trigger's behavior into that trigger, rather than having a trigger-agnostic loop on the outside. TBH I'd love to delete the subtriggers accessor. For prefetch it is mostly harmless, fetching extra data that won't be observed, but I don't want to add usage of this pattern.

Perhaps the prefetchXYZ methods should be abstract in the base TriggerStateMachine?

@scwhittle
Copy link
Contributor Author

R: @kennknowles I agree. I think this lines up with my comment above

"Let me know if it is safe to change the parameter for TriggerStateMachine.prefetch* to a TriggerContext."

If that is true, then we can make it abstract and TriggerStateMachine methods will have enough to do all the prefetching for subtriggers as desired. As a side-benefit they could be more sophisticated in what they prefetch if desired instead of all subtriggers.

Can you confirm it is safe to change the interfaces here without concerns about compatibility? If so, I'll change this to an abstract methods taking a TriggerContext

@kennknowles
Copy link
Member

Oh, yes, the whole module is OK to change. It is just for core runners (those that are in this repo and can be adjusted if needed).

is used for the state namespace.

Previously prefetching was fetching non-existent state, possibly adding
unnecessary fetches as well as neglecting to prefetch actual state used.

The prefetch methods are made abstract on TriggerStateMachine and changed
to take a PrefetchContext which has enough information to fetch the correct
state namespaces for subtriggers.
@scwhittle
Copy link
Contributor Author

R: @kennknowles PTAL, I added a new context type PrefetchContext since the finished set etc in TriggerContext are not available at prefetch time. The TriggerStateMachine.prefetch* methods are now abstract. Classes with subtriggers still have to iterate over subtriggers from the context to get access to a ExecutableTriggerStateMachine to get access to the index but at least it is more contained.

Copy link
Member

@kennknowles kennknowles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I just asked you to write a bunch of boilerplate. Sorry about that. I do think it is slightly better that being totally inessential. A state machine should own its own prefetch logic, even if they are all largely the same and any intelligence I imagined is not possible anyhow...

@@ -48,6 +48,13 @@ private AfterEachStateMachine(List<TriggerStateMachine> subTriggers) {
checkArgument(subTriggers.size() > 1);
}

@Override
public void prefetchOnElement(PrefetchContext c) {
for (ExecutableTriggerStateMachine subTrigger : c.trigger().subTriggers()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was one of the cases where I was thinking it only made sense to prefetch the active subtrigger. Of course, I was missing the whole point that you can't know which one that is until you do a fetch. So maybe my whole idea was foolish.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah when I embarked on this I was thinking also it might be possible to do better but it didn't pan out.

One idea I had was adding a isCached() method on the state objects that would indicate if the value was local (cached, in-memory, etc) and didn't require a fetch. In such cases we could trim the trigger state to prefetch by looking at the closed triggers. However if everything we actually needed was local, we've only called readLater on something we never actually read and thus never issue a fetch anyway. And if we do have to issue a fetch for anything, fetching an additional unused tag to it is likely not much more overhead.

@@ -91,6 +91,13 @@ public AfterWatermarkEarlyAndLate withLateFirings(TriggerStateMachine lateTrigge
return new AfterWatermarkEarlyAndLate(earlyTrigger, lateTrigger);
}

@Override
public void prefetchOnElement(PrefetchContext c) {
for (ExecutableTriggerStateMachine subTrigger : c.trigger().subTriggers()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the other one. Same problem with my thinking.

@kennknowles kennknowles merged commit 9dba074 into apache:master Nov 3, 2020
@scwhittle scwhittle deleted the trigger_prefetch branch November 3, 2020 19:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants