New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-11144] Fix trigger prefetching so that the correct trigger index #13221
Conversation
R: @kennknowles Let me know if it is safe to change the parameter for TriggerStateMachine.prefetch* to a TriggerContext. If so If so I think fixing it that way would be more consistent with how the non-prefetch methods access state. |
8fd63bb
to
33d8068
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overarching concern: I'd like to move ownership of each trigger's behavior into that trigger, rather than having a trigger-agnostic loop on the outside. TBH I'd love to delete the subtriggers
accessor. For prefetch it is mostly harmless, fetching extra data that won't be observed, but I don't want to add usage of this pattern.
Perhaps the prefetchXYZ
methods should be abstract in the base TriggerStateMachine
?
R: @kennknowles I agree. I think this lines up with my comment above "Let me know if it is safe to change the parameter for TriggerStateMachine.prefetch* to a TriggerContext." If that is true, then we can make it abstract and TriggerStateMachine methods will have enough to do all the prefetching for subtriggers as desired. As a side-benefit they could be more sophisticated in what they prefetch if desired instead of all subtriggers. Can you confirm it is safe to change the interfaces here without concerns about compatibility? If so, I'll change this to an abstract methods taking a TriggerContext |
Oh, yes, the whole module is OK to change. It is just for core runners (those that are in this repo and can be adjusted if needed). |
is used for the state namespace. Previously prefetching was fetching non-existent state, possibly adding unnecessary fetches as well as neglecting to prefetch actual state used. The prefetch methods are made abstract on TriggerStateMachine and changed to take a PrefetchContext which has enough information to fetch the correct state namespaces for subtriggers.
33d8068
to
8e093b5
Compare
R: @kennknowles PTAL, I added a new context type PrefetchContext since the finished set etc in TriggerContext are not available at prefetch time. The TriggerStateMachine.prefetch* methods are now abstract. Classes with subtriggers still have to iterate over subtriggers from the context to get access to a ExecutableTriggerStateMachine to get access to the index but at least it is more contained. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I just asked you to write a bunch of boilerplate. Sorry about that. I do think it is slightly better that being totally inessential. A state machine should own its own prefetch logic, even if they are all largely the same and any intelligence I imagined is not possible anyhow...
@@ -48,6 +48,13 @@ private AfterEachStateMachine(List<TriggerStateMachine> subTriggers) { | |||
checkArgument(subTriggers.size() > 1); | |||
} | |||
|
|||
@Override | |||
public void prefetchOnElement(PrefetchContext c) { | |||
for (ExecutableTriggerStateMachine subTrigger : c.trigger().subTriggers()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was one of the cases where I was thinking it only made sense to prefetch the active subtrigger. Of course, I was missing the whole point that you can't know which one that is until you do a fetch. So maybe my whole idea was foolish.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah when I embarked on this I was thinking also it might be possible to do better but it didn't pan out.
One idea I had was adding a isCached() method on the state objects that would indicate if the value was local (cached, in-memory, etc) and didn't require a fetch. In such cases we could trim the trigger state to prefetch by looking at the closed triggers. However if everything we actually needed was local, we've only called readLater on something we never actually read and thus never issue a fetch anyway. And if we do have to issue a fetch for anything, fetching an additional unused tag to it is likely not much more overhead.
@@ -91,6 +91,13 @@ public AfterWatermarkEarlyAndLate withLateFirings(TriggerStateMachine lateTrigge | |||
return new AfterWatermarkEarlyAndLate(earlyTrigger, lateTrigger); | |||
} | |||
|
|||
@Override | |||
public void prefetchOnElement(PrefetchContext c) { | |||
for (ExecutableTriggerStateMachine subTrigger : c.trigger().subTriggers()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the other one. Same problem with my thinking.
is used for the state namespace.
Previously prefetching was fetching non-existent state, possibly adding
unnecessary fetches as well as neglecting to prefetch actual state used.
Another way to accomplish this would be to change the TriggerStateMachine
prefetch methods to take a TriggerContext instead of a raw state accessor.
This would allow for TriggerStateMachine to get the correct state for
subtriggers. That seems more inline with how the nonprefetch methods work
however that is a public interface change.
Please add a meaningful description for your change here
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.