-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-33810][Runtime] Propagate RecordAttributes that contains isProcessingBacklog status #23919
Conversation
4488647
to
af38369
Compare
af38369
to
3cdd56b
Compare
…utes downstream when backlog status changed
…ethod of PushingAsyncDataInput#DataOutput
3cdd56b
to
c9cef9c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Sxnan, thanks for addressing my comments in PR#23521. The code quality looks quite good to me. Nice work, very impressive.
I think I have one last question regarding the initial status before receiving the first RecordAttributes. Currently, this seems undefined, or depends on the operators. Shall we explicitly define the initial status? Would it be a problem if we don't? (E.g., any consistency issues?)
/** If any of the input channels is backlog, the combined RecordAttributes is backlog. */ | ||
private boolean combineIsBacklog( | ||
RecordAttributes lastRecordAttributes, RecordAttributes recordAttributes) { | ||
if (lastRecordAttributes == null | ||
|| lastRecordAttributes.isBacklog() != recordAttributes.isBacklog()) { | ||
if (lastRecordAttributes != null && recordAttributes.isBacklog()) { | ||
nonBacklogChannelsCnt -= 1; | ||
} | ||
if (!recordAttributes.isBacklog()) { | ||
nonBacklogChannelsCnt += 1; | ||
} | ||
} | ||
|
||
return nonBacklogChannelsCnt < numInputChannels; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to understand the behavior during job initialization. According to this method, this will be:
-
As soon as the first
RecordAttributes
is received, despite itsisBacklog
is true or false, the combiner will emit aRecordAttributes
withisBacklog
being false to the downstream, unless there's only one input channel. Is that correct? -
Then the questions is, what happens before the first
RecordAttributes
is received? What is the initial status, and how should the operators behave? Would it be possible that the operators are initialized for one mode (e.g., non-backlog) and have to switch to another mode (e.g., backlog) before receiving any records? Or even worse, different operators might be initialized with inconsistent modes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review!
I updated the RecordAttributesCombiner
to address the initialization issue. The isBacklog of each input channel has three states, namely undefined, isBacklog=true, and isBacklog=false. We decide the isBacklog of the input as the following:
- if any input channel isBacklog=true, then the input is isBacklog=true
- otherwise, if any input channel is undefined, the isBacklog status of the input is unchanged
- otherwise (all the channel is defined and isBacklog=false), the input is isBacklog=false
Currently, all the operators are initialized with non-backlog mode. I agree that, ideally, we should determine the initial status before receiving the first RecordAttributes
so that we don't have to initialize the operator in non-backlog mode and immediately switch to backlog mode before processing any records. However, It turns out that it is non-trivial and I don't think it should block this PR. Thus, I prefer to keep the current PR simple and address the problem in the future. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me. Could you please open a JIRA ticket about this future work, so that we don't lose track on it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The JIRA ticket is created.
c9cef9c
to
84d0516
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Merging.
What is the purpose of the change
This PR introduces RecordAttributes that contain information about whether the data is backlog. The RecordAttributes will propagate through the job graph along with the data at runtime.
Brief change log
Verifying this change
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: yesDocumentation