[FLINK-33681] Reuse input/output metrics of SourceOperator/SinkWriterOperator for task #23998
+45
−4
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What is the purpose of the change
Currently, the numRecordsIn & numBytesIn metrics for sources and the numRecordsOut & numBytesOut metrics for sinks are always 0 on the Flink web dashboard.
FLINK-11576 brings us these metrics on the opeartor level, but it does not integrate them on the task level. On the other hand, the summay metrics on the job overview page is based on the task level I/O metrics. As a result, even though new connectors supporting FLIP-33 metrics will report operator-level I/O metrics, we still cannot see the metrics on dashboard.
This MR attempts to reuse the operator-level source/sink I/O metrics for task so that they can be viewed on Flink web dashboard.
Brief change log
Reuse input/output metrics of SourceOperator/SinkWriterOperator for task.
Since Flink only accounts for internal traffic for input/output bytes metrics before, the reuse won't cause duplication in the I/O bytes metrics. Also, as the output records metric is intentionally dropped for SinkWriterOperator in
OperatorChain#getOperatorRecordsOutCounter
and no input records metric is collected forSourceOperatorStreamTask
, no duplication in the I/O records metrics will take place.Verifying this change
Manually run Kafka2Kafka job on a testing cluster on K8s, verified that the source/sink input/output metrics can be seen on the web dashboard.
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: (yes / no)Documentation