Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-33681] Reuse input/output metrics of SourceOperator/SinkWriterOperator for task #23998

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

X-czh
Copy link
Contributor

@X-czh X-czh commented Dec 27, 2023

What is the purpose of the change

Currently, the numRecordsIn & numBytesIn metrics for sources and the numRecordsOut & numBytesOut metrics for sinks are always 0 on the Flink web dashboard.

FLINK-11576 brings us these metrics on the opeartor level, but it does not integrate them on the task level. On the other hand, the summay metrics on the job overview page is based on the task level I/O metrics. As a result, even though new connectors supporting FLIP-33 metrics will report operator-level I/O metrics, we still cannot see the metrics on dashboard.

This MR attempts to reuse the operator-level source/sink I/O metrics for task so that they can be viewed on Flink web dashboard.

Brief change log

Reuse input/output metrics of SourceOperator/SinkWriterOperator for task.

Since Flink only accounts for internal traffic for input/output bytes metrics before, the reuse won't cause duplication in the I/O bytes metrics. Also, as the output records metric is intentionally dropped for SinkWriterOperator in OperatorChain#getOperatorRecordsOutCounter and no input records metric is collected for SourceOperatorStreamTask, no duplication in the I/O records metrics will take place.

Verifying this change

Manually run Kafka2Kafka job on a testing cluster on K8s, verified that the source/sink input/output metrics can be seen on the web dashboard.

image

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
  • The serializers: (yes / no / don't know)
  • The runtime per-record code paths (performance sensitive): (yes / no / don't know)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
  • The S3 file system connector: (yes / no / don't know)

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@flinkbot
Copy link
Collaborator

flinkbot commented Dec 27, 2023

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

Copy link
Contributor

@affo affo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice job, thank you!
Though I am not committer, I stumbled upon this PR :)

I find the logic of reuseing the input metrics a bit convoluted, but that is nothing to deal with this PR.

@X-czh
Copy link
Contributor Author

X-czh commented Apr 22, 2024

@affo Thanks! @huwh Could you help take a look at the PR?

@X-czh
Copy link
Contributor Author

X-czh commented May 23, 2024

Hi @gyfora, do you think it a good way to go? Many Flink beginners have been complaining about this here and there for years. Given that we've already have the right stats in operator-level metrics, it'll be good to expose it on UI directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants