Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-27420] Recreate metric groups for each new RM to avoid metric loss #19646

Conversation

baugarten
Copy link

@baugarten baugarten commented May 5, 2022

See #19607

This is the same PR, but based on release-1.15. This appears safe to cherry-pick to release-1.14 as well (azure). I can open another PR for 1.14 if that's preferable.

What is the purpose of the change

Fix bug identified in https://issues.apache.org/jira/browse/FLINK-27420 by recreating the metric groups for slot manager and resource manager when leadership is granted.

Brief change log

  • The MetricRegistry and Hostname are stored in the ResourceManagerProcessContext
  • The SlotManagerMetricGroup and ResourceManagerMetricGroup are created in the ResourceManagerFactory

Verifying this change

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: yes
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? not applicable

@flinkbot
Copy link
Collaborator

flinkbot commented May 5, 2022

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@xintongsong
Copy link
Contributor

@baugarten,
It seems we still have some compiling issues for 1.14, most likely related to the Junit 4->5 migration which starts in 1.15.

I'll merge this PR for the 1.15 branch.

xintongsong pushed a commit that referenced this pull request May 5, 2022
@xintongsong
Copy link
Contributor

fbc8e46

@xintongsong xintongsong closed this May 5, 2022
czy006 pushed a commit to czy006/flink that referenced this pull request Jun 30, 2022
czy006 pushed a commit to czy006/flink that referenced this pull request Jun 30, 2022
czy006 pushed a commit to czy006/flink that referenced this pull request Jul 7, 2022
czy006 pushed a commit to czy006/flink that referenced this pull request Jul 11, 2022
czy006 pushed a commit to czy006/flink that referenced this pull request Jul 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants