Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mgr: Fix for dashboard/prometheus failure due to laggy pg state #37909

Merged
merged 1 commit into from
Dec 10, 2020

Conversation

alexandrsushko
Copy link
Contributor

PG_STATES in pybind/mgr/mgr_module.py are synced with osd/osd_types.cc now.
pybind/mgr/prometheus/module.py: safe increment for pg_%STATE% metrics

Fixes: https://tracker.ceph.com/issues/46142
Signed-off-by: Alexander Sushko alexandrsushko@gmail.com

@callithea callithea added the mgr label Nov 9, 2020
@callithea callithea changed the title pybind/mgr: Fix for dashboard/prometheus failure due to laggy pg state mgr: Fix for dashboard/prometheus failure due to laggy pg state Nov 9, 2020
Copy link
Member

@neha-ojha neha-ojha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, the first commit is cosmetic and the second commit is needed because of 5cdadf1 (it'd be good to include a reference to this commit in the fix)? Should we backport it all the way until N?

@tchaikov
Copy link
Contributor

@alexandrsushko ping?

@tchaikov
Copy link
Contributor

mypy run-test: commands[0] | mypy --config-file=../../mypy.ini cephadm/module.py mgr_module.py dashboard/module.py prometheus/module.py mgr_util.py orchestrator/__init__.py progress/module.py rook/module.py snap_schedule/module.py stats/module.py test_orchestrator/module.py mds_autoscaler/module.py volumes/__init__.py
prometheus/module.py: note: In member "get_pg_status" of class "Module":
prometheus/module.py:627: error: Need type annotation for 'num_by_state'
Found 1 error in 1 file (checked 13 source files)

src/pybind/mgr/prometheus/module.py Show resolved Hide resolved
src/pybind/mgr/prometheus/module.py Outdated Show resolved Hide resolved
@tchaikov
Copy link
Contributor

tchaikov commented Nov 28, 2020

Copy link
Contributor

@p-se p-se left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@tchaikov
Copy link
Contributor

tchaikov commented Dec 9, 2020

@alexandrsushko i took the liberty of applying the suggested change in hope to get this PR merged sooner. hope it's fine to you..

@tchaikov
Copy link
Contributor

tchaikov commented Dec 9, 2020

jenkins test api

@tchaikov
Copy link
Contributor

tchaikov commented Dec 9, 2020

jenkins test doc

@tchaikov tchaikov self-assigned this Dec 9, 2020
@tchaikov
Copy link
Contributor

retest this please

num_by_state[state] += count in get_pg_status method raises KeyError
if pg state is not in PG_STATES list. PG_STATES should be synced with
osd_types.cc:pg_state_string(). But sometimes it is not. After the
KeyError raise mgr metrics are not available at all.

Fixes: https://tracker.ceph.com/issues/46142

Signed-off-by: Alexander Sushko <alexandrsushko@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants