Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mgr/telemetry: fix waiting for mgr to warm up #43864

Merged
merged 1 commit into from Dec 13, 2021

Conversation

yaarith
Copy link
Contributor

@yaarith yaarith commented Nov 9, 2021

  1. The implementation of config_notify() in telemetry module sets the
    flag for event, which is supposed to wake up the 'serve' thread whenever
    a config option is changed. The problem is that we call config_notify()
    at the beginning of serve(), before we enter its 'run' loop. This call
    sets the event which cancels the 10 seconds wait for the mgr to warm up.
    To fix this, we extract the logic of updating the config options to a
    separate function (config_update_module_option()), and call it on
    serve(), instead of config_notify().

  2. We should always wait for the mgr to warm up here (10 seconds). In
    case of a sporadic event (e.g. a config option change via CLI) the event
    will be set, and wait will return immediately. We enforce this wait by
    using time.sleep(10) instead of event.wait(10).

Fixes: https://tracker.ceph.com/issues/53204
Signed-off-by: Yaarit Hatuka yaarit@redhat.com

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

1. The implementation of config_notify() in telemetry module sets the
flag for event, which is supposed to wake up the 'serve' thread whenever
a config option is changed. The problem is that we call config_notify()
at the beginning of serve(), before we enter its 'run' loop. This call
sets the event which cancels the 10 seconds wait for the mgr to warm up.
To fix this, we extract the logic of updating the config options to a
separate function (config_update_module_option()), and call it on
__init__, instead of calling config_notify() in serve().

2. We should always wait for the mgr to warm up here (10 seconds). In
case of a sporadic event (e.g. a config option change via CLI) the event
will be set, and wait will return immediately. We enforce this wait by
using time.sleep(10) instead of event.wait(10).

Fixes: https://tracker.ceph.com/issues/53204
Signed-off-by: Yaarit Hatuka <yaarit@redhat.com>
@ljflores
Copy link
Contributor

ljflores commented Nov 9, 2021

jenkins test make check arm64

@yuriw
Copy link
Contributor

yuriw commented Dec 13, 2021

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants