Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mgr/Zabbix: Various fixes to Zabbix module #19452

Merged
merged 4 commits into from Jan 2, 2018

Conversation

wido
Copy link
Member

@wido wido commented Dec 12, 2017

This pull request contains multiple improvements and fixes for the Zabbix Manager Module:

  • Use fsid of Ceph to identify with Zabbix
  • Configuration improvements of the module to make it run more smoothly
  • Logging fixes

@wido
Copy link
Member Author

wido commented Dec 12, 2017

The reason I added 'bugfix' as a label is that without this fix the Zabbix module doesn't start after setting zabbix_host and restarting the manager.

Copy link
Member

@liewegas liewegas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks ok to me, but I'm not sure if we're trying to converge on logging errors vs throwing exceptions. IIRC there is a PR in flight (or merged?) that will raise a health alert if a module throws an exception from serve(); dropping the exceptions here would lose that.

@wido
Copy link
Member Author

wido commented Dec 12, 2017

@liewegas Ah I see, you mean where I dropped the RuntimeErrors. I can add those back easily.

Just need to know what the convention is.

@jcsp
Copy link
Contributor

jcsp commented Dec 13, 2017

The PR with the error handling is #19235

It introduces a behaviour where exceptions from serve() or init() cause a health check to fail + get reported. However, it also prevents the module running until the mgr restarts, so it's meant for fatal/unexpected errors.

Modules are also able to raise their own health checks with MgrModule.set_health_checks, so maybe that's the right thing to do for runtime-fixable conditions like a bad configuration option?

@wido
Copy link
Member Author

wido commented Dec 14, 2017

@jcsp: Ok. But that PR isn't merged yet.

What to do in the meantime? Merge this and send a new PR when #19235 is merged?

@jcsp
Copy link
Contributor

jcsp commented Dec 15, 2017

The set_health_checks bits are already in master, so I think it would be neat to try using those to raise a friendly health message to tell the user to set their zabbix config when it's missing/invalid.

For anything really fatal, just raise the exception and it'll be handled more gracefully when #19235 merges.

Users can still override this parameter, but by default the fsid
of the cluster will be used to send data to Zabbix.

This makes it even easier to use the Zabbix module.

Signed-off-by: Wido den Hollander <wido@42on.com>
The module refused to run/start if not all configuration was properly set

This commit makes sure the module is initialized properly and allows it
to run much more smoothly.

Signed-off-by: Wido den Hollander <wido@42on.com>
Just to make sure more logging can be done when needed.

Signed-off-by: Wido den Hollander <wido@42on.com>
@wido
Copy link
Member Author

wido commented Dec 19, 2017

@jcsp I've brought back the RunTimeError exceptions again.

How does it look now?

@jcsp
Copy link
Contributor

jcsp commented Dec 19, 2017

We may want to revisit the runtimeerrors in the future, but clearly it's not a regression for this PR so 👍

@jcsp jcsp added the needs-qa label Dec 19, 2017
@wido
Copy link
Member Author

wido commented Dec 19, 2017

I see that the docs fail to build, but that's due to a 404 during the test itself.

Reading package lists...
E: Failed to fetch http://mirror.cs.uchicago.edu/ubuntu-toolchain-r/dists/xenial/main/binary-i386/Packages  404  Not Found
E: Failed to fetch http://mirror.yandex.ru/mirrors/launchpad/ubuntu-toolchain-r/dists/xenial/main/i18n/Translation-en  404  Not Found [IP: 213.180.204.183 80]
E: Some index files failed to download. They have been ignored, or old ones used instead.
Build step 'Execute shell' marked build as failure

Signed-off-by: Wido den Hollander <wido@42on.com>
@wido
Copy link
Member Author

wido commented Dec 19, 2017

I pushed again to trigger the checks. They succeeded now.

@tchaikov tchaikov merged commit 9fb89a8 into ceph:master Jan 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants