New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mgr/telegraf: Use Python generator and catch OSError #22418
Conversation
retest this please |
1 similar comment
retest this please |
Ping @hydro-b :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could put something like
try:
next(self.gather_measurements())
except StopIteration:
raise RuntimeError('No measurements found')
at https://github.com/ceph/ceph/pull/22418/files#diff-2d45266c801d406eac40c4bbce5afe5eR287
otherwise we have http://pulpito.ceph.com/kchai-2018-06-10_02:09:19-rados-wip-kefu-testing-2018-06-09-2340-distro-basic-smithi/2648089/
48767c3
to
372194e
Compare
Done! |
On Mon, Jun 11, 2018 at 04:59:28AM -0700, Wido den Hollander wrote:
Done!
I'll re-test tomorrow!
|
src/pybind/mgr/telegraf/module.py
Outdated
measurements = self.gather_measurements() | ||
if len(measurements) == 0: | ||
raise RuntimeError('No measurements found') | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry: IndentationError: expected an indented block (module.py, line 286)
src/pybind/mgr/telegraf/module.py
Outdated
if len(measurements) == 0: | ||
raise RuntimeError('No measurements found') | ||
try: | ||
next(self.gather_measurements()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling next() here will only result in evaluating the first element (I think?) -- maybe this should be list() instead to ensure hitting all the entries in the iterators is working
Instead of collection all data in a list prior to sending use generators inside Python to send out data as soon as the Mgr has supplied us with the data. On larger systems with thousands of OSDs this can reduce the amount of time it takes to collect and send the data. Telegraf also likes this more as we send less data in one go. Signed-off-by: Wido den Hollander <wido@42on.com>
372194e
to
481c116
Compare
A OSError will be thrown (and not catched) if the module starts for the first time and the UNIX socket of the Telegraf Agent does not exist (yet). This will render the module useless because it can not accept commands for configuration at that point. If we catch an exception, simply log it but do not raise it. Signed-off-by: Wido den Hollander <wido@42on.com>
481c116
to
e8a9357
Compare
@jcsp I'm casting it to a list now and then check if it's empty. This way all iterators are called to ensure we try to gather all data. |
the failure are due to test env: tracked by http://tracker.ceph.com/issues/24670 |
I saw two problems with the Telegraf module for the Mgr.
The OSError not being catched was the main issue as this will be thrown after module startup if the Telegraf socket does not exist (yet).
Performance wise I've also seen that using generators improves the performance as we can streamline how data is send to Telegraf. Instead of collecting everything in lists first we now use a pipeline where we fetch from the Manager and then directly send it to Telegraf.