New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mgr/crash: raise warning about recent crashes and other improvements #29034
Conversation
liewegas
commented
Jul 14, 2019
- raise health warning about recent crashes
- 'ls-new' as well as 'ls' command
- 'acrhive' and 'archive-all' commands
- keep crashes in mgr memory
- automatic pruning
@alfredodeza I had some battles with python 3 here, mainly with all of the comprehensions from dicts. IIUC, I have to do |
I believe in Python2 "for x in d" would give you the keys, so the same as "for x in d.keys()". You always needed items() or iteritems() to get (k, v) pairs. iteritems() is the default in P3, so is now just items(). |
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
e7773d4
to
cbedfbc
Compare
This is compatible with both Python versions (which Dan explains): for key, value in dictionary.items() |
b28067b
to
0619d26
Compare
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
0619d26
to
08be693
Compare
@noahdesu i removed the telemetry test since the crash module is doing a better validity check on the crash dump.. that ok? |
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
08be693
to
9941e28
Compare
Signed-off-by: Sage Weil <sage@redhat.com>
9941e28
to
209ce4e
Compare
Yup
…On Fri, Jul 19, 2019 at 11:47 AM Sage Weil ***@***.***> wrote:
@noahdesu i removed the telemetry test since the crash module is doing a better validity check on the crash dump.. that ok?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Signed-off-by: Sage Weil <sage@redhat.com>
|
||
The time period for what "recent" means is controlled by the option | ||
``mgr/crash/warn_recent_interval`` (default: two weeks). | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we also add retain_interval here?
Signed-off-by: Sage Weil <sage@redhat.com>
I pushed another commit that adds it to the mgr/crash.rst file. I don't think it belongs in the health report section since it's unrelated to the report (or mitigating it).
|
looks good |
* refs/pull/29034/head: doc/mgr/crash: document missing commands, options qa/suites/rados/singleton/all/test-crash: whitelist RECENT_CRASH qa/suites/rados/mgr/tasks/insights: whitelist RECENT_CRASH qa/tasks/mgr/test_insights: crash module now rejects bad crash reports mgr/telemetry: fix remote into crash do_ls() mgr/crash: don't make these methods static mgr/BaseMgrModule: handle unicode health detail strings mgr/crash: verify timestamp is valid qa/suites/mgr: whitelist RECENT_CRASH mgr/crash: remove unused var mgr/crash: remove unused import 'six' qa/workunits/rados/test_crash: health check mgr/crash: improve validation on post mgr/crash: automatically prune old crashes after a year mgr/crash: raise RECENT_CRASH warning for recent (new) crashes mgr/crash: add 'crash ls-new' mgr/crash: add option and serve infra mgr/crash: keep copy of crashes in memory mgr/pg_autoscaler: adjust style to match built-in tables mgr/crash: make 'crash ls' a nice table with a NEW column mgr/crash: nicely format 'crash info' output mgr/crash: add 'crash archive <id>', 'crash archive-all' commands Reviewed-by: Neha Ojha <nojha@redhat.com>