Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mimic: mgr: crashdump feature backport #24639

Merged
merged 23 commits into from Apr 5, 2019

Conversation

Projects
None yet
6 participants
@dmick
Copy link
Member

commented Oct 17, 2018

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

liewegas and others added some commits Jun 15, 2018

debian,rpm: /var/lib/ceph/crash
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit e37e640)
log: do not discard recent after dumping it
We want to call dump_recent multiple times without discarding those
log events.  Just iterate the list; don't discard it!

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 76802af)
common/BackTrace: add dump()
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 4998b5b)
common: add crash_dir option
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 72139f0)
common/assert: get rid of duplicate log dump from assert handler
When we abort() we get the log dump.  Only do it from the assert handler
if the fatal_signal_handlers are disabled for some reason.

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 3a7b9e4)
common/assert: record assert info in g_assert_* globals
These will then be available for others who are interested.

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit b0e59ce)
common/ceph_context: add "assert" and "abort" asok commands
Require 'debug_asok_assert_abort = true'

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 14699d2)
global/signal_handler: write crash dumps to /var/lib/ceph/crash/$uuid/
Include two files:

- meta, a JSON blob with everything interesting we can think of
- log, the dump_recent log events

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 90a46dc)
global/signal_handler: one less frame of context
    "backtrace": [
        "(()+0x942e6e) [0x55859889ae6e]",
        "(()+0x11fc0) [0x7f955b1aafc0]",
        "(gsignal()+0x10b) [0x7f9559e91f2b]",
        "(abort()+0x12b) [0x7f9559e7c561]",
	...

Drop one frame; it's not really helpful.  We could probably drop up through
gsignal(), but 1 seems safer across platforms.

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 7566c2e)
mgr, pybind/mgr: pass inbuf (ceph -i <file>) to modules
Modules may wish to receive bulk data; allow it

Signed-off-by: Dan Mick <dan.mick@redhat.com>
(cherry picked from commit 140761f)
qa/tasks/{ceph_manager.py,vstart_runner.py}: allow kwargs in raw_*
Allow passing kwargs (like stdin=) to the local and teuthology
clusters when running tests

Signed-off-by: Dan Mick <dan.mick@redhat.com>
(cherry picked from commit 7fc8714)
mgr/pybind/crash: handle crashdumps
Signed-off-by: Dan Mick <dan.mick@redhat.com>
(cherry picked from commit 29209a3)
qa/tasks/mgr: add test_crash, call from test_module_selftest
Signed-off-by: Dan Mick <dan.mick@redhat.com>
(cherry picked from commit 8145598)
qa/suites/rados/mgr: Add test_crash
Signed-off-by: Dan Mick <dan.mick@redhat.com>
(cherry picked from commit 85ab978)
doc/mgr: add doc for crash mgr module
Signed-off-by: Dan Mick <dan.mick@redhat.com>
(cherry picked from commit eb1103e)
mgr/crash: add timestamp filter helper
Signed-off-by: Noah Watkins <nwatkins@redhat.com>
(cherry picked from commit 4dda7cb)
mgr/crash: json report of recent crashes
reports crashes per entity type with hour granularity. primarily for
consumption by the insights module.

Signed-off-by: Noah Watkins <nwatkins@redhat.com>
(cherry picked from commit caa121b)
qa/suites/rados, qa/workunits/rados: Add suite/workunit for ceph-crash
Signed-off-by: Dan Mick <dan.mick@redhat.com>
(cherry picked from commit 298a1d9)
common/options: enable mgr 'crash' module by default
Signed-off-by: Dan Mick <dan.mick@redhat.com>
(cherry picked from commit 34e2853)
add ceph-crash service
ceph-crash runs from systemd and watches /var/lib/ceph/crash
for crashdumps, posting them to the mgrs using the mgr's
crash plugin

Signed-off-by: Dan Mick <dan.mick@redhat.com>
(cherry picked from commit da20184)
global/signal_handler: add 'done' file to signal crashdump is ready
for an asynchronous crash scraper to use for synchronization

Signed-off-by: Dan Mick <dan.mick@redhat.com>
(cherry picked from commit 1bb5c63)
global/signal_handler.cc: report assert_file as correct name
Boilerplate error led to two 'assert_line' items in the dump

Signed-off-by: Dan Mick <dan.mick@redhat.com>
(cherry picked from commit f6a4897)
mgr/telemetry: add crashdump info to report
Signed-off-by: Dan Mick <dan.mick@redhat.com>
(cherry picked from commit 2c17bc9)

@dmick dmick added this to the mimic milestone Oct 17, 2018

@dmick dmick requested a review from liewegas Oct 17, 2018

@smithfarm

This comment has been minimized.

Copy link
Contributor

commented Nov 2, 2018

@liewegas Is this something that should go into 13.2.3?

@smithfarm

This comment has been minimized.

Copy link
Contributor

commented Nov 15, 2018

@dmick @liewegas Ping. Should this be included in a normal mimic integration testing run?

@dmick

This comment has been minimized.

Copy link
Member Author

commented Nov 16, 2018

@smithfarm I believe so.

@smithfarm

This comment has been minimized.

Copy link
Contributor

commented Jan 25, 2019

@dmick @liewegas We're now collecting PRs for 13.2.5 - is this ready to go?

@dmick

This comment has been minimized.

Copy link
Member Author

commented Jan 28, 2019

as far as I know, yes

@dmick

This comment has been minimized.

Copy link
Member Author

commented Mar 14, 2019

Did this never get merged?

@smithfarm smithfarm changed the title mimic: crashdump feature backport mimic: mgr: crashdump feature backport Mar 15, 2019

@yuriw

This comment has been minimized.

Copy link
Contributor

commented Mar 20, 2019

@yuriw

This comment has been minimized.

Copy link
Contributor

commented Apr 2, 2019

@yuriw

This comment has been minimized.

Copy link
Contributor

commented Apr 5, 2019

@liewegas this passed tests and ready for merge

@liewegas liewegas merged commit 5ae3e4b into ceph:mimic Apr 5, 2019

4 checks passed

Docs: build check OK - docs built
Details
Signed-off-by all commits in this PR are signed
Details
Unmodified Submodules submodules for project are unmodified
Details
make check make check succeeded
Details
@sebastian-philipp

This comment has been minimized.

Copy link
Member

commented Apr 15, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.