New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mgr/crash: show timestamps in more user-friendly format #36671
Conversation
In order to test this, I would need to somehow precipitate a crash. Any ideas how to do that? |
[global]
debug asok assert abort = true $ ceph tell mon.a abort |
or just |
src/pybind/mgr/crash/module.py
Outdated
@@ -80,7 +80,7 @@ def _refresh_health_checks(self): | |||
'%s crashed on host %s at %s' % ( | |||
crash.get('entity_name', 'unidentified daemon'), | |||
crash.get('utsname_hostname', '(unknown)'), | |||
crash.get('timestamp', 'unknown time')) | |||
self.user_friendly_timestamp(crash.get('timestamp', 'unknown time'))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: A little cleaner to me looks to just pass crash.get('timestamp')
to user_friendly_timestamp
and make user_friendly_timestamp
return "unknown time" if not timestr
.
src/pybind/mgr/crash/module.py
Outdated
return timestr | ||
datetime_obj = self.time_from_string(timestr) | ||
return "{}+00:00".format( | ||
datetime_obj.isoformat(' ', timespec='seconds') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of manually formatting (adding timezone) you could just properly set the timezone in datetime_obj
:
datetime_obj.replace(tzinfo=datetime.timezone.utc).isoformat(' ', timespec='seconds')
Or probably time_from_string
could be fixed to set timezone to utc when creating the datetime
object.
src/pybind/mgr/crash/module.py
Outdated
@@ -114,6 +114,16 @@ def time_from_string(self, timestr): | |||
except ValueError: | |||
return datetime.datetime.strptime(timestr, OLD_DATEFMT) | |||
|
|||
def user_friendly_timestamp(self, timestr): | |||
# Converts timestamp as stored in crash metadata into a more | |||
# user-friendly format. See https://tracker.ceph.com/issues/43897 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I don't see much value in this comment here, especially in adding the tracker url. The purpose of the function is self evident.
Fixes: https://tracker.ceph.com/issues/43897 Signed-off-by: Nathan Cutler <ncutler@suse.com>
I tried killing a daemon, but when the daemon is running inside a container, it has PID 1 and |
Thanks, @tchaikov! Have you tried it when the daemon is running in a container? I'm getting:
I guess this doesn't work for the same reason that |
|
jenkins test make check |
no. i haven't tried this for killing a daemon in container.
this error is returned by MonClient when it failed to talk to monitor after tried "mon_client_directed_command_retry" times. can you try something like ceph -s ?
yeah, i am afraid so. this command is handled right in a daemon by calling |
@tchaikov Nevermind, this just indicates that the command succeeded:
|
i see, monitor failed to reply |
return datetime.datetime.strptime(timestr, OLD_DATEFMT) | ||
datetime_obj = datetime.datetime.strptime(timestr, OLD_DATEFMT) | ||
# add time zone to datetime object before returning it | ||
return datetime_obj.replace(tzinfo=datetime.timezone.utc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this breaks the self_test()
.
2020-08-18T15:07:02.661 INFO:teuthology.orchestra.run.smithi115.stderr:2020-08-18T15:07:02.654+0000 7fe288575700 1 -- 172.21.15.115:0/1247744208 --> [v2:172.21.15.115:6808/33840,v1:172.21.15.115:6809/338
40] -- mgr_command(tid 0: {"prefix": "mgr self-test module", "module": "crash", "target": ["mon-mgr", ""]}) v1 -- 0x7fe280063560 con 0x7fe258077b50
2020-08-18T15:07:02.662 INFO:teuthology.orchestra.run.smithi115.stderr:2020-08-18T15:07:02.656+0000 7fe26e7fc700 1 -- 172.21.15.115:0/1247744208 <== mgr.4499 v2:172.21.15.115:6808/33840 1 ==== mgr_comman
d_reply(tid 0: -1 Test failed: Remote method threw exception: Traceback (most recent call last):
2020-08-18T15:07:02.662 INFO:teuthology.orchestra.run.smithi115.stderr: File "/usr/share/ceph/mgr/crash/module.py", line 353, in self_test
2020-08-18T15:07:02.662 INFO:teuthology.orchestra.run.smithi115.stderr: raise RuntimeError('time_from_string() failed')
2020-08-18T15:07:02.662 INFO:teuthology.orchestra.run.smithi115.stderr:RuntimeError: time_from_string() failed
2020-08-18T15:07:02.662 INFO:teuthology.orchestra.run.smithi115.stderr:) v1 ==== 248+0+0 (secure 0 0 0) 0x7fe27c00d410 con 0x7fe258077b50
2020-08-18T15:07:02.663 INFO:teuthology.orchestra.run.smithi115.stderr:2020-08-18T15:07:02.656+0000 7fe26e7fc700 20 mgrc handle_command_reply tid 0 r -1
2020-08-18T15:07:02.663 INFO:teuthology.orchestra.run.smithi115.stderr:Error EPERM: Test failed: Remote method threw exception: Traceback (most recent call last):
2020-08-18T15:07:02.663 INFO:teuthology.orchestra.run.smithi115.stderr: File "/usr/share/ceph/mgr/crash/module.py", line 353, in self_test
2020-08-18T15:07:02.663 INFO:teuthology.orchestra.run.smithi115.stderr: raise RuntimeError('time_from_string() failed')
2020-08-18T15:07:02.663 INFO:teuthology.orchestra.run.smithi115.stderr:RuntimeError: time_from_string() failed
2020-08-18T15:07:02.663 INFO:teuthology.orchestra.run.smithi115.stderr:
2020-08-18T15:07:02.664 INFO:tasks.ceph.mgr.x.smithi115.stderr:2020-08-18T15:07:02.656+0000 7f4ae3639700 -1 Remote method threw exception: Traceback (most recent call last):
2020-08-18T15:07:02.664 INFO:tasks.ceph.mgr.x.smithi115.stderr: File "/usr/share/ceph/mgr/crash/module.py", line 353, in self_test
2020-08-18T15:07:02.664 INFO:tasks.ceph.mgr.x.smithi115.stderr: raise RuntimeError('time_from_string() failed')
2020-08-18T15:07:02.664 INFO:tasks.ceph.mgr.x.smithi115.stderr:RuntimeError: time_from_string() failed
2020-08-18T15:07:02.665 INFO:tasks.ceph.mgr.x.smithi115.stderr:
2020-08-18T15:07:02.665 INFO:tasks.ceph.mgr.x.smithi115.stderr:2020-08-18T15:07:02.656+0000 7f4ae3639700 -1 mgr.server reply reply (1) Operation not permitted Test failed: Remote method threw exception: T
raceback (most recent call last):
2020-08-18T15:07:02.666 INFO:tasks.ceph.mgr.x.smithi115.stderr: File "/usr/share/ceph/mgr/crash/module.py", line 353, in self_test
2020-08-18T15:07:02.666 INFO:tasks.ceph.mgr.x.smithi115.stderr: raise RuntimeError('time_from_string() failed')
2020-08-18T15:07:02.668 INFO:tasks.ceph.mgr.x.smithi115.stderr:RuntimeError: time_from_string() failed
seems it also breaks test in qa/tasks/mgr/test_insights.py
,
2020-08-18T14:50:49.983 INFO:tasks.cephfs_test_runner:test_crash_history (tasks.mgr.test_insights.TestInsights) ... FAIL
2020-08-18T14:50:49.984 INFO:tasks.cephfs_test_runner:
2020-08-18T14:50:49.985 INFO:tasks.cephfs_test_runner:======================================================================
2020-08-18T14:50:49.985 INFO:tasks.cephfs_test_runner:FAIL: test_crash_history (tasks.mgr.test_insights.TestInsights)
2020-08-18T14:50:49.985 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2020-08-18T14:50:49.986 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2020-08-18T14:50:49.986 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-kefu-testing-2020-08-18-1658/qa/tasks/mgr/test_insights.py", line 197, in test_crash_history
2020-08-18T14:50:49.986 INFO:tasks.cephfs_test_runner: self._add_crash(1)
2020-08-18T14:50:49.987 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-kefu-testing-2020-08-18-1658/qa/tasks/mgr/test_insights.py", line 45, in _add_crash
2020-08-18T14:50:49.987 INFO:tasks.cephfs_test_runner: self.assertEqual(0, ret)
2020-08-18T14:50:49.987 INFO:tasks.cephfs_test_runner:AssertionError: 0 != 22
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
unstale |
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
Fixes: https://tracker.ceph.com/issues/43897
Signed-off-by: Nathan Cutler ncutler@suse.com