-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OSD::Modify OSD Fast-Shutdown to work safely i.e. quiesce all activit… #44913
Conversation
f451f69
to
38ffd4d
Compare
38ffd4d
to
9051e0a
Compare
9051e0a
to
3dc81c4
Compare
3dc81c4
to
c10bb3a
Compare
looking pretty good, the teuthology piece could be a separate PR |
94d92f6
to
203e3e3
Compare
There are 2 tests showing corruption in the allocation file: I need to understand what was done in those tests and why we ended with corrupted allocation file, but for now we should stop the merge :-( |
It seems that the problem is an old race-condition in NCB code unrelated to safe-fast-shutdown. The problem might be a race in the way we free up space on BlueFS on compaction |
f35526c
to
b94db00
Compare
f70b0d8
to
40213c8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra changes look good
40213c8
to
62dc694
Compare
@benhanokh please see this failure https://pulpito.ceph.com/nojha-2022-02-23_17:58:44-rados-GBH_safe_shutdown_v2_basecode_sanity_check_disabled_2-distro-basic-smithi/6702922/
|
62dc694
to
0417458
Compare
quiesce all activities and destage allocations to disk before killing the OSD 1) keep the old (unsafe) fast-shutdown when we are not using NCB (non null-manager()) 2) skip service.prepare_to_stop() which can take as much as 10 seconds 3) skip debug options in fast-shutdown 4) set_state(STATE_STOPPING) which will stop accepting new tasks to this OSD 5) clear op_shardedwq queues, this is safe since we didn't started processing them 6) stop timer 7) drain osd_op_tp (no new items will be added) 8) now we can safely call umount which will close_db/bluefs and will destage allocation to disk 9) skip _shutdown_cache() when we are in the middle of a fast-shutdown 10) increase debug level on fast-shutdown 11) add option for bluestore_qfsck_on_mount to force scan on mount for all tests 12) disable fsck-on-umount when running fast-shutdown 13) add an option to increase debug level at fast-shutdown umount() 14) set a time limit to fast-shutdown 15) Bug-Fix BlueStore::pool_statfs don't access db after it was removed 16) Fix error message for qfsck (error was caused by PR ceph#44563) 17) make shutdown-timeout configurable Fixes: https://tracker.ceph.com/issues/53266 Signed-off-by: Gabriel Benhanokh <gbenhano@redhat.com>
Fixes problem with sync compaction (_rewrite_log_and_layout_sync). There was a problem with not updating log_seq after compacting log. It cause to stop _replay log right after first transaction. ... 20 bluefs _replay 0x0: op_dir_create sharding ... 20 bluefs _replay 0x0: op_dir_link sharding/def to 21 ... 20 bluefs _replay 0x0: op_jump_seq 1025 ... 10 bluefs _read h 0x555557c46400 0x1000~1000 from file(ino 1 size 0x1000 mtime 0.000000 allocated 410000 alloc_commit 410000 extents [1:0x1540000~410000]) ... 20 bluefs _read left 0xff000 len 0x1000 ... 20 bluefs _read got 4096 ... 10 bluefs _replay 0x1000: stop: seq 1025 != expected 1026 This is a product of bluefs fine grain locks refactor. Signed-off-by: Adam Kupczyk <akupczyk@redhat.com> (cherry picked from commit 2f8e370) Conflicts: src/test/objectstore/test_bluefs.cc
Close window for possibility to capture allocator state and bluefs state that are not in sync. Signed-off-by: Gabriel Benhanokh <gbenhano@redhat.com>
0417458
to
8d05255
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
_close_db_and_around(); | ||
if (cct->_conf->bluestore_fsck_on_umount) { | ||
// disable fsck on fast-shutdown |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"skip" is better word then "disable" here
@@ -4448,6 +4504,9 @@ int OSD::shutdown() | |||
hb_front_server_messenger->shutdown(); | |||
hb_back_server_messenger->shutdown(); | |||
|
|||
utime_t duration = ceph_clock_now() - start_time_func; | |||
dout(0) <<"Slow Shutdown duration:" << duration << " seconds" << dendl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proposal: How about calling it "Full" or "Orderly" instead of "Slow"?
// vstart overwrites osd_fast_shutdown value in the conf file -> force the value here! | ||
//cct->_conf->osd_fast_shutdown = true; | ||
|
||
dout(0) << "Fast Shutdown: - cct->_conf->osd_fast_shutdown = " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to improve output, show something like:
"Shutdown: Fast, null-fm=true"
@@ -4258,27 +4258,44 @@ PerfCounters* OSD::create_recoverystate_perf() | |||
|
|||
int OSD::shutdown() | |||
{ | |||
// vstart overwrites osd_fast_shutdown value in the conf file -> force the value here! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suspicious comments. Is there something missing in this PR?
If this is for testing purposes, then it should be a separate commit.
set_state(STATE_STOPPING); | ||
|
||
// Debugging | ||
if (cct->_conf.get_val<bool>("osd_debug_shutdown")) { | ||
// Disabled debugging during fast-shutdown |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not disable any debugging here....
Failures, unrelated: Details: |
jenkins test windows |
PR was backported to quincy - #45342 |
OSD::Modify OSD Fast-Shutdown to work safely i.e. quiesce all activities and destage allocations to disk before killing the OSD
Fixes: https://tracker.ceph.com/issues/53266
Signed-off-by: Gabriel Benhanokh gbenhano@redhat.com