New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
os/bluestore: allow ceph-bluestore-tool to coalesce, add and migrate BlueFS backing volumes #23103
Conversation
src/os/bluestore/BlueFS.cc
Outdated
<< " : own 0x" << block_all[i] | ||
<< " = 0x" << owned | ||
<< " : using 0x" << owned - free | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra blank line, weird indentation
88de445
to
dade37a
Compare
retest this please |
dade37a
to
17599fa
Compare
17599fa
to
1bd27c1
Compare
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
38f6734
to
271fdc8
Compare
271fdc8
to
3cd5254
Compare
src/os/bluestore/BlueStore.cc
Outdated
_dump_alloc_on_rebalance_failure(); | ||
return 0; | ||
} else if (alloc_len < (int64_t)gift) { | ||
dout(0) << __func__ << " insufficient allocate on 0x" << std::hex << gift | ||
<< " min_alloc_size 0x" << min_alloc_size | ||
<< " min_alloc_size 0x" << cct->_conf->bluefs_alloc_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the switch here? this seems dangerous because the config (which only affects mkfs) may disagree with the ondisk store's min_alloc_size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to better reflect previous allocation failure. It uses cct->_conf->bluefs_alloc_size not min_alloc_size. May be rename the caption?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, i misread the option name as the bluestore option.
src/os/bluestore/BlueFS.cc
Outdated
/*if (bdev[BDEV_WAL]) { | ||
_write_super(BDEV_NEWDB); | ||
flush_bdev(); | ||
} else {*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this dead code?
3cd5254
to
81065e9
Compare
4b64762
to
b4639eb
Compare
ceph-bluestore-tool. Signed-off-by: Igor Fedotov <ifedotov@suse.com>
amount of volumes is configured. This allows both coalescence and split for BlueFS backing volumes. Signed-off-by: Igor Fedotov <ifedotov@suse.com>
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
b4639eb
to
91c73df
Compare
@liewegas - this is ready for the final review. |
I want this feature so hard to make our test cluster move DBs to the correct device after 13.2.0 bug that caused them to spill to db.slow that I even rebased your PR on mimic and tried it on one of OSDs :)) it passed fsck, but it seems a) it moved nothing and b) OSD didn't start after that :) maybe my rebase isn't that good :) I'll try to test it slightly more... By the way, does your tool move SSTs from db.slow to db when migrating if only some of them are on db.slow? I.e. my 2-tb OSDs use 1.5GB of db and 2GB of db.slow. Will bluefs-bdev-migrate move these 2GB to db? Or should I first migrate db -> db.slow and then db.slow -> db? |
@vitalif - could you please share the complete command line you used for migration? And original volume layout. Answering your questions - no matter how SSTs are spread over volumes - once you specify slow as a source and db as a target everything should work properly. |
The command was
I did
After that I also tried I created a PR with rebased commits as you've suggested, it's here: #24784 |
@iliul - I'm looking into your issue. Meanwhile - could you please open a ticket in the tracker - communicating here is a bit inconvenient... |
@iliul - IMO you don't need to specify block.db in --devs-source, just block is enough. |
@ifed01 Thanks for you reply Yes, It's working fine after I remove it. The metadata that has overflowed into the slow device has been moved back. BTW, I have a doubt about using I tried to modify it with Then I filled these two parameters into the global section of the configuration file, restarted the osd, and |
@iliul - you can use specify ceph params in env variable, e.g.: |
Options to split and/or migrate bluefs data between volumes will follow.