Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os/bluestore: allow ceph-bluestore-tool to coalesce, add and migrate BlueFS backing volumes #23103

Merged
merged 5 commits into from Oct 22, 2018

Conversation

ifed01
Copy link
Contributor

@ifed01 ifed01 commented Jul 17, 2018

Options to split and/or migrate bluefs data between volumes will follow.

<< " : own 0x" << block_all[i]
<< " = 0x" << owned
<< " : using 0x" << owned - free

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra blank line, weird indentation

@ifed01
Copy link
Contributor Author

ifed01 commented Jul 18, 2018

retest this please

Signed-off-by: Igor Fedotov <ifedotov@suse.com>
@ifed01 ifed01 force-pushed the wip-ifed-bluefs-migrate branch 4 times, most recently from 38f6734 to 271fdc8 Compare October 10, 2018 17:19
@ifed01 ifed01 changed the title WIP: os/bluestore: allow ceph-bluestore-tool to coalesce BlueFS backing volumes WIP: os/bluestore: allow ceph-bluestore-tool to coalesce and add BlueFS backing volumes Oct 10, 2018
_dump_alloc_on_rebalance_failure();
return 0;
} else if (alloc_len < (int64_t)gift) {
dout(0) << __func__ << " insufficient allocate on 0x" << std::hex << gift
<< " min_alloc_size 0x" << min_alloc_size
<< " min_alloc_size 0x" << cct->_conf->bluefs_alloc_size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the switch here? this seems dangerous because the config (which only affects mkfs) may disagree with the ondisk store's min_alloc_size

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to better reflect previous allocation failure. It uses cct->_conf->bluefs_alloc_size not min_alloc_size. May be rename the caption?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, i misread the option name as the bluestore option.

/*if (bdev[BDEV_WAL]) {
_write_super(BDEV_NEWDB);
flush_bdev();
} else {*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this dead code?

ceph-bluestore-tool.

Signed-off-by: Igor Fedotov <ifedotov@suse.com>
amount of volumes is configured.

This allows both coalescence and split for BlueFS backing volumes.

Signed-off-by: Igor Fedotov <ifedotov@suse.com>
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
@ifed01 ifed01 changed the title WIP: os/bluestore: allow ceph-bluestore-tool to coalesce and add BlueFS backing volumes os/bluestore: allow ceph-bluestore-tool to coalesce and add BlueFS backing volumes Oct 17, 2018
@ifed01 ifed01 changed the title os/bluestore: allow ceph-bluestore-tool to coalesce and add BlueFS backing volumes os/bluestore: allow ceph-bluestore-tool to coalesce, add and migrate BlueFS backing volumes Oct 17, 2018
@ifed01
Copy link
Contributor Author

ifed01 commented Oct 18, 2018

@liewegas - this is ready for the final review.

@tchaikov tchaikov merged commit 4af71e7 into ceph:master Oct 22, 2018
@vitalif
Copy link

vitalif commented Oct 24, 2018

I want this feature so hard to make our test cluster move DBs to the correct device after 13.2.0 bug that caused them to spill to db.slow that I even rebased your PR on mimic and tried it on one of OSDs :)) it passed fsck, but it seems a) it moved nothing and b) OSD didn't start after that :) maybe my rebase isn't that good :) I'll try to test it slightly more...

By the way, does your tool move SSTs from db.slow to db when migrating if only some of them are on db.slow? I.e. my 2-tb OSDs use 1.5GB of db and 2GB of db.slow. Will bluefs-bdev-migrate move these 2GB to db? Or should I first migrate db -> db.slow and then db.slow -> db?

@ifed01
Copy link
Contributor Author

ifed01 commented Oct 24, 2018

@vitalif - could you please share the complete command line you used for migration? And original volume layout.
Also please provide more details on a) and b) - log output or other facts that makes you think it doesn't work.

Answering your questions - no matter how SSTs are spread over volumes - once you specify slow as a source and db as a target everything should work properly.
I suggest you to open both a PR with your rebase and corresponding ticket in the tracker with all the detail. It will be easier to investigate this way.

@ifed01 ifed01 deleted the wip-ifed-bluefs-migrate branch October 24, 2018 14:20
@vitalif
Copy link

vitalif commented Oct 26, 2018

The command was

./build/bin/ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-0 --devs-source /var/lib/ceph/osd/ceph-0/block --dev-target /var/lib/ceph/osd/ceph-0/block.db --command=bluefs-bdev-migrate

I did fsck before and after this command and it passed, but OSD failed to start with the following messages:

2018-10-24 15:26:53.321 7fef72f421c0  1 freelist init
2018-10-24 15:26:53.341 7fef72f421c0  1 bluestore(/var/lib/ceph/osd/ceph-0) _open_alloc opening allocation metadata
2018-10-24 15:26:53.958 7fef72f421c0  1 bluestore(/var/lib/ceph/osd/ceph-0) _open_alloc loaded 597 GiB in 167765 extents
2018-10-24 15:26:53.960 7fef72f421c0 -1 bluestore(/var/lib/ceph/osd/ceph-0) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0x6efd615c, expected 0xadc1fde4, device location [0x10000~1000], logical extent 0x0~1000, object #-1:7b3f43c4:::osd_superblock:0#
2018-10-24 15:26:53.960 7fef72f421c0 -1 osd.0 0 OSD::init() : unable to read osd superblock

After that I also tried bluefs-export and it produced a directory with some SSTs still residing in db.slow.

I created a PR with rebased commits as you've suggested, it's here: #24784

@ifed01
Copy link
Contributor Author

ifed01 commented Jun 19, 2019

@iliul - I'm looking into your issue. Meanwhile - could you please open a ticket in the tracker - communicating here is a bit inconvenient...

@ifed01
Copy link
Contributor Author

ifed01 commented Jun 19, 2019

@iliul - IMO you don't need to specify block.db in --devs-source, just block is enough.

@iliul
Copy link
Contributor

iliul commented Jun 19, 2019

@iliul - IMO you don't need to specify block.db in --devs-source, just block is enough.

@ifed01 Thanks for you reply

Yes, It's working fine after I remove it. The metadata that has overflowed into the slow device has been moved back.

BTW, I have a doubt about using bluefs-bdev-new-db , It required bluestore_block_db_create = true and bluestore_block_db_size > 0, But now my cluster has been initialized with default value (false, 0), can I dynamically modify these two parameters?

I tried to modify it with ceph daemon osd.x config set, Unfortunately, Returned Function not implement,Is these two parameters only set when the osd is initialized ?

Then I filled these two parameters into the global section of the configuration file, restarted the osd, and ceph daemon osd.x config get has taken effect, but bluefs-bdev-new-db still outputs the error "DB size isn't specified, please set Ceph bluestore-block-db-size config parameter ".

@ifed01
Copy link
Contributor Author

ifed01 commented Jun 19, 2019

@iliul - you can use specify ceph params in env variable, e.g.:
CEPH_ARGS="--bluestore_block_db_size=12345 --bluestore_clock_db_create=true" ceph_bluestore_tool ....
Please start using direct email exchange or mailing list for future communication rather than spoiling the PR...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants