Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nautilus: bluestore: 50-100% iops lost due to bluefs_preextend_wal_files = false #28573

Merged
merged 1 commit into from Aug 2, 2019

Conversation

@smithfarm
Copy link
Contributor

commented Jun 15, 2019

osd/bluestore: Actually wait until completion in write_sync
This function is only used by RocksDB WAL writing so it must sync data.

This fixes #18338 and thus allows to actually set `bluefs_preextend_wal_files`
to true, gaining +100% single-thread write iops in disk-bound (HDD or bad SSD) setups.
To my knowledge it doesn't hurt performance in other cases.
Test it yourself on any HDD with `fio -ioengine=rbd -direct=1 -bs=4k -iodepth=1`.

Issue #18338 is easily reproduced without this patch by issuing a `kill -9` to the OSD
while doing `fio -ioengine=rbd -direct=1 -bs=4M -iodepth=16`.

Fixes: https://tracker.ceph.com/issues/18338 https://tracker.ceph.com/issues/38559
Signed-off-by: Vitaliy Filippov <vitalif@yourcmc.ru>
(cherry picked from commit c703cf9)

@smithfarm smithfarm self-assigned this Jun 15, 2019

@smithfarm smithfarm added this to the nautilus milestone Jun 15, 2019

@smithfarm smithfarm changed the title nautilus: 50-100% iops lost due to bluefs_preextend_wal_files = false nautilus: bluestore: 50-100% iops lost due to bluefs_preextend_wal_files = false Jun 15, 2019

@smithfarm smithfarm added bluestore and removed core labels Jun 15, 2019

@smithfarm smithfarm requested review from liewegas and ifed01 Jun 15, 2019

@ifed01
ifed01 approved these changes Jun 17, 2019
@yuriw

This comment has been minimized.

Copy link
Contributor

commented Jul 31, 2019

@yuriw yuriw merged commit 94e8a40 into ceph:nautilus Aug 2, 2019

4 checks passed

Docs: build check OK - docs built
Details
Signed-off-by all commits in this PR are signed
Details
Unmodified Submodules submodules for project are unmodified
Details
make check make check succeeded
Details

@smithfarm smithfarm deleted the smithfarm:wip-40281-nautilus branch Aug 21, 2019

@umuzhaohui

This comment has been minimized.

Copy link

commented Sep 18, 2019

random IOPS increase by +50..+100%,really?
data db wal on HDD
ceph version:13.2.1 mimic
fio config:
[global]
direct=1
thread
refill_buffers
norandommap
randrepeat=0
numjobs=1
ioengine=rbd
clientname=admin
pool=rbd
rbdname=image0
invalidate=0   
rw=randwrite
bs=4k
size=100G
runtime=300
ramp_time=60
[rbd_iodepth32]
iodepth=1

"bluefs_preextend_wal_files" option is true:
IOPS=391, BW=1565KiB/s, clat=2521.76us

"bluefs_preextend_wal_files" option is false:
IOPS=368, BW=1474KiB/s (1510kB/s), clat=2679.74us

@smithfarm

This comment has been minimized.

Copy link
Contributor Author

commented Sep 18, 2019

@umuzhaohui

ceph version:13.2.1 mimic

This fix was released in Nautilus 14.2.3 - it's definitely not in Mimic 13.2.1. That version is very old.

@liewegas

This comment has been minimized.

Copy link
Member

commented Sep 18, 2019

This performance difference only happens for the first bit of data written after a brand new bluestore instance is created. It mostly only matters for performance testing on test clusters and users to run a performance test as the very first thing on an empty cluster. Once rocksdb has written a few wal files the performance is the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.