New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
os/bluestore: use direct write in BlueStore::_write_bdev_label #48092
Conversation
On AArch64 with kernel page size 64K, it occurs occasionally "OSD::init(): unable to read osd superblock" when deploying osd. As bluestore use direct write to write the superblock at 0x2000~1000 and BlueStore::_write_bdev_label use buffer write to write label at 0x0~1000, The OS flush the buffer write algined to page size, it will overwrite the superblock(0x2000~1000). Use driect write to avoid overwriting the superblock. Fixes: https://tracker.ceph.com/issues/57537 Signed-off-by: luo rixin <luorixin@huawei.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
@@ -5375,13 +5375,14 @@ int BlueStore::_write_bdev_label(CephContext *cct, | |||
z.zero(); | |||
bl.append(std::move(z)); | |||
|
|||
int fd = TEMP_FAILURE_RETRY(::open(path.c_str(), O_WRONLY|O_CLOEXEC)); | |||
int fd = TEMP_FAILURE_RETRY(::open(path.c_str(), O_WRONLY|O_CLOEXEC|O_DIRECT)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From https://man7.org/linux/man-pages/man2/open.2.html this flag "Try to minimize cache effects of the I/O ... but does not give the guarantees of the O_SYNC flag that data and necessary metadata are transferred"
Given that comment wouldn't be better to use O_SYNC flag and/or call fsync before closing the file descriptor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit confused. Do you advise that we should use O_SYNC here? In this func Line 5392, it does use fsync() to flush the data to device.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah.. my bad - there is fsync indeed. O_DIRECT + fsync should be sufficient then...
Rados suite review: http://pulpito.front.sepia.ceph.com/?branch=wip-yuri3-testing-2022-09-21-0921 Failures, unrealted: Details: |
On AArch64 with kernel page size 64K, it occurs occasionally "OSD::init(): unable to read osd superblock" when deploying osd. As bluestore use direct write to write the superblock at 0x2000~1000 and BlueStore::_write_bdev_label use buffer write to write label at 0x0~1000, The OS flush the buffer write algined to page size, it will overwrite the superblock(0x2000~1000). Use driect write to avoid overwriting the superblock.
Fixes: https://tracker.ceph.com/issues/57537
Signed-off-by: luo rixin luorixin@huawei.com
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows