-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
os/bluestore/BlueFS: support write time hint. #21511
os/bluestore/BlueFS: support write time hint. #21511
Conversation
@liewegas . please review. |
@liewegas . ping |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
b5bd7cd
to
12f6c18
Compare
@liewegas . Have you time to review this? Thanks! |
src/os/bluestore/BlueRocksEnv.cc
Outdated
@@ -227,6 +227,13 @@ class BlueRocksWritableFile : public rocksdb::WritableFile { | |||
return false; | |||
} | |||
|
|||
void SetWriteLifeTimeHint(rocksdb::Env::WriteLifeTimeHint hint) override { | |||
if (hint == h->write_hint) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this check?
src/os/bluestore/KernelDevice.cc
Outdated
@@ -47,6 +45,12 @@ KernelDevice::KernelDevice(CephContext* cct, aio_callback_t cb, void *cbpriv, ai | |||
discard_thread(this), | |||
injecting_crash(0) | |||
{ | |||
fd_directs.resize(WRITE_LIFE_MAX); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use vector::resize(size, val) for to initialize here
src/os/bluestore/KernelDevice.cc
Outdated
} | ||
|
||
for (i = 0; i < WRITE_LIFE_MAX; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably start from WRITE_LIFE_NONE
} | ||
|
||
for (i = 0; i < WRITE_LIFE_MAX; i++) { | ||
if (fcntl(fd_directs[i], F_SET_FILE_RW_HINT, &i) < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this fail on old kernels that we care about? Should we tolerate EOPNOTSUPP or similar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If failed on old kernel, we disable enable_wrt and later in choose_fd, we only use fd_directs/fd_buffered[WRITE_LIFE_NOT_SET].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this!
12f6c18
to
5a1d2f1
Compare
Rocksdb already supported this feature for posix fs backend. Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
* refs/pull/21511/head: os/bluefs: set logfile w/ WRITE_LIFE_MEDIUM os/bluefs: make super block w/ WRITE_LIFE_TIME_SHORT. os/bluestore: BlueFS support write_life_time feature of SSD. Reviewed-by: Sage Weil <sage@redhat.com> Reviewed-by: Igor Fedotov <ifedotov@suse.com>
Now rocksdb support this feature in io_posix.cc. This feature make SSD better understand the data life and based on this to do prefer allocate and do correctly GC.
In kernel, this feature based on file or inode. For BlueFs, it use the same raw device for multi files. So i open multi rad-write for different write-hint. Then for write based on hint to choose fd.