New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os/bluestore: use aio for reads #13066

Merged
merged 6 commits into from Jan 31, 2017

Conversation

Projects
None yet
6 participants
@liewegas
Copy link
Member

liewegas commented Jan 23, 2017

http://tracker.ceph.com/issues/19030

Dispatch all raw blob reads via aio, in parallel. This should accellerate
reads on fragmented objects (incl compressed objects) significantly.

@liewegas

This comment has been minimized.

Copy link
Member Author

liewegas commented Jan 23, 2017

[==========] 67 tests from 1 test case ran. (3558968 ms total)
[ PASSED ] 67 tests.

@liewegas liewegas requested a review from ifed01 Jan 23, 2017

@liewegas

This comment has been minimized.

Copy link
Member Author

liewegas commented Jan 23, 2017

@yuyuyu101 mind reviewing/testing the NVMe aio_read piece?

@liewegas liewegas requested a review from yuyuyu101 Jan 23, 2017

#ifdef HAVE_LIBAIO
if (aio && dio) {
_aio_log_start(ioc, off, len);
ioc->pending_aios.push_back(FS::aio_t(ioc, fd_direct));

This comment has been minimized.

@ifed01

ifed01 Jan 23, 2017

Contributor

You might want to use emplace_back instead.

This comment has been minimized.

@liewegas

liewegas Jan 24, 2017

Author Member

not yet, c++17

This comment has been minimized.

@varadakari

varadakari Jan 30, 2017

Contributor

@liewegas emplace_back() is C++11.

_aio_log_start(ioc, off, len);
ioc->pending_aios.push_back(FS::aio_t(ioc, fd_direct));
++ioc->num_pending;
FS::aio_t& aio = ioc->pending_aios.back();

This comment has been minimized.

@ifed01

ifed01 Jan 23, 2017

Contributor

emplace_back returns a referent to the new item hence no need for back() call

This comment has been minimized.

@Liuchang0812

Liuchang0812 Jan 30, 2017

Contributor

emplace_back returns void since cpp11 until cpp17, and returns referent since cpp17. FYI cppreference

int NVMEDevice::aio_read(
uint64_t off,
uint64_t len,
bufferlist *pbl,

This comment has been minimized.

@ifed01

ifed01 Jan 23, 2017

Contributor

Perhaps I missed something but I don't see any use of pbl below...

@liewegas liewegas force-pushed the liewegas:wip-aio-read branch from 64a891a to 8d9c229 Jan 23, 2017

@liewegas

This comment has been minimized.

Copy link
Member Author

liewegas commented Jan 23, 2017

@liewegas liewegas force-pushed the liewegas:wip-aio-read branch from 546ef13 to e8c755b Jan 23, 2017

assert(off < size);
assert(off + len <= size);

Task *t = new Task(this, IOCommand::WRITE_READ, off, len);

This comment has been minimized.

@yuyuyu101

yuyuyu101 Jan 24, 2017

Member

READ_COMMAND

IOContext ioc(cct, NULL); // FIXME?
// read the whole thing
compressed_blob_bls.push_back(bufferlist());
bufferlist& bl = compressed_blob_bls.back();

This comment has been minimized.

@ifed01

ifed01 Jan 24, 2017

Contributor

IMO this is a bad approach since next push_back might reallocate vector and hence invalidate bufferlist references. With the current aio_read implementations that probably not a bug though.
But a good candidate for that if someone decides to store that reference in a future aio_read implementation....
Multiple allocations might happen and negatively impact performance too.
Suggest to call vector::resize using blobs2read::size() before entering the loop and handle holes in this vector properly.
Or update blobs2read_t container and include per-entry bufferlist member there.

BlobRef bptr = b2r_it->first;
dout(20) << __func__ << " blob " << *bptr << std::hex
<< " need 0x" << b2r_it->second << std::dec << dendl;
if (bptr->get_blob().has_flag(bluestore_blob_t::FLAG_COMPRESSED)) {

This comment has been minimized.

@ifed01

ifed01 Jan 24, 2017

Contributor

suggest to use bluestore_blob_t::is_compressed() call here and throughout.

@liewegas liewegas force-pushed the liewegas:wip-aio-read branch from e8c755b to 891412d Jan 24, 2017

@@ -5316,7 +5318,12 @@ int BlueStore::_do_read(
r = bptr->get_blob().map(
0, bptr->get_blob().get_ondisk_length(),
[&](uint64_t offset, uint64_t length) {
int r = bdev->aio_read(offset, length, &bl, &ioc);
int r;
if (num_blobs > p.second.size()) {

This comment has been minimized.

@ifed01

ifed01 Jan 24, 2017

Contributor

What's the rationale for this condition?

@liewegas liewegas force-pushed the liewegas:wip-aio-read branch 2 times, most recently from 31c4b55 to 333a63f Jan 24, 2017

@ifed01

ifed01 approved these changes Jan 24, 2017

liewegas added some commits Jan 23, 2017

os/bluestore/BlockDevice: add aio_read API
NVMEDevice not implemented yet.

Signed-off-by: Sage Weil <sage@redhat.com>
os/bluestore/NVMEDevice: implement aio_read
Signed-off-by: Sage Weil <sage@redhat.com>
os/bluestore/KernelDevice: avoid possible use-after-free for ioc->priv
If aio_wake() triggers destruction, ioc->priv might be a
use-after-free (this is the case for the BlueStore read path).

Signed-off-by: Sage Weil <sage@redhat.com>
os/bluestore: do read io via aio in parallel
Dispatch all blob reads in parallel via aio.

Signed-off-by: Sage Weil <sage@redhat.com>
os/bluestore: only use aio for read if there are >1 blobs
If we have a single blob to read it is not worth the context switch.

Signed-off-by: Sage Weil <sage@redhat.com>
os/bluestore/BlockDevice: allow sync read to accumulate on bl
Do not clear the target bl; instead, append to it.  This makes our
behavior consistent with aio_read, which does the same.

Signed-off-by: Sage Weil <sage@redhat.com>

@liewegas liewegas force-pushed the liewegas:wip-aio-read branch from 333a63f to 73f5d2b Jan 27, 2017

@liewegas

This comment has been minimized.

Copy link
Member Author

liewegas commented Jan 30, 2017

@liewegas liewegas merged commit 67e48e2 into ceph:master Jan 31, 2017

3 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodifed Submodules submodules for project are unmodified
Details
default Build finished.
Details

@liewegas liewegas deleted the liewegas:wip-aio-read branch Jan 31, 2017

bassam added a commit to bassam/ceph that referenced this pull request Jan 31, 2017

os/bluestore: use aio for reads
backport of ceph#13066 for kraken

Signed-off-by: Bassam Tabbara <bassam.tabbara@quantum.com>

bassam added a commit to rook/ceph that referenced this pull request Jan 31, 2017

os/bluestore: use aio for reads
backport of ceph#13066 for kraken

Signed-off-by: Bassam Tabbara <bassam.tabbara@quantum.com>
@liuhongtong

This comment has been minimized.

Copy link

liuhongtong commented on dbaa2b0 Feb 7, 2017

@liewegas compile error:
/home/hongtong.liu/ceph_src/ceph/src/os/bluestore/NVMEDevice.cc: In member function ‘virtual int NVMEDevice::aio_read(uint64_t, uint64_t, ceph::bufferlist*, IOContext*)’:
/home/hongtong.liu/ceph_src/ceph/src/os/bluestore/NVMEDevice.cc:950:12: error: declaration of ‘uint64_t len’ shadows a parameter
uint64_t len = bl.length();
^
/home/hongtong.liu/ceph_src/ceph/src/os/bluestore/NVMEDevice.cc:950:18: error: ‘bl’ was not declared in this scope
uint64_t len = bl.length();
^
/home/hongtong.liu/ceph_src/ceph/src/os/bluestore/NVMEDevice.cc:962:7: warning: unused variable ‘r’ [-Wunused-variable]
int r = 0;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment