New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

librbd: scatter/gather support for the C API #13447

Merged
merged 3 commits into from Feb 20, 2017

Conversation

Projects
None yet
3 participants
@dillaman
Contributor

dillaman commented Feb 15, 2017

No description provided.

@dillaman

This comment has been minimized.

Contributor

dillaman commented Feb 16, 2017

retest this please

@trociny trociny self-assigned this Feb 16, 2017

#define LIBRBD_VER_MINOR 1
#define LIBRBD_VER_EXTRA 11
#define LIBRBD_VER_MAJOR 1
#define LIBRBD_VER_MINOR 12

This comment has been minimized.

@trociny

trociny Feb 19, 2017

Contributor

@dillaman Just wondering why minor is not reset when major is increased? And actually what was the reason to bump major? I thought it is only when API incompatibility is introduced, and this does not look like this case?

This comment has been minimized.

@trociny

trociny Feb 19, 2017

Contributor

Ah, I suppose there is no actual version major bump, LIBRBD_VER_MAJOR 0 just was a bug and now it matches librbd SOVERSION. BTW, shouldn't the library VERSION (which is 1.0.0 currently) match major.minor.extra?

@trociny

This comment has been minimized.

Contributor

trociny commented Feb 19, 2017

@dillaman

beton:~/ceph.ci/build% ./bin/ceph_test_librbd --gtest_repeat=10 --gtest_filter=TestLibRBD.TestScatterGatherIO

Repeating all tests (iteration 1) . . .

Note: Google Test filter = TestLibRBD.TestScatterGatherIO
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from TestLibRBD
seed 1931
[ RUN      ] TestLibRBD.TestScatterGatherIO
using old format
[       OK ] TestLibRBD.TestScatterGatherIO (3102 ms)
/home/mgolub/ceph.ci/src/log/SubsystemMap.h: In function 'bool ceph::logging::SubsystemMap::should_gather(unsigned int, int)' thread 7f1aa4ff9700 time 2017-02-19 10:12:56.878671
/home/mgolub/ceph.ci/src/log/SubsystemMap.h: 62: FAILED assert(sub < m_subsys.size())
 ceph version 12.0.0-537-gc1fe62f (c1fe62f57ac0a974470d698a3eb71a685e495122)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x7f1b404bc582]
 2: (ObjectCacher::flusher_entry()+0xef2) [0x55774206ce82]
 3: (ObjectCacher::FlusherThread::entry()+0xd) [0x55774207976d]
 4: (()+0x76ba) [0x7f1b48f4b6ba]
 5: (clone()+0x6d) [0x7f1b3f4b682d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
zsh: abort (core dumped)  ./bin/ceph_test_librbd --gtest_repeat=10 
@dillaman

This comment has been minimized.

Contributor

dillaman commented Feb 20, 2017

@trociny unit test fixed by closing the image properly and updated the shared library version for consistency

@@ -204,7 +205,7 @@ void CopyupRequest::send()
<< ", oid " << m_oid
<< ", extents " << m_image_extents
<< dendl;
AioImageRequest<>::aio_read(m_ictx->parent, comp, std::move(m_image_extents),
ImageRequest<>::aio_read(m_ictx->parent, comp, std::move(m_image_extents),
nullptr, &m_copyup_data, 0);

This comment has been minimized.

@trociny

trociny Feb 20, 2017

Contributor

indentation

uint64_t object_no, uint64_t object_off,
const ::SnapContext &snapc,
Context *completion) {
return new ObjectTruncateRequest(util::get_image_ctx(ictx), oid, object_no,
object_off, snapc, completion);

This comment has been minimized.

@trociny

trociny Feb 20, 2017

Contributor

indentation

const ceph::bufferlist &data,
const ::SnapContext &snapc,
Context *completion, int op_flags) {
return new ObjectWriteRequest(util::get_image_ctx(ictx), oid, object_no,
object_off, data, snapc, completion, op_flags);

This comment has been minimized.

@trociny

trociny Feb 20, 2017

Contributor

indentation

uint64_t object_len,
const ::SnapContext &snapc,
Context *completion) {
return new ObjectZeroRequest(util::get_image_ctx(ictx), oid, object_no,
object_off, object_len, snapc, completion);

This comment has been minimized.

@trociny

trociny Feb 20, 2017

Contributor

indentation

dillaman added some commits Feb 15, 2017

librbd: move IO-related classes to new io namespace
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
librbd: new rbd_aio_readv/rbd_aio_writev API methods
The RBD C API now has scatter/gather IO support via the API. The C++
API uses bufferlists which can also support scatter/gather IO.

Fixes: http://tracker.ceph.com/issues/13025
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
librbd: ReadResult helper class for reassembling partial reads
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
@dillaman

This comment has been minimized.

Contributor

dillaman commented Feb 20, 2017

@trociny updated

@trociny trociny merged commit 357cea2 into ceph:master Feb 20, 2017

3 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodifed Submodules submodules for project are unmodified
Details
default Build finished.
Details

@dillaman dillaman deleted the dillaman:wip-13025 branch Feb 20, 2017

@yuyuyu101

This comment has been minimized.

Member

yuyuyu101 commented Feb 21, 2017

@dillaman do you consider to use buffer::claim_buffer(https://github.com/ceph/ceph/blob/master/src/common/buffer.cc#L646) to solve data copy problem?

example usage can be found at https://github.com/ceph/ceph/blob/master/src/msg/async/dpdk/DPDKStack.h#L116.

deleter could be a lambda which called when ptr released

@dillaman

This comment has been minimized.

Contributor

dillaman commented Feb 21, 2017

@yuyuyu101 Cool -- didn't know that was already implemented. I brought up an option similar to this on the ceph-devel list [1] (combined w/ delaying the AioCompletion) and Sage mentioned that it would be possible for the memory reference to be held for a potentially long period of time (where such a completion delay wouldn't be feasible).

[1] https://www.spinics.net/lists/ceph-devel/msg34596.html

@yuyuyu101

This comment has been minimized.

Member

yuyuyu101 commented Feb 21, 2017

@dillaman yes, we can't release memory with releasing AioCompletion. I suggest that user can provide with callback as revoke method. this callback can be a member of AioCompletion

@dillaman

This comment has been minimized.

Contributor

dillaman commented Feb 21, 2017

@yuyuyu101 Hmm -- not sure where you are going w/ this. Once we fire the AioCompletion back to the end-user (e.g. QEMU), there is an expectation that the memory buffers they provided to the read/write operation are now free for re-use by the end-user. Providing a mechanism for "revoking" the buffers doesn't solve the problem because semantically the buffers should be revoked from Ceph-usage before we fire the completion. Even if we provided a way to revoke the buffers, bufferlist doesn't have any locks to protect its ptr collection from concurrent access.

@yuyuyu101

This comment has been minimized.

Member

yuyuyu101 commented Feb 21, 2017

Hmm, no, I mean callback is related to the bufferptr not AioCompletion. Just using AioCompletion to pass callback to bufferptr.

void revoke_buffer(char *buffer)
{
}

comp = create_rbd_completion(cct, revoke_buffer);
r = aio_write(image, offset, length, comp, iovecs);
........
....... ceph inside
auto callback = comp->revoke_func;
bl.push_back(claim_buffer(offset, length, iovec->buffer, iovec->buffer, callback { callback(iovec->buffer); }));
complete_completion(comp);
...... later
~bufferptr() -> revoke_buffer;

@dillaman

This comment has been minimized.

Contributor

dillaman commented Feb 21, 2017

@yuyuyu101 Still not understanding what you are proposing. As highlighted Sage, the bufferptr can be held for a potentially long time after the AioComlletion has completed. In the case of something like QEMU w/ virtio, this memory is coming from the ring buffer between the guest OS and QEMU. Therefore, w/o copying the memory, the only safe thing to do would be to stall the AioCompletion until Ceph is no longer referencing the memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment