Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

librbd: Zipkin tracing [GSOC 2016] #10637

Closed
wants to merge 33 commits into from
Closed

Conversation

@vears91
Copy link

vears91 commented Aug 9, 2016

Pass trace information from librbd to librados. This is part of the end-to-end performance visualization Google Summer of Code project, to initialize traces in fio rbd engine and pass them to librados. It is built on the wip-blkin-osdc branch that added Zipkin tracing support to Messenger and Objecter.

Trace information is passed to librbd through new rbd_aio_write_traced and rbd_aio_read_traced functions, that take trace information as one of the parameters.

Signed-off-by: Victor Araujo ve.ar91@gmail.com

agshew and others added 19 commits Mar 11, 2015
 * Adds --with-blkin to autoconf (default without)
 * Adds BLKIN_LIBS to necessary automake files
 * Adds documentation for testing BlkKin tracing

Signed-off-by: Andrew Shewmaker <agshew@gmail.com>
Signed-off-by: Marios-Evaggelos Kogias <marioskogias@gmail.com>
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
zipkin_trace.h is a wrapper around ztracer.hpp, which provides a stub
implementation when WITH_BLKIN is not defined

Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
when set, Message::decode_trace() will always create a trace for
incoming messages, even if they didn't pass trace information

Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Victor Araujo <ve.ar91@gmail.com>
A rbd_aio_write_traced function was created in the RBD API to be used from fio rbd engine.

A  blkin_trace_infomation* field was added to AioImageRequest, which passes the trace information to the cache requests. A similar field was added to the Bufferhead to be able to pass the trace information when an AioObjectRequest is created.

Signed-off-by: Victor Araujo <ve.ar91@gmail.com>
vh4x and others added 5 commits Jul 18, 2016
Signed-off-by: Victor Araujo <ve.ar91@gmail.com>
These changes add an optional blkin_trace_info* parameter for aio_operate and initializes a trace if the trace information is present in librados::IoCtxImpl::aio_operate.

Signed-off-by: Victor Araujo <ve.ar91@gmail.com>
A rbd_aio_read_traced function was added to librbd be able to pass trace information from clients to rbd. The trace information is passed through librbd to librados.

Signed-off-by: Victor Araujo <ve.ar91@gmail.com>
The template changes in AioObjectRequests are incorporated and trace information is passed along.

Signed-off-by: Victor Araujo <ve.ar91@gmail.com>
Signed-off-by: Victor Araujo <ve.ar91@gmail.com>
@vears91 vears91 force-pushed the vears91:wip-rebase-gsoc branch from b5ed3cd to 9de5eed Aug 10, 2016
Signed-off-by: Victor Araujo <ve.ar91@gmail.com>
@alimaredia alimaredia self-assigned this Aug 10, 2016
@alimaredia

This comment has been minimized.

Copy link
Contributor

alimaredia commented Aug 10, 2016

@alimaredia

This comment has been minimized.

Copy link
Contributor

alimaredia commented Aug 10, 2016

5b5d443 might be worth removing. The only part of it that could be useful is the .rst file that gets created, everything else should be removed.

@@ -76,6 +76,7 @@
#define CEPH_FEATURE_OSD_HITSET_GMT (1ULL<<54)
#define CEPH_FEATURE_HAMMER_0_94_4 (1ULL<<55)
#define CEPH_FEATURE_NEW_OSDOP_ENCODING (1ULL<<56) /* New, v7 encoding */
#define CEPH_FEATURE_BLKIN_TRACING (1ULL<<57) // enabled by ifdef, don't overlap

This comment has been minimized.

Copy link
@cbodley

cbodley Aug 10, 2016

Contributor

this was intended to have its own feature bit, rather than share a bit with other flags. the idea was to allow endpoints to interoperate if one was built with blkin support, and the other was not. but i'm not sure that's actually useful - if we're including blkin as a submodule, we might as well enable and package it by default

so i think we should get rid of the conditional CEPH_FEATURES_BLKIN stuff below, and just update the value of this bit. i'm guessing that we want to overlap with the KRAKEN bit (can @athanatos confirm?):

#define CEPH_FEATURE_BLKIN_TRACING CEPH_FEATURE_SERVER_KRAKEN

This comment has been minimized.

Copy link
@jdurgin

jdurgin Aug 10, 2016

Member

We can include this with the kraken feature bit if blkin will always be enabled. However, if we make blkin optional in the build there are potential issues with upgrades when some endpoints have blkin compiled in and others do not. E.g. a post-kraken osd without blkin and a kraken osd with blkin - one sends no tracing info, the other assumes tracing info is there based on the KRAKEN bit and can't decode the non-blkin message.

Keeping it as a separate feature bit that's only turned on in connections where it's compiled in, like this diff shows, seems to avoid this kind of problem.

This comment has been minimized.

Copy link
@cbodley

cbodley Aug 11, 2016

Contributor

@jdurgin okay, thanks!

@vears91 could you please update the value of CEPH_FEATURE_BLKIN_TRACING here to the first unused bit you can find?

@cbodley

This comment has been minimized.

Copy link
Contributor

cbodley commented Aug 10, 2016

5b5d443 might be worth removing

i'd prefer to keep it, at least in some form, so the original author/sign-offs remain in the git history

@@ -33,3 +33,6 @@
[submodule "src/isa-l"]
path = src/isa-l
url = https://github.com/ceph/isa-l
[submodule "src/blkin"]
path = src/blkin
url = https://github.com/linuxbox2/blkin

This comment has been minimized.

Copy link
@jdurgin

jdurgin Aug 10, 2016

Member

should we move this under the ceph github org? I can create the repo there

This comment has been minimized.

* @returns 0 on success, -EROFS if the io context specifies a snap_seq
* other than LIBRADOS_SNAP_HEAD
*/
struct blkin_trace_info;

This comment has been minimized.

Copy link
@jdurgin

jdurgin Aug 10, 2016

Member

does this compile when blkin isn't configured?

This comment has been minimized.

Copy link
@vears91

vears91 Aug 11, 2016

Author

It does, I have tested it without configuring blkin.

* @param info blkin trace information
* @returns 0 on success, negative error code on failure
*/
CEPH_RADOS_API int rados_aio_read_traced(rados_ioctx_t io,

This comment has been minimized.

Copy link
@jdurgin

jdurgin Aug 11, 2016

Member

it'd be great to have tests using these new librados and librbd apis in test/librados/io.cc and test/librbd/test_librbd.cc (good place to show example blkin usage too)

vear91 added 2 commits Aug 11, 2016
…ibrbd.cc

The tests follow the same strategy as rbd_aio_write and rbd_aio_read. These tests check that the call doesn't fail but tracing functionalities would need to be checked with LTTng.

Signed-off-by: Victor Araujo <ve.ar91@gmail.com>
Signed-off-by: Victor Araujo <ve.ar91@gmail.com>
@@ -1003,7 +1003,7 @@ namespace librados
*/
int aio_operate(const std::string& oid, AioCompletion *c,
ObjectWriteOperation *op, snap_t seq,
std::vector<snap_t>& snaps);
std::vector<snap_t>& snaps, const blkin_trace_info *trace_info = nullptr);

This comment has been minimized.

Copy link
@dillaman

dillaman Aug 15, 2016

Contributor

This breaks the current API/ABI (e.g. an older app linked against this would not have put a null pointer on the stack). Same comment below.

This comment has been minimized.

Copy link
@vears91

vears91 Aug 15, 2016

Author

Could it be fixed by adding an aio_operate_traced function in librados, and calling it from AioObjectRequest if there is trace information, or else defaulting to the regular aio_operate?

This comment has been minimized.

Copy link
@dillaman

dillaman Aug 15, 2016

Contributor

Since aio_operate is already overloaded many times over, you could just add a new overload which librbd would always call.

@dillaman

This comment has been minimized.

Copy link
Contributor

dillaman commented Aug 15, 2016

I am having trouble understanding the life cycle of blkin_trace_info. I am assuming it's the caller's responsibility to allocate one of these objects, but when would it get freed? In the librbd case with the client-side cache enabled, you cannot free it when the read/write's associated AioCompletion fires since it might still be held in ObjectCacher. You most likely would need to implement reference counting.

@cbodley

This comment has been minimized.

Copy link
Contributor

cbodley commented Aug 15, 2016

@dillaman struct blkin_trace_info is just 3 integers, and should be stack allocated and copied, rather than allocated on the heap. the same goes for struct blkin_trace and class Trace

@dillaman

This comment has been minimized.

Copy link
Contributor

dillaman commented Aug 15, 2016

@cbodley (1) right now it is being passed around as a pointer and (2) if it should be passed by value would we ever need to worry about the blkin_trace_info structure changing since it will now be part of the librados/librbd API as pass by value?

@@ -1681,6 +1686,12 @@ void FileJournal::submit_entry(uint64_t seq, bufferlist& e, uint32_t orig_len,
logger->inc(l_os_j_bytes, e.length());
}

if (osd_op) {
osd_op->mark_event("commit_queued_for_journal_write");
osd_op->journal_trace.init("journal", &trace_endpoint, &osd_op->store_trace);

This comment has been minimized.

Copy link
@cbodley

cbodley Aug 15, 2016

Contributor

this journal_trace.init() should be conditional on if (osd_op->store_trace) instead of just if (osd_op), or we'll create separate traces that aren't attached to a parent

(the call to osd_op->mark_event() should remain outside of if (osd_op->store_trace))

vear91 and others added 5 commits Aug 15, 2016
… info

This avoids breaking the API for older apps that are linked against it.

Signed-off-by: Victor Araujo <ve.ar91@gmail.com>
the blkin trace information needs to be present after initial
call to decode_payload()

Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Victor Araujo <ve.ar91@gmail.com>
Signed-off-by: Victor Araujo <ve.ar91@gmail.com>
Calling aio_operate without trace information caused problems in the rbd
unit tests.

Signed-off-by: Victor Araujo <ve.ar91@gmail.com>
@@ -93,6 +96,7 @@ class AioObjectRequest : public AioObjectRequestHandle {
Context *m_completion;
Extents m_parent_extents;
bool m_hide_enoent;
const blkin_trace_info *m_trace_info;

This comment has been minimized.

Copy link
@cbodley

cbodley Aug 18, 2016

Contributor

as @dillaman pointed out, it's not safe to hold on to the trace info pointers like this

take a look at how Objecter and Op work for an example; they pass optional traces around by ZTracer::Trace*, and store them by value as ZTracer::Trace. we should only be using blkin_trace_info in our public interface (anything in src/include/rados/), then turning it into a Trace object as soon as we can

This comment has been minimized.

Copy link
@vears91

vears91 Aug 18, 2016

Author

I wondered why ZTracer::Trace wasn't being used but I was told to pass the trace info to send_write_op. I will try to change it, can you please explain more about the issue with the trace info pointers? What problems can it cause?

This comment has been minimized.

Copy link
@cbodley

cbodley Aug 19, 2016

Contributor

looking at your aio_write_test_data_traced() test as an example, it's starting the rbd write and waiting for the completion at the same time:

void aio_write_test_data_traced(...) {
  struct blkin_trace_info trace_info;
  rbd_aio_write_traced(image, off, len, test_data, comp, &trace_info);
  rbd_aio_wait_for_complete(comp);

we're passing a pointer to a blkin_trace_info that's on the stack here. but it's not a problem, because the stack stays valid until the operation completes - this operation is basically synchronous. but consider an asynchronous case, where we start the operation in one function, and wait for it later in another:

void start_write(...) {
  struct blkin_trace_info trace_info;
  rbd_aio_write_traced(image, off, len, test_data, comp, &trace_info);
}
void wait_for_write(...) {
  rbd_aio_wait_for_complete(comp);
}

again we pass a pointer to blkin_trace_info on the stack, but this time it goes out of scope right after it starts the write. so if rbd holds on to this pointer and tries it dereference it later, it won't be pointing to valid memory. that's why we need to store a copy of the trace info instead of just the pointer

A ZTracer::Trace is generated and stored by value in AioImageRequest.
ZTracer::Trace* is passed around instead of blkin_trace_info*

Signed-off-by: Victor Araujo <ve.ar91@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants
You can’t perform that action at this time.