New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DNM] tracing: use LTTng for logging #22458
Conversation
src/ceph_osd.cc
Outdated
chan.attr.read_timer_interval = 200; | ||
chan.attr.output = LTTNG_EVENT_MMAP; | ||
|
||
dout(1) << "mogeb creating session " << tracing_session << " in " << (string("/var/data/ceph/build/out/") + tracing_session) << dendl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe replace "mogeb" with something else? :)
src/bootstrap-tracing.sh
Outdated
@@ -0,0 +1,28 @@ | |||
#!/bin/bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file can be ignored, I've been using it locally so I don't have to do_cmake every time to switch between backends.
src/common/hobject.h
Outdated
@@ -445,6 +445,18 @@ struct ghobject_t { | |||
(*this) = temp; | |||
} | |||
|
|||
operator std::string() { | |||
std::stringstream out; | |||
out << this; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this will write the address of "this" to the stream, rather than the value, *this. Is that what you want? Is there some other magic happening here that writes the value instead? Do I just need coffee? :-)
(There are a number of these, so I'm guessing that /is/ what you want, but I saw at least one exception so I figure it's worth asking!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OT, but these operator whatever() methods make me very nervous. IIUC, this will silently cast/convert the object to a string if it's in a context that wants a string (e.g., passing to void func(string foo)), right? I've been bitten by silent/accidental conversions in the past. Do we need these, or would an explicit std::string str() method suffice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is correct, it's for the implicit cast to string for unknown types. A simple str()
method would also work, and we'd append the .str()
call in tracetool.
src/osd/osd_types.h
Outdated
@@ -3792,6 +3809,7 @@ class pg_missing_const_i { | |||
virtual bool is_missing(const hobject_t& oid, pg_missing_item *out = nullptr) const = 0; | |||
virtual bool is_missing(const hobject_t& oid, eversion_t v) const = 0; | |||
virtual ~pg_missing_const_i() {} | |||
virtual operator std::string() { return "needs implementation"; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that this class is already pure virtual, why not just make this pure virtual as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're obviously right. I was either not thinking, or this was a note to myself to look into implementing the operator in the base class. Either way, thanks!
src/common/hobject.h
Outdated
return out.str(); | ||
} | ||
|
||
operator std::string() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need two operator std::string()
? i think the const
one would suffice.
hey @mogeb may i learn your plan on this PR? please let me know if you need more hands. probably i can help on it. |
@tchaikov sorry for the delay, I got side-tracked on another project for a couple of weeks now, but I'm planning on coming back to this PR next week. I already have a second iteration of this PR ready and working, I just need to clean it a bit and probably ask you for a code review and to try it for usability. The second iteration gets rid of the |
@tchaikov I've updated the PR. This is still experimental so please excuse some less than optimal code. You should be able to try it, and I think it's better to look at the usability before doing a full code review. For now, only |
src/include/config-h.in.cmake
Outdated
@@ -178,6 +178,9 @@ | |||
/* Define if you want to use LTTng */ | |||
#cmakedefine WITH_LTTNG | |||
|
|||
/* Degine if you want to use LTTng for logging instead of dout */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Degine/Define/
src/include/interval_set.h
Outdated
@@ -723,6 +723,13 @@ class interval_set { | |||
other = std::move(m); | |||
} | |||
|
|||
public: | |||
operator std::string() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mark it const
if you please.
@@ -355,7 +355,11 @@ int cmp(const hobject_t& l, const hobject_t& r) | |||
return 0; | |||
} | |||
|
|||
|
|||
ghobject_t::operator std::string() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, we could use CRTP for this idiom, like
template <typename T>
struct Printable {
operator std::string() const {
std::stringstream out;
out << *static_cast<const T*>(this);
return out.str();
}
};
struct Bar : Printable<Bar>
{};
ostream& operator<<(ostream& os, const Bar&) {
return os << "bar!";
}
so we don't need to repeat this method for every class that needs to be printed in the new logging system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that would make sense. OTOH this comment suggests to use a str()
method instead of relying on the implicit operator std::string()
to avoid surprises. What are your thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am also for the str() method
, or at least make the operator std::string
explicit, like
explicit operator std::string() const {
/// ....
}
. @mogeb have you considered reusing a single trace point everywhere and send the fmt string to it, i.e. use it as an alternative of syslog(3)
without the ...
parameters. if we are fine with this, we can avoid all these troubles. but the downside of this apporach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tchaikov sure, that's the very first solution I explored (pre-Cephalocon). I initially changed the implementation of dout()
to just record a tracepoint. With that approach, none of the actual has to be changed. Another alternative could be to use LTTng's tracef()
(similar to printf
), but it also needs/uses strings.
My objective here though is to avoid passing strings around for performance reasons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. I think if the legacy dout lines degenerate into a slow string op, we can focus our efforts on replacing dout calls in the fast path and leave the less frequently-run and slow-path code alone until later (e.g., peering).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks guys. after re-reading tracing/tracetool/{README.md,parsetool.py}
, now i understand the objective and rationale behind this design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mogeb i agree with @liewegas on this. and i was thinking about how to ease the pain to add a logging message in ceph in future. i even explored the possibility to use libclang to parse the source for creating the .tp
file. with it, we can coin the event name using the filename, and lineno, and deduce the name of argument type from the argument in the trace_*(...)
expression, so we can extract information interesting to us and create the .tp
, .h
and _impl.h
files for us. but this dependency is kind of heavy weight for us at this moment. probably we can revisit it in future.
but, we can always have a compromise here by differentiating following two use cases:
- the logging in critical path: to use
trace_${event_name}(loglevel, subsys, [arg_type, arg_name]...)
, so we don't need to render the values of primitive types in string before passing them to LTTng. - the logging in non-critical path: to use
trace(loglevel, subsys, fmt_str, [arg]...)
. we can usefmt::format()
offered by libfmt in thepasrsetool.py
. as libfmt is already included by ceph as a submodule. and maketrace()
a template function. which callstracepoint(subsys, ${default_event_name}_at_${log_evel})
, assuming all combinations of${default_event_name}_at_${log_level}
TRACEPOINT_EVENT
s of different${subsys}
s and${log_level}
s combinations have been defined byparsetool.py
. they only have a singleTP_FIELD
.
in this way, we can have
- a unified logging backend, so developers or users don't need to check different sources for tracing an issue, and
- a flexible front-end, so developers don't need to worry about the name of a trace event, and the argument type of event payload for all logging messages. they only need to do so when it's really needed.
what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the clang approach would be great. Sounds good, let's rethink it later as you said.
For having a critical and non-critical paths: just to make sure I understand, the non-critical path uses a formatted string using the arguments' operator <<()
functions, correct? If that's the case, we could change the implementation of dout
to record a tracepoint instead of using trace(loglevel, subsys, fmt_str, [arg]...)
. This becomes even easier for the developers as nothing changes: dout()
just takes the string passed to it and ends up calling tracepoint()
at the end. I'd need to figure out the details, but I was able to achieve that at the very beginning. Any thought?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the non-critical path uses a formatted string using the arguments'
operator <<()
functions, correct?
yes. at this moment, we can use the argument's operator<<()
, so minimum change is needed. in short and long term, i'd suggest use libfmt's fmt::format()
instead, it has better performance than that of std::iostream
. probably we can expose two more interfaces
- just for the backward compatibility: `dout(1) << "ceph rock" << 5 << '!' << dendl;
- as the faster one:
trace(bluestore, 1, "{0} rock{1}{2}", "ceph", 5, '!');
- as the fatest one:
trace_ceph_rocks(1, bluestore, string, who, "ceph", int, s, 5, uint_8, wow, '!', "%s rock %d%d");
so, i'd suggest start using the trace()
or trace_${event_name}
interface in the new code. and replace the implementaion of dout(1)
for the existing code.
src/tracing/tracetool/parsetool.py
Outdated
'static inline void __%(api)s(%(args)s)', | ||
'{', | ||
api=e.api(e.CEPH_TRACE), | ||
args=", ".join([' '.join(arg[:2]) if is_valid_type(arg[0]) else ' '.join(('string', arg[1])) for arg in e.args])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might want to differentiate the args which are already std::string
or string
in Event.build()
. probably we could have three Arg
types to represent them?
so we don't create a temporary copy of them just for passing them to ctf_string()
.
@mogeb i will try your patch out. probably the best way to improve it is to use it. |
src/osd/CMakeLists.txt
Outdated
if(WITH_LTTNG_LOGGING) | ||
add_dependencies(osd bluestore-parsetool) | ||
target_link_libraries(osd logging_tp) | ||
target_link_libraries(osd lttng-ust-fork) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might want to add dependencies from os
to *_impl.h
, as BlueStore.cc
now includes them. could use set_source_files_properties()
for this purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted, thanks.
@tchaikov any thought on the usability part? The LTTng usage, the resulting trace, reading/parsing it, etc. |
src/ceph_osd.cc
Outdated
@@ -19,6 +19,7 @@ | |||
|
|||
#include <iostream> | |||
#include <string> | |||
#include <lttng/lttng.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might want to add liblttng-ctl-dev
as a build dependency.
@mogeb i am stilling playing around with it. will reply you tomorrow. |
@mogeb following is what we have in the output of
i think this the best compact representation of a (log) event we can have in the fast path, but from a typical user's point of view, it's still not very friendly. we can ship the also, to integrate with existing logging infrastructure:
but we can leave these tooling for following-up PRs. i guess, the first thing we need to have is once this change gets merged. we will
and then switch OSD over to LTTng based logging. what do you think? |
@tchaikov I really like the approach, especially having a tool that knows how to rewrite the trace's events as more human-readable ones. Note however that the A couple of small things to clear:
|
@mogeb not yet. i will try to reproduce your test in my test bed.
i think it'd be hard for us to read the logging messages if we have a hybrid logging: plain text logging + LTTng tracing. to ease the transition. i'd recommend a runtime option for switching between the plain-text backend and the LTTng backend.
i don't think there is a simple criteria for telling if a log entry is in the fast path by looking at its loglevel. for instance, we use dout(1) << "not feeling well.. i am about to re-boot" << dendl; and use dout(30) << "tick, tick. i am healthy!" << dendl; i think, to ease the pain of migration, before we have the tool to "rewrite the trace's events as more human-readable ones.", we'd better leave the logging messages as they are. and leave the tool and the changes (for instance, ff26e95, 2b00fe2, 8e227fb) to LTTngize logging message for another PR. |
@mogeb in my case, lttng based log outperforms the existing dout(). tested with
while with existing dout backend
|
@tchaikov so the performance is twice as good with LTTng compared to current dout? This is great. What's the loglevel you set for dout? Something came up on my side for so I got sidetracked for a few days, but I'll update this PR afterwards. Edit: Sorry, I missed a part of your comment at first ;) |
looks like.
fio w/ bluestore backend.
debug-bluestore = 10/0
will double check my settings. and rerun the test .
sure. thanks! |
src/test/fio/fio_ceph_objectstore.cc
Outdated
#ifndef typeof | ||
#define typeof __typeof__ | ||
#endif | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure why this is necessary. src/test/fio/CMakeLists.txt has a workaround for this:
# fio headers use typeof(), which requires c++11 extensions
set_target_properties(fio_ceph_objectstore PROPERTIES
CXX_EXTENSIONS ON
COMPILE_FLAGS "${FIO_CFLAGS}")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably won't be it in the final version of this PR. However, I was planning on submitting it alone as fio_ceph_objectstore.cc wouldn't build in my environment. I haven't tried it since though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i also ran into this issue while compiling libfio_ceph_objectstore.so
with GCC 8.2, hence axboe/fio#628 , which renders both workaround not necessary anymore.
what i don't understand is why my SSD is 2x slower than it was last week... it was |
@tchaikov exactly.. even if a trace wasn't generated last time (tracing disabled), it still wouldn't cause that much of a difference. In any case, I will do proper benchmarking once I'm back on this. |
@mogeb ping? |
ad7ea41
to
1b37fe4
Compare
@tchaikov there are now three ways to write an event:
3 is exactly like 2, but the loglevel is not required. For 1, there is no string manipulation, and raw basic types have to be provided. The result in the trace is a compact event. For 2 and 3, arguments are formatted to a string and passed to the event. Here's an example. The only issue right now is that for 2 and 3, you have to use |
src/os/bluestore/BlueStore.cc
Outdated
dout(10) << __func__ << " " << cid << " " << oid | ||
<< " 0x" << std::hex << offset << "~" << length << std::dec | ||
<< " = " << r << dendl; | ||
trace(10, "bluestore", "read {}, {}, {}, {} and {}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also replace susbsys
with __func__
.
Rebased. I'm trying to run this on a real cluster now. |
ca62802
to
be4a212
Compare
sorry for the latency, i will review this PR early tomorrow. |
@tchaikov no problem, take your time. On my side, I'm trying to build it with a proper build system to deploy it on a real cluster and see what impact it has. I'll update you with the results soon. |
Update: my hardware is currently broken so I'm waiting for it to be fixed to report performance numbers on a real cluster. I'll report soon and sorry for the delay. |
f13a782
to
42668c2
Compare
Signed-off-by: Mohamad Gebai <mgebai@suse.com>
New results here |
This was discussed at the perf weekly 2019-06-20, where @mogeb shared those results. Given the difficulty of use of lttng for both users and developers, we decided that it would make more sense to concentrate on optimizing the log statements in the hot path (e.g. reducing the memcpys and stringification of many structures there) rather than converting the logging machinery to lttng. I'd update the trello card earlier: https://trello.com/c/WAu7nsH4/362-osd-make-logging-on-the-i-o-path-lightweight but this PR being open caused some confusion, so closing it now. |
@mogeb thanks for all of your efforts in exploring this! |
See
src/tracing/tracetool/README
for details. Use-DWITH_LTTNG_LOGGING=ON
with cmake to use LTTng for logging.Using fio's objectstore backend, with the following job:
I get the following results:
fio is writing to Bluestore on an SSD and the log is being written to an HDD for both dout and LTTng. The throughput difference varies. I've seen LTTng up to 10% faster with more iodepth and numjobs (more contention on dout's internal structures), and I've seen it pretty much the same as dout (with lower iodepth and lower numjobs). However, the size on disk of the log is consistently 50%-60% smaller with LTTng (as expected).