New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
common, osd, tools: Add histograms to performance counters #12829
Conversation
src/common/perf_histogram.cc
Outdated
case PerfHistogramCommon::SCALE_LOG2: | ||
return int64_t(1) << (i - 1); | ||
} | ||
assert(false && "Invalid quantization type"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also try to output what quantization type was passed to ease debugging later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for comment, since assert
generates assertion message at compile time it would be hard to format anything here. alternative would be to add some logging but that's a bit to heavy machinery for this class in my opinion.
src/common/perf_histogram.cc
Outdated
return ac.m_buckets - 1; | ||
|
||
default: | ||
assert(false && "Invalid scale type"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above.
src/kv/RocksDBStore.cc
Outdated
@@ -394,7 +394,7 @@ void RocksDBStore::get_statistics(Formatter *f) | |||
f->close_section(); | |||
} | |||
f->open_object_section("rocksdbstore_perf_counters"); | |||
logger->dump_formatted(f,0); | |||
logger->dump_formatted(f,0,false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add missing spaces between args?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
src/common/perf_histogram.h
Outdated
return get_raw_index_internal<0>( | ||
[](int64_t bucket, const axis_config_d &ac) { | ||
assert(bucket >= 0 && "Bucket index can not be negative"); | ||
assert(bucket < ac.m_buckets && "Bucket index to large"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/to/too/g
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
[](int){} | ||
); | ||
|
||
for (size_t i=0; i<rawValues.size(); ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing spaces
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied clang-format to the file. Now it's much better.
26c8a82
to
24795af
Compare
Updated PR, mostly minor changes |
Jenkins, retest this please |
24795af
to
c86cc46
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some trival comments
src/tools/histogram_dump.py
Outdated
|
||
|
||
def shorten(val): | ||
if type(val) is str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is instanceof
better ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, converted to instanceof
(although this is a bit tricky to get right and compatible with both python2 and python3 at the same time)
type=str, | ||
default='/var/run/ceph/*.asok', | ||
help='Path to asok file, can use wildcards') | ||
parser.add_argument( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems that logger
is not used now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used in line 100, could be counter-intuitive since it's not python logger but the name of performance counter logger inside ceph.
src/tools/histogram_dump.py
Outdated
def shorten(val): | ||
if type(val) is str: | ||
return val | ||
for u in ((3, ''), (6, 'k'), (9, 'M'), (12, 'G'), (15, 'T')): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(6, 'k')
or (6, 'K')
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lower-case intentionally since it's 10^3, not 2^10 (as per https://en.wikipedia.org/wiki/K_(disambiguation)#Unit) - this tool doesn't really take into account axis units, always threats as raw integer values - seems like a good future improvement though.
retest this please |
This change adds new performance counter type: histogram. Currently it does measure latency + request size (in bytes) of an op similarly to l_osd_op_(r|w|rw)_lat and l_osd_op_(r|w|rw)_(inb|outb). Histograms are 2-dimensional and gather probes for both values at the same time. This allows a more detailed analysis than what could be done with two separate 1-D histograms. Still the memory footprint is negligible. Since new data could break existing tools and the amount of data dumped is rather large, two new admin socket commands are introduced: perf histogram schema and perf histogram dump. Invocation of original command should remain compatible. There's also a new simple tool in src/tools/histogram_dump.py which does a live dump of one histogram in local daemon. Current configuration of histograms is hard-coded and should cover all common cases. If it turns out to be useful, in the future it could be loaded from the configuration. Signed-off-by: Bartłomiej Święcki <bartlomiej.swiecki@corp.ovh.com>
c86cc46
to
5034024
Compare
"Latency (usec)", | ||
PerfHistogramCommon::SCALE_LOG2, ///< Latency in logarithmic scale | ||
0, ///< Start at 0 | ||
100000, ///< Quantization unit is 100usec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a fixed quantization value is not suitable for latency histograms. For instance, one wants to distinguish between 0.1 msec and 1 msec, but measuring latencies ~ 1 sec with that precision hardly makes any sense. Perhaps we should stop re-inventing the wheel and re-use HdrHistogram
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although the quantization is fixed, the axis scale is logarithmic here so that there's much higher granularity for lower values and much lower for the big ones.
I took a brief look at HdrHistogram and it does something very similar, just configured in different way. I think it won't be a good fit here though since they don't support atomic/concurrent histograms. Maybe something other would be a better fit, do you have any experience in this area?
Also I'm pretty sure most of implementations would do histograms over one value, I try to gather it over two values simultaneously which may help much more fine-grained analysis.
I also remember Milosz Tanski was talking in ML about Count Min Sketch which we could look into.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the axis scale is logarithmic here so that there's much higher granularity for lower values and much lower for the big ones.
But this prevents one from measuring maximal latency accurately enough. For instance, all values in [0.1 sec, 1 sec) end up in the same bin, it's impossible to distinguish between 0.11 sec and 0.99 sec, which kind of makes the data useless.
I think it won't be a good fit here though since they don't support atomic/concurrent histograms.
The Java version does support thread safe (concurrent) and atomic versions (nobody bothered to port those to C yet).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at HdrHistogram it looks like the space requirements will be significantly higher if you want that level of precision, right?
I haven't looked at this is much detail, but I'm confused what the quantization value is used for... isn't the data effectively quantized by the bin size anyway, which is logarithmic?
If we were to swap in HdrHistogram (or similar) later, would/can the reporting interface remain the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@asheplyakov I think in both HdrHistogram and the implementation I did it's a matter of space usage, if you want better resolution you have to use more buckets. For probabilistic data structures (like Count Min Sketch mentioned before) this could also be adjusted by the amount of allowed noise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I don't want great to be the enemy of good, so unless there are outstanding issues with the current dump interface, let's merge this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liewegas To clarify the quantization stuff: I'm finding bucket in two steps, first is to quantize and shift the input space to normalize it. In the next step the quantized value is converted to a final bucket number by moving to a correct scale - either leave it as linear or make logarithmic, this second transformation always operate on uniform input space then.
Regarding the output - for each bucket it also dumps the range of values it holds so I guess switching to something else would just require dumping it in compatible format. Dumping probabilistic data structure could be a bit trickier though as long as we don't want to output large amounts of data.
OKed by @sageweil1 to merge http://pulpito.ceph.com/yuriw-2017-02-08_06:01:00-rados-wip-yuri-testing2_2017_2_8---basic-smithi/ |
scale_type_d m_scale_type; | ||
int64_t m_min; | ||
int64_t m_quant_size; | ||
int32_t m_buckets; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@byo where is m_buckets
initialized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@byo okay, i was trying to fix the issues reported by coverity, which complains that the fields of axis_config_d
are not initialized. i know this is a false alarm now, but shall we set some safe values in axis_config_d
to silence this warning ? like
struct axis_config_d {
const char *m_name = nullptr;
scale_type_d m_scale_type = 0;
int64_t m_min = 0;
int64_t m_quant_size = 0;
int32_t m_buckets =0;
}
what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change adds new performance counter type: histogram.
Currently it does measure latency + request size (in bytes) of an op similarly to
l_osd_op_(r|w|rw)_lat
andl_osd_op_(r|w|rw)_(inb|outb)
. Histogramsare 2-dimensional and gather probes for both values at the same time. This allows
a more detailed analysis than what could be done with two separate 1-D histograms.
Still the memory footprint is negligible.
Since new data could break existing tools and the amount of data dumped is rather large,
two new admin socket commands are introduced:
perf histogram schema
andperf histogram dump
. Invocation of original command should remain compatible.There's also a new simple tool in
src/tools/histogram_dump.py
which does a live dumpof one histogram in local daemon.
Current configuration of histograms is hard-coded and should cover all common cases.
If it turns out to be useful, in the future it could be loaded from the configuration.
Signed-off-by: Bartłomiej Święcki bartlomiej.swiecki@corp.ovh.com