Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

common, osd, tools: Add histograms to performance counters #12829

Merged
merged 1 commit into from Feb 9, 2017

Conversation

byo
Copy link

@byo byo commented Jan 9, 2017

This change adds new performance counter type: histogram.

Currently it does measure latency + request size (in bytes) of an op similarly to
l_osd_op_(r|w|rw)_lat and l_osd_op_(r|w|rw)_(inb|outb). Histograms
are 2-dimensional and gather probes for both values at the same time. This allows
a more detailed analysis than what could be done with two separate 1-D histograms.
Still the memory footprint is negligible.

Since new data could break existing tools and the amount of data dumped is rather large,
two new admin socket commands are introduced: perf histogram schema and
perf histogram dump. Invocation of original command should remain compatible.
There's also a new simple tool in src/tools/histogram_dump.py which does a live dump
of one histogram in local daemon.

Current configuration of histograms is hard-coded and should cover all common cases.
If it turns out to be useful, in the future it could be loaded from the configuration.

Signed-off-by: Bartłomiej Święcki bartlomiej.swiecki@corp.ovh.com

case PerfHistogramCommon::SCALE_LOG2:
return int64_t(1) << (i - 1);
}
assert(false && "Invalid quantization type");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also try to output what quantization type was passed to ease debugging later.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for comment, since assert generates assertion message at compile time it would be hard to format anything here. alternative would be to add some logging but that's a bit to heavy machinery for this class in my opinion.

return ac.m_buckets - 1;

default:
assert(false && "Invalid scale type");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

@@ -394,7 +394,7 @@ void RocksDBStore::get_statistics(Formatter *f)
f->close_section();
}
f->open_object_section("rocksdbstore_perf_counters");
logger->dump_formatted(f,0);
logger->dump_formatted(f,0,false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add missing spaces between args?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

return get_raw_index_internal<0>(
[](int64_t bucket, const axis_config_d &ac) {
assert(bucket >= 0 && "Bucket index can not be negative");
assert(bucket < ac.m_buckets && "Bucket index to large");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/to/too/g

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

[](int){}
);

for (size_t i=0; i<rawValues.size(); ++i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing spaces

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied clang-format to the file. Now it's much better.

@byo byo force-pushed the wip-perf-counters-histogram branch 2 times, most recently from 26c8a82 to 24795af Compare January 10, 2017 11:14
@byo
Copy link
Author

byo commented Jan 10, 2017

Updated PR, mostly minor changes

@branch-predictor
Copy link
Contributor

Jenkins, retest this please

@byo
Copy link
Author

byo commented Jan 18, 2017

@liewegas @tchaikov @dachary can you take a look at this?

@byo byo force-pushed the wip-perf-counters-histogram branch from 24795af to c86cc46 Compare January 31, 2017 15:12
Copy link
Contributor

@liuchang0812 liuchang0812 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some trival comments



def shorten(val):
if type(val) is str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is instanceof better ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, converted to instanceof (although this is a bit tricky to get right and compatible with both python2 and python3 at the same time)

type=str,
default='/var/run/ceph/*.asok',
help='Path to asok file, can use wildcards')
parser.add_argument(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems that logger is not used now?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used in line 100, could be counter-intuitive since it's not python logger but the name of performance counter logger inside ceph.

def shorten(val):
if type(val) is str:
return val
for u in ((3, ''), (6, 'k'), (9, 'M'), (12, 'G'), (15, 'T')):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(6, 'k') or (6, 'K')?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lower-case intentionally since it's 10^3, not 2^10 (as per https://en.wikipedia.org/wiki/K_(disambiguation)#Unit) - this tool doesn't really take into account axis units, always threats as raw integer values - seems like a good future improvement though.

@liewegas
Copy link
Member

retest this please

This change adds new performance counter type: histogram.

Currently it does measure latency + request size (in bytes) of
an op similarly to l_osd_op_(r|w|rw)_lat and
l_osd_op_(r|w|rw)_(inb|outb). Histograms are 2-dimensional and
gather probes for both values at the same time. This allows
a more detailed analysis than what could be done with two
separate 1-D histograms. Still the memory footprint is negligible.

Since new data could break existing tools and the amount of data
dumped is rather large, two new admin socket commands are
introduced: perf histogram schema and perf histogram dump.
Invocation of original command should remain compatible.
There's also a new simple tool in src/tools/histogram_dump.py
which does a live dump of one histogram in local daemon.

Current configuration of histograms is hard-coded and should
cover all common cases. If it turns out to be useful, in the
future it could be loaded from the configuration.

Signed-off-by: Bartłomiej Święcki <bartlomiej.swiecki@corp.ovh.com>
@byo byo force-pushed the wip-perf-counters-histogram branch from c86cc46 to 5034024 Compare February 1, 2017 09:05
"Latency (usec)",
PerfHistogramCommon::SCALE_LOG2, ///< Latency in logarithmic scale
0, ///< Start at 0
100000, ///< Quantization unit is 100usec

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a fixed quantization value is not suitable for latency histograms. For instance, one wants to distinguish between 0.1 msec and 1 msec, but measuring latencies ~ 1 sec with that precision hardly makes any sense. Perhaps we should stop re-inventing the wheel and re-use HdrHistogram

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although the quantization is fixed, the axis scale is logarithmic here so that there's much higher granularity for lower values and much lower for the big ones.

I took a brief look at HdrHistogram and it does something very similar, just configured in different way. I think it won't be a good fit here though since they don't support atomic/concurrent histograms. Maybe something other would be a better fit, do you have any experience in this area?

Also I'm pretty sure most of implementations would do histograms over one value, I try to gather it over two values simultaneously which may help much more fine-grained analysis.

I also remember Milosz Tanski was talking in ML about Count Min Sketch which we could look into.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the axis scale is logarithmic here so that there's much higher granularity for lower values and much lower for the big ones.

But this prevents one from measuring maximal latency accurately enough. For instance, all values in [0.1 sec, 1 sec) end up in the same bin, it's impossible to distinguish between 0.11 sec and 0.99 sec, which kind of makes the data useless.

I think it won't be a good fit here though since they don't support atomic/concurrent histograms.

The Java version does support thread safe (concurrent) and atomic versions (nobody bothered to port those to C yet).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at HdrHistogram it looks like the space requirements will be significantly higher if you want that level of precision, right?

I haven't looked at this is much detail, but I'm confused what the quantization value is used for... isn't the data effectively quantized by the bin size anyway, which is logarithmic?

If we were to swap in HdrHistogram (or similar) later, would/can the reporting interface remain the same?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asheplyakov I think in both HdrHistogram and the implementation I did it's a matter of space usage, if you want better resolution you have to use more buckets. For probabilistic data structures (like Count Min Sketch mentioned before) this could also be adjusted by the amount of allowed noise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I don't want great to be the enemy of good, so unless there are outstanding issues with the current dump interface, let's merge this!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liewegas To clarify the quantization stuff: I'm finding bucket in two steps, first is to quantize and shift the input space to normalize it. In the next step the quantized value is converted to a final bucket number by moving to a correct scale - either leave it as linear or make logarithmic, this second transformation always operate on uniform input space then.

Regarding the output - for each bucket it also dumps the range of values it holds so I guess switching to something else would just require dumping it in compatible format. Dumping probabilistic data structure could be a bit trickier though as long as we don't want to output large amounts of data.

@tchaikov tchaikov self-assigned this Feb 7, 2017
@yuriw yuriw merged commit 6ff5c5f into ceph:master Feb 9, 2017
@yuriw
Copy link
Contributor

yuriw commented Feb 9, 2017

@byo byo deleted the wip-perf-counters-histogram branch February 9, 2017 09:09
scale_type_d m_scale_type;
int64_t m_min;
int64_t m_quant_size;
int32_t m_buckets;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@byo where is m_buckets initialized?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently it's configured here and here

Copy link
Contributor

@tchaikov tchaikov Feb 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@byo okay, i was trying to fix the issues reported by coverity, which complains that the fields of axis_config_d are not initialized. i know this is a false alarm now, but shall we set some safe values in axis_config_d to silence this warning ? like

  struct axis_config_d {
    const char *m_name = nullptr;
    scale_type_d m_scale_type = 0;
    int64_t m_min = 0;
    int64_t m_quant_size = 0;
    int32_t m_buckets =0;
}

what do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tchaikov Seems like a great idea.
Those values would also create invalid configuration object and would assert. This would show up quickly when starting OSD.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@byo thanks! i just added a commit to #13473.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tchaikov I added a comment in #13473 - a small fix in this patch is needed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants