Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: Health warnings on long network ping times, add "dump_osd_network" to get a report #28755

Merged
merged 18 commits into from Sep 5, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
66d44e7
osd mon: Track heartbeat ping times and report health warning
dzafman Jun 26, 2019
025b10a
osd: Add "dump_osd_network" osd admin request to get a sorted report
dzafman Jul 10, 2019
5d3c185
mgr: Add "dump_osd_network" mgr admin request to get a sorted report
dzafman Jul 9, 2019
0d1bbd3
osd mgr mon: Add mon_warn_on_slow_ping_ratio config as 5% of osd_hear…
dzafman Jul 11, 2019
f4a0be2
doc: Add documentation and release notes
dzafman Jul 11, 2019
297a0e7
osd mgr: Add minimum and maximum tracking to network ping time
dzafman Jul 12, 2019
3f846d7
osd mgr: Store last pingtime for possible graphing
dzafman Jul 15, 2019
6555699
osd: After first interval populate vectors so 5min/15min values aren't 0
dzafman Jul 16, 2019
ea20d35
osd mon: Add last_update to osd_stat_t heartbeat info
dzafman Jul 19, 2019
5ab145d
mon: Indicate when an osd with slow ping time is down
dzafman Jul 19, 2019
048f809
osd mgr: Add osd_mon_heartbeat_stat_stale option to time out ping info
dzafman Jul 22, 2019
f2b26d8
osd: Add debug_disable_randomized_ping config for use in testing
dzafman Jul 24, 2019
573aea2
osd: Add debug_heartbeat_testing_span to allow quicker testing
dzafman Jul 24, 2019
4fb42ea
test: Add basic test for network ping tracking
dzafman Jul 13, 2019
8ac1562
common: Add support routines to generate strings for fixed point
dzafman Aug 9, 2019
9d02e5d
osd mon mgr: Convert all network ping time output to milliseconds
dzafman Aug 6, 2019
5f83a61
osd doc mon mgr: To milliseconds for config value, user input and thr…
dzafman Sep 4, 2019
71015b9
doc: Document network performance monitoring
dzafman Sep 4, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
16 changes: 16 additions & 0 deletions PendingReleaseNotes
Expand Up @@ -108,3 +108,19 @@
* The structured output of ``ceph status`` or ``ceph -s`` is now more
concise, particularly the `mgrmap` and `monmap` sections, and the
structure of the `osdmap` section has been cleaned up.

* A health warning is now generated if the average osd heartbeat ping
time exceeds a configurable threshold for any of the intervals
computed. The OSD computes 1 minute, 5 minute and 15 minute
intervals with average, minimum and maximum values. New configuration
option ``mon_warn_on_slow_ping_ratio`` specifies a percentage of
``osd_heartbeat_grace`` to determine the threshold. A value of zero
disables the warning. New configuration option
``mon_warn_on_slow_ping_time`` specified in milliseconds over-rides the
computed value, causes a warning
when OSD heartbeat pings take longer than the specified amount.
New admin command ``ceph daemon mgr.# dump_osd_network [threshold]`` command will
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might make sense to add a little about the admin commands in the main docs too - maybe in doc/rados/operations/monitoring.rst

list all connections with a ping time longer than the specified threshold or
value determined by the config options, for the average for any of the 3 intervals.
New admin command ``ceph daemon osd.# dump_osd_network [threshold]`` will
do the same but only including heartbeats initiated by the specified OSD.
19 changes: 19 additions & 0 deletions doc/rados/configuration/mon-config-ref.rst
Expand Up @@ -393,6 +393,25 @@ by setting it in the ``[mon]`` section of the configuration file.
:Default: True


``mon warn on slow ping ratio``

:Description: Issue a ``HEALTH_WARN`` in cluster log if any heartbeat
between OSDs exceeds ``mon warn on slow ping ratio``
of ``osd heartbeat grace``. The default is 5%.
:Type: Float
:Default: ``0.05``


``mon warn on slow ping time``

:Description: Override ``mon warn on slow ping ratio`` with a specific value.
Issue a ``HEALTH_WARN`` in cluster log if any heartbeat
between OSDs exceeds ``mon warn on slow ping time``
milliseconds. The default is 0 (disabled).
:Type: Integer
:Default: ``0``


``mon cache target full warn ratio``

:Description: Position between pool's ``cache_target_full`` and
Expand Down
15 changes: 11 additions & 4 deletions doc/rados/configuration/mon-osd-interaction.rst
Expand Up @@ -24,10 +24,8 @@ monitoring the Ceph Storage Cluster.
OSDs Check Heartbeats
=====================

Each Ceph OSD Daemon checks the heartbeat of other Ceph OSD Daemons every 6
seconds. You can change the heartbeat interval by adding an ``osd heartbeat
interval`` setting under the ``[osd]`` section of your Ceph configuration file,
or by setting the value at runtime. If a neighboring Ceph OSD Daemon doesn't
Each Ceph OSD Daemon checks the heartbeat of other Ceph OSD Daemons at random
intervals less than every 6 seconds. If a neighboring Ceph OSD Daemon doesn't
show a heartbeat within a 20 second grace period, the Ceph OSD Daemon may
consider the neighboring Ceph OSD Daemon ``down`` and report it back to a Ceph
Monitor, which will update the Ceph Cluster Map. You may change this grace
Expand Down Expand Up @@ -379,6 +377,15 @@ OSD Settings
:Default: ``30``


``osd mon heartbeat stat stale``

:Description: Stop reporting on heartbeat ping times which haven't been updated for
this many seconds. Set to zero to disable this action.

:Type: 32-bit Integer
:Default: ``3600``


``osd mon report interval``

:Description: The number of seconds a Ceph OSD Daemon may wait
Expand Down
110 changes: 110 additions & 0 deletions doc/rados/operations/monitoring.rst
Expand Up @@ -159,6 +159,116 @@ to a health state:
2017-07-25 10:11:13.535493 mon.a mon.0 172.21.9.34:6789/0 110 : cluster [INF] Health check cleared: PG_DEGRADED (was: Degraded data redundancy: 2 pgs unclean, 2 pgs degraded, 2 pgs undersized)
2017-07-25 10:11:13.535577 mon.a mon.0 172.21.9.34:6789/0 111 : cluster [INF] Cluster is now healthy

Network Performance Checks
--------------------------

Ceph OSDs send heartbeat ping messages amongst themselves to monitor daemon availability. We
also use the response times to monitor network performance.
While it is possible that a busy OSD could delay a ping response, we can assume
that if a network switch fails mutiple delays will be detected between distinct pairs of OSDs.

By default we will warn about ping times which exceed 1 second (1000 milliseconds).

::

HEALTH_WARN Long heartbeat ping times on back interface seen, longest is 1118.001 msec

The health detail will add the combination of OSDs are seeing the delays and by how much. There is a limit of 10
detail line items.

::

[WRN] OSD_SLOW_PING_TIME_BACK: Long heartbeat ping times on back interface seen, longest is 1118.001 msec
Slow heartbeat ping on back interface from osd.0 to osd.1 1118.001 msec
Slow heartbeat ping on back interface from osd.0 to osd.2 1030.123 msec
Slow heartbeat ping on back interface from osd.2 to osd.1 1015.321 msec
Slow heartbeat ping on back interface from osd.1 to osd.0 1010.456 msec

To see even more detail and a complete dump of network performance information the ``dump_osd_network`` command can be used. Typically, this would be
sent to a mgr, but it can be limited to a particular OSD's interactions by issuing it to any OSD. The current threshold which defaults to 1 second
(1000 milliseconds) can be overridden as an argument in milliseconds.

The following command will show all gathered network performance data by specifying a threshold of 0 and sending to the mgr.

::

$ ceph daemon /var/run/ceph/ceph-mgr.x.asok dump_osd_network 0
{
"threshold": 0,
"entries": [
{
"last update": "Wed Sep 4 17:04:49 2019",
"stale": false,
"from osd": 2,
"to osd": 0,
"interface": "front",
"average": {
"1min": 1.023,
"5min": 0.860,
"15min": 0.883
},
"min": {
"1min": 0.818,
"5min": 0.607,
"15min": 0.607
},
"max": {
"1min": 1.164,
"5min": 1.173,
"15min": 1.544
},
"last": 0.924
},
{
"last update": "Wed Sep 4 17:04:49 2019",
"stale": false,
"from osd": 2,
"to osd": 0,
"interface": "back",
"average": {
"1min": 0.968,
"5min": 0.897,
"15min": 0.830
},
"min": {
"1min": 0.860,
"5min": 0.563,
"15min": 0.502
},
"max": {
"1min": 1.171,
"5min": 1.216,
"15min": 1.456
},
"last": 0.845
},
{
"last update": "Wed Sep 4 17:04:48 2019",
"stale": false,
"from osd": 0,
"to osd": 1,
"interface": "front",
"average": {
"1min": 0.965,
"5min": 0.811,
"15min": 0.850
},
"min": {
"1min": 0.650,
"5min": 0.488,
"15min": 0.466
},
"max": {
"1min": 1.252,
"5min": 1.252,
"15min": 1.362
},
"last": 0.791
},
...



Muting health checks
--------------------

Expand Down
145 changes: 145 additions & 0 deletions qa/standalone/misc/network-ping.sh
@@ -0,0 +1,145 @@
#!/usr/bin/env bash

source $CEPH_ROOT/qa/standalone/ceph-helpers.sh

function run() {
local dir=$1
shift

export CEPH_MON="127.0.0.1:7146" # git grep '\<7146\>' : there must be only one
export CEPH_ARGS
CEPH_ARGS+="--fsid=$(uuidgen) --auth-supported=none "
CEPH_ARGS+="--mon-host=$CEPH_MON "
CEPH_ARGS+="--debug_disable_randomized_ping=true "
CEPH_ARGS+="--debug_heartbeat_testing_span=5 "
CEPH_ARGS+="--osd_heartbeat_interval=1 "
local funcs=${@:-$(set | sed -n -e 's/^\(TEST_[0-9a-z_]*\) .*/\1/p')}
for func in $funcs ; do
setup $dir || return 1
$func $dir || return 1
teardown $dir || return 1
done
}

function TEST_network_ping_test1() {
local dir=$1

run_mon $dir a || return 1
run_mgr $dir x || return 1
run_osd $dir 0 || return 1
run_osd $dir 1 || return 1
run_osd $dir 2 || return 1

sleep 5

create_pool foo 16

# write some objects
timeout 20 rados bench -p foo 10 write -b 4096 --no-cleanup || return 1

# Get 1 cycle worth of ping data "1 minute"
sleep 10
flush_pg_stats

CEPH_ARGS='' ceph daemon $(get_asok_path osd.0) dump_osd_network | tee $dir/json
test "$(cat $dir/json | jq '.entries | length')" = "0" || return 1
test "$(cat $dir/json | jq '.threshold')" = "1000" || return 1

CEPH_ARGS='' ceph daemon $(get_asok_path mgr.x) dump_osd_network | tee $dir/json
test "$(cat $dir/json | jq '.entries | length')" = "0" || return 1
test "$(cat $dir/json | jq '.threshold')" = "1000" || return 1

CEPH_ARGS='' ceph daemon $(get_asok_path osd.0) dump_osd_network 0 | tee $dir/json
test "$(cat $dir/json | jq '.entries | length')" = "4" || return 1
test "$(cat $dir/json | jq '.threshold')" = "0" || return 1

CEPH_ARGS='' ceph daemon $(get_asok_path mgr.x) dump_osd_network 0 | tee $dir/json
test "$(cat $dir/json | jq '.entries | length')" = "12" || return 1
test "$(cat $dir/json | jq '.threshold')" = "0" || return 1

# Wait another 4 cycles to get "5 minute interval"
sleep 20
flush_pg_stats
CEPH_ARGS='' ceph daemon $(get_asok_path osd.0) dump_osd_network | tee $dir/json
test "$(cat $dir/json | jq '.entries | length')" = "0" || return 1
test "$(cat $dir/json | jq '.threshold')" = "1000" || return 1

CEPH_ARGS='' ceph daemon $(get_asok_path mgr.x) dump_osd_network | tee $dir/json
test "$(cat $dir/json | jq '.entries | length')" = "0" || return 1
test "$(cat $dir/json | jq '.threshold')" = "1000" || return 1

CEPH_ARGS='' ceph daemon $(get_asok_path osd.0) dump_osd_network 0 | tee $dir/json
test "$(cat $dir/json | jq '.entries | length')" = "4" || return 1
test "$(cat $dir/json | jq '.threshold')" = "0" || return 1

CEPH_ARGS='' ceph daemon $(get_asok_path mgr.x) dump_osd_network 0 | tee $dir/json
test "$(cat $dir/json | jq '.entries | length')" = "12" || return 1
test "$(cat $dir/json | jq '.threshold')" = "0" || return 1


# Wait another 10 cycles to get "15 minute interval"
sleep 50
flush_pg_stats
CEPH_ARGS='' ceph daemon $(get_asok_path osd.0) dump_osd_network | tee $dir/json
test "$(cat $dir/json | jq '.entries | length')" = "0" || return 1
test "$(cat $dir/json | jq '.threshold')" = "1000" || return 1

CEPH_ARGS='' ceph daemon $(get_asok_path mgr.x) dump_osd_network | tee $dir/json
test "$(cat $dir/json | jq '.entries | length')" = "0" || return 1
test "$(cat $dir/json | jq '.threshold')" = "1000" || return 1

CEPH_ARGS='' ceph daemon $(get_asok_path osd.0) dump_osd_network 0 | tee $dir/json
test "$(cat $dir/json | jq '.entries | length')" = "4" || return 1
test "$(cat $dir/json | jq '.threshold')" = "0" || return 1

CEPH_ARGS='' ceph daemon $(get_asok_path mgr.x) dump_osd_network 0 | tee $dir/json
test "$(cat $dir/json | jq '.entries | length')" = "12" || return 1
test "$(cat $dir/json | jq '.threshold')" = "0" || return 1

# Just check the threshold output matches the input
CEPH_ARGS='' ceph daemon $(get_asok_path mgr.x) dump_osd_network 99 | tee $dir/json
test "$(cat $dir/json | jq '.threshold')" = "99" || return 1
CEPH_ARGS='' ceph daemon $(get_asok_path osd.0) dump_osd_network 98 | tee $dir/json
test "$(cat $dir/json | jq '.threshold')" = "98" || return 1

rm -f $dir/json
}

# Test setting of mon_warn_on_slow_ping_time very low to
# get health warning
function TEST_network_ping_test2() {
local dir=$1

export CEPH_ARGS
export EXTRA_OPTS=" --mon_warn_on_slow_ping_time=1"
run_mon $dir a || return 1
run_mgr $dir x || return 1
run_osd $dir 0 || return 1
run_osd $dir 1 || return 1
run_osd $dir 2 || return 1

sleep 5

create_pool foo 16

# write some objects
timeout 20 rados bench -p foo 10 write -b 4096 --no-cleanup || return 1

# Get at least 1 cycle of ping data (this test runs with 5 second cycles of 1 second pings)
sleep 10
flush_pg_stats

ceph health | tee $dir/health
grep -q "Long heartbeat" $dir/health || return 1

ceph health detail | tee $dir/health
grep -q "OSD_SLOW_PING_TIME_BACK" $dir/health || return 1
grep -q "OSD_SLOW_PING_TIME_FRONT" $dir/health || return 1
rm -f $dir/health
}

main network-ping "$@"

# Local Variables:
# compile-command: "cd ../.. ; make -j4 && ../qa/run-standalone.sh network-ping.sh"
# End:
26 changes: 26 additions & 0 deletions src/common/Formatter.cc
Expand Up @@ -25,6 +25,32 @@
// -----------------------
namespace ceph {

std::string
fixed_u_to_string(uint64_t num, int scale)
{
std::ostringstream t;

t.fill('0');
t.width(scale + 1);
t << num;
int len = t.str().size();
return t.str().substr(0,len - scale) + "." + t.str().substr(len - scale);
}

std::string
fixed_to_string(int64_t num, int scale)
{
std::ostringstream t;
bool neg = num < 0;
if (neg) num = -num;

t.fill('0');
t.width(scale + 1);
t << num;
int len = t.str().size();
return (neg ? "-" : "") + t.str().substr(0,len - scale) + "." + t.str().substr(len - scale);
}

/*
* FormatterAttrs(const char *attr, ...)
*
Expand Down
2 changes: 2 additions & 0 deletions src/common/Formatter.h
Expand Up @@ -299,5 +299,7 @@ namespace ceph {
std::vector< std::string > m_column_name;
};

std::string fixed_to_string(int64_t num, int scale);
std::string fixed_u_to_string(uint64_t num, int scale);
}
#endif