Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd,mon: misc full fixes and cleanups #13968

Merged
merged 24 commits into from Apr 17, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
1e7d227
osd: Fix log message
dzafman Feb 28, 2017
3a66f1f
ceph-objectstore-tool: cleanup comment
dzafman Apr 3, 2017
6c930f7
osd: Increase osd_backfill_retry_interval to 30 seconds
dzafman Mar 16, 2017
d024ab0
osd: Remove unused argument to clear_queued_recovery
dzafman Mar 16, 2017
811f89a
test: Switch from pg to osd for set-*-ratio commands
dzafman Apr 7, 2017
e927cd2
test: Fix intended test flow and restore nearfull-ratio
dzafman Apr 7, 2017
5baf7ab
osd: Fail-safe full is a hard stop even for mds
dzafman Mar 30, 2017
7912433
osd: too_full_for_backfill() returns ostream for reason
dzafman Mar 30, 2017
79a4ac4
common: Remove unused config option osd_recovery_threads
dzafman Mar 30, 2017
9dd6952
common: Bump ratio for backfillfull from 85% to 90%
dzafman Apr 3, 2017
0264bbd
osd: For testing full disks add injectfull socket command
dzafman Mar 30, 2017
a573107
osd: Handle backfillfull_ratio just like nearfull and full
dzafman Mar 30, 2017
1e2fde1
osd: Revamp injectfull op to support all full states
dzafman Mar 31, 2017
1711ccd
osd: Check failsafe full and crash on push/pull
dzafman Apr 3, 2017
94e253c
osd: Rename backfill_request_* to recovery_request_*
dzafman Mar 16, 2017
c7e8dca
osd: Add check_osdmap_full() to check for shard OSD fullness
dzafman Mar 16, 2017
27e1450
osd: Add PG state and flag for too full for recovery
dzafman Apr 5, 2017
8408856
osd: Check whether any OSD is full before starting recovery
dzafman Apr 5, 2017
1fafec2
osd: check_full_status() remove bogus comment and use equivalent comp…
dzafman Apr 12, 2017
afd739b
mon: Use currently configure full ratio to determine available space
dzafman Apr 13, 2017
c83f11d
mon: Always fix-up full ratios when specified incorrectly in config
dzafman Apr 13, 2017
e4cf10d
mon: Issue warning or error if a full ratio out of order
dzafman Apr 13, 2017
2522307
mon, osd: Add detailed full information for now in the mon
dzafman Apr 14, 2017
3becdd3
test: Test health check output for full ratios
dzafman Apr 15, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
7 changes: 4 additions & 3 deletions doc/dev/osd_internals/recovery_reservation.rst
Expand Up @@ -34,8 +34,8 @@ the typical process.

Once the primary has its local reservation, it requests a remote
reservation from the backfill target. This reservation CAN be rejected,
for instance if the OSD is too full (osd_backfill_full_ratio config
option). If the reservation is rejected, the primary drops its local
for instance if the OSD is too full (backfillfull_ratio osd setting).
If the reservation is rejected, the primary drops its local
reservation, waits (osd_backfill_retry_interval), and then retries. It
will retry indefinitely.

Expand All @@ -62,9 +62,10 @@ to the monitor. The state chart can set:

- recovery_wait: waiting for local/remote reservations
- recovering: recovering
- recovery_toofull: recovery stopped, OSD(s) above full ratio
- backfill_wait: waiting for remote backfill reservations
- backfilling: backfilling
- backfill_toofull: backfill reservation rejected, OSD too full
- backfill_toofull: backfill stopped, OSD(s) above backfillfull ratio


--------
Expand Down
6 changes: 6 additions & 0 deletions doc/man/8/ceph.rst
Expand Up @@ -1166,6 +1166,12 @@ Usage::

ceph pg set_full_ratio <float[0.0-1.0]>

Subcommand ``set_backfillfull_ratio`` sets ratio at which pgs are considered too full to backfill.

Usage::

ceph pg set_backfillfull_ratio <float[0.0-1.0]>

Subcommand ``set_nearfull_ratio`` sets ratio at which pgs are considered nearly
full.

Expand Down
10 changes: 10 additions & 0 deletions doc/rados/configuration/mon-config-ref.rst
Expand Up @@ -400,6 +400,7 @@ a reasonable number for a near full ratio.
[global]

mon osd full ratio = .80
mon osd backfillfull ratio = .75
mon osd nearfull ratio = .70


Expand All @@ -412,6 +413,15 @@ a reasonable number for a near full ratio.
:Default: ``.95``


``mon osd backfillfull ratio``

:Description: The percentage of disk space used before an OSD is
considered too ``full`` to backfill.

:Type: Float
:Default: ``.90``


``mon osd nearfull ratio``

:Description: The percentage of disk space used before an OSD is
Expand Down
16 changes: 0 additions & 16 deletions doc/rados/configuration/osd-config-ref.rst
Expand Up @@ -560,15 +560,6 @@ priority than requests to read or write data.
:Default: ``512``


``osd backfill full ratio``

:Description: Refuse to accept backfill requests when the Ceph OSD Daemon's
full ratio is above this value.

:Type: Float
:Default: ``0.85``


``osd backfill retry interval``

:Description: The number of seconds to wait before retrying backfill requests.
Expand Down Expand Up @@ -673,13 +664,6 @@ perform well in a degraded state.
:Default: ``8 << 20``


``osd recovery threads``

:Description: The number of threads for recovering data.
:Type: 32-bit Integer
:Default: ``1``


``osd recovery thread timeout``

:Description: The maximum time in seconds before timing out a recovery thread.
Expand Down
8 changes: 4 additions & 4 deletions doc/rados/operations/monitoring-osd-pg.rst
Expand Up @@ -468,8 +468,7 @@ Ceph provides a number of settings to balance the resource contention between
new service requests and the need to recover data objects and restore the
placement groups to the current state. The ``osd recovery delay start`` setting
allows an OSD to restart, re-peer and even process some replay requests before
starting the recovery process. The ``osd recovery threads`` setting limits the
number of threads for the recovery process (1 thread by default). The ``osd
starting the recovery process. The ``osd
recovery thread timeout`` sets a thread timeout, because multiple OSDs may fail,
restart and re-peer at staggered rates. The ``osd recovery max active`` setting
limits the number of recovery requests an OSD will entertain simultaneously to
Expand Down Expand Up @@ -497,8 +496,9 @@ placement group can't be backfilled, it may be considered ``incomplete``.
Ceph provides a number of settings to manage the load spike associated with
reassigning placement groups to an OSD (especially a new OSD). By default,
``osd_max_backfills`` sets the maximum number of concurrent backfills to or from
an OSD to 10. The ``osd backfill full ratio`` enables an OSD to refuse a
backfill request if the OSD is approaching its full ratio (85%, by default).
an OSD to 10. The ``backfill full ratio`` enables an OSD to refuse a
backfill request if the OSD is approaching its full ratio (90%, by default) and
change with ``ceph osd set-backfillfull-ratio`` comand.
If an OSD refuses a backfill request, the ``osd backfill retry interval``
enables an OSD to retry the request (after 10 seconds, by default). OSDs can
also set ``osd backfill scan min`` and ``osd backfill scan max`` to manage scan
Expand Down
17 changes: 10 additions & 7 deletions doc/rados/troubleshooting/troubleshooting-osd.rst
Expand Up @@ -206,28 +206,31 @@ Ceph prevents you from writing to a full OSD so that you don't lose data.
In an operational cluster, you should receive a warning when your cluster
is getting near its full ratio. The ``mon osd full ratio`` defaults to
``0.95``, or 95% of capacity before it stops clients from writing data.
The ``mon osd nearfull ratio`` defaults to ``0.85``, or 85% of capacity
The ``mon osd backfillfull ratio`` defaults to ``0.90``, or 90 % of
capacity when it blocks backfills from starting. The
``mon osd nearfull ratio`` defaults to ``0.85``, or 85% of capacity
when it generates a health warning.

Full cluster issues usually arise when testing how Ceph handles an OSD
failure on a small cluster. When one node has a high percentage of the
cluster's data, the cluster can easily eclipse its nearfull and full ratio
immediately. If you are testing how Ceph reacts to OSD failures on a small
cluster, you should leave ample free disk space and consider temporarily
lowering the ``mon osd full ratio`` and ``mon osd nearfull ratio``.
lowering the ``mon osd full ratio``, ``mon osd backfillfull ratio`` and
``mon osd nearfull ratio``.

Full ``ceph-osds`` will be reported by ``ceph health``::

ceph health
HEALTH_WARN 1 nearfull osds
osd.2 is near full at 85%
HEALTH_WARN 1 nearfull osd(s)

Or::

ceph health
HEALTH_ERR 1 nearfull osds, 1 full osds
osd.2 is near full at 85%
ceph health detail
HEALTH_ERR 1 full osd(s); 1 backfillfull osd(s); 1 nearfull osd(s)
osd.3 is full at 97%
osd.4 is backfill full at 91%
osd.2 is near full at 87%

The best way to deal with a full cluster is to add new ``ceph-osds``, allowing
the cluster to redistribute data to the newly available storage.
Expand Down
14 changes: 8 additions & 6 deletions qa/tasks/ceph_manager.py
Expand Up @@ -696,7 +696,7 @@ def test_backfill_full(self):
"""
Test backfills stopping when the replica fills up.

First, use osd_backfill_full_ratio to simulate a now full
First, use injectfull admin command to simulate a now full
osd by setting it to 0 on all of the OSDs.

Second, on a random subset, set
Expand All @@ -705,13 +705,14 @@ def test_backfill_full(self):

Then, verify that all backfills stop.
"""
self.log("injecting osd_backfill_full_ratio = 0")
self.log("injecting backfill full")
for i in self.live_osds:
self.ceph_manager.set_config(
i,
osd_debug_skip_full_check_in_backfill_reservation=
random.choice(['false', 'true']),
osd_backfill_full_ratio=0)
random.choice(['false', 'true']))
self.ceph_manager.osd_admin_socket(i, command=['injectfull', 'backfillfull'],
check_status=True, timeout=30, stdout=DEVNULL)
for i in range(30):
status = self.ceph_manager.compile_pg_status()
if 'backfill' not in status.keys():
Expand All @@ -724,8 +725,9 @@ def test_backfill_full(self):
for i in self.live_osds:
self.ceph_manager.set_config(
i,
osd_debug_skip_full_check_in_backfill_reservation='false',
osd_backfill_full_ratio=0.85)
osd_debug_skip_full_check_in_backfill_reservation='false')
self.ceph_manager.osd_admin_socket(i, command=['injectfull', 'none'],
check_status=True, timeout=30, stdout=DEVNULL)

def test_map_discontinuity(self):
"""
Expand Down
2 changes: 1 addition & 1 deletion qa/workunits/ceph-helpers.sh
Expand Up @@ -400,6 +400,7 @@ EOF
if test -z "$(get_config mon $id mon_initial_members)" ; then
ceph osd pool delete rbd rbd --yes-i-really-really-mean-it || return 1
ceph osd pool create rbd $PG_NUM || return 1
ceph osd set-backfillfull-ratio .99
fi
}

Expand Down Expand Up @@ -634,7 +635,6 @@ function activate_osd() {
ceph_disk_args+=" --prepend-to-path="

local ceph_args="$CEPH_ARGS"
ceph_args+=" --osd-backfill-full-ratio=.99"
ceph_args+=" --osd-failsafe-full-ratio=.99"
ceph_args+=" --osd-journal-size=100"
ceph_args+=" --osd-scrub-load-threshold=2000"
Expand Down
35 changes: 35 additions & 0 deletions qa/workunits/cephtool/test.sh
Expand Up @@ -1419,9 +1419,44 @@ function test_mon_pg()

ceph osd set-full-ratio .962
ceph osd dump | grep '^full_ratio 0.962'
ceph osd set-backfillfull-ratio .912
ceph osd dump | grep '^backfillfull_ratio 0.912'
ceph osd set-nearfull-ratio .892
ceph osd dump | grep '^nearfull_ratio 0.892'

# Check health status
ceph osd set-nearfull-ratio .913
ceph health | grep 'HEALTH_ERR Full ratio(s) out of order'
ceph health detail | grep 'backfill_ratio (0.912) < nearfull_ratio (0.913), increased'
ceph osd set-nearfull-ratio .892
ceph osd set-backfillfull-ratio .963
ceph health detail | grep 'full_ratio (0.962) < backfillfull_ratio (0.963), increased'
ceph osd set-backfillfull-ratio .912

# Check injected full results
WAITFORFULL=10
ceph --admin-daemon $CEPH_OUT_DIR/osd.0.asok injectfull nearfull
sleep $WAITFORFULL
ceph health | grep "HEALTH_WARN.*1 nearfull osd(s)"
ceph --admin-daemon $CEPH_OUT_DIR/osd.1.asok injectfull backfillfull
sleep $WAITFORFULL
ceph health | grep "HEALTH_WARN.*1 backfillfull osd(s)"
ceph --admin-daemon $CEPH_OUT_DIR/osd.2.asok injectfull failsafe
sleep $WAITFORFULL
# failsafe and full are the same as far as the monitor is concerned
ceph health | grep "HEALTH_ERR.*1 full osd(s)"
ceph --admin-daemon $CEPH_OUT_DIR/osd.0.asok injectfull full
sleep $WAITFORFULL
ceph health | grep "HEALTH_ERR.*2 full osd(s)"
ceph health detail | grep "osd.0 is full at.*%"
ceph health detail | grep "osd.2 is full at.*%"
ceph health detail | grep "osd.1 is backfill full at.*%"
ceph --admin-daemon $CEPH_OUT_DIR/osd.0.asok injectfull none
ceph --admin-daemon $CEPH_OUT_DIR/osd.1.asok injectfull none
ceph --admin-daemon $CEPH_OUT_DIR/osd.2.asok injectfull none
sleep $WAITFORFULL
ceph health | grep HEALTH_OK

ceph pg stat | grep 'pgs:'
ceph pg 0.0 query
ceph tell 0.0 query
Expand Down
6 changes: 5 additions & 1 deletion qa/workunits/rest/test.py
Expand Up @@ -359,10 +359,14 @@ def expect_nofail(url, method, respcode, contenttype, extra_hdrs=None,
r = expect('osd/dump', 'GET', 200, 'json', JSONHDR)
assert(float(r.myjson['output']['full_ratio']) == 0.90)
expect('osd/set-full-ratio?ratio=0.95', 'PUT', 200, '')
expect('osd/set-backfillfull-ratio?ratio=0.88', 'PUT', 200, '')
r = expect('osd/dump', 'GET', 200, 'json', JSONHDR)
assert(float(r.myjson['output']['backfillfull_ratio']) == 0.88)
expect('osd/set-backfillfull-ratio?ratio=0.90', 'PUT', 200, '')
expect('osd/set-nearfull-ratio?ratio=0.90', 'PUT', 200, '')
r = expect('osd/dump', 'GET', 200, 'json', JSONHDR)
assert(float(r.myjson['output']['nearfull_ratio']) == 0.90)
expect('osd/set-full-ratio?ratio=0.85', 'PUT', 200, '')
expect('osd/set-nearfull-ratio?ratio=0.85', 'PUT', 200, '')

r = expect('pg/stat', 'GET', 200, 'json', JSONHDR)
assert('num_pgs' in r.myjson['output'])
Expand Down
2 changes: 2 additions & 0 deletions src/common/ceph_strings.cc
Expand Up @@ -42,6 +42,8 @@ const char *ceph_osd_state_name(int s)
return "full";
case CEPH_OSD_NEARFULL:
return "nearfull";
case CEPH_OSD_BACKFILLFULL:
return "backfillfull";
default:
return "???";
}
Expand Down
11 changes: 6 additions & 5 deletions src/common/config_opts.h
Expand Up @@ -308,6 +308,7 @@ OPTION(mon_pg_warn_min_pool_objects, OPT_INT, 1000) // do not warn on pools bel
OPTION(mon_pg_check_down_all_threshold, OPT_FLOAT, .5) // threshold of down osds after which we check all pgs
OPTION(mon_cache_target_full_warn_ratio, OPT_FLOAT, .66) // position between pool cache_target_full and max where we start warning
OPTION(mon_osd_full_ratio, OPT_FLOAT, .95) // what % full makes an OSD "full"
OPTION(mon_osd_backfillfull_ratio, OPT_FLOAT, .90) // what % full makes an OSD backfill full (backfill halted)
OPTION(mon_osd_nearfull_ratio, OPT_FLOAT, .85) // what % full makes an OSD near full
OPTION(mon_allow_pool_delete, OPT_BOOL, false) // allow pool deletion
OPTION(mon_globalid_prealloc, OPT_U32, 10000) // how many globalids to prealloc
Expand Down Expand Up @@ -626,11 +627,11 @@ OPTION(osd_max_backfills, OPT_U64, 1)
// Minimum recovery priority (255 = max, smaller = lower)
OPTION(osd_min_recovery_priority, OPT_INT, 0)

// Refuse backfills when OSD full ratio is above this value
OPTION(osd_backfill_full_ratio, OPT_FLOAT, 0.85)

// Seconds to wait before retrying refused backfills
OPTION(osd_backfill_retry_interval, OPT_DOUBLE, 10.0)
OPTION(osd_backfill_retry_interval, OPT_DOUBLE, 30.0)

// Seconds to wait before retrying refused recovery
OPTION(osd_recovery_retry_interval, OPT_DOUBLE, 30.0)

// max agent flush ops
OPTION(osd_agent_max_ops, OPT_INT, 4)
Expand Down Expand Up @@ -742,7 +743,6 @@ OPTION(osd_op_pq_min_cost, OPT_U64, 65536)
OPTION(osd_disk_threads, OPT_INT, 1)
OPTION(osd_disk_thread_ioprio_class, OPT_STR, "") // rt realtime be best effort idle
OPTION(osd_disk_thread_ioprio_priority, OPT_INT, -1) // 0-7
OPTION(osd_recovery_threads, OPT_INT, 1)
OPTION(osd_recover_clone_overlap, OPT_BOOL, true) // preserve clone_overlap during recovery/migration
OPTION(osd_op_num_threads_per_shard, OPT_INT, 2)
OPTION(osd_op_num_shards, OPT_INT, 5)
Expand Down Expand Up @@ -871,6 +871,7 @@ OPTION(osd_debug_skip_full_check_in_backfill_reservation, OPT_BOOL, false)
OPTION(osd_debug_reject_backfill_probability, OPT_DOUBLE, 0)
OPTION(osd_debug_inject_copyfrom_error, OPT_BOOL, false) // inject failure during copyfrom completion
OPTION(osd_debug_misdirected_ops, OPT_BOOL, false)
OPTION(osd_debug_skip_full_check_in_recovery, OPT_BOOL, false)
OPTION(osd_enxio_on_misdirected_op, OPT_BOOL, false)
OPTION(osd_debug_verify_cached_snaps, OPT_BOOL, false)
OPTION(osd_enable_op_tracker, OPT_BOOL, true) // enable/disable OSD op tracking
Expand Down
1 change: 1 addition & 0 deletions src/include/rados.h
Expand Up @@ -116,6 +116,7 @@ struct ceph_eversion {
#define CEPH_OSD_NEW (1<<3) /* osd is new, never marked in */
#define CEPH_OSD_FULL (1<<4) /* osd is at or above full threshold */
#define CEPH_OSD_NEARFULL (1<<5) /* osd is at or above nearfull threshold */
#define CEPH_OSD_BACKFILLFULL (1<<6) /* osd is at or above backfillfull threshold */

extern const char *ceph_osd_state_name(int s);

Expand Down
4 changes: 4 additions & 0 deletions src/mon/MonCommands.h
Expand Up @@ -592,6 +592,10 @@ COMMAND("osd set-full-ratio " \
"name=ratio,type=CephFloat,range=0.0|1.0", \
"set usage ratio at which OSDs are marked full",
"osd", "rw", "cli,rest")
COMMAND("osd set-backfillfull-ratio " \
"name=ratio,type=CephFloat,range=0.0|1.0", \
"set usage ratio at which OSDs are marked too full to backfill",
"osd", "rw", "cli,rest")
COMMAND("osd set-nearfull-ratio " \
"name=ratio,type=CephFloat,range=0.0|1.0", \
"set usage ratio at which OSDs are marked near-full",
Expand Down