Skip to content

Commit

Permalink
restart_osd_daemon.sh.j2 - Reset RETRIES between calls of check_pgs
Browse files Browse the repository at this point in the history
Previously RETRIES was set (by default to 40) once at the start of the
script; this meant that it would only ever wait for up to 40 lots of
30s across *all* the OSDs on a host before bombing out. In fact, we
want to be prepared to wait for the same amount of time after each OSD
restart for the clusters' pgs to be happy again before continuing.

Closes: #3154
Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
(cherry picked from commit aa97ecf)
  • Loading branch information
mcv21 authored and mergify[bot] committed Sep 24, 2018
1 parent 4ce11a8 commit 93bc69e
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion roles/ceph-defaults/templates/restart_osd_daemon.sh.j2
@@ -1,6 +1,5 @@
#!/bin/bash

RETRIES="{{ handler_health_osd_check_retries }}"
DELAY="{{ handler_health_osd_check_delay }}"
CEPH_CLI="--name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/{{ cluster }}.keyring --cluster {{ cluster }}"

Expand Down Expand Up @@ -78,6 +77,7 @@ for unit in $(systemctl list-units | grep -E "loaded * active" | grep -oE "ceph-
{% endif %}
SOCKET=/var/run/ceph/{{ cluster }}-osd.${osd_id}.asok
while [ $COUNT -ne 0 ]; do
RETRIES="{{ handler_health_osd_check_retries }}"
$docker_exec test -S "$SOCKET" && check_pgs && continue 2
sleep $DELAY
let COUNT=COUNT-1
Expand Down

0 comments on commit 93bc69e

Please sign in to comment.