restart_osd_daemon.sh.j2 - Reset RETRIES between calls of check_pgs

Previously RETRIES was set (by default to 40) once at the start of the script; this meant that it would only ever wait for up to 40 lots of 30s across *all* the OSDs on a host before bombing out. In fact, we want to be prepared to wait for the same amount of time after each OSD restart for the clusters' pgs to be happy again before continuing. Closes: #3154 Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk> (cherry picked from commit aa97ecf)
ceph · Sep 24, 2018 · 93bc69e · 93bc69e
1 parent 4ce11a8
commit 93bc69e
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/roles/ceph-defaults/templates/restart_osd_daemon.sh.j2 b/roles/ceph-defaults/templates/restart_osd_daemon.sh.j2
@@ -1,6 +1,5 @@
 #!/bin/bash
 
-RETRIES="{{ handler_health_osd_check_retries }}"
 DELAY="{{ handler_health_osd_check_delay }}"
 CEPH_CLI="--name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/{{ cluster }}.keyring --cluster {{ cluster }}"
 
@@ -78,6 +77,7 @@ for unit in $(systemctl list-units | grep -E "loaded * active" | grep -oE "ceph-
   {% endif %}
   SOCKET=/var/run/ceph/{{ cluster }}-osd.${osd_id}.asok
   while [ $COUNT -ne 0 ]; do
+    RETRIES="{{ handler_health_osd_check_retries }}"
     $docker_exec test -S "$SOCKET" && check_pgs && continue 2
     sleep $DELAY
     let COUNT=COUNT-1