Skip to content
This repository has been archived by the owner. It is now read-only.
Permalink
Browse files
Rearrange and lengthen the watchdog delay
I did not completely comprehend that code upgrades are not atomic for
all code. This watchdog ended up causing a node reboot into an unusable
state because it killed couch_db_update_notifier handlers before the new
code was installed for each app.

This lead to mem3 quickly cycling trying to use couch_db_update_notifier
which eventually took down the mem3 app which took down the node. Then
the node would reboot into 1202 after databases had upgraded their
headers which prevented the node from booting correctly.

By extending the timeout to five minutes and placing it before the first
call to terminating couch_db_update I hope to give the release enough
time to complete before telling each handler to upgrade.
  • Loading branch information
davisp authored and rnewson committed Jul 30, 2014
1 parent de23171 commit 707997e37db11aa8194b00c0a432e49c7071b1f2
Showing 1 changed file with 1 addition and 1 deletion.
@@ -123,6 +123,7 @@ code_change(_OldVsn, St, _Extra) ->


watchdog() ->
timer:sleep(300000),
Handlers = gen_event:which_handlers(couch_db_update),
case length(Handlers) > 0 of
true ->
@@ -133,7 +134,6 @@ watchdog() ->
false ->
ok
end,
timer:sleep(5000),
?MODULE:watchdog().


0 comments on commit 707997e

Please sign in to comment.