Do not crash couch_log application when gen_* servers send extra args #1662
Conversation
Might also be a good idea to change the supervision tree to something like 5/10 or 10/10 for the restart intensity to avoid restarting the node. |
@@ -45,7 +45,7 @@ gen_server_error_test() -> | |||
{ | |||
Pid, | |||
"** Generic server and some stuff", | |||
[a_gen_server, {foo, bar}, server_state, some_reason] | |||
[a_gen_server, {foo, bar}, server_state, some_reason, sad, args] |
davisp
Oct 17, 2018
Member
You should add a new test case so that we're testing with and without the extra args.
You should add a new test case so that we're testing with and without the extra args.
@@ -71,7 +72,7 @@ gen_fsm_error_test() -> | |||
{ | |||
Pid, | |||
"** State machine did a thing", | |||
[a_gen_fsm, {ohai,there}, state_name, curr_state, barf] | |||
[a_gen_fsm, {ohai,there}, state_name, curr_state, barf, sad, args] |
davisp
Oct 17, 2018
Member
Samesies as the other test.
Samesies as the other test.
@@ -58,18 +58,18 @@ format(Level, Pid, Msg) -> | |||
|
|||
format({error, _GL, {Pid, "** Generic server " ++ _, Args}}) -> |
davisp
Oct 17, 2018
Member
I also think it'd be a good idea to have this function be do_format, and then we can wrap it in something ilke:
format(Arg) ->
try
do_format(Arg)
catch T:R ->
log_that_it_bew_up(T, R)
end.
I also think it'd be a good idea to have this function be do_format, and then we can wrap it in something ilke:
format(Arg) ->
try
do_format(Arg)
catch T:R ->
log_that_it_bew_up(T, R)
end.
gen_server, gen_fsm and gen_statem might send extra args when terminating. This is a recent behavior and not handling these extra args could lead to couch_log application crashing and taking down the whole VM with it. There are two improvements to fix the issue: 1) Handle the extra args. Format them and log as they might have useful information included. 2) Wrap the whole `format` function in a `try ... catch` statement. This will avoid any other cases where the logger itself if crashing when attepting to format error events.
Previously it was too easy to crash the whole node when any of couch_log's children restarted. To improve resiliency, let couch_log application restart a few more times before taking down the whole node with it.
Nice fix! |
+1 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
gen_server
,gen_fsm
andgen_statem
might send extra args when terminating. Thisis a recent behavior, and not handling these extra args could lead to couch_log
application crashing and taking down the whole VM with it.