Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

error_logger_lager_h discards trapped EXIT's and hangs the system #198

Closed
yfyf opened this Issue · 2 comments

2 participants

@yfyf

Hi,

This is a rather bizarre bug report.

When the high watermark is exceeded, discard_messages/2 will discard all
messages:
https://github.com/basho/lager/blob/b6bfaca5f3629b657ae944b6b5483e7bd6842947/src/error_logger_lager_h.erl#L134

This means it will also discard trapped EXITs and other gen_event internal
messaging, which leads to undefined behaviour. In practice, I've only observed
this in tests hanging (when a test suite tries to clean up by stopping all
applications), but you can reproduce this in general too, though not very
easily, here's how:

  1. Start an Erlang shell with ERL_FLAGS="-smp disabled" (disabling SMP seems to make it easier to reproduce, at least for me)
  2. (Optional) apply this patch to see more informative output: yfyf@b8914e2
  3. Run the following:

    error_logger:tty(false),
    application:load(lager),
    application:set_env(lager, error_logger_hwm, 2),
    ok = lager:start(),
    ok = application:stop(lager),
    {ok, Tref} = timer:apply_interval(1, error_logger, error_msg, ["Foo"]),
    Fun = fun(N) ->
        erlang:display({round, N}),
        ok = application:start(lager),
        ok = application:stop(lager)
    end,
    %% eliminate the possibility that the handler is getting messages faster
    %% than it can discard them:
    spawn(fun () -> timer:apply_after(1000, timer, cancel, [Tref]) end),
    
    lists:foreach(Fun, lists:seq(1, 1000)).
  4. You should now observe something like:

    {adding_handler,error_logger}
    {round,63}
    12:29:48.752 [error] Foo
    12:29:48.752 [error] Foo
    {discard,{notify,{error,<0.8.0>,{<0.1780.0>,"Foo",[]}}}}
    {discard,{notify,{error,<0.8.0>,{<0.1781.0>,"Foo",[]}}}}
    {discard,{notify,{error,<0.8.0>,{<0.1782.0>,"Foo",[]}}}}
    {discard,{'EXIT',<0.1774.0>,shutdown}} %% OUCH!
    {adding_handler,error_logger}
    {round,64}
    {discard,{notify,{error,<0.8.0>,{<0.1805.0>,"Foo",[]}}}}
    {discard,{notify,{error,<0.8.0>,{<0.1807.0>,"Foo",[]}}}}
    {adding_handler,error_logger}
    {adding_handler,error_logger_lager_h} %% Duplicate error_logger_lager_h entries, ouch!
    {round,65}
    {discard,{notify,{error,<0.8.0>,{<0.1829.0>,"Foo",[]}}}}
    {adding_handler,error_logger}
    {discard,{notify,{error,<0.8.0>,{<0.1831.0>,"Foo",[]}}}}
    {discard,{<0.1810.0>,{<0.1810.0>,#Ref<0.0.0.4795>},{add_handler,error_logger,[]}}}
    <SYSTEM HANGS>

A possible solution is to make sure you are only discarding gen_event events,
i.e. in the receive to pattern match only on {notify, Event} and {From, Tag,
{sync_notify, Event}}

(https://github.com/erlang/otp/blob/maint/lib/stdlib/src/gen_event.erl#L265)

This then relies on the internals of gen_event, but I don't think there's any
better alternative. I'm happy to provide a patch if you agree with this
solution.

@yfyf

Ah, I forgot to add the main point: setting {error_logger_hwm, undefined} makes the problems go away.

@Vagabond
Collaborator

Yes, make a PR, that sounds like a reasonable fix to this problem.

@yfyf yfyf referenced this issue from a commit in yfyf/lager
@yfyf yfyf Only discard gen_event notifications on high watermark
Previously all messages were being flushed, which meant that trapped exits
and internal gen_event messaging was also flushed, leading to undefined
behaviour.

closes #198
5862805
@yfyf yfyf referenced this issue from a commit in yfyf/lager
@yfyf yfyf Only discard gen_event notifications on high watermark
Previously all messages were being flushed, which meant that trapped exits
and internal gen_event messaging was also flushed, leading to undefined
behaviour.

closes #198
b3544a7
@yfyf yfyf referenced this issue from a commit in yfyf/lager
@yfyf yfyf Only discard gen_event notifications on high watermark
Previously all messages were being flushed, which meant that trapped exits
and internal gen_event messaging was also flushed, leading to undefined
behaviour.

closes #198
9f844c9
@yfyf yfyf referenced this issue from a commit in yfyf/lager
@yfyf yfyf Only discard gen_event notifications on high watermark
Previously all messages were being flushed, which meant that trapped exits
and internal gen_event messaging was also flushed, leading to undefined
behaviour.

closes #198
b07d6b5
@Vagabond Vagabond closed this in #201
@yfyf yfyf referenced this issue in spilgames/lager
Merged

Upstream patch #4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.