Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Switch to notify, not sync_notify. #113

Closed
wants to merge 1 commit into from

2 participants

@ewanmellor

Despite the commit message on d19d927, I can't see how sync_notify
could possibly be faster (at least since OTP-8623), and I'm seeing performance issues when logging.

Now that we are logging asynchronously, we also need to add flushes in
critical places in the tests.

@ewanmellor ewanmellor Switch to notify, not sync_notify.
Despite the commit message on d19d927, I can't see how sync_notify
could possibly be faster, and I'm seeing performance issues when logging.

Now that we are logging asynchronously, we also need to add flushes in
critical places in the tests.
7cf0442
@Vagabond
Collaborator

So this is similar to #97, unfortunately the tradeoffs aren't as clear cut as they seem. If we use notify instead of sync_notify, we obviously get a performance increase, but in the case when you get a lot of messages getting sent to the gen_event, things slow to a crawl. With sync_notify, each process can put at most one message in the gen_event's mailbox, instead of as many as it wants.

I did some benchmarking work quite a while back where I just fired 10,000 log messages from a single process into various logging libraries and saw how long it took for all of them to be logged. Here's one of the results:

bench

'el' is regular error_logger, and 'sync_el' is error_logger hacked to use sync_notify rather than notify. As you can see, when the mailbox fills up, bad things start to happen.

Also, besides trying to print giant terms, another good way to crash an erlang node is to create an enormous mailbox that triggers an OOM condition. sync_notify protects against this.

I'm more willing to entertain something like #97, but with less complexity, for example being able to toggle the behaviour purely at compile time, via a parse transform option or something. I'm not willing to change the default behaviour, especially not in such an irreversible way.

That said, I'm open to other suggestions on this front. I just haven't anything that is unequivocally better than sticking with sync_notify.

@ewanmellor

How far back is "quite a while back"? I expect that the version of Erlang will make a big difference here (especially before OTP-8623 vs after).

The problem with the current behavior is it just pushes the problem onto the rest of the app. Sure, you don't have queuing in lager's mailbox, but now the rest of the app is slowed down instead. This isn't the right tradeoff for me at all. Logging should have minimal impact on the performance of the rest of the system, but instead we're effectively blocking at every log statement (in my case, blocking for syslog too!)

@Vagabond
Collaborator

OTP-8623, the change to reduce mailbox scans if the receive statement contains a newly created ref, doesn't apply here. If you actually read the gen_event source, you see that the receive statement for pulling messages out of the mailbox is selective and does not use the OTP-8623 trick to reduce the number of messages to be scanned:

https://github.com/erlang/otp/blob/maint/lib/stdlib/src/gen_event.erl#L231

This means that the entire mailbox is scanned for messages matching {system, ...} or {'EXIT', ...} before handling any regular messages. When the mailbox is large, this becomes incredibly expensive (and means the mailbox grows even faster).

So, assuming you're logging at a rate faster than you can actually write those logs to disk/syslog/whatever, your choice is either: slow down the rest of the app to compensate or let the mailbox balloon and have logging slow to a crawl anyway and possibly OOM the node.

Now, for the case where you aren't logging faster than you can handle, async messages are obviously better. The problem is, a system can go from one state to the other without warning and if you're doing async mode, then you're in trouble.

However, as I said before, I'm not against a solution that lets you change the mode somehow. One idea would be to use mailbox size as a guideline for which mode to use and stay in async mode until a threshold hits, at which point, for lack of anything better to do, we switch into making logging synchronous. A simpler solution would just be either a compile or runtime switch that made the decision.

I'm open to suggestions and pull-requests as long as they don't change the current choice for safety over performance as the default.

@Vagabond
Collaborator

And, some benchmarking results from Vagabond/logbench I just ran, with your branch vs master using the console output and a patch to logbench to wait for the gen_event's mailbox to be empty:

./bin/bench lager_console simple 10000

Master: 8696 ops/sec
Branch: 4179 ops/sec

I ran the test 3 times for each version and took the median value.

@ewanmellor
@ewanmellor

Rereading your comment above, you said "I'm more willing to entertain something like #97, but with less complexity, for example being able to toggle the behaviour purely at compile time, via a parse transform option or something." Looking at the patches on there, it looks like they used to use an application env, but the latest patch introduces a compile-time macro to either force it at build time or use the env. Is that what you were referring to?

Having both the app env and the macro would allow people to choose at runtime or at compile time which behavior they wanted. That seems like everyone would be happy then!

I like the idea of making it dynamic based on mailbox size. It would be perfectly reasonable to have that behavior by default on the async path. I don't know the cost of calling erlang:process_info/2 though, and whether we would be able to do that on every message.

@ewanmellor

Maybe what we need is a buffer process, who's only job is to take messages and queue them. That way, the mailbox at each end would be small, but you'd still get the behavior where messages are buffered in RAM instead of blocking the process. (I thought that the selective receive work was intended so that you didn't have to do this, but it's obviously not happening in this case.)

This would mean that instead of a single sync_notify call, it would be sync_notify -> add to queue, followed by pop from queue -> sync_notify. Extra straight line cost, but lower latency on the app side.

If the "pop from queue" sent a whole batch at once, it might even work out cheaper overall, depending on what the backend is doing. You could avoid a per-message disk or network flush, for example.

@Vagabond
Collaborator

How does #117 look to you, it adaptively switches beween async and sync logging, based on the gen_event mailbox size.

@Vagabond
Collaborator

Closing due to inactivity and in favor of #117, if you want fully async mesages, you can just set the threshold really, really high.

@Vagabond Vagabond closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Mar 2, 2013
  1. @ewanmellor

    Switch to notify, not sync_notify.

    ewanmellor authored
    Despite the commit message on d19d927, I can't see how sync_notify
    could possibly be faster, and I'm seeing performance issues when logging.
    
    Now that we are logging asynchronously, we also need to add flushes in
    critical places in the tests.
This page is out of date. Refresh to see the latest.
Showing with 17 additions and 2 deletions.
  1. +1 −1  src/lager.erl
  2. +16 −1 src/lager_file_backend.erl
View
2  src/lager.erl
@@ -78,7 +78,7 @@ dispatch_log(Severity, Metadata, Format, Args, Size) when is_atom(Severity)->
_ ->
Format
end,
- gen_event:sync_notify(Pid, {log, lager_msg:new(Msg, Timestamp,
+ gen_event:notify(Pid, {log, lager_msg:new(Msg, Timestamp,
Severity, Metadata, Destinations)});
false ->
ok
View
17 src/lager_file_backend.erl
@@ -279,6 +279,7 @@ filesystem_test_() ->
fun() ->
gen_event:add_handler(lager_event, lager_file_backend, {"test.log", info}),
lager:log(error, self(), "Test message"),
+ gen_event:sync_notify(lager_event, wait_for_pending),
{ok, Bin} = file:read_file("test.log"),
Pid = pid_to_list(self()),
?assertMatch([_, _, "[error]", Pid, "Test message\n"], re:split(Bin, " ", [{return, list}, {parts, 5}]))
@@ -299,12 +300,14 @@ filesystem_test_() ->
gen_event:add_handler(lager_event, lager_file_backend, {"test.log", info}),
?assertEqual(0, lager_test_backend:count()),
lager:log(error, self(), "Test message"),
+ gen_event:sync_notify(lager_event, wait_for_pending),
?assertEqual(1, lager_test_backend:count()),
file:delete("test.log"),
file:write_file("test.log", ""),
{ok, FInfo} = file:read_file_info("test.log"),
file:write_file_info("test.log", FInfo#file_info{mode = 0}),
lager:log(error, self(), "Test message"),
+ gen_event:sync_notify(lager_event, wait_for_pending),
?assertEqual(3, lager_test_backend:count()),
lager_test_backend:pop(),
lager_test_backend:pop(),
@@ -323,6 +326,7 @@ filesystem_test_() ->
?assertEqual("Failed to open log file test.log with error permission denied", lists:flatten(Message)),
file:write_file_info("test.log", FInfo#file_info{mode = OldPerms}),
lager:log(error, self(), "Test message"),
+ gen_event:sync_notify(lager_event, wait_for_pending),
{ok, Bin} = file:read_file("test.log"),
Pid = pid_to_list(self()),
?assertMatch([_, _, "[error]", Pid, "Test message\n"], re:split(Bin, " ", [{return, list}, {parts, 5}]))
@@ -333,15 +337,18 @@ filesystem_test_() ->
gen_event:add_handler(lager_event, lager_file_backend, {"test.log", info}),
?assertEqual(0, lager_test_backend:count()),
lager:log(error, self(), "Test message1"),
+ gen_event:sync_notify(lager_event, wait_for_pending),
?assertEqual(1, lager_test_backend:count()),
file:delete("test.log"),
file:write_file("test.log", ""),
lager:log(error, self(), "Test message2"),
+ gen_event:sync_notify(lager_event, wait_for_pending),
{ok, Bin} = file:read_file("test.log"),
Pid = pid_to_list(self()),
?assertMatch([_, _, "[error]", Pid, "Test message2\n"], re:split(Bin, " ", [{return, list}, {parts, 5}])),
file:rename("test.log", "test.log.0"),
lager:log(error, self(), "Test message3"),
+ gen_event:sync_notify(lager_event, wait_for_pending),
{ok, Bin2} = file:read_file("test.log"),
?assertMatch([_, _, "[error]", Pid, "Test message3\n"], re:split(Bin2, " ", [{return, list}, {parts, 5}]))
end
@@ -352,12 +359,14 @@ filesystem_test_() ->
?assertEqual(0, lager_test_backend:count()),
lager:log(info, self(), "Test message1"),
lager:log(error, self(), "Test message2"),
+ gen_event:sync_notify(lager_event, wait_for_pending),
{ok, Bin} = file:read_file("test.log"),
Lines = length(re:split(Bin, "\n", [{return, list}, trim])),
?assertEqual(Lines, 2),
?assertEqual(ok, lager:set_loglevel(lager_file_backend, "test.log", warning)),
lager:log(info, self(), "Test message3"), %% this won't get logged
lager:log(error, self(), "Test message4"),
+ gen_event:sync_notify(lager_event, wait_for_pending),
{ok, Bin2} = file:read_file("test.log"),
Lines2 = length(re:split(Bin2, "\n", [{return, list}, trim])),
?assertEqual(Lines2, 3)
@@ -375,13 +384,14 @@ filesystem_test_() ->
gen_event:add_handler(lager_event, lager_file_backend,
{"test.log", critical}),
lager:error("Test message"),
+ gen_event:sync_notify(lager_event, wait_for_pending),
?assertEqual({ok, <<>>}, file:read_file("test.log")),
{Level, _} = lager_config:get(loglevel),
lager_config:set(loglevel, {Level, [{[{module,
?MODULE}], ?DEBUG,
{lager_file_backend, "test.log"}}]}),
lager:error("Test message"),
- timer:sleep(1000),
+ gen_event:sync_notify(lager_event, wait_for_pending),
{ok, Bin} = file:read_file("test.log"),
?assertMatch([_, _, "[error]", _, "Test message\n"], re:split(Bin, " ", [{return, list}, {parts, 5}]))
end
@@ -391,6 +401,7 @@ filesystem_test_() ->
gen_event:add_handler(lager_event, lager_file_backend,
{"test.log", critical}),
lager:critical("Test message"),
+ gen_event:sync_notify(lager_event, wait_for_pending),
{ok, Bin1} = file:read_file("test.log"),
?assertMatch([_, _, "[critical]", _, "Test message\n"], re:split(Bin1, " ", [{return, list}, {parts, 5}])),
ok = file:delete("test.log"),
@@ -399,10 +410,12 @@ filesystem_test_() ->
?MODULE}], ?DEBUG,
{lager_file_backend, "test.log"}}]}),
lager:critical("Test message"),
+ gen_event:sync_notify(lager_event, wait_for_pending),
{ok, Bin2} = file:read_file("test.log"),
?assertMatch([_, _, "[critical]", _, "Test message\n"], re:split(Bin2, " ", [{return, list}, {parts, 5}])),
ok = file:delete("test.log"),
lager:error("Test message"),
+ gen_event:sync_notify(lager_event, wait_for_pending),
{ok, Bin3} = file:read_file("test.log"),
?assertMatch([_, _, "[error]", _, "Test message\n"], re:split(Bin3, " ", [{return, list}, {parts, 5}]))
end
@@ -412,6 +425,7 @@ filesystem_test_() ->
file:delete("foo.log"),
{ok, _} = lager:trace_file("foo.log", [{module, ?MODULE}]),
lager:error("Test message"),
+ gen_event:sync_notify(lager_event, wait_for_pending),
%% not elegible for trace
lager:log(error, self(), "Test message"),
{ok, Bin3} = file:read_file("foo.log"),
@@ -443,6 +457,7 @@ formatting_test_() ->
gen_event:add_handler(lager_event, lager_file_backend,[{"test.log", debug},{lager_default_formatter,["[",severity,"] ", message, "\n"]}]),
gen_event:add_handler(lager_event, lager_file_backend,[{"test2.log", debug},{lager_default_formatter,["2> [",severity,"] ", message, "\n"]}]),
lager:log(error, self(), "Test message"),
+ gen_event:sync_notify(lager_event, wait_for_pending),
?assertMatch({ok, <<"[error] Test message\n">>},file:read_file("test.log")),
?assertMatch({ok, <<"2> [error] Test message\n">>},file:read_file("test2.log"))
end
Something went wrong with that request. Please try again.