Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Add logging and pretty printing of log file to background manager. #501

Open
wants to merge 4 commits into from

4 participants

@buddhisthead

The background manager already contains functions to view the current set of locks and tokens, but lacks any historical view of how subsystems have interacted. It would be highly useful to examine past get_lock/get_token patterns in the event of performance problems around the participating subsystems that use the background manager, especially on nodes with extreme overload or crashed nodes.

This PR adds logging to disk, which will persist after a crashed node. All writes are done using Erlang's disk_log module and are asynchronous so that normal operations are not blocked by slow disk writes. Losing a little data from the disk_log cash is preferable to blocking a caller of the background manager.

Three logging levels are provided, with on being the default of the config variable background_manager_loglevel:

  • off - no logging is done
  • on - logs only the get_lock, get_token, enable/disable, and bypass calls as events.
  • debug - logs all operations

Logfile entries are in disk_log's "internal" format under riak_core_bg_manager/events.xxx in the data directory of each node. Writing is fast and the log files are repairable when read. The logs are wrap logs and thus will occupy a fixed maximum space on disk (and memory for the disk_log write buffer). The total size of the logs are controlled by background_manager_log_size application configuration variable, and default to 5 files of 1MB each, so the maximum disk space used is 5 MB, which should give days of logged events for postmortem analysis.

Testing was done using both the bg_manager_eqc.erl EQC test as a driver (find log files in /tmp/riak_core_bg_manager/events.xxx) , and with riak_test -t rtee replication2 (find log files in devN/data/riak_core_bg_manager/events.xxx). After running the replication2 test, we see entries like the following:

6/Jan/2014:18:29:0 get_lock {vnode_lock, 685078892498860742907977265335757665463718379520} [{task, handoff}] ok
6/Jan/2014:18:29:0 get_lock {vnode_lock, 685078892498860742907977265335757665463718379520} [{task,  repl_fullsync}] max_concurrency
6/Jan/2014:18:29:0 get_lock {vnode_lock, 68507889249886074290797726533575766546371837952} [{task,  repl_fullsync}] ok
buddhisthead added some commits
@buddhisthead buddhisthead Add logging with rotation and pretty printer.
* Fix race condition in EQC test that arose from queries happening outside of gen_server.Adjust
* Implement disk_log logging of events based on log level
* log level = on | off | debug
* normal (on) logs get_token and get_lock events
* status is included in log message so we can see when processes get denied resources
* included a term to ascii pretty printer that can be executed from the shell
* it's possible (and probable!) to generate messages sequence charts from this data
* Use extended date log file format in pretty printing
d6a682e
@buddhisthead buddhisthead Change default logging level to on, not debug 2ea062d
src/riak_core_bg_manager.erl
((50 lines not shown))
+maybe_log(get_resource=Op, Args, Status, State=#state{loglevel=on}) ->
+ log(Op, Args, Status, State);
+maybe_log(bypass=Op, Args, Status, State=#state{loglevel=on}) ->
+ log(Op, Args, Status, State);
+maybe_log(enable=Op, Args, Status, State=#state{loglevel=on}) ->
+ log(Op, Args, Status, State);
+maybe_log(_Op, _Args, _Status, _State) ->
+ ok.
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%% Called from outside of riak
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
+%% @private
+%% @doc Format the timestamp into human readable date string.
+format_utc_timestamp(TS) ->
@Vagabond
Vagabond added a note

You may want to consider using lager_util:format_time and friends so that you don't have a completely different timestamp format to the rest of riak (and the Erlang ecosystem in general).

Addressed by aefe367. Good call. Besides the benefit of using consistent formats, it also adds support for both local and UTC time zone pretty printing based on the app env sasl/utc_log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@Vagabond Vagabond commented on the diff
src/riak_core_bg_manager.erl
((72 lines not shown))
+
+%% @private
+%% @doc
+%% Print a single term from the background manager's log file.
+print_entry({TS, get_resource, {Resource, Type, Meta}, Status}) ->
+ io:format("~s get_~p ~p ~p ~p~n", [format_utc_timestamp(TS), Type, Resource, Meta, Status]);
+print_entry({TS, Op, {Resource, Value}, Status}) ->
+ io:format("~s ~p ~p ~p ~p~n", [format_utc_timestamp(TS), Op, Resource, Value, Status]);
+print_entry({TS, Op, Args, Status}) ->
+ io:format("~s ~p ~p ~p~n", [format_utc_timestamp(TS), Op, Args, Status]);
+print_entry(Term) ->
+ io:format("Unexpected: ~p~n", [Term]).
+
+%% @doc
+%% Pretty print background manager's logfile.
+print_log() ->
@Vagabond
Vagabond added a note

What is the intended use of this, and will it io:format all 5mb worth of logs, or just the latest one?

As of now, it's the only way to dump the logs in a human readable way. I'd like to add more in the future, but I don't think we can jam it into 2.0 at this point. Yes, it will dump all of the logs. I could easily add an entry point that will only print N entries, but I'm trying to keep this dirt simple. Since the iterator walks over chunks, it won't io:format anything bigger than a single entry from the list at a time.

I guess it's not so easy to add the print(N) thing. It would be easy to print the first N, but not the last N without doing a massive list creation and reversal in memory. Boo. While it's possible to index to the end of the log minus some amount, that seems complicated for a PR that's last minute. We can always print_log() | tail -N. It's only 5 MB.

@Vagabond
Vagabond added a note

I'm still unsure about the wisdom of pushing 5mb through erlang's IO subsystem.

Fair. I'm running a test now to generate the full 5 MB of logs and then I'll pretty print it. Since it's an iterator, it should do fine, but we'll see.

Well, it was hard generating that much data, so I turned down the file size to 100K and it prints the log with no effort at all. That gives 500K worth of log data.

So, hopefully 1a759c6 addresses your concerns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@Vagabond

I guess I'm sort of conceptually stuggling with this feature. If we want Riak to provide historical information, shouldn't it be more structured than just a sequential event log? For example, I would think it would be more interesting to inspect via vnode ID, as well as show the current status of all of the vnodes.

If each vnode had a bounded buffer for its own events, rather then all of them be in a global one, it would also help debugging stuck ones. If, for example, a vnode gets stuck and that is not noticed for a day or two and the cluster is under a workload that uses the bgmgr, the chances are high that the logs for the stuck vnode will have been rotated out by all the successful messages. One way to do this would be one disk_log per vnode, or even just an in-memory dict with a sized list of events (eg. 20 events per vnode).

This logging approach also doesn't seem to help much when looking for an current status of all the locks, without a bit of mental bookkeeping.

I think if we want to punt on something more complicated we should at least remove the print() call (or not add it to any console commands). The more I think about that, the worse it seems it could be on a busy cluster.

@buddhisthead

A separate log per vnode is a good idea and I should have done that straight away. But I don't like the in-memory approach because it makes the logging more complicated, doing calculations to trim the size. I had originally stored history in a queue which is really easy to write to, but again wanted to avoid in-memory solutions that vanish on node crash. I think that we could reduce the file size way down from 1 MB or even reduce the number of rotation files. That would also reduce the impact of the print call a lot. It's probably ok to just leave this accessible from the shell console and not make a riak-admin command for it, maybe not until after we have more experience with it.

The current status of all locks is available via existing api. See riak_core_bg_manager:all_locks() and friends.

@buddhisthead

Well, writing a separate log for each vnode is a huge pain and I don't think it's worth the code complexity. For one thing, we just happen to care about vnodes right now, but there is nothing vnode-centric about the background manager. If a particular vnode is stuck, we'll just have to grep the logs for that partition number. The rate of lock and token acquisition is not high for the vnode lock and if there is a super busy lock, then at least it will survive in the log the longest. I think some operational experience during the testing phase will help determine the best log file size needed. I am still due to write a riak_test for the background-manager, but we have the RC candidate looming as well.

@jonmeredith
Owner

@Vagabond does anything need to happen with this?

@jrwest

moved to milestone 2.0.1. This would have been a nice addition (in some form) but its not going to make it I don't think

@jrwest jrwest modified the milestone: 2.0.1, 2.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Jan 4, 2014
  1. @buddhisthead

    Add logging with rotation and pretty printer.

    buddhisthead authored
    * Fix race condition in EQC test that arose from queries happening outside of gen_server.Adjust
    * Implement disk_log logging of events based on log level
    * log level = on | off | debug
    * normal (on) logs get_token and get_lock events
    * status is included in log message so we can see when processes get denied resources
    * included a term to ascii pretty printer that can be executed from the shell
    * it's possible (and probable!) to generate messages sequence charts from this data
    * Use extended date log file format in pretty printing
Commits on Jan 6, 2014
  1. @buddhisthead
Commits on Jan 7, 2014
  1. @buddhisthead
Commits on Jan 8, 2014
  1. @buddhisthead
This page is out of date. Refresh to see the latest.
Showing with 150 additions and 5 deletions.
  1. +144 −5 src/riak_core_bg_manager.erl
  2. +6 −0 test/bg_manager_eqc.erl
View
149 src/riak_core_bg_manager.erl
@@ -115,7 +115,9 @@
all_locks/0,
all_locks/1,
all_tokens/0,
- all_tokens/1
+ all_tokens/1,
+ print_log/0,
+ iter_log/1
]).
%% Convenience
@@ -131,6 +133,9 @@
-define(NOT_TRANSFERED(S), S#state.info_table == undefined orelse S#state.entry_table == undefined).
+-define(MAX_LOG_BYTES, 100 * 1024). %% maximum size of a single logfile
+-define(MAX_LOG_FILES, 5). %% maximum number of logfiles in the wrap rotation
+
%%%===================================================================
%%% API
%%%===================================================================
@@ -404,7 +409,9 @@ query_resource(Resource, Types) ->
{info_table:: ets:tid(), %% TableID of ?BG_INFO_ETS_TABLE
entry_table:: ets:tid(), %% TableID of ?BG_ENTRY_ETS_TABLE
enabled :: boolean(), %% Global enable/disable switch, true at startup
- bypassed:: boolean() %% Global kill switch. false at startup
+ bypassed:: boolean(), %% Global kill switch. false at startup
+ loglevel::atom(), %% Logging level = on | off | debug
+ log::disk_log:log()
}).
%%%===================================================================
@@ -423,10 +430,14 @@ init([]) ->
%% We have two to claim...
ok = riak_core_table_manager:claim_table(?BG_INFO_ETS_TABLE),
ok = riak_core_table_manager:claim_table(?BG_ENTRY_ETS_TABLE),
+ {ok, Log} = open_disk_log(read_write),
State = #state{info_table=undefined, %% resolved in the ETS-TRANSFER handler
entry_table=undefined, %% resolved in the ETS-TRANSFER handler
enabled=true,
- bypassed=false},
+ bypassed=false,
+ loglevel=app_helper:get_env(riak_core, background_manager_loglevel, on),
+ log=Log
+ },
{ok, State}.
%% @private
@@ -458,7 +469,7 @@ handle_call(disable, _From, State) ->
State2 = update_enabled(false, State),
{reply, status_of(true, State2), State2};
handle_call({get_lock, Lock, Pid, Meta}, _From, State) ->
- do_handle_call_exception(fun do_get_resource/5, [Lock, lock, Pid, Meta, State], State);
+ do_handle_call_exception(fun logged_do_get_resource/5, [Lock, lock, Pid, Meta, State], State);
handle_call({lock_count, Lock}, _From, State) ->
{reply, held_count(Lock, State), State};
handle_call({lock_limit_reached, Lock}, _From, State) ->
@@ -480,7 +491,7 @@ handle_call({token_info, Token}, _From, State) ->
handle_call({set_token_rate, Token, Rate}, _From, State) ->
do_handle_call_exception(fun do_set_token_rate/3, [Token, Rate, State], State);
handle_call({get_token, Token, Pid, Meta}, _From, State) ->
- do_handle_call_exception(fun do_get_resource/5, [Token, token, Pid, Meta, State], State).
+ do_handle_call_exception(fun logged_do_get_resource/5, [Token, token, Pid, Meta, State], State).
%% @private
%% @doc Handling cast messages
@@ -583,6 +594,7 @@ validate_hold(_Obj, _TableId) -> %% tokens don't monitor processes
update_bypassed(_Bypassed, State) when ?NOT_TRANSFERED(State) ->
State;
update_bypassed(Bypassed, State=#state{info_table=TableId}) ->
+ maybe_log(bypass, Bypassed, ok, State),
ets:insert(TableId, {bypassed, Bypassed}),
State#state{bypassed=Bypassed}.
@@ -590,6 +602,7 @@ update_bypassed(Bypassed, State=#state{info_table=TableId}) ->
update_enabled(_Enabled, State) when ?NOT_TRANSFERED(State) ->
State;
update_enabled(Enabled, State=#state{info_table=TableId}) ->
+ maybe_log(enable, Enabled, ok, State),
ets:insert(TableId, {enabled, Enabled}),
State#state{enabled=Enabled}.
@@ -749,6 +762,7 @@ do_enabled(Resource, State) ->
do_enable_resource(Resource, Enabled, State) ->
Info = resource_info(Resource, State),
+ maybe_log(enable, {Resource, Enabled}, ok, State),
State2 = update_resource_enabled(Resource, Enabled, Info, State),
{reply, status_of(Enabled, State2), State2}.
@@ -759,6 +773,7 @@ update_resource_enabled(Resource, Value, Default, State) ->
State).
update_limit(Resource, Limit, Default, State) ->
+ maybe_log(set_limit, {Resource, Limit}, ok, State),
update_resource_info(Resource,
fun(Info) -> Info#resource_info{limit=Limit} end,
Default#resource_info{limit=Limit},
@@ -852,6 +867,20 @@ try_get_resource(true, Resource, Type, Pid, Meta, State) ->
end.
%% @private
+%% @doc
+%% Wrap the call to do_get_resource/5 so we can log it's operation
+%% and status, depending on the logging level in effect.
+logged_do_get_resource(Resource, Type, Pid, Meta, State) ->
+ {_R, Result, _S} = Reply = do_get_resource(Resource, Type, Pid, Meta, State),
+ Status = case Result of
+ ok -> ok;
+ {ok, _Ref} -> ok;
+ Other -> Other
+ end,
+ maybe_log(get_resource, {Resource, Type, Meta}, Status, State),
+ Reply.
+
+%% @private
%% @doc reply now if resource is available. Returns max_concurrency
%% if resource not available or globally or specifically disabled.
-spec do_get_resource(bg_resource(), bg_resource_type(), pid(), [{atom(), any()}], #state{}) ->
@@ -943,3 +972,113 @@ use_bg_mgr() ->
-spec use_bg_mgr(atom(), atom()) -> boolean().
use_bg_mgr(Dependency, Key) ->
use_bg_mgr() andalso app_helper:get_env(Dependency, Key, true).
+
+%%%%%%%%%%
+%% Logging
+%%%%%%%%%%
+
+%% %% @private
+%% %% @doc Create a logfile in the platform-dir/log/
+open_disk_log(RWorRO) ->
+ DataDir = app_helper:get_env(riak_core, platform_data_dir, "/tmp"),
+ Log = now(),
+ LogFile = lists:flatten(io_lib:format("~s/~s/events.log",
+ [DataDir, ?MODULE])),
+ ok = filelib:ensure_dir(LogFile),
+ case open_disk_log(Log, LogFile, RWorRO) of
+ {ok, Log} ->
+ {ok, Log};
+ {repaired, _L, Recovered, Bad} ->
+ lager:info("Repaired damaged logfile, ~p, ~p", [Recovered, Bad]),
+ {ok, Log};
+ Error ->
+ lager:error("Failed to open logfile: '~p' error: ~p", [LogFile, Error]),
+ {ok, undefined}
+ end.
+
+open_disk_log(Name, Path, RWorRO) ->
+ Size = app_helper:get_env(riak_core, background_manager_log_size, {?MAX_LOG_BYTES, ?MAX_LOG_FILES}),
+ open_disk_log(Name, Path, RWorRO, [{type, wrap}, {format, internal}, {size, Size}]).
+
+open_disk_log(Name, Path, RWorRO, OtherOpts) ->
+ disk_log:open([{name, Name}, {file, Path}, {mode, RWorRO}|OtherOpts]).
+
+%% @private
+%% @doc Write binary terms to disk_log file.
+%% Terms are always written asychrnonously. If some get lost due to crash,
+%% that's not bad, but we never want to block the operational path.
+log(_Op, _Args, _Status, #state{log=undefined}) ->
+ ok;
+log(Op, Args, Status, #state{log=WriteLog}) ->
+ Timestamp = erlang:now(),
+ disk_log:alog_terms(WriteLog, [{Timestamp, Op, Args, Status}]).
+
+%% @private
+%% @doc Log an operation if log level is "debug", none if we're "off",
+%% and only specific operations if we're "on".
+maybe_log(Op, Args, Status, State=#state{loglevel=debug}) ->
+ log(Op, Args, Status, State);
+maybe_log(get_resource=Op, Args, Status, State=#state{loglevel=on}) ->
+ log(Op, Args, Status, State);
+maybe_log(bypass=Op, Args, Status, State=#state{loglevel=on}) ->
+ log(Op, Args, Status, State);
+maybe_log(enable=Op, Args, Status, State=#state{loglevel=on}) ->
+ log(Op, Args, Status, State);
+maybe_log(_Op, _Args, _Status, _State) ->
+ ok.
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%% Called from outside of riak
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
+%% @private
+%% @doc Format the timestamp into a human readable date string. Timezone depends
+%% on the value of the app env var 'sasl/utc_log' which is true or false. If true,
+%% then universal time is returned, otherwise local time prevails.
+%% @link https://github.com/basho/lager/blob/master/src/lager_stdlib.erl#L80
+format_utc_timestamp(TS) ->
+ DateTimeMsecs = lager_util:localtime_ms(TS),
+ {D,T} = lager_util:format_time(lager_util:maybe_utc(DateTimeMsecs)),
+ lists:flatten([D,$ ,T]).
+
+%% @private
+%% @doc
+%% Print a single term from the background manager's log file.
+print_entry({TS, get_resource, {Resource, Type, Meta}, Status}) ->
+ io:format("~s get_~p ~p ~p ~p~n", [format_utc_timestamp(TS), Type, Resource, Meta, Status]);
+print_entry({TS, Op, {Resource, Value}, Status}) ->
+ io:format("~s ~p ~p ~p ~p~n", [format_utc_timestamp(TS), Op, Resource, Value, Status]);
+print_entry({TS, Op, Args, Status}) ->
+ io:format("~s ~p ~p ~p~n", [format_utc_timestamp(TS), Op, Args, Status]);
+print_entry(Term) ->
+ io:format("Unexpected: ~p~n", [Term]).
+
+%% @doc
+%% Pretty print background manager's logfile.
+print_log() ->
@Vagabond
Vagabond added a note

What is the intended use of this, and will it io:format all 5mb worth of logs, or just the latest one?

As of now, it's the only way to dump the logs in a human readable way. I'd like to add more in the future, but I don't think we can jam it into 2.0 at this point. Yes, it will dump all of the logs. I could easily add an entry point that will only print N entries, but I'm trying to keep this dirt simple. Since the iterator walks over chunks, it won't io:format anything bigger than a single entry from the list at a time.

I guess it's not so easy to add the print(N) thing. It would be easy to print the first N, but not the last N without doing a massive list creation and reversal in memory. Boo. While it's possible to index to the end of the log minus some amount, that seems complicated for a PR that's last minute. We can always print_log() | tail -N. It's only 5 MB.

@Vagabond
Vagabond added a note

I'm still unsure about the wisdom of pushing 5mb through erlang's IO subsystem.

Fair. I'm running a test now to generate the full 5 MB of logs and then I'll pretty print it. Since it's an iterator, it should do fine, but we'll see.

Well, it was hard generating that much data, so I turned down the file size to 100K and it prints the log with no effort at all. That gives 500K worth of log data.

So, hopefully 1a759c6 addresses your concerns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
+ iter_log(fun print_entry/1).
+
+%% @doc
+%% Iterate function Fun over the terms in the background manager's logfile.
+iter_log(Fun) ->
+ {ok, ReadLog} = open_disk_log(read_only),
+ iter_disk_log(Fun, ReadLog).
+
+%% @private
+%% @doc
+%% Iterate a function over the background manger's logfile terms.
+iter_disk_log(Fun, DiskLog) ->
+ iter_disk_log(disk_log:chunk(DiskLog, start), Fun, DiskLog).
+
+iter_disk_log(eof, _Fun, _DiskLog) ->
+ ok;
+iter_disk_log({Cont, Terms}, Fun, DiskLog) ->
+ try
+ [Fun(Term) || Term <- Terms],
+ ok
+ catch X:Y ->
+ io:format("~s:fold_disk_log: caught ~p ~p @ ~p\n",
+ [?MODULE, X, Y, erlang:get_stacktrace()]),
+ ok
+ end,
+ iter_disk_log(disk_log:chunk(DiskLog, Cont), Fun, DiskLog).
View
6 test/bg_manager_eqc.erl
@@ -783,6 +783,12 @@ wait_for_pid(Pid) ->
Mref = erlang:monitor(process, Pid),
receive
{'DOWN', Mref, process, _, _} ->
+ %% listing all resources is no longer in the gen_server, so
+ %% we need to let the DOWN handler have a chance to complete
+ %% and delete the entry from the table. It's now a race to
+ %% see who notices the dead pid. Maybe bg-mgr should verify
+ %% that a process is still alive when doing the query.
+ timer:sleep(10),
ok
after
5000 ->
Something went wrong with that request. Please try again.