Race condition in some metrics (counter and gauge are fine) #20

etrepum · 2012-05-02T01:01:29Z

These metrics are ok:

folsom_metrics_counter (update_counter is atomic)
folsom_metrics_gauge (insert is atomic)

These metrics can drop data due to race conditions (interleaved get_value / insert):

folsom_metrics_histogram
folsom_metrics_meter
folsom_metrics_meter_reader

And this one can grow and shrink incorrectly (interleaved ets:info(…, size) / delete / insert):

folsom_metrics_history

There are some similar problems with folsom_sample_exdec, folsom_sample_none, folsom_sample_uniform but those problems would go away if folsom_metrics_histogram usage was serialized by key.

The most straightforward way out of this predicament is to set up a process per metric (of these types) for update serialization. The metrics are basically determined at compile time and their count should be smallish (hundreds maybe), so a fixed size pool isn't necessary. Most reads can still go directly to the table since it's updated atomically, but some of the histogram implementations may need some kind of get_values serialization.

folsom_sample_exdec: can delete a table before you get its contents, maybe ets:delete_all_objects/1 would be preferable to cycling the table. This would change worst case to returning somewhere between 0 and Size samples instead of crashing. Inserting a list instead of calling insert in a loop would change it such that 0 or Size samples could be returned, since inserting a list is atomic.
folsom_sample_none: could be in a state where there are Size - 1 items in the pool (this is likely acceptable)
folsom_sample_uniform: this looks like it would always be in a consistent state, so definitely acceptable.

joewilliams · 2012-05-02T01:06:38Z

Yup, I am aware of these issues and have started work to resolve them. Here is one option for exdec samples:

https://github.com/boundary/folsom/tree/exdec_gen_server

I haven't merged this yet as I am not convinced this is the best way to go yet.

etrepum · 2012-05-02T01:34:53Z

If you're open to dependencies, a straightforward solution would be to use gproc. This would make it super easy to register worker processes with the name of the metric that they own and do the pub/sub.

Not sure that making exdec a process is the best solution, although it looks like it might solve the problem in that particular place.

etrepum · 2012-05-02T02:07:18Z

Also, which of the notify interfaces in folsom_metrics / folsom_ets are actually used? folsom_metrics:notify/1 is the only one that's documented.

joewilliams · 2012-05-02T02:07:22Z

Right, I was looking at that last week when I was writing the exdec_gen_server branch. Seemed like a decent way to go.

joewilliams · 2012-05-02T02:11:59Z

folsom_metrics is the API module I prefer folks use.

etrepum · 2012-05-02T02:35:01Z

What I mean is that there are notify/1, notify/2, notify/3, and notify_existing_metric/3.

notify/1 and notify/2 are essentially the same thing, and notify_existing_metric/3 has similar semantics. The comments lead me to believe that notify/2 is actually preferred, but the documentation only shows notify/1. notify_existing_metric/3 seems reasonable to have, but it's not clear if it's actually worth the trouble. notify/2 could surely be made faster by essentially passing down the result of get_info(Name) to notify/4 and not calling handler_exists at all.

notify/3 would be a good thing to get rid of if you could, or at least clean up all the implementation bloat it causes. I could see more use cases for it if metrics were named by terms and not atoms.

etrepum · 2012-05-02T17:37:58Z

So here's what I was thinking last night, I started sketching it out last night but didn't finish it.

Have a function that maps types to a way to create metrics from them. Sometimes the state will be meaningless, sometimes it will be a pid. If it was a behaviour I'd have it look like this:

-module(folsom_metric_constructor).
%% The returned module here is a folsom_metric_handler
-callback new(Name :: atom(), Opts :: [{atom(), term()}]) -> {Handler :: module(), State :: term()}.

Define a behaviour to encapsulate what a metric must do:

-module(folsom_metric_handler).

-callback delete(Name :: atom(), State :: term()) -> ok.
-callback get_value(Name :: atom(), State :: term()) -> term().
-callback get_value(Name :: atom(), Count :: integer(), State :: term()) -> term().
-callback notify(Name :: atom(), Value :: term(), State :: term()) -> term().

folsom_ets should change its #metric{} to include the state and the handler module returned by the constructor. An example module might look like this:

-module(folsom_metrics_counter).
-behaviour(folsom_metric_handler).
-export([new/2,
         notify/3,
         delete/2,
         get_value/2]).

-include("folsom.hrl").

new(Name, []) ->
    {?MODULE, ets:insert(?COUNTER_TABLE, {Name, 0})}.

notify(Name, Value, _State) when is_integer(Value) ->
    inc(Name, Value);
notify(Name, Value, _State) ->
    case Value of
        inc ->
            inc(Name, 1);
        {inc, Value} ->
            inc(Name, Value);
        dec ->
            inc(Name, -1);
        {dec, Value} ->
            inc(Name, -Value)
    end.

delete(Name, _State) ->
    ets:delete(?COUNTER_TABLE, Name).

get_value(Name, _State) ->
    [{_, Value}] = ets:lookup(?COUNTER_TABLE, Name),
    Value.

get_value(Name, _Count, _State) ->
    get_value(Name, _State).

inc(Name, Value) ->
    ets:update_counter(?COUNTER_TABLE, Name, Value).

There would be a wrapper module that constructs a metric in its own gen_server process, this would itself be a metric that has State as its pid. Metrics that it wraps would have an init/1 to return an initial {ok, State}, and notify casts would return the new state for the gen_server (will often be faster than using ets since we are already serialized).

-module(folsom_metric_proc).
-behaviour(gen_server).

%% API
-export([start_link/0, new/2]).

-export([new/2,
         notify/3,
         delete/2,
         get_value/2,
         get_value/3]).

%% gen_server callbacks
-export([init/1, handle_call/3, handle_cast/2, handle_info/2,
         terminate/2, code_change/3]).

-record(state, {name :: atom(),
                handler :: module(),
                state :: term()}).

new(Name, Opts) ->
    {ok, Pid} = gen_server:start(?MODULE, [{name, Name} | Opts]),
    {?MODULE, Pid}.

notify(Name, Value, Pid) ->
    gen_server:cast(Pid, {notify, Name, Value}).

delete(Name, Pid) ->
    gen_server:call(Pid, {delete, Name}).

get_value(Name, Pid) ->
    gen_server:call(Pid, {get_value, Name}).

get_value(Name, Count, Pid) ->
    gen_server:call(Pid, {get_value, Name, Count}).

init(Args) ->
    {_, Name} = lists:keyfind(name, 1, Args),
    {_, Handler} = lists:keyfind(handler, 1, Args),
    {_, Opts} = lists:keyfind(opts, 1, Args),
    {ok, S} = Handler:init(Opts),
    {ok, #state{name=Name, handler=Handler, state=S}}.

handle_call({delete, Name}, _From
            State=#state{handler=Handler, state=S}) ->
    Reply = Handler:delete(Name, S),
    {stop, normal, Reply, S#state{state=undefined}};
handle_call({get_history_values, Name, Count}, _From,
            State=#state{handler=Handler, state=S}) ->
    Reply = Handler:get_history_values(Name, Count, S),
    {reply, Reply, State};
handle_call({get_value, Name}, _From,
            State=#state{handler=Handler, state=S}) ->
    Reply = Handler:get_value(Name, S),
    {reply, Reply, State}.

handle_cast({notify, Name, Event},
            State=#state{handler=Handler, state=S}) ->
    S1 = Handler:notify(Name, Event, S),
    {noreply, State#state{state=S1}}.

handle_info(_Info, State) ->
    {noreply, State}.

terminate(_Reason, _State) ->
    ok.

code_change(_OldVsn, State, _Extra) ->
    {ok, State}.

An example that uses this might look something like this:

-module(folsom_metrics_histogram).
-behaviour(folsom_metric_handler).
-export([init/2,
         new/2,
         notify/3,
         delete/2,
         get_value/2,
         get_value/3]).

-include("folsom.hrl").

new(Name, Opts) ->
    folsom_metric_proc:new(Name, [{handler, ?MODULE}, {opts, Opts}]).

init(Opts) ->
    {Type, Size, Alpha} = lists:foldl(
        fun opt_fold/2,
        {?DEFAULT_SAMPLE_TYPE, ?DEFAULT_SIZE, ?DEFAULT_ALPHA},
        Opts),
    Sample = folsom_sample:new(Type, Size, Alpha),
    {ok, #histogram{type=Type, sample=Sample}}.

notify(_Name, Value, Hist=#histogram{type=Type, sample=Sample}) ->
    Hist#histogram{sample=folsom_sample:update(Type, Sample, Value)}.

delete(_Name, _State) ->
    ok.

% pulls the sample out of the record gotten from ets
get_value(_Name, #histogram{type=Type, sample=Sample}) ->
    folsom_sample:get_values(Type, Sample).

get_value(Name, _Count, Hist) ->
    get_value(Name, Hist).

%% Internal API

opt_fold({type, Type}, {_Type, Size, Alpha}) ->
    {Type, Size, Alpha};
opt_fold({size, Size}, {Type, _Size, Alpha}) ->
    {Type, Size, Alpha};
opt_fold({alpha, Alpha}, {Type, Size, _Alpha}) ->
    {Type, Size, Alpha}.

joewilliams · 2012-05-04T23:39:49Z

This seems like a pretty reasonable solution. I assume we also want a metric supervisor to keep an eye on each new metric gen_server that we start up (supervisor:start_child/2).

etrepum · 2012-05-05T00:16:41Z

Yeah, a simple_one_for_one or something like that

joewilliams · 2015-10-21T16:36:55Z

Test message, ignore me.

joewilliams · 2015-10-28T14:05:46Z

Folsom has moved, please resubmit your issue at https://github.com/folsom-project Thanks!

ghost assigned joewilliams Aug 10, 2012

joewilliams removed their assignment Oct 21, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Race condition in some metrics (counter and gauge are fine) #20

Race condition in some metrics (counter and gauge are fine) #20

etrepum commented May 2, 2012

joewilliams commented May 2, 2012

etrepum commented May 2, 2012

etrepum commented May 2, 2012

joewilliams commented May 2, 2012

joewilliams commented May 2, 2012

etrepum commented May 2, 2012

etrepum commented May 2, 2012

joewilliams commented May 4, 2012

etrepum commented May 5, 2012

joewilliams commented Oct 21, 2015

joewilliams commented Oct 28, 2015

Race condition in some metrics (counter and gauge are fine) #20

Race condition in some metrics (counter and gauge are fine) #20

Comments

etrepum commented May 2, 2012

joewilliams commented May 2, 2012

etrepum commented May 2, 2012

etrepum commented May 2, 2012

joewilliams commented May 2, 2012

joewilliams commented May 2, 2012

etrepum commented May 2, 2012

etrepum commented May 2, 2012

joewilliams commented May 4, 2012

etrepum commented May 5, 2012

joewilliams commented Oct 21, 2015

joewilliams commented Oct 28, 2015