Skip to content

Race condition in some metrics (counter and gauge are fine) #20

Open
etrepum opened this Issue May 2, 2012 · 11 comments

2 participants

@etrepum
etrepum commented May 2, 2012

These metrics are ok:

  • folsom_metrics_counter (update_counter is atomic)
  • folsom_metrics_gauge (insert is atomic)

These metrics can drop data due to race conditions (interleaved get_value / insert):

  • folsom_metrics_histogram
  • folsom_metrics_meter
  • folsom_metrics_meter_reader

And this one can grow and shrink incorrectly (interleaved ets:info(…, size) / delete / insert):

  • folsom_metrics_history

There are some similar problems with folsom_sample_exdec, folsom_sample_none, folsom_sample_uniform but those problems would go away if folsom_metrics_histogram usage was serialized by key.

The most straightforward way out of this predicament is to set up a process per metric (of these types) for update serialization. The metrics are basically determined at compile time and their count should be smallish (hundreds maybe), so a fixed size pool isn't necessary. Most reads can still go directly to the table since it's updated atomically, but some of the histogram implementations may need some kind of get_values serialization.

  • folsom_sample_exdec: can delete a table before you get its contents, maybe ets:delete_all_objects/1 would be preferable to cycling the table. This would change worst case to returning somewhere between 0 and Size samples instead of crashing. Inserting a list instead of calling insert in a loop would change it such that 0 or Size samples could be returned, since inserting a list is atomic.
  • folsom_sample_none: could be in a state where there are Size - 1 items in the pool (this is likely acceptable)
  • folsom_sample_uniform: this looks like it would always be in a consistent state, so definitely acceptable.
@joewilliams

Yup, I am aware of these issues and have started work to resolve them. Here is one option for exdec samples:

https://github.com/boundary/folsom/tree/exdec_gen_server

I haven't merged this yet as I am not convinced this is the best way to go yet.

@etrepum
etrepum commented May 2, 2012

If you're open to dependencies, a straightforward solution would be to use gproc. This would make it super easy to register worker processes with the name of the metric that they own and do the pub/sub.

Not sure that making exdec a process is the best solution, although it looks like it might solve the problem in that particular place.

@etrepum
etrepum commented May 2, 2012

Also, which of the notify interfaces in folsom_metrics / folsom_ets are actually used? folsom_metrics:notify/1 is the only one that's documented.

@joewilliams

Right, I was looking at that last week when I was writing the exdec_gen_server branch. Seemed like a decent way to go.

@joewilliams

folsom_metrics is the API module I prefer folks use.

@etrepum
etrepum commented May 2, 2012

What I mean is that there are notify/1, notify/2, notify/3, and notify_existing_metric/3.

notify/1 and notify/2 are essentially the same thing, and notify_existing_metric/3 has similar semantics. The comments lead me to believe that notify/2 is actually preferred, but the documentation only shows notify/1. notify_existing_metric/3 seems reasonable to have, but it's not clear if it's actually worth the trouble. notify/2 could surely be made faster by essentially passing down the result of get_info(Name) to notify/4 and not calling handler_exists at all.

notify/3 would be a good thing to get rid of if you could, or at least clean up all the implementation bloat it causes. I could see more use cases for it if metrics were named by terms and not atoms.

@etrepum
etrepum commented May 2, 2012

So here's what I was thinking last night, I started sketching it out last night but didn't finish it.

Have a function that maps types to a way to create metrics from them. Sometimes the state will be meaningless, sometimes it will be a pid. If it was a behaviour I'd have it look like this:

-module(folsom_metric_constructor).
%% The returned module here is a folsom_metric_handler
-callback new(Name :: atom(), Opts :: [{atom(), term()}]) -> {Handler :: module(), State :: term()}.

Define a behaviour to encapsulate what a metric must do:

-module(folsom_metric_handler).

-callback delete(Name :: atom(), State :: term()) -> ok.
-callback get_value(Name :: atom(), State :: term()) -> term().
-callback get_value(Name :: atom(), Count :: integer(), State :: term()) -> term().
-callback notify(Name :: atom(), Value :: term(), State :: term()) -> term().

folsom_ets should change its #metric{} to include the state and the handler module returned by the constructor. An example module might look like this:

-module(folsom_metrics_counter).
-behaviour(folsom_metric_handler).
-export([new/2,
         notify/3,
         delete/2,
         get_value/2]).

-include("folsom.hrl").

new(Name, []) ->
    {?MODULE, ets:insert(?COUNTER_TABLE, {Name, 0})}.

notify(Name, Value, _State) when is_integer(Value) ->
    inc(Name, Value);
notify(Name, Value, _State) ->
    case Value of
        inc ->
            inc(Name, 1);
        {inc, Value} ->
            inc(Name, Value);
        dec ->
            inc(Name, -1);
        {dec, Value} ->
            inc(Name, -Value)
    end.

delete(Name, _State) ->
    ets:delete(?COUNTER_TABLE, Name).

get_value(Name, _State) ->
    [{_, Value}] = ets:lookup(?COUNTER_TABLE, Name),
    Value.

get_value(Name, _Count, _State) ->
    get_value(Name, _State).

inc(Name, Value) ->
    ets:update_counter(?COUNTER_TABLE, Name, Value).

There would be a wrapper module that constructs a metric in its own gen_server process, this would itself be a metric that has State as its pid. Metrics that it wraps would have an init/1 to return an initial {ok, State}, and notify casts would return the new state for the gen_server (will often be faster than using ets since we are already serialized).

-module(folsom_metric_proc).
-behaviour(gen_server).

%% API
-export([start_link/0, new/2]).

-export([new/2,
         notify/3,
         delete/2,
         get_value/2,
         get_value/3]).

%% gen_server callbacks
-export([init/1, handle_call/3, handle_cast/2, handle_info/2,
         terminate/2, code_change/3]).

-record(state, {name :: atom(),
                handler :: module(),
                state :: term()}).

new(Name, Opts) ->
    {ok, Pid} = gen_server:start(?MODULE, [{name, Name} | Opts]),
    {?MODULE, Pid}.

notify(Name, Value, Pid) ->
    gen_server:cast(Pid, {notify, Name, Value}).

delete(Name, Pid) ->
    gen_server:call(Pid, {delete, Name}).

get_value(Name, Pid) ->
    gen_server:call(Pid, {get_value, Name}).

get_value(Name, Count, Pid) ->
    gen_server:call(Pid, {get_value, Name, Count}).

init(Args) ->
    {_, Name} = lists:keyfind(name, 1, Args),
    {_, Handler} = lists:keyfind(handler, 1, Args),
    {_, Opts} = lists:keyfind(opts, 1, Args),
    {ok, S} = Handler:init(Opts),
    {ok, #state{name=Name, handler=Handler, state=S}}.

handle_call({delete, Name}, _From
            State=#state{handler=Handler, state=S}) ->
    Reply = Handler:delete(Name, S),
    {stop, normal, Reply, S#state{state=undefined}};
handle_call({get_history_values, Name, Count}, _From,
            State=#state{handler=Handler, state=S}) ->
    Reply = Handler:get_history_values(Name, Count, S),
    {reply, Reply, State};
handle_call({get_value, Name}, _From,
            State=#state{handler=Handler, state=S}) ->
    Reply = Handler:get_value(Name, S),
    {reply, Reply, State}.

handle_cast({notify, Name, Event},
            State=#state{handler=Handler, state=S}) ->
    S1 = Handler:notify(Name, Event, S),
    {noreply, State#state{state=S1}}.

handle_info(_Info, State) ->
    {noreply, State}.

terminate(_Reason, _State) ->
    ok.

code_change(_OldVsn, State, _Extra) ->
    {ok, State}.

An example that uses this might look something like this:

-module(folsom_metrics_histogram).
-behaviour(folsom_metric_handler).
-export([init/2,
         new/2,
         notify/3,
         delete/2,
         get_value/2,
         get_value/3]).

-include("folsom.hrl").

new(Name, Opts) ->
    folsom_metric_proc:new(Name, [{handler, ?MODULE}, {opts, Opts}]).

init(Opts) ->
    {Type, Size, Alpha} = lists:foldl(
        fun opt_fold/2,
        {?DEFAULT_SAMPLE_TYPE, ?DEFAULT_SIZE, ?DEFAULT_ALPHA},
        Opts),
    Sample = folsom_sample:new(Type, Size, Alpha),
    {ok, #histogram{type=Type, sample=Sample}}.

notify(_Name, Value, Hist=#histogram{type=Type, sample=Sample}) ->
    Hist#histogram{sample=folsom_sample:update(Type, Sample, Value)}.

delete(_Name, _State) ->
    ok.

% pulls the sample out of the record gotten from ets
get_value(_Name, #histogram{type=Type, sample=Sample}) ->
    folsom_sample:get_values(Type, Sample).

get_value(Name, _Count, Hist) ->
    get_value(Name, Hist).

%% Internal API

opt_fold({type, Type}, {_Type, Size, Alpha}) ->
    {Type, Size, Alpha};
opt_fold({size, Size}, {Type, _Size, Alpha}) ->
    {Type, Size, Alpha};
opt_fold({alpha, Alpha}, {Type, Size, _Alpha}) ->
    {Type, Size, Alpha}.
@joewilliams

This seems like a pretty reasonable solution. I assume we also want a metric supervisor to keep an eye on each new metric gen_server that we start up (supervisor:start_child/2).

@etrepum
etrepum commented May 5, 2012

Yeah, a simple_one_for_one or something like that

@joewilliams joewilliams was assigned Aug 10, 2012
@joewilliams

Test message, ignore me.

@joewilliams joewilliams removed their assignment Oct 21, 2015
@joewilliams

Folsom has moved, please resubmit your issue at https://github.com/folsom-project Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.