Add more dimensions to fsm stats #414

russelldb · 2012-10-25T08:42:21Z

Exposed the put_fsm per stage timings
Added per stage timings to the get_fsm

sorry about the trailing whitespace clean-ups. Maybe I should disable that in my .emacs.

russelldb · 2012-10-29T10:38:43Z

To verify this works, do some gets/puts then attach to the riak console and run something like:

 [{Stat, folsom_metrics:get_metric_value(Stat)} || Stat <- folsom_metrics:get_metrics(),
                                                       is_tuple(Stat), 
                                                       element(1, Stat) == riak_kv, 
                                                       element(2, Stat) == node_get_fsm].

Or

[{Stat, folsom_metrics:get_metric_value(Stat)} || Stat <- folsom_metrics:get_metrics(),
                                                       is_tuple(Stat),
                                                       element(1, Stat) == riak_kv,
                                                       element(2, Stat) == node_put_fsm].

Until I have the query / read stuff written.

Also, if you want to see per-bucket stats, you need to set the stat_tracked bucket property to true for the bucket(s) you're interested in.

Attach to a node and

 riak_core_bucket:set_bucket(<<"a">>, [{stat_tracked, true}]).

Then after putting/getting that bucket you should see stats like:

 {riak_kv,node_put_fsm,time,a,postcommit}

Where one element in the tuple is the bucket in question.

Again, attached to the node:

folsom_metrics:get_histogram_statistics({riak_kv,node_put_fsm,time,a}).

beerriot · 2012-11-12T20:15:18Z

src/riak_kv_get_fsm.erl

@@ -66,7 +66,11 @@
                bucket_props,
                startnow :: {non_neg_integer(), non_neg_integer(), non_neg_integer()},
                get_usecs :: non_neg_integer(),
-                tracked_bucket=false :: boolean() %% is per bucket stats enabled for this bucket
+                tracked_bucket=false :: boolean(), %% is per bucket stats enabled for this bucket
+                timing = [] :: [{atom(), {non_neg_integer(), non_neg_integer(),


Is this really [{atom(), timestamp()}] (erlang:timestamp/0)?

Yes, yes it is , for shame.

coderoshi · 2012-11-12T21:05:44Z

Works as advertised +1

beerriot · 2012-11-12T21:10:13Z

src/riak_kv_fsm_util.erl

+    calc_timing(Timing, Now, ReplyNow, []).
+
+%% Each timing stage has a start time.
+calc_timing([], StageEnd, ReplyNow, Stages) ->


I see that this has been cut-and-pasted from another module, but it seems like it's possible that this might some day be called with ReplyNow as undefined, which will blow up in the call to timer:now_diff/2. Is there a reason that will never happen? Should it be documented in case other things change?

The case seems to be (and you're right, I didn't fully understand the function when I hoisted it for re-use) that {repl, TS} is always the ultimate or penultimate state when you calculate timings. Meaning that line 49 then gets the reply timestamp from the list, as undefined is matched as the 3rd argument to the function.

Options are:

refactor so the function does what it seems to be supposed to do (calculate times for all states up to reply time or

Document the assumption that reply is the final or pen-ultimate` state in the list of timings

I think 1 is the right thing to do, but will require more review from you.

Ok, found more problems here. The calc_timing fun returns two timing entries for reply on the put_fsm, which means two updates to that stat. I really need to refactor this function so it does what I think it is meant to do. (Time each stage, and the total time from init to reply)

beerriot · 2012-11-12T21:12:29Z

Should there be a utility function for deleting tracked-bucket stats? That way we can turn them on for a while, then turn them off, and not have to worry about unused memory consumption?

russelldb · 2012-11-12T21:15:07Z

Yes there should. Good point. Same utility should provide a way to turn on / off bucket tracking too.

russelldb · 2012-11-14T10:53:39Z

I've opened an issue (#434) for this "utility to delete tracked-bucket stats", it will depend on code in basho/riak_core#252. It may or may not get into the code freeze. Is that acceptable, or does a +1 for this PR depend on it?

riak_kv_fsm_timing seems a better name for the former `util` module since all it does is timing right now. Refactored the `cal_timing` function to handle duplicate states and made explicit the need for a `reply` tuple in the timings list.

russelldb · 2012-11-14T11:01:01Z

Added commit that I hope addresses your comments.

beerriot · 2012-11-14T13:17:47Z

Issue #434 works for me. I see no need to hold up this PR for it.

There is no need to do a ordict find, update, and store orddict:update_counter is there for that, this change use it.

russelldb · 2012-11-14T14:52:28Z

Squashed that fun.

beerriot · 2012-11-14T14:56:42Z

It might be a good idea to loosen the function head match on riak_kv_stat:do_put_bucket and :do_get_bucket. As it stands, where only true and false are allowed for the first argument, it's easy to lose all of your stats by setting a bucket property incorrectly.

For instance, using curl:

# put something to update stats
$ curl -X PUT -H "content-type: text/plain" http://localhost:8098/riak/a/1 --data "okay"

# see stats increment
$ curl http://localhost:8098/stats | jsonpp | grep node_puts_total
  "vnode_puts_total": 3,
  "node_puts_total": 1,

# set the 'stat_tracked' bucket property to something not true or false
$ curl -X PUT -H "content-type: application/json" http://localhost:8098/riak/a --data "{\"props\":{\"stat_tracked\":\"not a boolean\"}}"

# put something to update stats again
$ curl -X PUT -H "content-type: text/plain" http://localhost:8098/riak/a/2 --data "bummer"

# see the stats clear
$ curl http://localhost:8098/stats | jsonpp | grep node_puts_total
  "vnode_puts_total": 0,
  "node_puts_total": 0,

Either rewriting the false clause of those functions as a don't-care/catch-all, or using PerBucket==true instead of just PerBucket should fix this.

russelldb · 2012-11-14T15:04:11Z

Good catch, maybe here is the right place to handle this (https://github.com/basho/riak_kv/blob/master/src/riak_kv_bucket.erl)?

beerriot · 2012-11-14T15:13:26Z

Aha! I totally forgot that there is bucket-property validation now. Good plan.

beerriot · 2012-11-14T15:18:04Z

One bit about per-bucket stats that's bugging me is the creation of an atom from the bucket name. For tracking a constant handful of buckets, this isn't bad, but for tracking lots of buckets over time, it may lead to atom table exhaustion (think about the case of enabling it as a default bucket prop). It seems like it would be safer (if slightly less efficient during update/query) to use the binary bucket name instead. Am I worrying too much? How hard would such a change be?

russelldb · 2012-11-14T15:43:04Z

Pushed change to validate that the stat_tracked property is a boolean.

I spoke with Jon about the risks of allowing people to track buckets in this way. We're going to warn against tracking too many. In fact, it should be emphasised as a temporary / debugging measure.

I agree with your concerns about the atom table and am investigating making the change to using atoms.

russelldb · 2012-11-14T21:01:50Z

I'm going to deal with the atoms for bucket names in a different branch. This PR is the first of a few that build on it, and I think #428 is the place to make that change.

Can I get a view on this PR as it stands, worries about atoms aside?

russelldb · 2012-11-14T21:13:56Z

I've made the changes to gh412-tree-stats to use binaries for bucket names in stat keys and it all works fine. I'll pay the price when it comes to offering a REST query interface, and I can't just use list_to_existing_atom/1 for each path token. Discerning what should be a binary vs. what should be an atom will be…fun.

beerriot · 2012-11-15T02:07:00Z

Indeed, +1 for this branch. Thanks!

russelldb mentioned this pull request Oct 25, 2012

Moar stats for riak #412

Closed

beerriot reviewed Nov 12, 2012
View reviewed changes

russelldb mentioned this pull request Nov 14, 2012

Add utility to delete per bucket stats [JIRA: RIAK-2808] #434

Closed

Address review comments

fad9118

riak_kv_fsm_timing seems a better name for the former `util` module since all it does is timing right now. Refactored the `cal_timing` function to handle duplicate states and made explicit the need for a `reply` tuple in the timings list.

Address review comment and squash redundant function

3035396

There is no need to do a ordict find, update, and store orddict:update_counter is there for that, this change use it.

Add stat_tracked to the validated as boolean bucket props

cf2b18e

ghost assigned beerriot Nov 14, 2012

russelldb mentioned this pull request Nov 14, 2012

Add a per stat calculation process for fine grained control basho/riak_core#252

Closed

Merge branch 'gh412-read-repairs' into rdb-fsm-timing-stats

28c937d

russelldb merged commit 28c937d into master Nov 15, 2012

ghost assigned russelldb Nov 16, 2012

seancribbs deleted the rdb-fsm-timing-stats branch April 1, 2015 23:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add more dimensions to fsm stats #414

Add more dimensions to fsm stats #414

russelldb commented Oct 25, 2012

russelldb commented Oct 29, 2012

beerriot Nov 12, 2012

russelldb Nov 13, 2012

coderoshi commented Nov 12, 2012

beerriot Nov 12, 2012

russelldb Nov 13, 2012

russelldb Nov 13, 2012

beerriot commented Nov 12, 2012

russelldb commented Nov 12, 2012

russelldb commented Nov 14, 2012

russelldb commented Nov 14, 2012

beerriot commented Nov 14, 2012

russelldb commented Nov 14, 2012

beerriot commented Nov 14, 2012

russelldb commented Nov 14, 2012

beerriot commented Nov 14, 2012

beerriot commented Nov 14, 2012

russelldb commented Nov 14, 2012

russelldb commented Nov 14, 2012

russelldb commented Nov 14, 2012

beerriot commented Nov 15, 2012

Add more dimensions to fsm stats #414

Add more dimensions to fsm stats #414

Conversation

russelldb commented Oct 25, 2012

russelldb commented Oct 29, 2012

beerriot Nov 12, 2012

Choose a reason for hiding this comment

russelldb Nov 13, 2012

Choose a reason for hiding this comment

coderoshi commented Nov 12, 2012

beerriot Nov 12, 2012

Choose a reason for hiding this comment

russelldb Nov 13, 2012

Choose a reason for hiding this comment

russelldb Nov 13, 2012

Choose a reason for hiding this comment

beerriot commented Nov 12, 2012

russelldb commented Nov 12, 2012

russelldb commented Nov 14, 2012

russelldb commented Nov 14, 2012

beerriot commented Nov 14, 2012

russelldb commented Nov 14, 2012

beerriot commented Nov 14, 2012

russelldb commented Nov 14, 2012

beerriot commented Nov 14, 2012

beerriot commented Nov 14, 2012

russelldb commented Nov 14, 2012

russelldb commented Nov 14, 2012

russelldb commented Nov 14, 2012

beerriot commented Nov 15, 2012