-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add more dimensions to fsm stats #414
Conversation
To verify this works, do some gets/puts then attach to the riak console and run something like:
Or
Until I have the query / read stuff written. Also, if you want to see per-bucket stats, you need to set the Attach to a node and
Then after putting/getting that bucket you should see stats like:
Where one element in the tuple is the bucket in question. Again, attached to the node:
|
@@ -66,7 +66,11 @@ | |||
bucket_props, | |||
startnow :: {non_neg_integer(), non_neg_integer(), non_neg_integer()}, | |||
get_usecs :: non_neg_integer(), | |||
tracked_bucket=false :: boolean() %% is per bucket stats enabled for this bucket | |||
tracked_bucket=false :: boolean(), %% is per bucket stats enabled for this bucket | |||
timing = [] :: [{atom(), {non_neg_integer(), non_neg_integer(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really [{atom(), timestamp()}]
(erlang:timestamp/0)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, yes it is , for shame.
Works as advertised +1 |
calc_timing(Timing, Now, ReplyNow, []). | ||
|
||
%% Each timing stage has a start time. | ||
calc_timing([], StageEnd, ReplyNow, Stages) -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that this has been cut-and-pasted from another module, but it seems like it's possible that this might some day be called with ReplyNow
as undefined
, which will blow up in the call to timer:now_diff/2
. Is there a reason that will never happen? Should it be documented in case other things change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The case seems to be (and you're right, I didn't fully understand the function when I hoisted it for re-use) that {repl, TS} is always the ultimate or penultimate state when you calculate timings. Meaning that line 49 then gets the reply timestamp from the list, as undefined
is matched as the 3rd argument to the function.
Options are:
- refactor so the function does what it seems to be supposed to do (calculate times for all states up to reply time or
- Document the assumption that
reply
is the final or pen-ultimate` state in the list of timings
I think 1 is the right thing to do, but will require more review from you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, found more problems here. The calc_timing
fun returns two timing entries for reply
on the put_fsm, which means two updates to that stat. I really need to refactor this function so it does what I think it is meant to do. (Time each stage, and the total time from init to reply)
Should there be a utility function for deleting tracked-bucket stats? That way we can turn them on for a while, then turn them off, and not have to worry about unused memory consumption? |
Yes there should. Good point. Same utility should provide a way to turn on / off bucket tracking too. |
I've opened an issue (#434) for this "utility to delete tracked-bucket stats", it will depend on code in basho/riak_core#252. It may or may not get into the code freeze. Is that acceptable, or does a +1 for this PR depend on it? |
riak_kv_fsm_timing seems a better name for the former `util` module since all it does is timing right now. Refactored the `cal_timing` function to handle duplicate states and made explicit the need for a `reply` tuple in the timings list.
Added commit that I hope addresses your comments. |
Issue #434 works for me. I see no need to hold up this PR for it. |
There is no need to do a ordict find, update, and store orddict:update_counter is there for that, this change use it.
Squashed that fun. |
It might be a good idea to loosen the function head match on For instance, using curl: # put something to update stats
$ curl -X PUT -H "content-type: text/plain" http://localhost:8098/riak/a/1 --data "okay"
# see stats increment
$ curl http://localhost:8098/stats | jsonpp | grep node_puts_total
"vnode_puts_total": 3,
"node_puts_total": 1,
# set the 'stat_tracked' bucket property to something not true or false
$ curl -X PUT -H "content-type: application/json" http://localhost:8098/riak/a --data "{\"props\":{\"stat_tracked\":\"not a boolean\"}}"
# put something to update stats again
$ curl -X PUT -H "content-type: text/plain" http://localhost:8098/riak/a/2 --data "bummer"
# see the stats clear
$ curl http://localhost:8098/stats | jsonpp | grep node_puts_total
"vnode_puts_total": 0,
"node_puts_total": 0, Either rewriting the |
Good catch, maybe here is the right place to handle this (https://github.com/basho/riak_kv/blob/master/src/riak_kv_bucket.erl)? |
Aha! I totally forgot that there is bucket-property validation now. Good plan. |
One bit about per-bucket stats that's bugging me is the creation of an atom from the bucket name. For tracking a constant handful of buckets, this isn't bad, but for tracking lots of buckets over time, it may lead to atom table exhaustion (think about the case of enabling it as a default bucket prop). It seems like it would be safer (if slightly less efficient during update/query) to use the binary bucket name instead. Am I worrying too much? How hard would such a change be? |
Pushed change to validate that the I spoke with Jon about the risks of allowing people to track buckets in this way. We're going to warn against tracking too many. In fact, it should be emphasised as a temporary / debugging measure. I agree with your concerns about the atom table and am investigating making the change to using atoms. |
I'm going to deal with the atoms for bucket names in a different branch. This PR is the first of a few that build on it, and I think #428 is the place to make that change. Can I get a view on this PR as it stands, worries about atoms aside? |
I've made the changes to gh412-tree-stats to use binaries for bucket names in stat keys and it all works fine. I'll pay the price when it comes to offering a REST query interface, and I can't just use |
Indeed, +1 for this branch. Thanks! |
Exposed the put_fsm per stage timings
Added per stage timings to the get_fsm
sorry about the trailing whitespace clean-ups. Maybe I should disable that in my .emacs.