Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce Exometer #1165

Merged
merged 5 commits into from
Jun 24, 2015
Merged

Introduce Exometer #1165

merged 5 commits into from
Jun 24, 2015

Conversation

kuenishi
Copy link
Contributor

  • Change result of /riak-cs/stats to flat JSON object style
  • Add `riak-cs-admin status' command
  • Move from folsom to Exometer
  • Change names of several metrics to Riak way

@Basho-JIRA
Copy link

Added riak-cs-admin status and changed some result format of /riak-cs/stats .

_[posted via JIRA by Kota Uenishi]_

@kuenishi kuenishi added this to the 2.1.0 milestone Jun 16, 2015
@kuenishi
Copy link
Contributor Author

TODOs

  • include 'status' in riak-cs-debug
  • do some micro bench
  • check some commits between 1.0 and 1.2
  • read some implementation

* Change result of /riak-cs/stats to flat JSON object style
* Add `riak-cs-admin status' command
* Move from folsom to Exometer
* Change names of several metrics to Riak way
* Add result of 'riak-cs-admin status' in riak-cs-debug pack
@kuenishi
Copy link
Contributor Author

(riak-cs@127.0.0.1)20> Rep = fun(List) -> timer:tc(fun() -> [riak_cs_stats:update([block, get, retry], 10000)||_<-List] end)end.
#Fun<erl_eval.6.80484245>
(riak-cs@127.0.0.1)21> M = fun() -> io:format("~p~n", [element(1, Rep(lists:seq(1,1000000)))]) end.                             
#Fun<erl_eval.20.80484245>
(riak-cs@127.0.0.1)17> [spawn(M)||_<-lists:seq(1,10)].                                                                          
[<0.718.0>,<0.719.0>,<0.720.0>,<0.721.0>,<0.722.0>,
 <0.723.0>,<0.724.0>,<0.725.0>,<0.726.0>,<0.727.0>]
12275031               
12447881               
12566408               
12791156               
15421717               
15301850               
15380236               
15430211               
15720915               
16101178               
(riak-cs@127.0.0.1)18> [spawn(M)||_<-lists:seq(1,3)]. 
[<0.729.0>,<0.730.0>,<0.731.0>]
7994062                
8005765                
8313832  

A single process, requires ~6 seconds to process a million riak_cs_stats:update/2. The penalty is about twice more when running in 10 concurrent processes.

@kuenishi
Copy link
Contributor Author

Another round of microbenchmark:

(riak-cs@127.0.0.1)5> Self = self().
<0.578.0>
(riak-cs@127.0.0.1)6>  TestFun = fun(Concurrency, Rep) -> {Lat,_} = timer:tc(fun() ->[receive done -> ok end || _Pid <- [spawn(fun()                      
(riak-cs@127.0.0.1)6>  ->[riak_cs_stats:update([block, get, retry], 10000)||_<- lists:seq(1, Rep)], Self!done end)||_<- lists:seq(1, Concurrency)]] end), 
(riak-cs@127.0.0.1)6> Concurrency * Rep * 1000000 / Lat end.
#Fun<erl_eval.12.80484245>
(riak-cs@127.0.0.1)7> TestFun(1,1).
4098.360655737705
(riak-cs@127.0.0.1)8> TestFun(10,10).
42087.54208754209
(riak-cs@127.0.0.1)9> TestFun(10,100).
180603.21473722233
(riak-cs@127.0.0.1)10> TestFun(10,1000).
316896.9451134491
(riak-cs@127.0.0.1)11> TestFun(1,1000000). 
147176.8463740114
(riak-cs@127.0.0.1)12> TestFun(2,1000000).
273493.4272692092
(riak-cs@127.0.0.1)13> TestFun(4,1000000).
488571.27955718833
(riak-cs@127.0.0.1)14> TestFun(8,1000000).
512455.58531427843
(riak-cs@127.0.0.1)15> TestFun(16,1000000).
562100.4344193208
(riak-cs@127.0.0.1)16> TestFun(32,1000000).
541355.426770107

On my 8 core i7 Desktop box, it looks like riak_cs_stats:update/2 can handle ~500k calls / second. Isn't it enough for production?

@Basho-JIRA
Copy link

I believe this is ready for review.

_[posted via JIRA by Kota Uenishi]_

@@ -357,7 +357,6 @@ maybe_backpressure_sleep(Siblings, _BackpressureThreshold) ->
Delta = MeanSleepMS div 2,
SleepMS = crypto:rand_uniform(MeanSleepMS - Delta, MeanSleepMS + Delta),
lager:debug("maybe_backpressure_sleep: Siblings=~p, SleepMS=~p~n", [Siblings, SleepMS]),
ok = riak_cs_stats:update(manifest_siblings_bp_sleep, SleepMS * 1000),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is kludge, bue we have no means whether backpressure sleep happened or not except this stats.
If new stats system has counter, it is sufficient just to count up here.
This PR's aim is to introduce exometer, so It can be deferred to a part of #961.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will deprecate this. And remove in future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In another pull request.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@shino
Copy link
Contributor

shino commented Jun 19, 2015

Now riak_cs_stats has no (meaningful) state and does not handle messages. Turn it to plain module instead of gen_server callback?

@kuenishi
Copy link
Contributor Author

Updated ^^;

@kuenishi
Copy link
Contributor Author

Updated again, found a typo bug ^^;

@shino
Copy link
Contributor

shino commented Jun 24, 2015

make test-int does not work. It tests put fsm with live riak nodes, which is covered in some of riak_test modules. It seems that just removing it does no harm.

@Basho-JIRA
Copy link

Will add a commit removing that.

_[posted via JIRA by Kota Uenishi]_

@@ -33,6 +34,11 @@
%%% Public API
%%%===================================================================

status([]) ->
Stats = lists:sort(riak_cs_stats:get_stats()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need sort here? This sorting orders item as 100 -> 95 -> 99:

block_get_retry_time_100 : 0
block_get_retry_time_95 : 0
block_get_retry_time_99 : 0
block_get_retry_time_mean : 0
block_get_retry_time_median : 0

On the other hand, JSON output is, for me, more human friendly:

    "block_get_retry_time_mean": 0,
    "block_get_retry_time_median": 0,
    "block_get_retry_time_95": 0,
    "block_get_retry_time_99": 0,
    "block_get_retry_time_100": 0,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ffmm, maybe sort could be done by external command like sort(1). Will update.

@shino
Copy link
Contributor

shino commented Jun 24, 2015

All riak_test passed. Code diff looks nice ➕

@shino
Copy link
Contributor

shino commented Jun 24, 2015

Just a memo for self, license for added repos.

| exometer_core     | MPL v2.0 |
| setup             | MPL v2.0 |
| parse_trans       | EPL      |
| edown, NOTICE     | EPL      |
| edown, some files | APL v2   |

borshop added a commit that referenced this pull request Jun 24, 2015
Introduce Exometer

Reviewed-by: shino
@kuenishi
Copy link
Contributor Author

@borshop merge

@borshop borshop merged commit a1bb8ae into develop Jun 24, 2015
@Basho-JIRA
Copy link

See RCS-110 (#654) for release notes.

_[posted via JIRA by Kota Uenishi]_

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants