Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add latency stats for riak pb client operations [JIRA: RCS-243] #1189

Merged
merged 6 commits into from
Jul 16, 2015

Conversation

shino
Copy link
Contributor

@shino shino commented Jul 15, 2015

Part 3 of stats improvement which addresses #961 (RCS-37) (RCS-217) (RCS-217) (RCS-37) (RCS-37) (RCS-217) partially.

This PR adds latency/counter stats for (almost) every riak pb client operations.

Excluded operations and reasons are as follows:

  • Ignored because not in hot path
  ./riak_cs_app.erl:115:    case catch riakc_pb_socket:get_bucket(MasterPbc, Bucket) of
  • Ignored because strongly deprecated
./riak_cs_list_objects_fsm.erl:308:    riakc_pb_socket:stream_list_keys(ManifestPbc,
./riak_cs_list_objects_fsm.erl:496:    riakc_pb_socket:mapred_stream(ManifestPbc,

Also this PR fixes a bug that timeout is not properly set from
application environment for multipart upload listing call. Please
refer the commit message of dc637e1
for details.

2i for listing MP uploads should use equal query but the function
riakc_pb_socket:get_range/5 was used which is for range query.  But,
so luckily, this bug did not manifests for user facing response.

The reason was two-fold,
1. only 2i key is important and value is always <<"1">> for MP 2i, and
2. Timeout value 60000 (integer) is larger than <<"1">> after
   converted to binary.

So this commit fixes a bug that has not became reality ever.
@shino shino added this to the 2.1.0 milestone Jul 15, 2015
{ok, _} = Success ->
ProceedFun(Success);

{error, Why} when Why == disconnected;
Why == timeout ->
_ = lager:debug("get_block_local/5 failed: {error, ~p}", [Why]),
_ = lager:debug("get_block_local() failed: {error, ~p}", [Why]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this changed to () ? Just /6 looks nice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The arity does not have information. I like log to be minimal but sufficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I want to use standard or de-fact notation to refer a function that does not specify (or emphasize) its arity. Sometimes, function_name/* is used in Erlang, but it seems like a group of functions with the name (right?).
Any recommendation is appreciated 😅

Parenthesis with empty args may be not familiar.. Hmm... I saw the notation in exometer_core README.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't make change about this.

@kuenishi
Copy link
Contributor

What is include/riak_cs_stats.hrl for? To include from multibag?

@@ -113,6 +114,36 @@ pbc_pool_name(undefined) ->
pbc_pool_name(BagId) when is_binary(BagId) ->
list_to_atom(lists:flatten(io_lib:format("pbc_pool_~s", [BagId]))).

%% @doc Make a thunk that looks up samples for a given bucket+prefix.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/prefix/postfix/ or s/prefix/suffix/ ?

@kuenishi
Copy link
Contributor

Code looks nice, a lot simpler than I thought. Are you gonna update this branch? Or I'll just +1 now and go to next round.

@kuenishi
Copy link
Contributor

riak_test and manual tests have passed.

@kuenishi
Copy link
Contributor

Won't need fix for this right now, but Lifecycle mock fails like this:

DEBUG: format_uri(): http://aaa.s3.amazonaws.com/?lifecycle
DEBUG: Sending request method_string='GET', uri='http://aaa.s3.amazonaws.com/?lifecycle', headers={'Authorization': 'AWS admin-key:IpJege
EZQC8J39r0D9gnjVMwEFM=', 'x-amz-date': 'Thu, 16 Jul 2015 05:34:30 +0000'}, body=(0 bytes)
DEBUG: Response: {'status': 403, 'headers': {'date': 'Thu, 16 Jul 2015 05:34:31 GMT', 'content-length': '158', 'content-type': 'applicati
on/xml', 'server': 'Riak CS'}, 'reason': 'Forbidden', 'data': '<?xml version="1.0" encoding="UTF-8"?><Error><Code>AccessDenied</Code><Mes
sage>Access Denied</Message><Resource>/aaa</Resource><RequestId></RequestId></Error>'}
DEBUG: ConnMan.put(): closing proxy connection (keep-alive not yet supported)
ERROR: S3 error: Access Denied
   Expiration Rule: none

@shino
Copy link
Contributor Author

shino commented Jul 16, 2015

Addressed your comments and pushed. Thanks for review, it helps much 😄

@shino
Copy link
Contributor Author

shino commented Jul 16, 2015

Ah, I don't look at location endpoint error. Will you file it as a new issue?

@shino
Copy link
Contributor Author

shino commented Jul 16, 2015

Could not reproduce 403 for lifecycle endpoint in straightforward manner. My s3curl (I guess everyone owns his/her own s3curl, right?) does not understand ?lifecycle as one of subresources and riak cs does not either. If lifecycle is added to s3curl for string to sign, them I had 403.

h/t to @ksauzz for pointing out subresource mismatch for 403 error

@kuenishi
Copy link
Contributor

My laptop eunit failed:

$ ./rebar eunit skip_deps=true suites=riak_cs_list_objects_fsm_v2_eqc
(snip)
........................................................x................................................................................
.........................................................................................................................................
.....................x........................
OK, passed 1000 tests
riak_cs_list_objects_fsm_v2_eqc:60: eqc_test_...*failed*
in function eqc:quickcheck/1 (../src/eqc.erl, line 1160)
in call from riak_cs_list_objects_fsm_v2_eqc:'-eqc_test_/0-fun-9-'/0 (test/riak_cs_list_objects_fsm_v2_eqc.erl, line 60)
**exit:{badarg,
    [{ets,lookup,[exometer_1,[riak_cs,in,riakc,fold_manifest_objs]],[]},
     {exometer,update_,2,[{file,"src/exometer.erl"},{line,213}]},
     {riak_cs_stats,safe_update,2,[{file,"src/riak_cs_stats.erl"},{line,220}]},
     {riak_cs_list_objects_fsm_v2,update_profiling_state_with_start,4,
         [{file,"src/riak_cs_list_objects_fsm_v2.erl"},{line,570}]},
     {riak_cs_list_objects_fsm_v2,make_2i_request,2,
         [{file,"src/riak_cs_list_objects_fsm_v2.erl"},{line,360}]},
     {riak_cs_list_objects_fsm_v2,prepare,2,
         [{file,"src/riak_cs_list_objects_fsm_v2.erl"},{line,138}]},
     {gen_fsm,handle_msg,7,[{file,"gen_fsm.erl"},{line,505}]},
     {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]}


riak_cs_list_objects_fsm_v2_eqc:64: eqc_test_...*failed*
in function eqc:quickcheck/1 (../src/eqc.erl, line 1160)
in call from riak_cs_list_objects_fsm_v2_eqc:'-eqc_test_/0-fun-12-'/0 (test/riak_cs_list_objects_fsm_v2_eqc.erl, line 64)
**exit:{badarg,
    [{ets,lookup,[exometer_2,[riak_cs,in,riakc,fold_manifest_objs]],[]},
     {exometer,update_,2,[{file,"src/exometer.erl"},{line,213}]},
     {riak_cs_stats,safe_update,2,[{file,"src/riak_cs_stats.erl"},{line,220}]},
     {riak_cs_list_objects_fsm_v2,update_profiling_state_with_start,4,
         [{file,"src/riak_cs_list_objects_fsm_v2.erl"},{line,570}]},
     {riak_cs_list_objects_fsm_v2,make_2i_request,2,
         [{file,"src/riak_cs_list_objects_fsm_v2.erl"},{line,360}]},
     {riak_cs_list_objects_fsm_v2,prepare,2,
         [{file,"src/riak_cs_list_objects_fsm_v2.erl"},{line,138}]},
     {gen_fsm,handle_msg,7,[{file,"gen_fsm.erl"},{line,505}]},
     {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]}


=======================================================
  Failed: 2.  Skipped: 0.  Passed: 2.
ERROR: One or more eunit tests failed.
ERROR: eunit failed while processing /home/kuenishi/cs-2.1/riak_cs: rebar_abort

borshop added a commit that referenced this pull request Jul 16, 2015
Add latency stats for riak pb client operations

Reviewed-by: kuenishi
@kuenishi
Copy link
Contributor

@borshop merge

@borshop borshop merged commit bfe6871 into develop Jul 16, 2015
@kuenishi kuenishi deleted the feature/stats-for-riak-pb-client branch July 16, 2015 10:13
@Basho-JIRA Basho-JIRA changed the title Add latency stats for riak pb client operations Add latency stats for riak pb client operations [JIRA: RCS-243] Jul 16, 2015
@Basho-JIRA
Copy link

This is a part of RCS-11 work; release notes will be written there.

_[posted via JIRA by Kota Uenishi]_

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants