Slf dtrace take1 #322

Merged
merged 3 commits into from Jun 10, 2012

Projects

None yet

3 participants

@slfritchie
Contributor

NOTE: This is a work in progress, to be picked up by someone else while I'm away on vacation.

NOTE: Depends on basho/riak_core#174

All trace messages are funneled first into a function called dtrace()
in order to make tracing via Redbug or other Erlang-based tracing tools
a bit easier to use, e.g. redbug:start({riak_kv_get_fsm, dtrace}).

However, Erlang tracing syntax (e.g. via Redbug) will be clumsy, but
at least it's doable (I hope). For example, for the get FSM probe category
numbers between 500 & 512:

redbug:start("riak_kv_get_fsm:dtrace('_', X, '_', '_') when 500 =< X, X =< 512", {msgs, 100}).

Eventually, I want to see libusdt used for Erlang-triggerable probes, but
we aren't quite there yet, alas. As a temporary measure, it's likely that
the Erlang virtual machine will have 1024 numbered probes instead of
the current 1. If/when the 1024 numbered probes arrive, the category
code can be used for the 0-to-1023 probe "name" mapping. In the long
term, the code can be used to derive a real ASCII string name for a
probe created by libusdt.

Contributor

Next steps, some possibilities. Items beginning with "X" have been done already or are missing just one little thing.

Things for DTrace probes in Riak KV

* backend ops: get, put, delete.  
  - bucket, key, # sec indexes?, # siblings!, object size!, status?
  - Also folding? hrm, sync vs. async folding, what are useful probe conditions?
  - What about doing this in riak_core instead?  In addition?
  - Read repair events: bucket, key, vnode to repair

* bitcask backend
  - merge start, merge end, via bitcask_merge_worker:merge()

* buckets FSM
  - init & finish?

X get FSM
  - init & finish: bucket, key, # siblings?, object size!, status!
  - vnode list is very interesting, but implicit as replies arrive?
  - timeouts from vnodes as separate event?
  - read repair as separate event?
  - what about tombstone handling?
  - separate events as each reply arrives: vnode, status (ok, notfound, other?)

X delete
  - bucket, key, put time (separate event?)

* index FSM
  - Hrm, this module refers to luke ... is it still used??
  - init & finish: bucket, query (if it's a string or stringifyable), clienttype
  - as result messages arrive: 

X keys FSM
  - init & finish: bucket, key, # siblings?, object size!, status!
  - separate events as each reply arrives: vnode, # keys

X put FSM
  - init & finish: bucket, key, # siblings?, object size!, status!, t'stone?
  - vnode list is very interesting, but implicit as replies arrive?
  - timeouts from vnodes as separate event?
  - separate events as each reply arrives: vnode, status (ok, notfound, other?)

* erlang_js manager
  - VM in use, VM idle: VM number, bytes handled, other computation info?
  - blocking_dispatch(): reserve failures & successes
  - what's the batch stuff all about?

* riak_kv_lru?
  - There appear to be no ops stats at all ... are they needed?

* riak_kv_mrc_pipe
  - pipe setup: # of stages?, method of specifying input?, bucket name
  - pipe teardown (collect_outputs())

* riak_kv_pb_listener
  - accept connection: peer info
  - terminate connection

* riak_kv_pb_socket
  - process_message(): major type with bucket, key, ...
  - bytes input, bytes output?

* riak_kv_vnode
  - handle_command: begin & end, type, vnode, bucket, key, object size, # sibs?
  - handle_coverage: begin & end, type, vnode, bucket, other?
  - handle_handoff_command: begin & end, vnode, type, bucket, key, obj size & sibs?
  - handoff_starting & cancelled: vnode
  - handle_handoff_data: vnode, bucket, key, obj size, #sibs?
  - encode_handoff_item: vnode, bucket, key, obj size, #sibs?
  - handle_info: final_delete stuff: vnode, bucket, key, other?

* WM resources
  - ?
@metadave metadave was assigned Jun 4, 2012
@beerriot beerriot commented on the diff Jun 7, 2012
src/riak_kv_put_fsm.erl
new_state_timeout(postcommit, StateData2);
{ok, _} ->
+ Values = riak_object:get_values(RObj),
+ %% TODO: more accurate sizing method
beerriot
beerriot Jun 7, 2012 Contributor

I know the comment says "TODO", but I think the biggest item missing from this sum is metadata, which is specifically interesting because of its use for storing secondary index information (and links, and other user-defined things, but 2i is the place people are really starting to use it a bunch).

slfritchie
slfritchie Jun 8, 2012 Contributor

Hrm. So, how would I calculate the size? The best thing that I can think of is this:

        ApproxBytes = size(Bucket) + size(Key) +
            lists:sum([size(V) || V <- Values]),
        NumSibs = length(Values),
        %% 2 new lines below, 1->2 edited line after that
        ApproxMD = lists:sum([size(term_to_binary(MD)) ||
                                 MD <- riak_object:get_metadatas(RObj)]),
        ?DTRACE(?C_PUT_FSM_PROCESS_REPLY,
                [1, ApproxBytes, ApproxMD, NumSibs], []),

term_to_binary/1 is going to allocate memory that will live for only a brief amount of time. In the common case where the number of siblings < 5, that shouldn't be horrible. In bad cases where the number of siblings is 10K or more, then there's extra expense when we know that we're a CPU hog already.

The erts_debug:size/1 will create less garbage, but it's going to execute a lot more BEAM instructions than erlang:term_to_binary/1. I'm lazy and haven't measured either approach yet.

One option would be to report a number > 0 if the # of siblings is very small, e.g. less than 5, and report 0 if the number of siblings is too big?

beerriot
beerriot Jun 8, 2012 Contributor

Agreed. Let's leave it out, until we have a better (less expensive) way to do the computation.

@beerriot beerriot was assigned Jun 8, 2012
@beerriot beerriot and 1 other commented on an outdated diff Jun 8, 2012
src/riak_kv_keys_fsm.erl
@@ -132,6 +148,7 @@ finish(clean,
plain ->
ClientPid ! {ReqId, done}
end,
+ ?DTRACE(?C_KEYS_FINISH, [-1], []),
beerriot
beerriot Jun 8, 2012 Contributor

This finish ought to be 0 to represent success, yes?

slfritchie
slfritchie Jun 8, 2012 Contributor

I'd used the -1 in there to differentiate that case from the ok clause earlier. I'm ambivalent about which to use, repeating 0 or using a unique -1. Opinions?

slfritchie
slfritchie Jun 8, 2012 Contributor

Sorry, I was looking at riak_kv_put_fsm.erl, bad Scott.

Contributor
beerriot commented Jun 8, 2012

Except for my two minor comments, I'm +1 on this PR. Great work, Scott & Ryan!

@slfritchie slfritchie was assigned Jun 8, 2012
Contributor

Gadz, now @jonmeredith has found a legitimate problem with the R15B01 flavor and its hidden use (inside dyntrace.erl) of unicode:characters_to_binary/1. {sigh} Yet more review to do, sorry folks.

@slfritchie slfritchie Fix DTrace + R15B01 problem with hidden Unicode problem
In the R15B01 version of DTrace, the OTP folks had slipped in a change to
dyntrace.erl that I wasn't fully aware of: instead of using the
iolist_to_binary/1 BIF to set the user-defined process tag, R15B01 uses
unicode:characters_to_binary/1.  The Unicode function will throw a badarg
exception if the argument isn't 100% Unicode kosher.

Fix by using io_lib:format/2 on all places where a string might have
unkosher bytes.  A side-effect of this extra formatting is that any D
script that collects probe events (namely the copyinstr() function) will
not be confused by NUL bytes in the middle of an Erlang string.
d1478f7
@slfritchie slfritchie merged commit d1478f7 into master Jun 10, 2012
@seancribbs seancribbs deleted the slf-dtrace-take1 branch Apr 1, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment