Skip to content

Execution traces

Paul Khuong edited this page Aug 29, 2020 · 3 revisions

It is hard to generate realistic calls to a hash function for microbenchmarking. We recorded traces a Backtrace server's initial hash table population phase with a synthetic data set that is representative of (some of) our production workload. We also recorded UMASH calls during steady state, again with synthetic, but representative data and requests.

Format

This is the raw output from perf script with UMASH tracepoints enabled. Each line has the same general shape:

  coronerd/3 18410 [002] 6998482.097358:        sdt_libumash:umash_full: (5630566f8919) arg1=94765633077920 arg2=1 arg3=139594106184935 arg4=64
  coronerd/8 18415 [012] 6998481.440851:     sdt_libumash:umash_fp_init: (5630566f90b4) arg1=139667898878592 arg2=94765624153984
  coronerd/2 18409 [001] 6998481.440856: sdt_libumash:umash_sink_update: (5630566f8621) arg1=139668257941328 arg2=16 arg3=139668257941216 arg4=16
  coronerd/3 18410 [002] 6998481.440947:   sdt_libumash:umash_fp_digest: (5630566f91dc) arg1=139668146702976

Everything left of the first colon is metadata: thread name, thread id, core id, and timestamp. I expect only the tid will be useful, to reconstruct call sequences from this parallel trace.

The tracepoint name after the first colon matches the first two arguments to DTRACE_PROBEn in umash.c. After that comes the address of the trace instruction (not useful for benchmarking), and finally the tracepoint arguments provided to DTRACE_PROBEn.

Each call has a different list of arguments (see umash.c for confirmation):

  • sdt_libumash:umash_full: address of the umash_params, which UMASH to compute (0 or 1), address of the bytes to hash, number of bytes to hash.

  • sdt_libumash:umash_fp_init: address of the initialised umash_fp_state, address of the umash_params.

  • sdt_libumash:umash_sink_update: address of the sink umash_state or umash_fp_state, number of bytes remaining in the internal buffer, address of the bytes to hash, number of bytes to hash.

  • sdt_libumash:umash_fp_digest: address of the umash_fp_state to digest.

  • sdt_libumash:umash_fprint: address of the umash_params, address of the bytes to hash, number of bytes to hash (does not appear in the trace files).

  • sdt_libumash:umash_init: address of the initialised umash_state, address of the umash_params, which UMASH to compute (0 or 1) (does not appear in the trace files).

  • sdt_libumash:umash_digest: address of the digested umash_state (does not appear in the trace files).

When traces include incremental fingerprinting calls, we can expect each thread to only work on one umash_fp_state at a time: the trace for each thread will always look like the regular expression (([umash_fp_init] [umash_sink_update]* [umash_fp_digest]) | [umash_full])*, although recording issues may have introduced irregularities.

Batch hash table filling

Our synthetic hash table population phase makes more than 150 million calls to UMASH. It's not clear that there is any large scale pattern here (at the small end, we do observe rehashing calls), and that's a lot of data, so we only have truncated subsequences of the trace:

During this phase, we only call umash_full, arguably with an off-by-one, since we include NUL terminators in the hashed data. It seems likely that we will fix that off-by-one, and always subtract one from the input length, rather than focusing on "odd" input lenghts.

Steady state request processing

Steady state sees a more diverse set of calls to UMASH: we interleave phases of heavy (string) hashing with other phases that use the incremental fingerprinting interface with different call patterns.

steady-state.2020-08-28.trace.bz2 records all of these calls during a representative workload (2.5 million calls into UMASH over 26 seconds).

This phase exercises umash_full (often with C-string arguments that must first be passed to strlen), and incremental fingerprinting: umash_fp_init, umash_sink_update, and umash_fp_digest.

Clone this wiki locally