Skip to content

Commit

Permalink
Move optimizations and pitfalls into 'advanced'
Browse files Browse the repository at this point in the history
  • Loading branch information
hainest committed Apr 3, 2024
1 parent 4663753 commit b6883c0
Show file tree
Hide file tree
Showing 3 changed files with 248 additions and 0 deletions.
203 changes: 203 additions & 0 deletions docs/advanced/optimizations.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,203 @@

==============================
Optimizing Dyninst Performance
==============================

This section describes how to tune Dyninst for optimum performance.
During the course of a run, Dyninst will perform several types of
analysis on the binary, make safety assumptions about instrumentation
that is inserted, and rewrite the binary (perhaps several times). Given
some guidance from the user, Dyninst can make assumptions about what
work it needs to do and can deliver significant performance
improvements.

There are two areas of Dyninst performance users typically care about.
First, the time it takes Dyninst to parse and instrument a program. This
is typically the time it takes Dyninst to start and analyze a program,
and the time it takes to modify the program when putting in
instrumentation. Second, many users care about the time instrumentation
takes in the modified mutatee. This time is highly dependent on both the
amount and type of instrumentation put it, but it is still possible to
eliminate some of the Dyninst overhead around the instrumentation.

The following subsections describe techniques for improving the
performance of these two areas.

Optimizing Mutator Performance
------------------------------

CPU time in the Dyninst mutator is usually consumed by either parsing or
instrumenting binaries. When a new binary is loaded, Dyninst will
analyze the code looking for instrumentation points, global variables,
and attempting to identify functions in areas of code that may not have
symbols. Upon user request, Dyninst will also parse debug information
from the binary, which includes local variable, line, and type
information.

Since Dyninst 10.0.0, Dyninst supports parsing binaries in parallel,
which significantly improve the analysis speed. We typically have about
4X speedup when analyzing binaries with 8 threads. By default, Dyninst
will use all the available cores on your system. Please set environment
variable OMP_NUM_THREADS to the number of desired threads.

Debugging information is lazily parsed separately from the rest of the
binary parsing. Accessing line, type, or local variable information will
cause Dyninst to parse the debug information for all three of these.

Another common source of mutator time is spent re-writing the mutatee to
add instrumentation. When instrumentation is inserted into a function,
Dyninst may need to rewrite some or all of the function to fit the
instrumentation in. If multiple pieces of instrumentation are being
inserted into a function, Dyninst may need to rewrite that function
multiple times.

If the user knows that they will be inserting multiple pieces of
instrumentation into one function, they can batch the instrumentation
into one bundle, so that the function will only be re-written once,
using the BPatch_process::beginInsertionSet and
BPatch_­process::end­Inser­tion­Set functions (see section 4.4). Using
these functions can result in a significant performance win when
inserting instrumentation in many locations.

To use the insertion set functions, add a call to beginInsertionSet
before inserting instrumentation. Dyninst will start buffering up all
instrumentation insertions. After the last piece of instrumentation is
inserted, call finalizeInsertionSet, and all instrumentation will be
atomically inserted into the mutatee, with each function being rewritten
at most once.

Optimizing Mutatee Performance
------------------------------

As instrumentation is inserted into a mutatee, it will start to run
slower. The slowdown is heavily influenced by three factors: the number
of points being instrumented, the instrumentation itself, and the
Dyninst overhead around each piece of instrumentation. The Dyninst
overhead comes from pieces of protection code (described in more detail
below) that do things such as saving/restoring registers around
instrumentation, checking for instrumentation recursion, and performing
thread safety checks.

The factor by which Dyninst overhead influences mutatee run-time depends
on the type of instrumentation being inserted. When inserting
instrumentation that runs a memory cache simulator, the Dyninst overhead
may be negligible. On the other-hand, when inserting instrumentation
that increments a counter, the Dyninst overhead will dominate the time
spent in instrumentation. Remember, optimizing the instrumentation being
inserted may sometimes be more important than optimizing the Dyninst
overhead. Many users have had success writing tools that make use of
Dyninst’s ability to dynamically remove instrumentation as a performance
improvement.

The instrumentation overhead results from safety and correctness checks
inserted by Dyninst around instrumentation. Dyninst will automatically
attempt to remove as much of this overhead as possible, however it
sometimes must make a conservative decision to leave the overhead in.
Given additional, user-provided information Dyninst can make better
choices about what safety checks to leave in. An unoptimized
post-Dyninst 5.0 instrumentation snippet looks like the following:

+----------------------------------+----------------------------------+
| **Save General Purpose | In order to ensure that |
| Registers** | instrumentation doesn’t corrupt |
| | the program, Dyninst saves all |
| | live general purpose registers. |
+----------------------------------+----------------------------------+
| **Save Floating Point | Dyninst may decide to separately |
| Registers** | save any floating point |
| | registers that may be corrupted |
| | by instrumentation. |
+----------------------------------+----------------------------------+
| **Generate A Stack Frame** | Dyninst builds a stack frame for |
| | instrumentation to run under. |
| | This provides the illusion to |
| | instrumentation that it is |
| | running as its own function. |
+----------------------------------+----------------------------------+
| **Calculate Thread Index** | Calculate an index value that |
| | identifies the current thread. |
| | This is primarily used as input |
| | to the Trampoline Guard. |
+----------------------------------+----------------------------------+
| **Test and Set Trampoline | Test to see if we are already |
| Guard** | recursively executing under |
| | instrumentation, and skip the |
| | user instrumentation if we are. |
+----------------------------------+----------------------------------+
| **Execute User Instrumentation** | Execute any BPatch_snippet code. |
+----------------------------------+----------------------------------+
| **Unset Trampoline Guard** | Marks the this thread as no |
| | longer being in instrumentation |
+----------------------------------+----------------------------------+
| **Clean Stack Frame** | Clean the stack frame that was |
| | generated for instrumentation. |
+----------------------------------+----------------------------------+
| **Restore Floating Point | Restore the floating point |
| Registers** | registers to their original |
| | state. |
+----------------------------------+----------------------------------+
| **Restore General Purpose | Restore the general purpose |
| Registers** | registers to their original |
| | state. |
+----------------------------------+----------------------------------+

Dyninst will attempt to eliminate as much of its overhead as is
possible. The Dyninst user can assist Dyninst by doing the following:

- **Write BPatch_snippet code that avoids making function calls.**
Dyninst will attempt to perform analysis on the user written
instrumentation to determine which general purpose and floating point
registers can be saved. It is difficult to analyze function calls
that may be nested arbitrarily deep. Dyninst will not analyze any
deeper than two levels of function calls before assuming that the
instrumentation clobbers all registers and it needs to save
everything.

..
In addition, not making function calls from instrumentation allows
Dyninst to eliminate its tramp guard and thread index calculation.
Instrumentation that does not make a function call cannot recursively
execute more instrumentation.

- **Call BPatch::setTrampRecursive(true) if instrumentation cannot
execute recursively.** If instrumentation must make a function call,
but will not execute recursively, then enable trampoline recursion.
This will cause Dyninst to stop generating a trampoline guard and
thread index calculation on all future pieces of instrumentation. An
example of instrumentation recursion would be instrumenting a call to
write with instrumentation that calls printf—write will start calling
printf printf will re-call write.

- **Call BPatch::setSaveFPR(false) if instrumentation will not clobber
floating point registers**. This will cause Dyninst to stop saving
floating point registers, which can be a significant win on some
platforms.

- **Use simple BPatch_snippet objects when possible**. Dyninst will
attempt to recognize, peep-hole optimize, and simplify frequently
used code snippets when it finds them. For example, on x86 based
platforms Dyninst will recognize snippets that do operations like
‘var = constant’ or ‘var++’ and turn these into optimized assembly
instructions that take advantage of CISC machine instructions.

- **Call BPatch::setInstrStackFrames(false) before inserting
instrumentation that does not need to set up stack frames. Dyninst
allows you to force stack frames to be generated for all
instrumentation. This is useful for some applications (e.g.,
debugging your instrumentation code) but allowing Dyninst to omit
stack frames wherever possible will improve performance. This flag is
false by default; it should be enabled for as little instrumentation
as possible in order to maximize the benefit from optimizing away
stack frames.**

- **Avoid conditional instrumentation wherever possible.** Conditional
logic in your instrumentation makes it more difficult to avoid saving
the state of the flags.

- **Avoid unnecessary instrumentation.** Dyninst provides you with all
kinds of information that you can use to select only the points of
actual interest for instrumentation. Use this information to
instrument as selectively as possible. The best way to optimize your
instrumentation, ultimately, is to know *a priori* that it was
unnecessary and not insert it.
36 changes: 36 additions & 0 deletions docs/advanced/pitfalls.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@

===============
Common pitfalls
===============

These are some common pitfalls that users
have reported when using the Dyninst system. Many of these are either
due to limitations in the current implementations, or reflect design
decisions that may not produce the expected behavior from the system.

Attach followed by detach
-------------------------

If a mutator attaches to a mutatee, and immediately exits, the current
behavior is that the mutatee is left suspended. To make sure the
application continues, call detach with the appropriate flags.

Attaching to a program that has already been modified by Dyninst
----------------------------------------------------------------

If a mutator attaches to a program that has already been modified by a
previous mutator, a warning message will be issued. We are working to
fix this problem, but the correct semantics are still being specified.
Currently, a message is printed to indicate that this has been
attempted, and the attach will fail.

Dyninst is event-driven
-----------------------

Dyninst must sometimes handle events that take place in the mutatee, for
instance when a new shared library is loaded, or when the mutatee
executes a fork or exec. Dyninst handles events when it checks the
status of the mutatee, so to allow this the mutator should periodically
call one of the functions BPatch::pollForStatusChange,
BPatch::wait­ForStatusChange, BPatch_thread::isStopped, or
BPatch_­thread::is­Termin­ated.
9 changes: 9 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -56,3 +56,12 @@ Developed by
basics/building
basics/using
basics/first_mutator

.. toctree::
:caption: advanced
:name: advanced
:hidden:
:maxdepth: 2

advanced/optimizations
advanced/pitfalls

0 comments on commit b6883c0

Please sign in to comment.