Skip to content

oaken-source/graprof

Repository files navigation

graprof - a profiling and trace analysis tool
-------------------------------------------

Please send graprof bug reports via 
<http://github.com/oaken-source/graprof/issues/>.

Or mail us at <graprof@grapentin.org>!

 Table of Contents:
   1. Introduction
   2. Quick-Start Guide (TM)
   3. Tracing
     3.1. Function Instrumentation
     3.2. Allocation Hooks
     3.3. Measuring Time
     3.4. Trace Files
   4. Analysis
     4.1. Flat Profile
     4.2. Call Graph
     4.3. Memory Profile
     4.4. Tracing GUI
     4.5. Debug Symbols
   5. Limitations
     5.1. Runtime Error
     5.2. Memory Footprint
     5.3. Profilee Termination
     5.4. Multithreaded Applications
   6. Tricks and Hints
     6.1. Profiling Libraries


1. Introduction
---------------

 Graprof is a profiling and trace analysis tool for C projects.

 The development of graprof has been inspired by the profiling tools gprof and
 mtrace, and their inability to work together. Based on the most basic 
 profiling methods the GNU toolchain has to offer, graprof aims to provide a
 low-impact easy to use, high-precision profiling method, as well as a 
 sophisticated trace analysis tool which, sadly, is not yet implemented as of 
 the current 0.5 release.

 The information in this file, as well as some additional documentation, can
 be found on <http://graprof.grapentin.org>.

 If you have found a bug, please do not hesitate to report it at graprof's 
 bug tracker found on <http://github.com/oaken-source/graprof/issues>.

 If you have any suggestions, on how graprof can be improved, or want to 
 submit a bug but loathe to create an account at the bug tracking system, just
 send an email to <graprof@grapentin.org> and I will see what I can do.

 If you want to support graprof, or its developers (me), have a look at
 <http://graprof.grapentin.org?page=support>, there are numerous ways to 
 support graprof, from donating to just using it and spreading the word! I
 would appreciate it :)


2. Quick-Start Guide
--------------------

 First, you need to install graprof. Refer to INSTALL for detailed installation
 instructions. Usually `./configure && make && sudo make install' does the job.
 If you try to build from the git repository instead of the tarball, run
 `./autogen.sh' first, to generate configure.

 To instruct your application to write program traces, you have to add
  to CFLAGS:   -finstrument-functions -g
  to LDFLAGS:  -lgraprof
 and recompile.

 This instrumented binary should now be invoked with GRAPROF_OUT set to the
 filename you want the trace data to be written to, for example

  GRAPROF_OUT=gp.out ./your_app [OPTIONS ...]

 To analyze the trace, invoke the graprof application, and pass the path to
 your binary, and the path to the trace file as arguments, for example:

  graprof -t gp.out -FCMg -o your_app.profile your_app
 
 The -FCMg switch tells graprof to print the full analysis result to the 
 specified file. 
 
 Alternatively, you can merge these two steps together and run:

  graprof -FCMg -o your_app.profile [--] your_app [OPTIONS ...]

 This will by default persist the trace information for reuse in a file called
 graprof.out, which can be changed by setting GRAPROF_OUT in the environment.

 For more information on the command line parameters of graprof, run:

  graprof --help

 More information on edge cases, limitations and implementation details can
 be found below this point, and on <http://graprof.grapentin.org>. I suggest
 that you read these, before sending angry mail to me because something broke.


3. Tracing
----------

 The tracing library of graprof is called libgraprof. Its purpose is the 
 collection of events during the runtime of the profilee, and writing these
 information out to file specified by the environment variable GRAPROF_OUT on
 program termination. The information gathered includes function entry and 
 exit events, and memory allocation, reallocation and deallocation events.

 The tracing is only enabled, if the file specified in GRAPROF_OUT is writable
 by the profilee. Otherwise, nothing is traced.


3.1. Function Instrumentation
-----------------------------

 Tracing function entry and exit events with graprof requires the profilee to
 be compiled by gcc with --finstrument-functions enabled. This switch causes
 the resulting binary to have instrumented function calls, that are preceded 
 with a call to __cyg_profile_func_enter, and followed by a call to the analog
 __cyg_profile_func_exit, both of which are defined in libgraprof.
 these functions recieve the address of the called function, and the address
 of the calling function.
 
 __cyg_profile_func_enter 
  This function appends a function entry event to the trace. An entry event 
  is identified by the character `e' and contains the address of the called 
  function, the address of the calling function, and the current timestamp.

 __cyg_profile_func_exit
  This function appends a function exit event to the trace. An exit event is
  identified by the character `x' and contains only the current timestamp.
  This is sufficient, because graprof assumes that only the function o top of
  the call stack may be exited, and this has to be the one last entered.

 There may be functions, that a developer feels should not be instrumented,
 for whatever reason. For this, it is possible to add the following attribute
 to the function definition:

  __attribute__((no_instrument_function))

 Alternatively, if the code is not to be altered in any way, there are command
 line options for gcc that exclude functions or entire files from the #
 instrumentation, for example:

  --finstrument-functions-exclude-file-list=file,file,...
  --finstrument-functions-exclude-function-list=sym,sym,...

 For more information on this, see `man gcc'.


3.2. Allocation Hooks
---------------------

 libgraprof provides two different methods to intercept and log memory 
 allocation methods. The default method works by overriding the weak symbols
 of the allocation functions `malloc', `calloc', `realloc' and `free' with
 its own methods. These methods call their original couterparts, ivestigate 
 the results, and append their respective events to the trace file. The second
 method uses glibc's `malloc_hook' functionality, which has been deprecated.
 Both approaches depend directly on a recent glibc version, and the first one
 has been chosen over the second because of `malloc_hook's deprecation and the
 poor behaviour in multithreaded environments. However, as overriding weak 
 symbols in glibc tends to break in statically linked applications, the
 `malloc_hook' approach has been included as a backup.

 malloc / calloc
  These functions append a memory allocation event to the trace. An allocation
  event is identified by the character `+' and contains the size of the 
  allocation, the address of the caller, the result of __libc_malloc and a
  timestamp. 
 
 realloc
  This function appends a memory reallocation event to the trace. A 
  reallocation event is identified by the character `*' and contains the `ptr'
  argument passed to realloc, the size of the reallocation, the address of the
  caller, the result of __libc_realloc and a timestamp.

 free
  This function appends a memory deallocation event to the trace. A
  deallocation event is identified by the character `-' and contains the `ptr'
  argument passed to free, the caller address and a timestamp.

 This results in all allocation related events being enveloped in entry and 
 exit events, and hence the information of the last instrumented calling 
 function. By comparison with the caller address gathered by calling
 __builtin_return_address(0), which is a builtin of gcc that yields the return
 address of the current function, the analysis module of graprof is able to 
 decide if an allocation method is called directly by the application, or a 
 non-instrumented third party library function. In that case, the analysis 
 tool can output the last known instrumented function below which the 
 allocation was made, which should be more useful to a developer than an
 obscure address.


3.3. Measuring Time
-------------------

 Measuring time is difficult in profiling, because all times are inaccurate,
 subject to statistical error, and even worse, the profiling itself has a
 time footprint. For graprof, we decided against the usual sampling approach
 and instead measured real time. This decision is subject to change, depending
 on further studies on the reliability of either method.

 The current implementation of libgraprof uses clock_gettime called with the
 clock_id CLOCK_MONOTONIC_RAW to get approximate timing values with nanosecond
 precision.


3.4. Trace Files
----------------

 During the execution of the profilee, libgraprof builds up an in memory buffer
 of all relevant events. When the profilee returns from main, or calls exit, 
 libgraprof writes this buffer directly to a file, preceded by an unsigned long
 long containing the buffers size. This file is not in human-readable format,
 but should be processed by the graprof analysis tool. 
 Additionally, the tracefiles are not necessarily portable. Size and endianness
 of the contained data fields may be different on two systems, resuling in the 
 traces written on System A to not be readable on System B. If you want to 
 share profiling information, share the output of the analysis and not the 
 trace files.


4. Analysis
-----------

 Analyzing traces is the most complex part of graprof. To invoke the analyzer,
 call `graprof <your binary> <your tracefile> [OPTIONS ...]', for example:

  graprof a.out gp.out -FCMg

 This call will invoke the analysis tool on the trace file gp.out, which was 
 generated by the binary a.out, and generate the flat profile, call graph and
 memory profile, and not open the tracing gui. 

 Graprofs analyis is separated into four parts, each of which can be activated
 separately on the command line, that are designed to tackle different problems
 in profiling an application.


4.1. Flat Profile
-----------------

 The flat profile is a very coarse-grained representation of the profilees 
 runtime behaviour, not unlike the one provided by gprof. All instrumented 
 functions are listed, in order of the total runtime spent in each one. 

 Each line contains function related information that has been aggregated from
 the trace file. Detailed information on the different columns is contained in
 the footer of this listing.


4.2. Call Graph
---------------

 The call graph is a more fine-grained representation of the profilees runtime
 behaviour, again not unlike the one provided by gprof, that in addition to
 listing all instrumented functions, also lists their callers and callees, 
 putting them in a context in the program. 

 This listing contains one block per function, which are ordered by the id of
 the function. Each block consists of three parts. Firstly, the caller 
 information, secondly the function information, and thirdly the callee 
 information. Each of these sections has slightly different semantics for the
 different columns of the listing, all of which are described in detail in the
 footer of the listing.


4.3. Memory Profile
-------------------

 The memory profile aims to give an overview of the memory footprint of the
 profilee, as well as any notable failures of allocation or deallocation, not
 unlike the memory profile provided by mtrace.

 The listing starts with the total number of bytes allocated and freed, and the
 maximum number of bytes allocated at any given time of the profilees runtime,
 followed by the number of calls to malloc realloc and free. 
 
 The memory profile counts calls to calloc as calls to malloc.

 Following these general aggregated values are notable allocation failures, 
 reallocation failures, free failures, and a list of unfreed blocks.


4.4. Tracing GUI
----------------

 As of graprof version 0.5, this is still a planned feature. The main goal of
 graprof is to provide an intuitive, efficient tracing gui that is backed with
 the profiling information gathered by the analysis tool. With this gui, 
 developers should be enabled to instantly view the call behaviour of their 
 applications, as well as hot-spots in the code where allocation or runtime
 related problems reside.
 The finishing of this feature will mark the 1.0 release of graprof.


4.5. Debug Symbols
------------------

 The events in the trace file usually contain one or more addresses, that
 represent locations in the profiled application. These addresses carry little
 semantical value, so they need to be translated to human-readable function 
 names, and if possible also file names and lines, so that developers can 
 instantly see what function an event is about, and where this function is
 defined. The information required to translate addresses to function names is
 contained in the binary of the profilee, if it was compiled with debug symbols
 enabled. This is the reason why graprof requires the profilee to be compiled
 with the `-g' switch. 

 To extract the debug information from the binary, graprof uses libbfd, which
 invokes a lot of black magic on the binary that shall not be discussed in 
 detail in this manual. If you are interested in the details, have a look at
 the manual of libbfd, or read the documentation of addr2line that uses these
 same features excessively.

 If you have no debug symbols available in the profilees binary, graprof will
 still work, but the profile will be less useful.


5. Limitations
--------------

 Graprof is still mainly untested. There is only so much testing a single 
 person can do, so this is mainly where graprof needs support from YOU!
 If you find any suspicious, reproducable behaviour, please file a bug on
 <http://github.com/oaken-source/graprof/issues>.
 If you want to send suggestions, regarding graprof itself, the website, or
 just want to chat with me a little, you can also mail <graprof@grapentin.org>
 and I will answer as soon as I can.

 Apart from the lack of testing, graprof also has a couple of conceptual
 limitations, all of which are decribed in detail in the following chapters.

5.1. Runtime Error
------------------

 All times in graprof are taken directly from the hardware clock. This 
 eliminates the statistical error created by sampling times, for example with
 the `profil' system call, but instead introduces a conceptional error by 
 overestimating times. The instrumented function layout can be understood as 
 follows:

  <enter>
   <setup stackframe>
   <call __cyg_profile_func_enter> 
    <actual function body>
   <call __cyg_profile_func_exit>
   <tear down stackframe>
  <return>

 The entry and exit times of the function are measured somewhere in the entry
 and exit calls, wich nanosecond precision. This introduces two problems. 
 Firstly, the function takes longer to execute than it would without the 
 instrumentation, as the calls to enter and exit take time. Secondly, the 
 measurements for entry and exit times are taken in the enter and exit
 routines, so that the time spent for setting up and tearing down the stack 
 frame, as well as a necessary part of the __cyg_profile routines is accounted
 to the runtime of the parent function. This results in the time accounted to
 functions that call lots and lots of children and do nothing else to be highly
 overestimated, hence the resulting times should be taken with a grain of salt.

 Additionally, graprof is as of yet not good in handling the time aggregation
 of recursive function calls, meaning that the self and children time values 
 for recursive methods might be unintuitive, or just plain wrong.


5.2. Memory Footprint
---------------------

 As stated before, during the runtime of the profilee all tracing information
 is stored in an in-memory buffer that unfortunately grows very quickly, and
 thus increases the memory footprint of the profilee. Because of that, it is
 not recommended to profile RAM-bound applications with graprof, unless you 
 know what you are doing and realize that profiling with graprof decreases your
 upper memory limit for your application.

 To clarify this, here are a few numbers. A function entry and exit pair has 
 a memory footprint of 34 bytes on an amd64 linux system. This means with a 
 million function calls and upwards, the memory footprint of the profiling 
 reaches up into a couple of megabytes, and a million function calls is 
 gathered surprisingly fast. Memory events take even more space, but usually
 the call numbers of these are orders of magnitude smaller.

 You will probably run into memory problems when profiling applications built
 on extensive GUI frameworks, because these usually perform loads of 
 allocations and deallocations, all of which are recorded by the tracer.


5.3. Profilee Termination
-------------------------

 The in-memory buffer generated during profilee execution is written to disk
 upon clean profilee termination, clean termination meaning either return from
 main, or a call to exit. This means that if the profilee is terminated any 
 other way, the trace data is very likely lost.

 The reason for this is, that the method to write the trace is implemented as
 a function with __attribute__((destructor)), which is not executed when the
 program terminates for example on SIGKILL.

 This probably makes it difficult to profile deamons with graprof, unless you
 add a hook to explicitly exit cleanly.


5.4. Multithreaded Applications
-------------------------------

 I don't dare to think about this. Try it if you're feeling lucky, but expect 
 things to break.


6. Tricks and Hints
-------------------

 Using graprof is relatively straightforward. However, there are a couple of 
 tricks to try if things go downhill. Some of these tricks of the trade  are
 listed in this section.


6.1. Profiling Libraries
---------------------

 The process of profiling a library is a little different than profiling an
 application. The library should be compiled with 
 
  -finstrument-functions -g

 appended to CFLAGS, and

  -lgraprof

 appended to LDFLAGS. This is very similar to profiling an application. 

 Now, depending on if the application should be profiled as well, the same
 additions could or could not be made to CFLAGS and LDFLAGS of the application.

 The main problem is, that the analysis tool will not find debug symbols for
 addresses of instrumented functions gathered from a shared library. To remedy
 this, the profilee library should be linked statically to the host application
 so that the debug symbols are included in the resulting binary.

 For this to work, make sure that your build system installs a statically 
 linkable version of your instrumented library, and add the following to the 
 LDFLAGS of the host application:

  -Wl,-Bstatic -lyour-library -Wl,-Bdynamic

 This instructs the linker to statically link your library to the host 
 application, while leaving the rest of the linkage untouched. To profile, 
 you have to invoke your host application with the environment variable 
 GRAPROF_OUT set to your trace file, and the rest of the process works just 
 like normal.

About

a profiling and trace analysis tool - see the website for downloads and additional information

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published