Debugging OSv

justinc1 edited this page Jul 10, 2017 · 56 revisions

This document explains various techniques that have proved useful for debugging OSV itself, or applications running on OSV.

Table of Contents

Booting OSv with verbose option enabled

Allow the user to see debug messages which are only supposed to be seen with the verbose option enabled.

./scripts/run.py --verbose

OSV build modes

OSV’s build process offers two modes, “make mode=debug” and “make mode=release”. The “debug” build is a slight misnomer, as all of the debugging techniques described in this document work on both debug and release builds. Perhaps a better description of the debug builds are unoptimized builds: release builds use the compiler’s most sophisticated optimizations, which can cause code to be rearranged, inlined, or even eliminated, which can be confusing at times while debugging (e.g., a stack trace does not show a function that was inlined). debug builds avoid these optimizations, which yields significantly slower code, but makes the generated code closer to the source code, making debugging a bit easier.

The fact that we allow all debugging techniques to also work on release builds was a deliberate design decision - we consider it important that we should be able to debug and profile a running system in the field - and those will always be release builds. Even OSV developers would more often than not run, and debug, release builds, and avoid the slow debug builds. We recommend that you should also default to release builds, and only use a debug build on rare occasions.

Note that the debug and release builds do not differ in their inclusion or exclusion of debugging symbols (i.e., “debug” doesn’t correspond to gcc’s “-g” option). For both our build types, two binaries are produced - loader.elf which contains OSV compiled with full debugging information (line number information, local variable names, etc.), and loader-stripped.elf which is stripped from all this debugging information. The OSV image contains the stripped version, but the non-stripped version is used (as explained below) for debugging.

To choose build mode, pass the “mode=debug” or “mode=release” option to “make” or "scripts/build". Release mode is the default, so “mode=release” can simply be omitted. For example, to clean the debug build, compile it, and finally run it, use:

scripts/build clean mode=debug
scripts/build -j4 mode=debug
scripts/run.py -d           # “-d” runs the debug build

A debug and release builds can coexist at the same time (their results are saved in separate directories - build/debug and build/release).

Debugging OSV with GDB

This section explains how to debug an OSV guest running under KVM on a Linux host, using the familiar gdb debugger.

Getting started with GDB

The script we use to run an OSV guest on a Linux host, scripts/run.py, asks qemu to enable a “gdb stub” on port 1234. Gdb knows how to connect to this stub and use it to inspect the program’s memory, registers, and so on, which allows debugging of the running OSV.

First build OSV (as explained above, either the “release” or “debug” build modes are fine for debugging), and then run it normally.

The OSV executable requests gdb to load the python script scripts/loader.py on startup. This is done via a debug_gdb_scripts section in the executable - see loader.cc for the assembly magic that does it.

The loader.py script tells gdb how to connect to the guest (via port 1234), and adds a bunch of new commands to gdb especially for OSV debugging (we’ll survey these new commands below). Most gdb installations refuse loading arbitrary scripts, for security reasons, and complain that “auto-loading has been declined”. So before running gdb, add the following line to your ~/.gdbinit file:

add-auto-load-safe-path scripts/loader.py

If your application depends on Golang runtime, please enable auto-loading of GDB convenience functions for Go's runtime structures.

add-auto-load-safe-path /usr/local/go/src/runtime/runtime-gdb.py

(this is, of course, something you’ll only need to do once).

To start the debugger, run in the OSV source directory,

gdb build/release/loader.elf

This starts the debugger on the unstripped executable from the release build (replace, of course, the “release” by “debug” if you are running the debug build).

After starting gdb, you’ll want to attach to a running OSV guest with the “connect” command. Connect will stop the running guest, and attach to it.

At this point, gdb only knows about the symbols present in loader.elf, which is the OSV kernel. However, most likely the running application loaded additional shared objects, and we should let gdb know of these additional objects, so it will be able to show symbolic names when encountering functions from these objects. The new gdb command

osv syms

fetches the list of loaded shared-objects from OSV, and loads each of these into OSV. This command prints a bunch of diagnostic and warning messages, which should be ignored.

A simple example

As a simple example, consider a single-vcpu OSV guest, running a Java program (the “sunflow” part of the SPECjvm2008 benchmark), as follows

scripts/run.py -c1 -m2G -e "java.so -jar specjvm.jar -ikv -wt 30 -it 240 -bt 1 sunflow"

Note how without the “-d” option, run.py runs the release build by default. We now want to attach to it with gdb at some random point, and look at the stack trace. The gdb session will look something like this (with messages from gdb edited for brevity):

$ gdb build/release/loader.elf
...
Reading symbols from build/release/loader.elf...done.
(gdb) connect
warning: Unable to find dynamic linker breakpoint function.
GDB will be unable to debug shared library initializers
and track explicitly loaded dynamic code.
0x0000100000a87b72 in ?? ()
(gdb) osv syms
/home/nyh/osv/build/release/java/java.so 0x100000000000L
/home/nyh/osv/build/release/../../external/openjdk.bin/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/jre/lib/amd64/server/libjvm.so 0x100000400000L
/home/nyh/osv/build/release/../../external/openjdk.bin/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/jre/lib/amd64/libverify.so 0x100001200000L
...
(gdb) where
#0  0x0000100000a87b72 in MemNode::Ideal_common(PhaseGVN*, bool) ()
#1  0x0000100000a8f912 in LoadNode::Ideal(PhaseGVN*, bool) ()
#2  0x0000100000b528fe in PhaseIterGVN::transform_old(Node*) ()
#3  0x0000100000b4f85c in PhaseIterGVN::optimize() ()
#4  0x0000100000a72acc in PhaseMacroExpand::expand_macro_nodes() ()
#5  0x000010000079fa0c in Compile::Optimize() ()
#6  0x00001000007a0c22 in Compile::Compile(ciEnv*, C2Compiler*, ciMethod*, int, bool, bool) ()
#7  0x000010000071792a in C2Compiler::compile_method(ciEnv*, ciMethod*, int) ()
#8  0x00001000007a6f5f in CompileBroker::invoke_compiler_on_method(CompileTask*) ()
#9  0x00001000007a7e2d in CompileBroker::compiler_thread_loop() ()
#10 0x0000100000c2cae1 in JavaThread::thread_main_inner() ()
#11 0x0000100000b09572 in java_start(Thread*) ()
#12 0x0000000000379ca6 in operator() (__closure=0xffffc0006e9d0e00)
    at ../../libc/pthread.cc:59
#13 std::_Function_handler::_M_invoke(const std::_Any_data &) (__functor=...)
    at ../../external/gcc.bin/usr/include/c++/4.8.1/functional:2071
#14 0x000000000033108b in main (this=0xffffc0006e642010) at ../../core/sched.cc:548
#15 sched::thread_main_c (t=0xffffc0006e642010) at ../../arch/x64/arch-switch.hh:133
#16 0x000000000031a5de in thread_main () at ../../arch/x64/entry.S:101

Here we caught the program running the JVM’s compiler code - note how “osv syms” helped us see symbol names for JVM functions (frames #0 to #11 above) but line number information is not available for the jvm (the shared object was compiled without it). Frames #12 to #16 are OSV code, which does have line number information.

Multiple virtual CPUs

Gdb sees the OSV guest through Qemu’s eyes. Qemu has one thread per virtual CPU in the guest VM, so this is what we’ll also see in gdb’s native thread commands such as “info threads”. For example, on a 2-vcpu guest we might see:

(gdb) info threads
  Id   Target Id         Frame
  2    Thread 2 (CPU#1 [running]) 0x0000200000287c97 in ?? ()
* 1    Thread 1 (CPU#0 [running]) 0x0000200000274e2a in ?? ()

To reiterate, these “threads” are actually vCPUs, not the threads running inside OSV, of which there are probably much more than two.

The vcpu marked with the asterisk (*) is the current one, and commands such as “where” (to see a call trace) are applied to the code currently running in that vcpu. To switch focus to another vcpu, use the “thread” command:

(gdb) thread 2
[Switching to thread 2 (Thread 2)]
#0  0x0000200000287c97 in ?? ()

(for some mysterious reason, one needs to use the info threads command at least once before being able to switch threads with the thread command).

The fact that our gdb support works with the unit of cpus, not OSV threads, makes it convenient to debug OSV itself, but less convenient to debug application code running inside OSV. Particularly problematic is single stepping support which isn’t currently supported well: The problem is that when you try to look at a single OSV thread (see the next section), a single-step operation (“s” or “n”) makes one step inside the vcpu, not the osv thread, and this step might include switching to a different thread. One way to work around this problem is to compile OSV with preemption disabled (not recommended, but can be a convenient trick in some debugging situations). A better approach would have been to include a gdb stub into OSV itself, to allow gdb to see OSV threads directly and single-step them. We’ve made some progress in implementing such a stub, but it isn’t ready for prime-time yet.

Debugging aborts

When OSV detects a fatal condition or unexpected fault, it calls the abort() function. This prints a message and hangs the system - with the intention that a developer will attach gdb and look at where the abort() was called, how we got there (the stack trace), the values of various variables at the time, and so on. Afterwards, you can exit Qemu using Ctrl-a, x.

When the guest has just one vcpu, looking at the cause of the abort() is straightforward. For example:

(gdb) connect
(gdb) osv syms
(gdb) where
#0  0x000000000033f872 in halt_no_interrupts () at ../../arch/x64/processor.hh:232
#1  osv::halt () at ../../core/power.cc:20
#2  0x0000000000215142 in abort (msg=msg@entry=0x51a3c7 "Aborted\n")
    at ../../runtime.cc:98
#3  0x000000000021515e in abort () at ../../runtime.cc:89
#4  0x000000000032fe17 in sched::cpu::do_idle (this=this@entry=0xffffc0003ffea000)
    at ../../core/sched.cc:225
#5  0x0000000000330f38 in sched::cpu::idle (this=0xffffc0003ffea000)
    at ../../core/sched.cc:251
...

We can see here that the abort (frame #3) was called from sched::cpu::do_idle (sched.cc line 225). We can now, for example, ask gdb to look into the context of frame #4 (with the “fr 4” command), or use any of the usual gdb debugging tools.

When the guest has several vcpus, getting abort()’s stack trace becomes slightly tricker. The problem is that abort() was only called on one vcpu, so we need to switch to that vcpu (with the gdb’s “thread” command) to see abort’s backtrace. On other cpus we won’t see abort(), but rather an NMI handler (abort() sends NMIs to all other cpus to ensure that all CPUs stop). So we basically need to switch to each cpu in sequence, running “where” on each, and stopping when we find the one with the call to abort. For example:

(gdb) connect
(gdb) osv syms
(gdb) where
(gdb) info threads
  Id   Target Id         Frame
  2    Thread 2 (CPU#1 [halted ]) 0x000000000031889a in halt_no_interrupts ()
    at ../../arch/x64/processor.hh:232
* 1    Thread 1 (CPU#0 [halted ]) 0x000000000033f872 in halt_no_interrupts ()
    at ../../arch/x64/processor.hh:232

Note that both cpus are “halted”, but we don’t know yet which was halted by abort() and which was halted by the NMI sent by abort(). Let’s try the first CPU first:

(gdb) where
#0  0x000000000031889a in halt_no_interrupts () at ../../arch/x64/processor.hh:232
#1  nmi (ef=0xffffc0003ffe7008) at ../../arch/x64/exceptions.cc:195
#2  
...

No, this one was halted by an NMI. So let’s try the second CPU :

(gdb) thread 2
[Switching to thread 2 (Thread 2)]
#0  0x000000000031889a in halt_no_interrupts () at ../../arch/x64/processor.hh:232
232        asm volatile ("cli; hlt" : : : "memory");
(gdb) where
#0  0x000000000033f872 in halt_no_interrupts () at ../../arch/x64/processor.hh:232
#1  osv::halt () at ../../core/power.cc:20
#2  0x0000000000215142 in abort (msg=msg@entry=0x51a3c7 "Aborted\n")
    at ../../runtime.cc:98
#3  0x000000000021515e in abort () at ../../runtime.cc:89
#4  0x000000000032fe17 in sched::cpu::do_idle (this=this@entry=0xffffc0003ffea000)
    at ../../core/sched.cc:225
...

So indeed here we can see where abort() was called. Note that the “info threads” command in the above example is necessary, even if you already know that there are two cpus, and have no need to list them. This is because, surprisingly, gdb only learns about the existance of the second thread (vcpu) when the “info threads” command is run, and without it the “thread 2” command wouldn’t work.

The above process of finding the CPU on which abort() was called can be easily automated. In the future we’ll add such a function to loader.py.

Breakpoints

To use breakpoints, use the hbreak gdb command (not break).

To put a breakpoint early in the OSv run, a useful trick is to let OSv run past the breakpoint, and then set the breakpoint and reboot the VM:

(gdb) hbreak function_name
(gdb) monitor system_reset
(gdb) c

OSV threads

As we explained above, gdb sees everything that happens on one vcpu as being one “thread” (it is indeed one thread of Qemu). When we inspect each of these vcpus, we can see which code is running right now on it. But OSV also has many more threads (the OSV threads) that aren’t currently running on any CPU (there are many more threads than CPU).

We want to be able to list the OSV threads, and also to switch to the context of one and inspect what it was last doing. We can’t use the “info threads” or “thread ..” commands, as those deal (as explained above) with cpus. So loader.py gives us two gdb extensions: osv info threads and osv thread.

“osv info threads” gives us the full list of osv threads:

(gdb) osv info threads
   0 (0xa297e88) cpu0 waking     mmu::tlb_flush() at core/mmu.cc:337 vruntime            0
   1 (0xffffc0003fea0000) cpu0 waiting    console::console_poll() at drivers/console.cc:88 vruntime            0
   2 (0xffffc0003fe8c000) cpu0 waiting    ?? at arch/x64/entry.S:101 vruntime     10002203
  timers: 1378759012.856634855
   3 (0xffffc0003fe78000) cpu0 running    osv::halt() at arch/x64/processor.hh:232 vruntime 9223372036854775807
   4 (0xffffc0003fe4f000) cpu1 running    ?? at arch/x64/entry.S:94 vruntime 9223372036854775807
   5 (0xffffc0003fe3b000) cpu1 waiting    ?? at arch/x64/entry.S:101 vruntime     10011081
  timers: 1378759012.815964699
...

For each thread we see:

  1. A numeric thread id
  2. The address of the thread structure (can be cast to sched::thread*).
  3. Which CPU this thread was last running on (or running on now, if it is running now).
  4. The thread’s state. For example “running” is a currently running thread (there can only be one like this for each cpu), “waiting” means the thread is sleeping until woken - e.g., a thread can be in this state while waiting on a mutex, condition variable, or a timer.
  5. Some attempt of showing what the thread was last doing.
  6. A list of timers which may wake this thread.

The above output of “osv info thread” didn’t tell us much about what is thread 2 (for example), what it was doing or what it is waiting for. So we can use the new command “osv thread 2” to inspect thread 2:

(gdb) osv thread 2
(gdb) where
#0  sched::thread::switch_to (this=0x7e8, this@entry=0xffffc00035d0b000)
    at ../../arch/x64/arch-switch.hh:65
#1  0x0000000000330186 in sched::cpu::reschedule_from_interrupt (
    this=0xffffc0003ffea000, preempt=preempt@entry=false) at ../../core/sched.cc:148
#2  0x0000000000330995 in schedule (yield=true, this=)
    at ../../core/sched.cc:97
#3  sched::schedule (yield=yield@entry=true) at ../../core/sched.cc:387
#4  0x000000000033124c in wait (this=0xffffc0003fe8c000) at ../../core/sched.cc:581
#5  do_wait_until
    (mtx=..., pred=...) at ../../include/sched.hh:428
#6  wait_until (pred=...)
    at ../../include/sched.hh:439
#7  sched::cpu::load_balance (this=0xffffc0003ffea000) at ../../core/sched.cc:321

Frame #0-#3 shows that the last thing this thread did was to call schedule(), and eventuallyl switch_to() which switched to running a different thread. This is what we expect to see as the first frames for every thread which isn’t currently running (so the last thing it did must have been to let another thread run). Frame #6 and #7 shows us that this is a load_balance() thread (from sched.cc), and it is waiting on a wait_until() which was called in sched.cc line 321.

Caveat Emptor:

Since gdb only knows about vcpus, the “osv thread” command uses vcpu running this thread, but overwrites its registers (instruction pointer, stack pointer, and frame pointer) to simulate a switch to that non-running thread. After this, the regular “info threads” and “thread” commands no longer show the original information they used to show.

More gdb extensions from loader.py

We’ve already seen above that the python script loader.py is used to add to gdb some OSV-specific commands, starting with the “osv” prefix, that are useful for debugging OSV. We’ve seen the osv syms, osv info threads, and osv thread commands. loader.py adds more commands which help inspect OSV’s internal state:

  • osv trace, osv logtrace, osv trace2file Shows the list of tracepoints currently in the tracepoint buffer. See the “Tracepoints” section below for an explanation how how to set up tracepoints, and how to view their results
  • osv memory This command summarizes the amount of available memory. For example,
   (gdb)  osv memory
   Total Memory: 2147081216 Bytes
   Mmap Memory:  2125824 Bytes (0.10%)
   Free Memory:  1852944384 Bytes (86.30%)

The “Total Memory” is the amount of physical memory the hypervisor gave the virtual machine. “Free Memory” is how much of this is available for new allocations - i.e., the total memory minus the code size, and space taken up by allocations like malloc() and mmap(). “Mmap memory” counts just the amount of memory allocated by mmap (OSV doesn’t do lazy map population, so physical memory is always allocated for an entire mmap() region).

  • osv heap This command shows the ranges of available physical memory. It can be useful if physical memory fragmentation is suspected to be causing problems (large malloc() need to find a large contiguous chunk of physical memory, and large mmap needs 2MB chunks of memory). If all you care about is the total amount of free memory, use “osv memory” instead. An example of osv heap’s output:
   (gdb) osv heap
   0xffffc000086d0000 0x000000002dc38000
   0xffffc0003636f000 0x0000000000004000
   0xffffc0003639d000 0x0000000000005000
   0xffffc000363a4000 0x000000000000e000
   …

Each line of output is a free physical memory chunk. The first column is the beginning of the free chunck (given as a virtual address equal to 0xffffc00000000000 plus the physical address), and the second column is the length of the free chuck (in hexadecimal).

  • osv mmap This command shows the currently mmap’ed ranges. An example of osv mmap’s output:
  (gdb) osv mmap
  ...
  0x0000200000000000 0x0000200000100000 [1024 kB]
  ...

We see a line for each mmap’ed address range. The first and second colums are the start and one-past-end (virtual) address of the mapped range. At the end we see the length of this range.

  • osv info callouts Shows information about “callouts”, a BSD-derived mechanism for scheduling work, which is used heavily in our network stack. An example output:
  • osv info virtio Shows information about virtio devices and the status of their queues. Example output:
(gdb) osv info virtio 
virtio::virtio_net at 0xffffc0003ff0ec00
  queue 0 at 0xffffc0003ff2ba00
    avail g=0xbe09 h=0xbe09 (0)
    used   h=0xbe09 g=0xbd11 (248)
    used notifications: enabled
  queue 1 at 0xffffc0003ff2bc00
    avail g=0xc951 h=0xc951 (0)
    used   h=0xc951 g=0xc8fd (84)
    used notifications: enabled
  queue 2 at 0xffffc0003ff2bd00
    avail g=0x0 h=0x0 (0)
    used   h=0x0 g=0x0 (0)
    used notifications: enabled
virtio::virtio_blk at 0xffffc0003fefd400
  queue 0 at 0xffffc0003fee5100
    avail g=0x15f h=0x15f (0)
    used   h=0x15f g=0x15f (0)
    used notifications: enabled

Here we have a single virtio-net device and a single virtio-blk device. The receive queue of the virtio-net device (queue 1) has 84 used descriptors that have not been processed by the guest yet, indicating that processing is either in progress, or that some bug (for example a deadlock) prevents the driver thread from processing the ring.

  • osv thread apply all This command can be used to get more information about each thread than “osv info threads” supplies. A useful example is “osv thread apply all where”, which shows the backtrace of each OSV thread.
  • osv leak on|off|show This command is used to enable or disable memory leak detection, and to show the information collected so far by the leak detector. See “Leak Detection” section below on how to use memory leak detection.
  • osv pagetable walk <addr> Does a page table walk on the given address.
  • osv runqueue Shows the runqueue on each CPU. The runqueue is the list of threads which are ready to run as soon as the currently-running thread sleeps or is preempted (the currently-running thread is not listed on the runqueue). Each runqueue is listed in order of dynamic priority (vruntime) - the first listed thread would run next if the running thread went to sleep now. The threads listed by “osv runqueue” are the same as those listed as “queued” in the output of “osv info threads”.
  • osv percpu <expr> [fields to descend to] Prints the value of aPERCPU(typ, expr)variable for all CPUs. If you want a specific field from the variable, specify the field after the variable name. For example (gdb) osv percpu memory::percpu_page_buffer Would print thepercpu_page_buffervalue on all CPUs. (gdb) osv percpu memory::percpu_page_buffer nr Would print thenrfield frompercpu_page_buffer` on all CPUs.

Core dumps

We currently only support attaching to a running guest - dumping a core file and debugging from it later isn’t currently supported.

Reading values of thread-local variables

For some unknown reason you cannot read values of thread-local variables in GDB:

gdb$ p preempt_counter 
Cannot find executable file `(...)/osv/build/debug/loader.elf' in dynamic linker's load module list

There is a workaround. Thread-local variables which are defined in the core object are placed relative to the thread's TCB block. To read them we need to find out the address of threads's TCB and the offset.

First, find a function which is reading variable you want to check and disassemble the access site:

gdb$ x/10i preempt_disable
   ...
   0x3429c6 <sched::preempt_disable()+4>:	add    DWORD PTR fs:0xfffffffffffff9c0,0x1

On x86 fs register points to the TCB. The offset is 0xfffffffffffff9c0. We can read TCB address from sched::thread::_tcb field, so lets's find out what current thread is:

gdb$ bt
...
#6  0x0000000000346806 in sched::thread_main_c (t=0xffffc0003fe0e000) 

The pointer to current sched::thread is 0xffffc0003fe0e000. Now we can read variable value, assuming it's of type long:

gdb$ p *(long*)((ulong)((sched::thread*)0xffffc0003fe0e000)->_tcb + 0xfffffffffffff9c0)

Debug printouts, aborts and assertions

Tracepoints

To add a static tracepoint, use the following code:

#include <osv/trace.hh> 
TRACEPOINT(trace_foo, "x=%x y=%d", u64, int);
 ...

void foo(u64 x, int y)
{
   trace_foo(x, y);
   ...
}

Where

  • trace_foo: an internal identifier (normally beings with “trace_”).
  • "foo": a name for the tracepoint as will be visible in the log
  • u64, int: parameters for the tracepoint
  • "x=%x y=%d": format string for the tracepoint; size modifiers unneeded

To enable tracepoints at runtime, use the --trace= switch:

scripts/run.py --trace=sched\*

Use comma to enable multiple tracepoints. Shell-style wildcards allow enabling multiple tracepoints (as in the example).

If you want to gather backtrace on every tracepoint call pass the --trace-backtrace option. You will be able to see where each trace sample was called from.

To view a trace, connect with gdb, and:

(gdb) osv syms
(gdb) set pagination off
(gdb) set logging on
(gdb) osv trace

gdb.txt will contain the trace.

Alternatively, you can save trace to file and use scripts/trace.py to view it:

(gdb) osv trace save trace_file

To list all trace samples:

$ scripts/trace.py list trace_file

For more information checkout Trace analysis using trace.py

Tracepoint control via REST Api

Available trace points and their status (on/off, backtrace) can be listed through the REST API.

  • GET http://<address>:<port>/trace/status[?match=<regex>]
    Returns a list of available trace points. You can use an optional regular expression to match against the trace point id to control the returned set contents.
  • GET http://<address>:<port>/trace/event/{eventid}
    Returns information for a single trace point based on the eventid parameter

Trace point status can also be controlled, during run time, using either regex selectors or for individual ids:

  • POST http://<address>:<port>/trace/status[?match=<regex>]?enabled=<bool>[?backtrace=<bool>]
    Sets status for a set of trace points matching the optional regex (or all, if empty).

  • POST http://<address>:<port>/trace/event/{eventid}?enabled=<bool>[?backtrace=<bool>]
    Sets status for a single trace point based on the eventid parameter.

Both modifying calls return the previous status for the affected trace points.

Downloading trace data via REST

Trace data can be downloaded in the trace dump format via the rest call: GET http://<address>:<port>/trace/buffers

This will collect all available trace point data from the instance, along with various meta information, and return this as a binary blob. The data can be used and visualized with the tracing scripts

Dump format

The trace dump consists of a series of RIFF/IFF-like chunks, the file itself being one (thus embeddable in a binary stream).

Note: All chunks are aligned on eight (8, sizeof(uint64_t)). Otherwise data is aligned on the "natural" alignment. Semi-structs such as strings are aligned on the first member.

Chunks are typed with a 32-bit FOURCC tag, written in native endian, i.e. 'ROCK' on LE becomes the string "KCOR". Size is a somewhat overzealous uint64_t value, thus in effect there is a "reserved" 32-bit value between tag and size.

Any chunk type may appear more than once, and in no particular order, except that actual trace data chunks should appear last.


string = {
  uint16_t len;
  char data[len];
};

dump =  {
  uint32_t tag = ‘OSVT’;
  uint64_t size = ;
  uint32_t format_version;

  trace_dictionary =  {
    uint32_t tag = 'TRCD';
    uint64_t size = ;
    uint32_t backtrace_len; // how long are all backtraces
    // array of trace point definitions
    uint32_t n_types;
    struct {
      uint64_t tag;
      string id;
      string name;
      string prov;
      string format;
      // array of arguments
      uint32_t n_args;
      struct {
        string name;
        char type;
      } [n_args];
    } [n_types];
  } +; // one or more, may repeat

  loaded_moduled =  {
    uint32_t tag = ‘MODS’;
    uint64_t size = ;
    // array of module info
    uint32_t n_modules;
    struct {
      string path;
      uint64_t base;
      // array of loaded segments
      uint32_t n_segments;
      struct {
        string name;
        uint32_t type;
        uint32_t info
        uint64_t flags;
        uint64_t address;
        uint64_t offset;
        uint64_t size;
      } [n_segments];
    } [n_modules];
  } *; // zero or more, may repeat

  symbol_table =  {
    uint32_t tag = 'SYMB';
    uint32_t size = ;
    // array of symbol entries
    uint32_t n_symbols;
    struct {
      string name;
      uint64_t address;
      uint64_t size;
    } [n_symbols];
  } *; // zero or more, may repeat

  trace_data =  {
    uint32_t tag = ‘TRCS’;
    uint64_t size = ;
    
    //
  } +; // 1 or more
};


.

Tracepoint counts

Tracepoint call-graph histograms

Function tracing

To trace all function entries/returns in the program, build with conf-tracing=1 (clean build needed), and enable "function*" tracepoints, with --trace=.

Memory leak detection

Memory allocation tracking can be enabled/disabled with the gdb commands "osv leak on", "osv leak off", but even easier is to add the "--leak" paramter to the loader, to have it start leak detection when entering the payload (not during OSv boot). For example:

scripts/run.py -e "--leak java.so -jar cli.jar"
scripts/run.py -e "--leak tests/tst-leak.so"

Leak detection can be used in either the debug or release build, though note that the release builds' optimizations may somewhat modify the reported call traces.

When the run ends, connect to OSV with the debugger to see the where the remaining allocation come from:

$ gdb build/release/loader.elf       # or build/debug/loader.elf
(gdb) connect
(gdb) osv syms
(gdb) set logging on    # optional, if you want to save the output to a file
(gdb) set height 0      # optional, useful if using "set logging on"
(gdb) osv leak show

Please note that "osv leak show" can take a L-O-N-G time - even a few minutes, especially when running Java, as it allocates myriads of objects that GDB will now have to inspect.

Memory debugging

Set conf-debug_memory=1 in conf/base.mak, and perform a clean build. A use-after-free will result in an immediate segmentation fault.

Note that the detector requires a lot of memory, so you may need to start a larger guest.

Profiling

You can use 'perf kvm' for profiling:

perf kvm --guestvmlinux=build/release/loader.elf top

You can also use a timer-based sampling profiler described here.

Profiling Java applications

This section describes how to profile a JVM inside OSv using jvisualvm, a standard monitoring tool which ships with the OpenJDK since 1.6

First you need to start the JVM in the guest with JMX exposed by passing the following options in the java command line:

-Djava.rmi.server.hostname=192.168.122.89
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=9010
-Dcom.sun.management.jmxremote.local.only=false
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false

This assumes that the VM has IP address 192.168.122.89 and JMX server shall listen on port 9010.

Then start jvisualvm on the host. It ships with the JDK and should be on the PATH.

To connect with the JVM inside OSv you need to first configure a connection:

  • Right click on "Remote" node from the "Applications" tab on the left, choose "Add Remote Host...". A dialog opens. In the "display name" field type some name, eg "osv" and click "OK"
  • Right click on the newly create node representing remote host and choose "Add JMX connection...". A dialog opens. In the "Connection" field enter "192.168.122.89:9010" and click "OK"
  • Right click on the newly created node which will have JMX url label and choose "Open". This will create a new tab in the main panel for this JVM.

To start sampling go to the "Sampler" tab and click on "CPU" button. By default thread stacks are sampled every 100ms and the view is updated every second. You can change that in the settings: to do so tick the "Settings" checkbox in the upper right corner.

At any time you can create a snapshot by clicking on the “Snapshot” button just above the samples table.

ZFS Stats

There is a GDB command that shows ZFS statistics in general (e.g. ZFS tunables, ARC stats) which may be useful for understanding the ZFS behavior.

$ gdb build/release/loader.elf       # or build/debug/loader.elf
(gdb) connect
(gdb) osv zfs
Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.