Skip to content

dkogan/FindGlobals

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Overview

This is a proof-of-concept program that is able to find all global state in a program by parsing its debug information (which is assumed to be available).

The global state I’m looking for is all persistent data:

  • globals variables (static or otherwise)
  • static local variables in functions

I’m only interested in data that can change, so I look only at the writeable segment. This program is beta-quality, but works for me, and is very useful.

Example

tst1.c:

// ....
struct S1 v1[10];

volatile const char volatile_const_char;
const volatile char const_volatile_char;
const char          const_char = 1;

void print_tst1(void)
{
    showvar(v1);
    showvar(volatile_const_char);
    showvar(const_volatile_char);
    showvar_readonly(const_char);
}

tst2.c:

// ....
int v2 = 5;

const char*       const_char_pointer       = NULL;
const char const* const_char_const_pointer = NULL;
char const*       char_const_pointer       = NULL;

const        char  const_string_array[]        = "456";
static const char  static_const_string_array[] = "abc";

struct S0 s0;
int       x0[0];
int       x1[30];



void print_tst2(void)
{
    static int    static_int_5[5];
    static double static_double;
    double dynamic_d;

    showvar(static_int_5);
    showvar(static_double);
    showvar(v2);
    showvar(const_char_pointer);
    showvar_readonly(const_char_const_pointer);
    showvar_readonly(char_const_pointer);
    showvar_readonly(const_string_array);
    showvar_readonly(static_const_string_array);
    showvar(s0);
    showvar(x0);
    showvar(x1);
}

show_globals_from_this_process.c:

// ...
#include "getglobals.h"

int main(int argc, char* argv[])
{
    printf("=================== Program sees: ===================\n");

    print_tst1();
    print_tst2();

    printf("=================== DWARF sees: ===================\n");

    get_addrs_from_this_process_from_DSO_with_function((void (*)())&print_tst1, "tst");
    return 0;
}

Results:

$ ./show_globals_from_this_process 2>/dev/null

=================== Program sees: ===================
v1 at 0x55756bc8e240, size 80
volatile_const_char at 0x55756bc8e220, size 1
const_volatile_char at 0x55756bc8e290, size 1
readonly const_char at 0x55756ba8cc65, size 1
static_int_5 at 0x55756bc8e040, size 20
static_double at 0x55756bc8e038, size 8
v2 at 0x55756bc8e010, size 4
const_char_pointer at 0x55756bc8e030, size 8
readonly const_char_pointer_const at 0x55756ba8cbd0, size 8
readonly char_pointer_const at 0x55756ba8cbc8, size 8
readonly const_string_array at 0x55756ba8cbc4, size 4
readonly static_const_string_array at 0x55756ba8cbc0, size 4
s0 at 0x55756bc8e120, size 100
x0 at 0x55756bc8e184, size 0
x1 at 0x55756bc8e1a0, size 120
=================== DWARF sees: ===================
v2 at 0x55756bc8e010, size 4
const_char_pointer at 0x55756bc8e030, size 8
readonly const_char_pointer_const at 0x55756ba8cbd0, size 8
readonly char_pointer_const at 0x55756ba8cbc8, size 8
readonly const_string_array at 0x55756ba8cbc4, size 4
readonly static_const_string_array at 0x55756ba8cbc0, size 4
s0 at 0x55756bc8e120, size 100
x1 at 0x55756bc8e1a0, size 120
static_int_5 at 0x55756bc8e040, size 20
static_double at 0x55756bc8e038, size 8
v1 at 0x55756bc8e240, size 80
volatile_const_char at 0x55756bc8e220, size 1
const_volatile_char at 0x55756bc8e290, size 1
readonly const_char at 0x55756ba8cc65, size 1


dima@shorty:$ ./show_globals_from_this_process 2>/dev/null | sort | uniq -c | sort

      1 =================== DWARF sees: ===================
      1 =================== Program sees: ===================
      1 x0 at 0x55be5ba3d184, size 0
      2 const_char_pointer at 0x55be5ba3d030, size 8
      2 const_volatile_char at 0x55be5ba3d290, size 1
      2 readonly char_pointer_const at 0x55be5b83bbc8, size 8
      2 readonly const_char at 0x55be5b83bc65, size 1
      2 readonly const_char_pointer_const at 0x55be5b83bbd0, size 8
      2 readonly const_string_array at 0x55be5b83bbc4, size 4
      2 readonly static_const_string_array at 0x55be5b83bbc0, size 4
      2 s0 at 0x55be5ba3d120, size 100
      2 static_double at 0x55be5ba3d038, size 8
      2 static_int_5 at 0x55be5ba3d040, size 20
      2 v1 at 0x55be5ba3d240, size 80
      2 v2 at 0x55be5ba3d010, size 4
      2 volatile_const_char at 0x55be5ba3d220, size 1
      2 x1 at 0x55be5ba3d1a0, size 120

In the example above we see the test program report the locations and addreses of the data in the test DSO. Then the instrumentation reports the addresses and data that it sees. These should be identical: the instrumentation should figure out the correct addresses. We then sort and count duplicate lines in the output. If the two sets of reports were identical, we should see all lines appear 2 times. We see this with 2 trivial exceptions:

  • The lines containing ======= are delimiters
  • x0 has size 0, so the instrumentation doesn’t even bother to report it

It is also possible to query non-loaded ELF files by running a different utility, and passing the query DSO on the commandline:

dima@shorty:$ ./show_globals_from_elf_file tst.so 2>/dev/null

v2 at 0x212028, size 4
const_char_pointer at 0x212050, size 8
readonly const_char_pointer_const at 0x10cc8, size 8
readonly char_pointer_const at 0x10cc0, size 8
readonly const_string_array at 0x10cbc, size 4
readonly static_const_string_array at 0x10cb8, size 4
s0 at 0x212080, size 100
x1 at 0x212100, size 120
static_int_5 at 0x212060, size 20
static_double at 0x212058, size 8
v1 at 0x2121a0, size 80
volatile_const_char at 0x212180, size 1
const_volatile_char at 0x2121f0, size 1
readonly const_char at 0x10d60, size 1

This is the same data, but since this DSO wasn’t loaded, the addresses all have offsets.

Finally, the variable ranges can be visualized with something like this:

dima@shorty:$ ./show_globals_from_elf_file tst.so 2>/dev/null | \
               awk -n '!/readonly/ && /, size/ \
                 { addr = and($3,0xFFFFFFFF); \
                   print addr,"range", 0,addr, addr+$5; \
                   print addr,"labels",1,$1 }' | \
               feedgnuplot --domain --dataid --rangesize range 3 --rangesize labels 2 \
                           --style range 'with xerrorbars' \
                           --style labels 'with labels' \
                           --ymin -2 --ymax 3 \
                           --set 'xtics format "0x%08x"'

Notes

  • The example in this tree works with both static linking (show_globals_from_this_process) and dynamic linking (show_globals_from_this_process_viaso).
  • This is proof-of-concept software. It’s an excellent starting point if you need this functionality, but do thoroughly test it for your use case.
  • This was written with C code in mind, and tested to work with Fortran as well. I can imagine that C++ can produce persistent state in more ways. Again, test it thoroughly
  • The libraries I’m using to parse the DWARF are woefully underdocumented, and I’m probably not doing everything 100% correctly. At the risk of repeating myself: test it thoroughly.
  • For ELF objects linked into the executable normally (whether statically or dynamically) this works. If we’re doing something funny like loading libraries ourselves with libdl, then it probably works too, but I’d test it before assuming.

This effort uncovered a bug in gcc, and one in elfutils.

gcc bug

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78100

It turns out that gcc erroneously omits the sizing information if you have an array object pre-declared as an extern of unknown size (as was done in my specific library). So if you have this source:

#include <stdio.h>

extern int s[];
int s[] = { 1,2,3 };

int main(void)
{
    printf("%zd\n", sizeof(s));
    return 0;
}

then the object produced by gcc <= 6.2 doesn’t know that s has 3 integers.

elfutils bug

Apparently elfutils was misreporting the sizes of multi-dimensional arrays. Reported and fixed here:

https://sourceware.org/bugzilla/show_bug.cgi?id=22546

Copyright and License

Copyright 2016 Dima Kogan 2017-2018 California Institute of Technology

Released under an MIT-style license:

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so.

About

Finds all global state in a program by looking at the DWARF data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published