Skip to content
crossbowerbt edited this page Oct 10, 2011 · 27 revisions

For the good and lazy programmers, this section documents the library following the approach "a function - an example".

For every features exposed by the library you will find a brief explanation and an example of how to use it... Let's start ;)

read_string(address, count)

This function try to read from the process memory an ascii string, no longer than count bytes. So you can provide a large count value since the function recognizes the NULL terminating character.

Example:

rax_address = gdb.parse_and_eval("$rax")
string = gdb_utils.read_string(rax_address, 1024)
print string

In the example we first read the address of the string pointed by RAX register, and then read the actual string passing the numeric address to the function. The result can be safely printed.

execute_output(command)

This is perhaps the most important function of the library, on which the majority of the other functions are built.

Its behavior is very similar to gdb.execute(), since both functions allow to execute a GDB command, but execute_output also returns the output of the command.

This is extremely important because it permits to exploit many GDB features, which are not exported in the standard gdb python library.

Example:

output = gdb_utils.execute_output('info registers')
print output

The result will be:

rax            0x610	1552
rbx            0x8000	32768
rcx            0x60f020	6352928
rdx            0x8000	32768
rsi            0x610000	6356992
rdi            0x10000	0
...

You can use this function to examine process status, control its execution or set GDB options: the possibilities are endless...

execute_external(command), execute_external_output(command)

These functions are just utilities, since they do not use GDB. They simply execute an external shell command with the possibility to capture its output.

They can be useful if you want to call an external program to analyze system status, or other useful things that help the debugging activity.

Example:

execute_external('kill -9 <pid>')

output = execute_external('free')
print output

For the second call the result will be similar to:

             total       used       free     shared    buffers     cached
Mem:       4062352     819340    3243012          0      36244     336448
-/+ buffers/cache:     446648    3615704
Swap:            0          0          0

search_functions(regex='')

This function search program functions and return their names and addresses. It's possible to specify a regular expression to exclude unwanted results.

The return value is a python dictionary, where the key is the name and the item is the address of the function.

Example:

functions = gdb_utils.search_functions('@plt$')
print functions

The output (a bit reformatted...):

{
  'open@plt': 4200224,
  'fwrite@plt': 4200112,
  'fclose@plt': 4200080,
  'mbrtowc@plt': 4199328,
  '__cxa_atexit@plt': 4199616,
  'malloc@plt': 4199536,
  'realloc@plt': 4200128,
  'strlen@plt': 4199712,
  ...
}

In this case we only searched for functions contained in the Procedure Linkage Table, but the regular expression is totally arbitrary.

search_processes(regex='')

This functions returns a list of the current running processes, optionally filtered by the provided regular expression.

It's based on the external ps command, whose output is parsed into a (little complex) data structure. The data structure is essentially a list of dictionaries: every dictionary contains informations about a process.

The available informations are (snippet of code from the library):

# add process info to the list
processes.append({
    'user': field[0],
    'pid': int(field[1]),
    'percentage_cpu': eval(field[2]),
    'percentage_mem': eval(field[3]),
    'vsz': int(field[4]),
    'rss': int(field[5]),
    'tty': field[6],
    'stat': field[7],
    'start': field[8],
    'time': field[9],
    'command': field[10],
    'args': field[11:] if len(field) > 11 else ''
})

Example:

processes = gdb_utils.search_processes('^g')
print processes

The output (slightly reformatted...):

[
  {
   'tty': '?',
   'pid': 3453,
   'vsz': 280220,
   'args': '',
   'percentage_mem': 0.29999999999999999,
   'stat': 'Ssl',
   'start': '08:37',
   'command': 'gnome-settings-daemon',
   'user': 'geek',
   'time': '0:00',
   'percentage_cpu': 0.0,
   'rss': 13636
  },
  {
   'tty': '?',
   'pid': 3473,
   'vsz': 168484,
   'args': '',
   'percentage_mem': 0.0,
   'stat': 'Ss',
   'start': '08:37',
   'command': 'gnome-screensaver',
   'user': 'geek',
   'time': '0:00',
   'percentage_cpu': 0.0,
   'rss': 2816
  },
  ...
]

The regular expression is applied only to the process command field.

parse_disassembled_output(output, regex='')

This function is an internal utility, and but be used only if you are writing an assembly parsing function.

gdb_utils already contains several functions to get the assembly code of programs, they will be discussed below.

disassemble functions

Wow, a lot of functions... We will treat them all in this section since they are similar.

All these functions are based on parse_disassembled_output, and all serve to disassemble parts of the program memory and obtain the corresponding assembly instructions.

All the functions take a regular expression to get only the instructions that interest us.

Let's start now to deal with them one by one...

disassemble_function(func_name, regex='')

This function disassembles a function. You can pass as parameter func_name the name of the function, or a memory address contained in the function, taking care, in the second case, to prefix it with the character '*'.

Example:

instructions = gdb_utils.disassemble_function('main')
instructions = gdb_utils.disassemble_function('*0x40100a', 'mov.+%eax')

disassemble_range(start, end, regex='')

This function disassembles a memory range. The parameter start is the start address, end is the end memory address, and they are numeric.

Example:

instructions = gdb_utils.disassemble_range(4010090, 4010100)

disassemble_count(start, count, regex='')

This function is similar to disassemble_range since it disassembles a memory range, but specifying, instead of the end address, the total number of instructions to disassemble.

Example:

instructions = gdb_utils.disassemble_count(4010090, 5)

disassemble_current_instruction(regex='')

This function disassembles the current instruction pointed by the program counter register.

Example:

curr_inst = gdb_utils.disassemble_current_instruction()

disassemble_current_instructions(count, regex='')

This function is similar to disassemble_current_instruction, but, instead of returning a single instruction, disassembles count instructions starting from the instruction pointed by the program counter register.

Example:

curr_insts = gdb_utils.disassemble_current_instructions(4)

output of the disassemble functions

All the functions that disassemble memory regions, return to a similar output.

The returned data structure is a dictionary, where the key is the address of the instruction, and the item is its mnemonic code.

Example:

instructions = gdb_utils.disassemble_function('read')
print instructions

The output (slightly reformatted...):

{
 4199520: 'jmpq *0x20bdfa(%rip) # 0x60d260 <read@got.plt>',
 4199531: 'jmpq 0x401370',
 4199526: 'pushq $0xe'
}

You may need to sort the keys of the dictionary is you want to access the instruction in order:

instructions = gdb_utils.disassemble_function('read')
keys = instructions.keys()
keys.sort()
for key in keys:
    print instructions[key]

The output:

jmpq *0x20bdfa(%rip) # 0x60d260 <read@got.plt>
pushq $0xe
jmpq 0x401370

process_mappings(regex='')

This function returns the currently mapped memory regions of the debugged process, optionally filtered by a regular expression matched against the name of the memory region.

The output is similar to /proc//maps but actually uses the internal GDB command 'info proc mappings'.

Example:

mappings = gdb_utils.process_mappings()
print mappings   

The output (slightly reformatted...):

[
  {
   'start': 4194304,
   'offset': 0,
   'end': 4247552,
   'objfile': '/bin/cat',
   'size': 53248
  },
  {
   'start': 6344704,
   'offset': 53248,
   'end': 6348800,
   'objfile': '/bin/cat',
   'size': 4096
  },
  {
   'start': 6348800,
   'offset': 0,
   'end': 6483968,
   'objfile': '[heap]',
   'size': 135168
  },
  ...
]

assemble_instructions(instructions)

This function assembles the given assembly listing, and return a buffer containing the assembled machine code.

It's based on the external tool gcc that is present in almost every *nix system on earth. This allow the assembly code to be written in a very flexible way.

Example:

asm_code = '''
    movq $0x100,%rax
    push %rax
  label_1:
    pop  %rbx
    jmp label_1
'''

payload = gdb_utils.assemble_instructions(asm_code)

The assembled payload:

48 c7 c0 00 01 00 00 50 5b eb fd

The assembly code must be in AT&T style.

normalized_argv()

This is just an utility to normalize sys.argv. You should use it when launching a script with gdb -P as interpreter.

Usage:

sys.argv = gdb_utils.normalized_argv()

If you want to see what the problem is see the "formal" documentation.