Skip to content

BinaryHardening/cfgrip

Repository files navigation

cfgrip

PE/ELF x86/x64 CFG extractor. Takes a binary in, disassembles it, resolves every jump and call (GOT, jump tables, register tracing), and exports the full control flow graph as structured JSON.

why

You need to know exactly where every branch goes. Not for reading — for patching. Feed the JSON into Zydis or AsmJit, locate the exact instruction you need to hook or modify, and write back. Anti-cheat teams use it to map out game binaries. RE people use it to lift code into their own analysis pipelines. Software analysts trace execution paths without running the binary.

cfgrip gives you the map. What you do with it is up to you.

what it extracts

For every binary cfgrip processes, it produces:

  • Function list with addresses, optional end address (from .pdata), names (entry point, exports, discovered), and thunk tagging (PLT stubs)
  • Basic blocks per function — instruction sequences terminated by branches, calls, returns, or traps
  • Control flow edges — successors of each block (direct branches, fall-through, indirect targets)
  • Import table — resolved library imports with addresses
  • Indirect targets — GOT-resolved calls, jump table entries, register-traced branches, and unresolved ones (marked as such)
  • Function boundaries — multiple detection passes (prolog patterns, call targets, tail calls, .pdata entries, data-section function pointers) for maximum coverage

With --clean, it additionally:

  • Jump-threads jmpjmp chains into direct edges
  • Prunes dead basic blocks (no incoming edges)
  • Annotates each instruction with stack_offset (RSP delta from function entry)
  • Builds an xrefs section mapping every call/jump target back to its callers

It handles indirect branches by:

  1. Checking the GOT (Global Offset Table) for known imports
  2. Scanning backward for LEA instructions to locate jump tables, then reading table entries
  3. Tracing registers backward through mov/lea chains to find concrete addresses

usage

cfgrip [--subs-only] [--clean] <binary>

Feed it a binary, get <binary>.cfg as output.
Use --subs-only to extract only functions reachable from the entry point through the call graph — skips unreachable exports and prolog candidates.
Use --clean to apply jump-threading, dead-block pruning, stack-offset tracking, and cross-reference analysis.


Example:

cfgrip.exe tests\example1.exe
format: PE
arch: x86-64
entry: 0x1400054bc
imports: 85
  0x140020000 EncodePointer (KERNEL32.dll)
  0x140020008 DecodePointer (KERNEL32.dll)
  0x140020010 EnterCriticalSection (KERNEL32.dll)
  0x140020018 LeaveCriticalSection (KERNEL32.dll)
  0x140020020 InitializeCriticalSectionEx (KERNEL32.dll)
  0x140020028 DeleteCriticalSection (KERNEL32.dll)
  0x140020030 MultiByteToWideChar (KERNEL32.dll)
  0x140020038 WideCharToMultiByte (KERNEL32.dll)
  0x140020040 LCMapStringEx (KERNEL32.dll)
  0x140020048 GetStringTypeW (KERNEL32.dll)
  0x140020050 GetCPInfo (KERNEL32.dll)
  0x140020058 RtlCaptureContext (KERNEL32.dll)
  0x140020060 RtlLookupFunctionEntry (KERNEL32.dll)
  0x140020068 RtlVirtualUnwind (KERNEL32.dll)
  0x140020070 UnhandledExceptionFilter (KERNEL32.dll)
  0x140020078 SetUnhandledExceptionFilter (KERNEL32.dll)
  0x140020080 GetCurrentProcess (KERNEL32.dll)
  0x140020088 TerminateProcess (KERNEL32.dll)
  0x140020090 IsProcessorFeaturePresent (KERNEL32.dll)
  0x140020098 QueryPerformanceCounter (KERNEL32.dll)
  0x1400200a0 GetCurrentProcessId (KERNEL32.dll)
  0x1400200a8 GetCurrentThreadId (KERNEL32.dll)
  0x1400200b0 GetSystemTimeAsFileTime (KERNEL32.dll)
  0x1400200b8 InitializeSListHead (KERNEL32.dll)
  0x1400200c0 IsDebuggerPresent (KERNEL32.dll)
  0x1400200c8 GetStartupInfoW (KERNEL32.dll)
  0x1400200d0 GetModuleHandleW (KERNEL32.dll)
  0x1400200d8 WriteConsoleW (KERNEL32.dll)
  0x1400200e0 RtlPcToFileHeader (KERNEL32.dll)
  0x1400200e8 RaiseException (KERNEL32.dll)
  0x1400200f0 RtlUnwindEx (KERNEL32.dll)
  0x1400200f8 GetLastError (KERNEL32.dll)
  0x140020100 SetLastError (KERNEL32.dll)
  0x140020108 InitializeCriticalSectionAndSpinCount (KERNEL32.dll)
  0x140020110 TlsAlloc (KERNEL32.dll)
  0x140020118 TlsGetValue (KERNEL32.dll)
  0x140020120 TlsSetValue (KERNEL32.dll)
  0x140020128 TlsFree (KERNEL32.dll)
  0x140020130 FreeLibrary (KERNEL32.dll)
  0x140020138 GetProcAddress (KERNEL32.dll)
  0x140020140 LoadLibraryExW (KERNEL32.dll)
  0x140020148 GetStdHandle (KERNEL32.dll)
  0x140020150 WriteFile (KERNEL32.dll)
  0x140020158 GetModuleFileNameW (KERNEL32.dll)
  0x140020160 ExitProcess (KERNEL32.dll)
  0x140020168 GetModuleHandleExW (KERNEL32.dll)
  0x140020170 GetCommandLineA (KERNEL32.dll)
  0x140020178 GetCommandLineW (KERNEL32.dll)
  0x140020180 HeapAlloc (KERNEL32.dll)
  0x140020188 HeapFree (KERNEL32.dll)
  0x140020190 FlsAlloc (KERNEL32.dll)
  0x140020198 FlsGetValue (KERNEL32.dll)
  0x1400201a0 FlsSetValue (KERNEL32.dll)
  0x1400201a8 FlsFree (KERNEL32.dll)
  0x1400201b0 VirtualProtect (KERNEL32.dll)
  0x1400201b8 CompareStringW (KERNEL32.dll)
  0x1400201c0 LCMapStringW (KERNEL32.dll)
  0x1400201c8 GetLocaleInfoW (KERNEL32.dll)
  0x1400201d0 IsValidLocale (KERNEL32.dll)
  0x1400201d8 GetUserDefaultLCID (KERNEL32.dll)
  0x1400201e0 EnumSystemLocalesW (KERNEL32.dll)
  0x1400201e8 GetFileType (KERNEL32.dll)
  0x1400201f0 CloseHandle (KERNEL32.dll)
  0x1400201f8 FlushFileBuffers (KERNEL32.dll)
  0x140020200 GetConsoleOutputCP (KERNEL32.dll)
  0x140020208 GetConsoleMode (KERNEL32.dll)
  0x140020210 ReadFile (KERNEL32.dll)
  0x140020218 GetFileSizeEx (KERNEL32.dll)
  0x140020220 SetFilePointerEx (KERNEL32.dll)
  0x140020228 ReadConsoleW (KERNEL32.dll)
  0x140020230 HeapReAlloc (KERNEL32.dll)
  0x140020238 FindClose (KERNEL32.dll)
  0x140020240 FindFirstFileExW (KERNEL32.dll)
  0x140020248 FindNextFileW (KERNEL32.dll)
  0x140020250 IsValidCodePage (KERNEL32.dll)
  0x140020258 GetACP (KERNEL32.dll)
  0x140020260 GetOEMCP (KERNEL32.dll)
  0x140020268 GetEnvironmentStringsW (KERNEL32.dll)
  0x140020270 FreeEnvironmentStringsW (KERNEL32.dll)
  0x140020278 SetEnvironmentVariableW (KERNEL32.dll)
  0x140020280 SetStdHandle (KERNEL32.dll)
  0x140020288 GetProcessHeap (KERNEL32.dll)
  0x140020290 HeapSize (KERNEL32.dll)
  0x140020298 CreateFileW (KERNEL32.dll)
  0x1400202a0 RtlUnwind (KERNEL32.dll)
functions: 1975
indirect targets: 3453
cfg written to: tests\example1.exe.cfg

the output format

The .cfg file is structured JSON. Here's what it looks like:

{
  "binary": "tests\\example1.exe",
  "mode": "full",
  "arch": "x86-64",
  "format": "PE",
  "entry_point": "0x1400054bc",
  "imports": [
    {
      "address": "0x140020000",
      "name": "EncodePointer",
      "library": "KERNEL32.dll"
    },
    {
      "address": "0x140020008",
      "name": "DecodePointer",
      "library": "KERNEL32.dll"
    },
    {
      "address": "0x140020010",
      "name": "EnterCriticalSection",
      "library": "KERNEL32.dll"
    },
    {
      "address": "0x140020018",
      "name": "LeaveCriticalSection",
      "library": "KERNEL32.dll"
    },
    {
      "address": "0x140020020",
      "name": "InitializeCriticalSectionEx",
      "library": "KERNEL32.dll"
    },
    ...
    ...
    ...
  "functions": [
    {
      "address": "0x1400054bc",
      "name": "entry",
      "blocks": [
        {
          "address": "0x1400054bc",
          "size": 4,
          "is_prolog": false,
          "is_epilog": false,
          "instructions": [
            {
              "address": "0x1400054bc",
              "size": 4,
              "mnemonic": "sub",
              "operands": "rsp, 0x28"
            },
            {
              "address": "0x1400054c0",
              "size": 5,
              "mnemonic": "call",
              "operands": "0x140005d30"
            },
            {
              "address": "0x1400054c5",
              "size": 4,
              "mnemonic": "add",
              "operands": "rsp, 0x28"
            },
            {
              "address": "0x1400054c9",
              "size": 5,
              "mnemonic": "jmp",
              "operands": "0x140005340"
            }
          ],
          "successors": [
            "0x140005340"
          ]
        },
        {
          "address": "0x140005340",
          "size": 8,
          "is_prolog": false,
          "is_epilog": false,
          "instructions": [
            {
              "address": "0x140005340",
              "size": 5,
              "mnemonic": "mov",
              "operands": "qword ptr [rsp + 8], rbx"
            },
            {
              "address": "0x140005345",
              "size": 5,
              "mnemonic": "mov",
              "operands": "qword ptr [rsp + 0x10], rsi"
            },
            {
              "address": "0x14000534a",
              "size": 1,
              "mnemonic": "push",
              "operands": "rdi"
            },
            {
              "address": "0x14000534b",
              "size": 4,
              "mnemonic": "sub",
              "operands": "rsp, 0x30"
            },
            {
              "address": "0x14000534f",
              "size": 5,
              "mnemonic": "mov",
              "operands": "ecx, 1"
            },
            {
              "address": "0x140005354",
              "size": 5,
              "mnemonic": "call",
              "operands": "0x14000550c"
            },
            {
              "address": "0x140005359",
              "size": 2,
              "mnemonic": "test",
              "operands": "al, al"
            },
            {
              "address": "0x14000535b",
              "size": 6,
              "mnemonic": "je",
              "operands": "0x140005497"
            }
          ],
          "successors": [
            "0x140005497",
            "0x140005361"
          ]
        },
        {
          "address": "0x140005497",
          "size": 15,
          "is_prolog": false,
          "is_epilog": false,
          "instructions": [
            {
              "address": "0x140005497",
              "size": 5,
              "mnemonic": "mov",
              "operands": "ecx, 7"
            },
            {
              "address": "0x14000549c",
              "size": 5,
              "mnemonic": "call",
              "operands": "0x140005e44"
            },
            {
              "address": "0x1400054a1",
              "size": 1,
              "mnemonic": "nop",
              "operands": ""
            },
            {
              "address": "0x1400054a2",
              "size": 5,
              "mnemonic": "mov",
              "operands": "ecx, 7"
            },
            {
              "address": "0x1400054a7",
              "size": 5,
              "mnemonic": "call",
              "operands": "0x140005e44"
            },
            {
              "address": "0x1400054ac",
              "size": 2,
              "mnemonic": "mov",
              "operands": "ecx, ebx"
            },
            {
              "address": "0x1400054ae",
              "size": 5,
              "mnemonic": "call",
              "operands": "0x14000ec14"
            },
            {
              "address": "0x1400054b3",
              "size": 1,
              "mnemonic": "nop",
              "operands": ""
            },
            {
              "address": "0x1400054b4",
              "size": 2,
              "mnemonic": "mov",
              "operands": "ecx, ebx"
            },
            {
              "address": "0x1400054b6",
              "size": 5,
              "mnemonic": "call",
              "operands": "0x14000ebcc"
            },
            {
              "address": "0x1400054bb",
              "size": 1,
              "mnemonic": "nop",
              "operands": ""
            },
            {
              "address": "0x1400054bc",
              "size": 4,
              "mnemonic": "sub",
              "operands": "rsp, 0x28"
            },
            {
              "address": "0x1400054c0",
              "size": 5,
              "mnemonic": "call",
              "operands": "0x140005d30"
            },
            {
              "address": "0x1400054c5",
              "size": 4,
              "mnemonic": "add",
              "operands": "rsp, 0x28"
            },
            {
              "address": "0x1400054c9",
              "size": 5,
              "mnemonic": "jmp",
              "operands": "0x140005340"
            }
          ],
          "successors": [
            "0x140005340"
          ]
        },
        ...
        ...
        ...

Each function now includes optional fields:

  • end_address — precise function end when available (from PE .pdata exception table), otherwise computed as the maximum instruction address across all blocks
  • is_thunktrue for PLT stubs and import thunks (functions that only redirect to another address)

--subs-only output

{
  "binary": "C:\\binaries\\target.exe",
  "mode": "subs-only",
  "arch": "x86-64",
  "format": "PE",
  "entry_point": "0x1400054bc",
  "imports": [ ... ],
  "indirect_targets": [ ... ],
  "functions": [
    {
      "address": "0x1400054bc",
      "name": "entry",
      "blocks": [ ... ]
    },
    ...
  ]
}

The "mode": "subs-only" field tells downstream tools this CFG only contains functions reachable from the entry point. Unreachable exports and prolog candidates are excluded — fewer functions, cleaner analysis surface.

--clean output

{
  "binary": "C:\\binaries\\target.exe",
  "mode": "full+clean",
  "arch": "x86-64",
  "format": "PE",
  "entry_point": "0x1400054bc",
  "imports": [ ... ],
  "indirect_targets": [ ... ],
  "functions": [
    {
      "address": "0x1400054bc",
      "name": "entry",
      "blocks": [
        {
          "address": "0x1400054bc",
          "size": 4,
          "is_prolog": false,
          "is_epilog": false,
          "instructions": [
            {
              "address": "0x1400054bc",
              "size": 4,
              "mnemonic": "sub",
              "operands": "rsp, 0x28",
              "stack_offset": 0
            },
            {
              "address": "0x1400054c0",
              "size": 5,
              "mnemonic": "call",
              "operands": "0x140005d30",
              "stack_offset": -40
            },
            {
              "address": "0x1400054c5",
              "size": 4,
              "mnemonic": "add",
              "operands": "rsp, 0x28",
              "stack_offset": -40
            },
            {
              "address": "0x1400054c9",
              "size": 5,
              "mnemonic": "jmp",
              "operands": "0x140005340",
              "stack_offset": 0
            }
          ],
          "successors": [ "0x140005340" ]
        },
        ...
      ]
    },
    ...
  ],
  "xrefs": [
    {
      "target": "0x140011b00",
      "callers": [
        { "address": "0x14001a30f", "type": "call" },
        { "address": "0x140019ec1", "type": "call" },
        ...
      ]
    },
    {
      "target": "0x140007394",
      "callers": [
        { "address": "0x14001a31a", "type": "call" },
        ...
      ]
    },
    ...
  ]
}

Every instruction in --clean mode includes stack_offset — the RSP delta from function entry at that instruction point. The xrefs section maps each call/jump target back to every instruction that references it.

--subs-only --clean combined output

{
  "binary": "C:\\binaries\\target.exe",
  "mode": "subs-only+clean",
  ...
}

Same structure as --clean, but with "mode": "subs-only+clean" to indicate both filters were applied. Function count is reduced to only entry-point-reachable functions, and remaining functions have stack offsets and xrefs.

function boundary detection

cfgrip discovers functions through multiple detection passes:

Pass What it detects Covers
Prolog scanning push rbp, push r15/r14/r13/r12/rbx/rdi/rsi, sub rsp, >=0x20, enter MSVC x64, GCC, leaf functions, CET (endbr64)
Call targets Every call instruction target is a function start Direct and GOT-resolved indirect calls
Tail calls jmp instructions targeting prolog candidates Optimized tail-call chains
.pdata (PE) Runtime function entries from the exception handler table Precise start/end for every x64 PE function
Data pointers 8-byte values in .rdata/.data pointing into executable code Function pointers, vtables, callbacks

Functions with is_thunk: true are PLT stubs or import thunks — single-block functions that redirect to another address.

building

Requires CMake and a C++17 compiler. Capstone is fetched automatically.

cmake -B build
cmake --build build --config Release
./build/cfgrip <binary>

Or on Windows with Visual Studio:

cmake -B build -S .
cmake --build build --config Release
.\build\Release\cfgrip.exe <binary>

image image

what it supports

Formats PE (32/64-bit) YES
ELF (64-bit) YES
Architectures x86 YES
x86-64 YES
Indirect calls GOT resolution YES
Jump table detection YES
Backward register tracing YES
Function discovery Entry point YES
Exports YES
call targets YES
Prolog scanning (MSVC x64, GCC, CET endbr64) YES
Tail-call detection (jmp → function) YES
PE .pdata (exception handler table) YES
Data-section function pointer scan YES
Thunk detection PLT stubs / import thunks (is_thunk) YES
Function boundaries end_address from .pdata or max instruction YES
Subs-only mode --subs-only flag YES
CFG cleaning --clean (jump-thread, dead-block prune, stack deltas, xrefs) YES

research

Function boundary detection in this tool is based on the approach described in "Function Boundary Detection in Stripped Binaries" (Alves-Foss & Song, 2019), which introduces a multi-heuristic algorithm for locating function starts and ends in stripped x86/x64 binaries.

The paper is available at papers/Function_Boundary_Detection_in_Stripped_Binaries.pdf.

How our implementation maps to the paper's heuristics:

Heuristic Paper description Our implementation
H1–H4 Prolog signatures (push rbp, callee-saved regs, stack sub, enter) isProlog() in disasm/engine.cpp — detects push rbp, push r15..rbx, sub rsp >= 0x20, enter
H5 Call-target seeding Every direct call target is a function start
H6 Jump-to-function (tail call) detection jmp to prolog candidates adds target to function queue
H7 Exception table parsing PE .pdata RUNTIME_FUNCTION entries give precise start/end
H8 Data reference analysis scanDataPointers() walks data sections for code pointers

The paper's key insight is that algorithmic heuristics — without machine learning — can achieve high accuracy on stripped binaries. Our implementation follows this philosophy, using a multi-pass approach where each pass catches functions the others might miss.