Skip to content

[WIP] Feature request: OCaml bindings for SVF (alongside Python) #42

@ExcitedSpider

Description

@ExcitedSpider

Summary

Request for OCaml language bindings to SVF, following the same approach as pysvf. OCaml is the dominant language for static analysis tools (Infer, Frama-C, Astrée) and would benefit from access to SVF's pointer analysis and SVFIR infrastructure.

Motivation

We built an abstract interpreter on top of pysvf (Python prototype) and then translated it to C++ for performance. During the language evaluation, OCaml emerged as the optimal choice for abstract interpretation workloads — but no SVF bindings exist for OCaml.

Why OCaml for static analysis

Python (pysvf) C++ OCaml
Performance Baseline ~8x faster ~3-5x faster
Domain modeling Dynamic typing, isinstance() Class hierarchies, visitor pattern ADTs + pattern matching — purpose-built
Memory GC + boxing overhead Manual move/clone GC + persistent structures (clone is free)
Code volume Baseline ~3x more ~1.2x more
Analysis ecosystem Growing SVF, LLVM Infer, Frama-C, Astrée (dominant)

OCaml's algebraic data types are a natural fit for SVF's type hierarchies. For example, SVFStmt's 15 subclasses become:

type svf_stmt =
  | CopyStmt of copy_stmt
  | BinaryOPStmt of binop_stmt
  | CmpStmt of cmp_stmt
  | PhiStmt of phi_stmt
  | CallPE of call_pe
  | LoadStmt of load_stmt
  | StoreStmt of store_stmt
  | AddrStmt of addr_stmt
  | GepStmt of gep_stmt
  | RetPE of ret_pe
  | ...

This replaces both Python's isinstance() chains and C++'s dyn_cast<>() with exhaustive pattern matching — the compiler catches missing cases.

OCaml Examples: Forward Interval Analysis

To illustrate how analysis code looks in OCaml vs Python/C++, here are canonical forward abstract interpretation examples on the interval domain.

Interval lattice

(** Bounded integer interval [lo, hi] with ±∞ *)
type bound = NegInf | Fin of int | PosInf

type interval = Bot | Itv of bound * bound

let top = Itv (NegInf, PosInf)

(** Lattice operations *)
let join a b = match a, b with
  | Bot, x | x, Bot -> x
  | Itv (l1, h1), Itv (l2, h2) -> Itv (min l1 l2, max h1 h2)

let meet a b = match a, b with
  | Bot, _ | _, Bot -> Bot
  | Itv (l1, h1), Itv (l2, h2) ->
      let l, h = max l1 l2, min h1 h2 in
      if l > h then Bot else Itv (l, h)

let widen a b = match a, b with
  | Bot, x -> x
  | x, Bot -> x
  | Itv (l1, h1), Itv (l2, h2) ->
      Itv ((if l2 < l1 then NegInf else l1),
           (if h2 > h1 then PosInf else h1))

let leq a b = join a b = b

15 lines for the complete lattice. C++ needs separate classes, operator overloads, copy/move constructors for the same semantics.

Abstract state as a persistent map

module VarMap = Map.Make(Int)
type state = interval VarMap.t

(** Lookup with default ⊤ — untracked variables are unconstrained *)
let find_var x st = VarMap.find_opt x st |> Option.value ~default:top

(** "Clone" is free — OCaml maps share structure.
    In C++/Python, clone() deep-copies the entire map. *)

Forward transfer functions

(** Forward abstract semantics for statements.
    Pattern matching replaces isinstance() / dyn_cast<>() chains. *)
let transfer (stmt : svf_stmt) (st : state) : state =
  match stmt with
  | CopyStmt { lhs; rhs } ->
      VarMap.add lhs (find_var rhs st) st

  | BinaryOPStmt { res; op0; op1; opcode } ->
      let a = find_var op0 st and b = find_var op1 st in
      let v = match opcode with
        | Add -> itv_add a b
        | Sub -> itv_sub a b
        | Mul -> itv_mul a b
        | _ -> top
      in
      VarMap.add res v st

  | PhiStmt { res; operands } ->
      let v = List.fold_left
        (fun acc (_, var) -> join acc (find_var var st))
        Bot operands
      in
      VarMap.add res v st

  | LoadStmt _ | StoreStmt _ -> st  (* handled via points-to *)
  | _ -> st

The compiler warns if a statement type is unhandled. The immutable map updates (VarMap.add) return a new map sharing structure with the original — no explicit clone needed.

Fixpoint iteration

(** Chaotic iteration with widening at loop heads *)
let rec fixpoint wto st =
  match wto with
  | [] -> st
  | Vertex node :: rest ->
      let st' = transfer_node node st in
      fixpoint rest st'
  | Cycle (head, body) :: rest ->
      let rec iterate prev =
        let entered = transfer_node head prev in
        let after_body = fixpoint body entered in
        let widened = widen_state prev after_body in
        if leq_state widened prev then prev
        else iterate widened
      in
      fixpoint rest (iterate st)

WTO-guided fixpoint in 12 lines. The recursive structure mirrors the mathematical definition directly.

Proposed Approach

Since pysvf binds directly to SVF's C++ classes via pybind11 (no intermediate C API), the OCaml binding would need:

  1. C shim layer (svf_capi.h): flat C functions with opaque handles wrapping SVF's C++ API. This layer could also benefit future bindings for other languages (Rust, Haskell, etc.)

  2. OCaml ctypes bindings: mechanical mapping of the C shim to OCaml's foreign function interface

  3. Idiomatic OCaml types: convert C++ inheritance hierarchies to OCaml variants at the FFI boundary

The existing pysvf.pyi type stub (1400+ lines) serves as a complete API specification to translate from.

Architecture

OCaml user code
    |
    v
svf_ocaml.ml  (ctypes bindings + variant types)
    |
    v
svf_capi.so   (C shim: opaque handles + flat C functions)
    |
    v
libSvfCore.so + libSvfLLVM.so
    |
    v
LLVM

Scope

The initial binding could focus on the subset most useful for analysis clients:

  • SVFIR: PAG access, SVFStmt hierarchy, SVFVar hierarchy
  • ICFG: nodes, edges, WTO computation
  • Pointer analysis: AndersenWaveDiff, CallGraph, PointsTo
  • Module lifecycle: buildSVFModule, build PAG, release

Abstract domains (intervals, powersets, etc.) would be implemented natively in OCaml — these are more naturally expressed as OCaml types than as FFI wrappers around C++ classes.

This is roughly ~30-40 classes and ~150 methods out of pysvf's full ~136 classes / ~835 methods.

Benefits to the SVF ecosystem

  • Opens SVF to the large OCaml static analysis community
  • The C shim layer would be reusable for Rust, Haskell, and other language bindings
  • OCaml 5's multicore support enables parallel analyses over SVF's IR
  • Could attract contributors from the Infer/Frama-C communities who are already familiar with OCaml-based analysis

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions