Feature: add call graph to analysis output (Jedi + CodeQL, provenance-tracked)

**Is your feature request related to a problem? Please describe.**

The Python analyzer currently emits only a symbol table — there's no call graph. Downstream consumers (the python-sdk's path/caller/callee queries, dead-code analysis, attack-surface mapping, retrieval-augmented codebase tooling) all need caller→callee information. The Java analyzer already produces one; until the Python analyzer does too, SDK code can't be language-uniform.

Jedi alone covers a narrow slice of call sites (locally-typed, statically obvious). A lot of edges — dynamic dispatch, framework callbacks, calls into third-party libraries — go unresolved. We need a richer resolver, and a way to record which backend(s) resolved each edge so consumers can decide how much to trust them.

**Describe the solution you'd like**

Add a `call_graph: List[PyCallEdge]` field to `PyApplication`, where each `PyCallEdge` has `source` / `target` (both `PyCallable.signature` strings), `weight`, and `provenance: List[Literal["jedi","codeql","joern"]]`. Nodes are the existing `PyCallable` entries — no separate vertex type. External / RPC / third-party callees surface as ghost nodes downstream via `to_digraph`, so those edges are preserved rather than dropped.

Pipeline per run:

1. Build symbol table (Jedi, as today).
2. If `--codeql` is enabled: auto-download the CLI into `<cache_dir>/codeql/bin/`, install `codeql/python-all` via `codeql pack install` into a per-project qlpack, execute the call-edge query, and augment `PyCallsite.callee_signature` in-place for sites Jedi couldn't resolve.
3. Heuristic constructor fallback for sites neither Jedi nor CodeQL caught (most often classes nested inside functions/methods).
4. Derive Jedi-side edges from the fully-augmented symbol table.
5. Merge with CodeQL edges; provenance unions for edges both backends saw.

An SDK-facing `to_digraph` / `from_digraph` adapter rehydrates the persisted edges into a `networkx.DiGraph` for path/caller/callee queries.

**Describe alternatives you've considered**

- **Symbol-table only, build the call graph in the SDK.** Pushes the resolution problem onto every consumer and forces each to re-implement the Jedi/CodeQL plumbing. Rejected.
- **Promote `PyClass` to a separate call-graph node type so `A()` lands on the class.** Doubles the node taxonomy. Rejected in favor of `<class>.__init__` as the constructor target — keeps nodes uniform with where method `PyCallable`s already live in the symbol table.
- **Drop edges to external/library callees.** Loses RPC / framework / third-party calls, which are exactly the most interesting edges for security/architecture analysis. Rejected; ghost nodes solve this.
- **Use an `nx.DiGraph` in the persisted schema directly.** Not Pydantic-serializable, and would force every consumer onto networkx. Rejected; persist a flat edge list and adapt at the boundaries.

**Additional context**

- Mirrors the Java analyzer's `CallableVertex` + `CallDependency` shape, adapted to Python (signature-keyed, no separate vertex type).
- Implementation is on `feat/add-codeql-call-graph` (committed retroactively for provenance).
- Breaking change: `--analysis-level` is removed. The call graph is built unconditionally; CodeQL participation is controlled by the existing `--codeql / --no-codeql` flag.
- Follow-ups (out of scope here): auto-synthesizing `PyCallable` entries for implicit `__init__`s on Pydantic/dataclass-style classes (currently surfaces as ghost constructor targets); a Joern backend (placeholder in the provenance enum only).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: add call graph to analysis output (Jedi + CodeQL, provenance-tracked) #22

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature: add call graph to analysis output (Jedi + CodeQL, provenance-tracked) #22

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions