Is your feature request related to a problem? Please describe.
The Python analyzer currently emits only a symbol table — there's no call graph. Downstream consumers (the python-sdk's path/caller/callee queries, dead-code analysis, attack-surface mapping, retrieval-augmented codebase tooling) all need caller→callee information. The Java analyzer already produces one; until the Python analyzer does too, SDK code can't be language-uniform.
Jedi alone covers a narrow slice of call sites (locally-typed, statically obvious). A lot of edges — dynamic dispatch, framework callbacks, calls into third-party libraries — go unresolved. We need a richer resolver, and a way to record which backend(s) resolved each edge so consumers can decide how much to trust them.
Describe the solution you'd like
Add a call_graph: List[PyCallEdge] field to PyApplication, where each PyCallEdge has source / target (both PyCallable.signature strings), weight, and provenance: List[Literal["jedi","codeql","joern"]]. Nodes are the existing PyCallable entries — no separate vertex type. External / RPC / third-party callees surface as ghost nodes downstream via to_digraph, so those edges are preserved rather than dropped.
Pipeline per run:
- Build symbol table (Jedi, as today).
- If
--codeql is enabled: auto-download the CLI into <cache_dir>/codeql/bin/, install codeql/python-all via codeql pack install into a per-project qlpack, execute the call-edge query, and augment PyCallsite.callee_signature in-place for sites Jedi couldn't resolve.
- Heuristic constructor fallback for sites neither Jedi nor CodeQL caught (most often classes nested inside functions/methods).
- Derive Jedi-side edges from the fully-augmented symbol table.
- Merge with CodeQL edges; provenance unions for edges both backends saw.
An SDK-facing to_digraph / from_digraph adapter rehydrates the persisted edges into a networkx.DiGraph for path/caller/callee queries.
Describe alternatives you've considered
- Symbol-table only, build the call graph in the SDK. Pushes the resolution problem onto every consumer and forces each to re-implement the Jedi/CodeQL plumbing. Rejected.
- Promote
PyClass to a separate call-graph node type so A() lands on the class. Doubles the node taxonomy. Rejected in favor of <class>.__init__ as the constructor target — keeps nodes uniform with where method PyCallables already live in the symbol table.
- Drop edges to external/library callees. Loses RPC / framework / third-party calls, which are exactly the most interesting edges for security/architecture analysis. Rejected; ghost nodes solve this.
- Use an
nx.DiGraph in the persisted schema directly. Not Pydantic-serializable, and would force every consumer onto networkx. Rejected; persist a flat edge list and adapt at the boundaries.
Additional context
- Mirrors the Java analyzer's
CallableVertex + CallDependency shape, adapted to Python (signature-keyed, no separate vertex type).
- Implementation is on
feat/add-codeql-call-graph (committed retroactively for provenance).
- Breaking change:
--analysis-level is removed. The call graph is built unconditionally; CodeQL participation is controlled by the existing --codeql / --no-codeql flag.
- Follow-ups (out of scope here): auto-synthesizing
PyCallable entries for implicit __init__s on Pydantic/dataclass-style classes (currently surfaces as ghost constructor targets); a Joern backend (placeholder in the provenance enum only).
Is your feature request related to a problem? Please describe.
The Python analyzer currently emits only a symbol table — there's no call graph. Downstream consumers (the python-sdk's path/caller/callee queries, dead-code analysis, attack-surface mapping, retrieval-augmented codebase tooling) all need caller→callee information. The Java analyzer already produces one; until the Python analyzer does too, SDK code can't be language-uniform.
Jedi alone covers a narrow slice of call sites (locally-typed, statically obvious). A lot of edges — dynamic dispatch, framework callbacks, calls into third-party libraries — go unresolved. We need a richer resolver, and a way to record which backend(s) resolved each edge so consumers can decide how much to trust them.
Describe the solution you'd like
Add a
call_graph: List[PyCallEdge]field toPyApplication, where eachPyCallEdgehassource/target(bothPyCallable.signaturestrings),weight, andprovenance: List[Literal["jedi","codeql","joern"]]. Nodes are the existingPyCallableentries — no separate vertex type. External / RPC / third-party callees surface as ghost nodes downstream viato_digraph, so those edges are preserved rather than dropped.Pipeline per run:
--codeqlis enabled: auto-download the CLI into<cache_dir>/codeql/bin/, installcodeql/python-allviacodeql pack installinto a per-project qlpack, execute the call-edge query, and augmentPyCallsite.callee_signaturein-place for sites Jedi couldn't resolve.An SDK-facing
to_digraph/from_digraphadapter rehydrates the persisted edges into anetworkx.DiGraphfor path/caller/callee queries.Describe alternatives you've considered
PyClassto a separate call-graph node type soA()lands on the class. Doubles the node taxonomy. Rejected in favor of<class>.__init__as the constructor target — keeps nodes uniform with where methodPyCallables already live in the symbol table.nx.DiGraphin the persisted schema directly. Not Pydantic-serializable, and would force every consumer onto networkx. Rejected; persist a flat edge list and adapt at the boundaries.Additional context
CallableVertex+CallDependencyshape, adapted to Python (signature-keyed, no separate vertex type).feat/add-codeql-call-graph(committed retroactively for provenance).--analysis-levelis removed. The call graph is built unconditionally; CodeQL participation is controlled by the existing--codeql / --no-codeqlflag.PyCallableentries for implicit__init__s on Pydantic/dataclass-style classes (currently surfaces as ghost constructor targets); a Joern backend (placeholder in the provenance enum only).