Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,28 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.1.14] - 2026-05-13

### Added
- **Call graph in analysis output**: `PyApplication.call_graph: List[PyCallEdge]`. Every run now produces a call graph in addition to the symbol table. Edges carry `source`, `target` (both `PyCallable.signature`), `weight`, and `provenance` (`jedi` / `codeql` / `joern`).
- **`call_graph` module** (`codeanalyzer.semantic_analysis.call_graph`) with `to_digraph` / `from_digraph` networkx adapters, `jedi_call_graph_edges`, and `merge_edges`. Endpoints absent from the symbol table become ghost nodes so RPC / third-party / framework edges are preserved.
- **CodeQL Python query** rewritten against the CodeQL Python library (was Java idioms before). Resolves direct calls and constructor calls via `ClassValue.lookup("__init__")`, using the modern `Value.getACall()` predicate (CodeQL Python 7.x).
- **`augment_call_sites`**: when `--codeql` is enabled, CodeQL backfills `PyCallsite.callee_signature` entries Jedi left unresolved.
- **`resolve_unresolved_constructors`**: heuristic fallback that walks the symbol table by class short-name and scope to fill in constructor sites neither Jedi nor CodeQL resolved (common for classes nested inside functions/methods). Synthesizes `<class>.__init__` signatures.
- **`iter_classes_in_symbol_table`**: full recursive walker over classes — including inner classes, classes nested in functions, and classes nested in class methods.

### Changed
- **BREAKING**: Removed `--analysis-level` / `analysis_level`. The call graph is built unconditionally; use `--codeql/--no-codeql` to control CodeQL participation. Jedi-derived edges are always available.
- **Jedi constructor calls now resolve to `<class>.__init__`** (was: bare `<class>`). When `script.infer()` returns a class, the qualified name is rewritten to point at the constructor — matching where method `PyCallable`s actually live in the symbol table. `PyCallsite.is_constructor_call` now reflects Jedi's type inference (was: `method_name == "__init__"`, only true for explicit `obj.__init__()` calls).
- **`_call_sites` scope correctness**: replaced naive `ast.walk` with `_iter_calls_in_scope`, which stops at nested `FunctionDef` / `AsyncFunctionDef` / `ClassDef` bodies (those have their own `PyCallable.call_sites`). Decorators, default arguments, return annotations, base classes and class keyword args are still walked since they execute in the enclosing scope. Previously, outer functions over-attributed every call from every nested definition.
- CodeQL CLI binary is now downloaded into `<cache_dir>/codeql/bin/` (per-project, respecting `--cache-dir`) and discovered before any CodeQL operation — including when the database cache is reused. The downloaded archive is removed after extraction.
- `CodeQLQueryRunner` now accepts the resolved binary path instead of relying on `PATH`. The temporary `.ql` file is written **inside** a per-project qlpack (`<cache_dir>/codeql/qlpack/`) whose `codeql/python-all` dependency is resolved once via `codeql pack install`, eliminating the lockfile / search-path gymnastics.

### Fixed
- **`zipfile` extraction dropped Unix permissions** on the CodeQL CLI launcher, causing `PermissionError` on first query run. Entries are now extracted with their stored `external_attr` mode applied, plus a defensive `chmod +x` on the resolved binary.
- **`rglob("codeql")` matched the bundled `codeql/codeql/` directory** before the launcher file, returning a directory instead of an executable. Both `CodeQLLoader` and `_ensure_codeql_bin` now filter to `is_file()`.
- **`CodeQLQueryRunner` crashed on subprocess errors** with `'NoneType' object has no attribute 'stderr'` because `stderr=None` returns `None` from `communicate()`. Now captures `stderr=PIPE` and decodes bytes safely.

## [0.1.13] - 2025-07-22

### Improved
Expand Down
42 changes: 10 additions & 32 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,6 @@ To view the available options and commands, run `codeanalyzer --help`. You shoul
│ * --input -i PATH Path to the project root directory. [default: None] [required] │
│ --output -o PATH Output directory for artifacts. [default: None] │
│ --format -f [json|msgpack] Output format: json or msgpack. [default: json] │
│ --analysis-level -a INTEGER 1: symbol table, 2: call graph. [default: 1] │
│ --codeql --no-codeql Enable CodeQL-based analysis. [default: no-codeql] │
│ --eager --lazy Enable eager or lazy analysis. Defaults to lazy. [default: lazy] │
│ --cache-dir -c PATH Directory to store analysis cache. [default: None] │
Expand Down Expand Up @@ -112,33 +111,23 @@ To view the available options and commands, run `codeanalyzer --help`. You shoul

This will save the analysis results in `analysis.msgpack` in the specified directory.

3. **Toggle analysis levels with `--analysis-level`:**
```bash
codeanalyzer --input ./my-python-project --analysis-level 1 # Symbol table only
```
Call graph analysis can be enabled by setting the level to `2`:
```bash
codeanalyzer --input ./my-python-project --analysis-level 2 # Symbol table + Call graph
```
***Note: The `--analysis-level=2` is not yet implemented in this version.***

4. **Analysis with CodeQL enabled:**
3. **Analysis with CodeQL enabled:**
```bash
codeanalyzer --input ./my-python-project --codeql
```
This will perform CodeQL-based analysis in addition to the standard symbol table generation.
Every run produces a symbol table **and** a call graph. By default, edges come from Jedi's lexical analysis. Adding `--codeql` resolves additional edges (including RPC / third-party / dynamically-dispatched targets) and merges them with the Jedi-derived edges. CodeQL also backfills resolved callees on Jedi-emitted call sites where Jedi couldn't resolve them.

***Note: Not yet fully implemented. Please refrain from using this option until further notice.***
***Note: CodeQL integration is experimental. The CLI is downloaded into `<cache_dir>/codeql/` on first use and reused thereafter.***

5. **Eager analysis with custom cache directory:**
4. **Eager analysis with custom cache directory:**
```bash
codeanalyzer --input ./my-python-project --eager --cache-dir /path/to/custom-cache
```
This will rebuild the analysis cache at every run and store it in `/path/to/custom-cache/.codeanalyzer`. The cache will be cleared by default after analysis unless you specify `--keep-cache`.

If you provide --cache-dir, the cache will be stored in that directory. If not specified, it defaults to `.codeanalyzer` in the current working directory (`$PWD`).

6. **Quiet mode (minimal output):**
5. **Quiet mode (minimal output):**
```bash
codeanalyzer --input /path/to/my-python-project --quiet
```
Expand Down Expand Up @@ -236,7 +225,6 @@ To view the available options and commands, run `codeanalyzer --help`. You shoul
│ * --input -i PATH Path to the project root directory. [default: None] [required] │
│ --output -o PATH Output directory for artifacts. [default: None] │
│ --format -f [json|msgpack] Output format: json or msgpack. [default: json]. │
│ --analysis-level -a INTEGER 1: symbol table, 2: call graph. [default: 1] │
│ --codeql --no-codeql Enable CodeQL-based analysis. [default: no-codeql] │
│ --eager --lazy Enable eager or lazy analysis. Defaults to lazy. [default: lazy] │
│ --cache-dir -c PATH Directory to store analysis cache. [default: None] │
Expand All @@ -261,33 +249,23 @@ To view the available options and commands, run `codeanalyzer --help`. You shoul

Now, you can find the analysis results in `analysis.json` in the specified directory.

2. **Toggle analysis levels with `--analysis-level`:**
```bash
codeanalyzer --input ./my-python-project --analysis-level 1 # Symbol table only
```
Call graph analysis can be enabled by setting the level to `2`:
```bash
codeanalyzer --input ./my-python-project --analysis-level 2 # Symbol table + Call graph
```
***Note: The `--analysis-level=2` is not yet implemented in this version.***

3. **Analysis with CodeQL enabled:**
2. **Analysis with CodeQL enabled:**
```bash
codeanalyzer --input ./my-python-project --codeql
```
This will perform CodeQL-based analysis in addition to the standard symbol table generation.
Every run produces a symbol table **and** a call graph. By default, edges come from Jedi's lexical analysis. Adding `--codeql` resolves additional edges (including RPC / third-party / dynamically-dispatched targets) and merges them with the Jedi-derived edges. CodeQL also backfills resolved callees on Jedi-emitted call sites where Jedi couldn't resolve them.

***Note: Not yet fully implemented. Please refrain from using this option until further notice.***
***Note: CodeQL integration is experimental. The CLI is downloaded into `<cache_dir>/codeql/` on first use and reused thereafter.***

4. **Eager analysis with custom cache directory:**
3. **Eager analysis with custom cache directory:**
```bash
codeanalyzer --input ./my-python-project --eager --cache-dir /path/to/custom-cache
```
This will rebuild the analysis cache at every run and store it in `/path/to/custom-cache/.codeanalyzer`. The cache will be cleared by default after analysis unless you specify `--keep-cache`.

If you provide --cache-dir, the cache will be stored in that directory. If not specified, it defaults to `.codeanalyzer` in the current working directory (`$PWD`).

5. **Save output in msgpack format:**
4. **Save output in msgpack format:**
```bash
codeanalyzer --input ./my-python-project --output /path/to/analysis-results --format msgpack
```
Expand Down
5 changes: 0 additions & 5 deletions codeanalyzer/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,10 +27,6 @@ def main(
case_sensitive=False,
),
] = OutputFormat.JSON,
analysis_level: Annotated[
int,
typer.Option("-a", "--analysis-level", help="1: symbol table, 2: call graph."),
] = 1,
using_codeql: Annotated[
bool, typer.Option("--codeql/--no-codeql", help="Enable CodeQL-based analysis.")
] = False,
Expand Down Expand Up @@ -82,7 +78,6 @@ def main(
input=input,
output=output,
format=format,
analysis_level=analysis_level,
using_codeql=using_codeql,
using_ray=using_ray,
rebuild_analysis=rebuild_analysis,
Expand Down
Loading