A static analysis tool for measuring architectural context exposure in codebases.
Context Footprint is a research prototype that computes a Context Footprint (CF) metric for functions and types in a codebase.
CF approximates the amount of code context that must be traversed to analyze a given symbol, based on language-level dependencies and abstraction boundaries.
This tool is developed alongside an ongoing academic study and is intended for measurement, comparison, and empirical analysis, rather than production use.
- The CF definition and traversal rules are still evolving.
- APIs, outputs, and heuristics may change before publication.
- The repository currently serves reproducibility and early feedback purposes.
Given a semantic index of a codebase, the tool can:
- Compute the distribution of CF values across all functions or types
- Identify symbols with unusually large context exposure
- Query the CF of a specific symbol
- Print the source code that contributes to a symbol’s CF
CF is computed via conservative graph traversal over language-level dependencies, with configurable pruning rules.
The tool consumes semantic data (JSON) produced by language-specific extractors (e.g. LSP-based), and is therefore language-agnostic in principle.
Tested languages include:
- Python
- TypeScript
Support for additional languages depends on the availability of semantic data extractors that output the SemanticData JSON format.
Install as a Python tool—includes the cf-extract command for Python project extraction:
uv tool install cftool
# or: pip install cftoolRequires Python 3.9+.
Build from source:
git clone https://github.com/yourusername/context-footprint.git
cd context-footprint
cargo build --releaseRequires Rust 1.70+.
- A semantic data JSON file for the target project (e.g. from
cf-extractfor Python)
For Python projects, use the bundled extractor:
cf-extract /path/to/python/project > semantic_data.jsonOr use another extractor (e.g. LSP-based) that outputs the SemanticData JSON format.
cftool semantic_data.json stats
# or with cargo build: ./target/release/cftool semantic_data.json statscftool semantic_data.json top --limit 10cftool semantic_data.json compute "<symbol-id>"cftool semantic_data.json context "<symbol-id>"The tool reports CF values as token counts, using a configurable size function. Output includes percentile distributions and summary statistics for large codebases.
Example:
Functions - Context Footprint Distribution:
Count: 856
Median: 245 tokens
90th percentile: 20,567 tokens
-
A directed dependency graph is constructed from the semantic data (JSON).
-
Starting from a target symbol, dependencies are traversed conservatively.
-
Traversal stops at:
- External libraries
- Explicit abstraction boundaries defined by the pruning policy
-
The size of the reachable subgraph is summed.
The default pruning policy is intentionally conservative and favors soundness over precision.
For a formal definition, see docs/design.md.
The implementation separates core analysis logic from language-specific adapters:
src/
├─ domain/ # Graph model and traversal logic
└─ adapters/ # Size functions, doc scoring, test detection
Apache 2.0
Semantic data is consumed as JSON (e.g. from LSP-based extractors).