-
Notifications
You must be signed in to change notification settings - Fork 2
Explore formal foundations: authority lattice, incremental graph reachability, abstract interpretation #61
Description
Context
capsec's cross-crate propagation and MIR analysis implement several patterns that have formal counterparts in programming language theory. Grounding the implementation in these frameworks could improve precision, enable incremental analysis, and provide correctness guarantees.
This issue tracks exploration — not immediate implementation.
1. Authority Lattice (Denning, 1976)
Current state: Findings have a single Category (FS, NET, ENV, PROC, FFI). A function that does both filesystem and network access produces two separate findings. Dedup, propagation, and reporting all treat categories independently.
Proposal: Model authority as a set (or lattice element) per function, not per call site. The lattice has:
- Bottom:
{}(pure, no authority) - Atoms:
{FS},{NET},{ENV},{PROC},{FFI} - Top:
{FS, NET, ENV, PROC, FFI}(ambient — can do anything) - Join: set union
Cross-crate propagation becomes: for each function, compute the join of all callee authority sets. This is more precise than the current per-call-site model and eliminates redundant findings.
Concrete benefit: git2::Repository::open() would have authority set {FS, FFI} instead of two separate findings. The report would show one entry with multiple categories. Filtering by category would be set intersection, not linear scan.
Reference: Dorothy Denning, "A Lattice Model of Secure Information Flow" (1976). Also: Myers & Liskov, "Protecting Privacy using the Decentralized Label Model" (2000) for a practical label/authority system.
2. Incremental Graph Reachability
Current state: cargo capsec diff crate@v1 crate@v2 scans both versions from scratch. Export maps are cached per crate version, but the diff doesn't use them — it rescans and compares findings.
Proposal: The dependency graph + export maps form an authority propagation graph. A version bump changes one node's export map. Instead of rescanning everything, compute which downstream nodes are affected and re-propagate only those.
Algorithm:
- Load cached export maps for all deps at their current versions
- For the bumped crate, compute the new export map (scan only that crate)
- Diff the old and new export maps (added/removed authority entries)
- For each changed entry, walk the reverse dependency graph and update affected workspace findings
This would make cargo capsec diff near-instant on cached data — O(changed entries × affected dependents) instead of O(all source files × 2).
Reference: Italiano, "Amortized efficiency of a path retrieval data structure" (1986). Also: La Poutré & van Leeuwen, "Maintenance of transitive closures and transitive reductions of graphs" (1988).
3. Abstract Interpretation Framework (Cousot & Cousot, 1977)
Current state: The MIR walker computes an over-approximation of authority: if a function might call std::fs::read on any code path, it's flagged. Dead branches, unreachable paths, and conditional authority are all treated as exercised.
Proposal: Formalize the MIR analysis as an abstract interpreter over an authority domain. The concrete domain is "set of syscalls actually made at runtime." The abstract domain is "set of syscalls syntactically reachable." The abstraction function maps concrete execution traces to authority sets. The analysis computes a fixed point over the MIR control flow graph.
This framing doesn't change what capsec does today, but it:
- Provides a correctness argument (the analysis is a sound over-approximation)
- Identifies where precision is lost (join points in the CFG, unresolved dynamic dispatch)
- Suggests refinements (path-sensitive analysis, conditional authority: "FS only if config.enable_cache is true")
Concrete benefit: Could enable a --precise mode that uses path-sensitive analysis to reduce false positives. For example, if cfg!(feature = "network") { connect() } would only report NET authority when the feature is enabled.
Reference: Cousot & Cousot, "Abstract Interpretation: A Unified Lattice Model for Static Analysis of Programs" (1977). Also: Might & Shivers, "Abstracting Abstract Machines" (2010) for a modern accessible treatment.
4. Capability Safety (Mark Miller, 2006)
Current state: capsec's type system (Cap<P>, CapProvider<P>) implements the object-capability model. The audit tool detects violations of capability discipline (ambient authority usage). These are two sides of the same coin.
Proposal: Formalize the relationship between the audit tool and the type system using Miller's capability safety framework. A "capability-safe" module is one where the audit tool finds zero ambient authority findings and all I/O goes through CapProvider<P> bounds. The type system guarantees this at compile time; the audit tool verifies it for code that hasn't adopted the type system yet.
This framing would let capsec report a capability safety score per crate: percentage of functions that are capability-safe (all I/O via caps) vs. ambient (direct std:: calls). Migration from ambient to capability-safe becomes a measurable metric.
Reference: Mark Miller, "Robust Composition: Towards a Unified Approach to Access Control and Concurrency Control" (2006). Also: the Wyvern capability module system (Melicher et al., ECOOP 2017) which capsec's design already draws from.
Priority
- Lattice model (fix: reduce false positives, improve discovery and CI ergonomics #1): Most practical. Could improve the data model and reduce finding noise. Worth prototyping.
- Incremental reachability (docs: add complete dogfooding throughout codebase #2): High value for
diffcommand performance. Worth exploring after export map format stabilizes. - Abstract interpretation (build: add capsec to CI #3): Theoretical grounding. Useful for documentation and correctness arguments. Low urgency.
- Capability safety (Create a GitHub Action for capsec audit #4): Already partially implemented. Formalizing it would improve the README and academic positioning.
Non-goals
This is not about making capsec an academic tool. It's about borrowing proven frameworks to make the engineering better — fewer false positives, faster incremental analysis, and a clearer story about what the tool guarantees.