-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Is your feature request related to a problem? Please describe.
Currently, codeanalyzer-python provides basic symbol table generation and has planned call graph analysis (marked as not yet implemented for --analysis-level=2
). However, it lacks crucial program flow analysis capabilities that are essential for understanding code behavior and dependencies:
- Call Graph Construction: While planned, the current implementation doesn't provide comprehensive call graph analysis that handles Python's dynamic features (higher-order functions, nested definitions, dynamic calls)
- Control Flow Graphs (CFG): No support for intra-procedural or inter-procedural control flow analysis
- Data Flow Analysis: Missing data flow tracking capabilities for understanding how data moves through the program
These limitations prevent users from performing advanced static analysis tasks like vulnerability propagation analysis, refactoring impact assessment, and comprehensive dependency tracking.
Describe the solution you'd like
I would like to integrate specific components from the Scalpel Python Static Analysis Framework (https://github.com/SMAT-Lab/Scalpel) to enhance codeanalyzer-python with robust graph-based analysis:
1. Enhanced Analysis Levels
--analysis-level 2 # Call graph analysis (implement using Scalpel)
--analysis-level 3 # Call graph + Control flow graphs
--analysis-level 4 # Call graph + CFG + Data flow analysis
2. New CLI Options
--call-graph # Generate comprehensive call graphs
--control-flow # Generate control flow graphs
--data-flow # Perform data flow analysis
--inter-procedural # Enable inter-procedural analysis
3. Scalpel Integration Focus
Target specific Scalpel capabilities:
- Function 8: Call Graph Construction - Handles Python's dynamic features like higher-order functions and nested definitions
- Function 2: Control-Flow Graph Construction - Generates intra-procedural CFGs that can be combined for inter-procedural analysis
- Function 5: Constant Propagation - Provides data flow analysis capabilities
4. Enhanced Output Schema
class PyCallGraph(BaseModel):
nodes: List[CallNode] # Function/method nodes
edges: List[CallEdge] # Call relationships
entry_points: List[str] # Program entry points
class PyControlFlowGraph(BaseModel):
function_cfgs: Dict[str, CFG] # Per-function CFGs
basic_blocks: List[BasicBlock] # Code basic blocks
class PyDataFlow(BaseModel):
def_use_chains: Dict[str, List] # Variable definitions and uses
reaching_definitions: Dict # Reaching definition analysis
Describe alternatives you've considered
1. NetworkX-based custom implementation
The project already uses NetworkX, but building CFG/call graph analysis from scratch would be time-intensive and error-prone.
2. AST-only analysis
Python's AST module provides basic structure but lacks the sophisticated analysis needed for accurate call graphs in dynamic Python code.
3. Existing call graph tools
- pycg: Good for call graphs but limited CFG support
- code2flow: Visualization-focused, not programmatic analysis
- vulture: Dead code detection, not comprehensive flow analysis
Additional context
Specific Scalpel Advantages for Graph Analysis
- Call Graph: Handles Python's complex dynamic features (decorators, metaclasses, dynamic imports)
- CFG Construction: Provides precise basic block identification and control flow edges
- Inter-procedural Analysis: Can combine function-level CFGs into program-wide flow graphs
Current Project Readiness
- Already has placeholder for call graph analysis (
--analysis-level=2
) - Uses NetworkX for graph operations
- Extensible CLI architecture with typer
- Established pattern for multiple analysis backends
Implementation Plan
# New module: codeanalyzer/semantic_analysis/scalpel/
├── __init__.py
├── scalpel_analyzer.py # Main integration class
├── call_graph_builder.py # Scalpel call graph integration
├── cfg_builder.py # Control flow graph integration
└── data_flow_analyzer.py # Data flow analysis integration
Expected Output Enhancement
# Current (Level 1)
codeanalyzer --input project --analysis-level 1 # Symbol table only
# Enhanced (Levels 2-4 with Scalpel)
codeanalyzer --input project --analysis-level 2 # + Call graphs
codeanalyzer --input project --analysis-level 3 # + Control flow graphs
codeanalyzer --input project --analysis-level 4 # + Data flow analysis
Example Usage Scenarios
-
Security Analysis:
codeanalyzer --input webapp --analysis-level 4 --data-flow # Trace data flow from user inputs to sensitive operations
-
Refactoring Impact Assessment:
codeanalyzer --input legacy_code --call-graph --inter-procedural # Understand function dependencies before refactoring
-
Performance Analysis:
codeanalyzer --input application --control-flow --analysis-level 3 # Identify performance bottlenecks through CFG analysis
Benefits
- Comprehensive Analysis: Complete the missing call graph functionality and add powerful control/data flow analysis
- Python-Specific: Handles Python's dynamic nature better than generic tools
- Research-Backed: Scalpel is published research (arXiv:2202.11840) with proven effectiveness
- Compatible: Both projects use Python 3.12+ and have compatible licenses
- Modular: Can integrate specific components without full framework overhead
This focused integration would complete the missing call graph functionality and add powerful control/data flow analysis capabilities, making codeanalyzer-python a comprehensive tool for program flow analysis without overwhelming complexity.