-
Notifications
You must be signed in to change notification settings - Fork 0
Tree Sitter Intelligence Design
The Tree-Sitter Intelligence feature provides comprehensive code analysis capabilities for Coda, enabling users to understand code structure, dependencies, and relationships across their codebase. This document outlines the architecture, design decisions, and implementation details.
coda/intelligence/
├── __init__.py # Public API exports
├── tree_sitter_analyzer.py # Main analyzer interface
├── tree_sitter_query_analyzer.py # Query-based implementation
├── repo_map.py # Repository structure mapping
├── dependency_graph.py # Dependency analysis
├── cli.py # CLI command interface
└── queries/ # Language-specific query files
├── python-tags.scm
├── javascript-tags.scm
├── kotlin-tags.scm
└── ... (30+ languages)
- Language Agnostic: Uses tree-sitter grammars to support 30+ languages uniformly
- Query-Based Extraction: SCM query files define what to extract for each language
- Fallback Support: Regex-based fallback for when tree-sitter isn't available
- Incremental Analysis: Can analyze single files or entire directories
- Extensible: Easy to add new languages by adding query files
The system extracts various code elements:
- Definitions: Classes, functions, methods, variables, constants
- References: Function calls, class instantiations, property access
- Imports: Import statements and dependencies
- Documentation: Docstrings and comments
RepoMap provides repository-wide analysis:
- File discovery and categorization
- Language statistics
- Git integration for repository metadata
- Aggregate metrics (size, lines, file counts)
DependencyGraph builds and analyzes dependency relationships:
- Import resolution
- Circular dependency detection
- Dependency depth calculation
- Most depended upon files
We implement two types of tools:
Native Tools (@tool decorator):
- Simple, synchronous functions
- Instant execution
- String-based output
- Perfect for interactive use
MCP Tools (class-based):
- Comprehensive analysis
- Rich metadata
- Async execution
- Structured results
This hybrid approach provides both speed and depth as needed.
# Query-based extraction
query = language.query(query_scm)
captures = query.captures(tree.root_node)
# Process captures
for node, tag in captures:
if tag.startswith("definition."):
# Extract definition
elif tag == "import":
# Extract importExample Python query:
;; Class definitions
(class_definition
name: (identifier) @name.definition.class) @definition.class
;; Function definitions
(function_definition
name: (identifier) @name.definition.function) @definition.function- Graceful Degradation: Falls back to regex when tree-sitter fails
- Language Detection: Multiple methods (extension, content-based)
- Parser Caching: Reuses parsers for performance
- Query Validation: Validates SCM queries on load
- Parser Caching: Each language parser initialized once
- Query Caching: Compiled queries cached per language
- Lazy Loading: Only loads necessary language support
- Progress Indicators: For long-running operations
- Create
queries/[language]-tags.scmfile - Add extension mappings in
_get_extension_map() - Write test coverage
- Document supported constructs
The modular design allows:
- Custom query patterns
- Additional extraction rules
- Language-specific handling
- Plugin-based extensions
- Unit Tests: Each component tested independently
- Integration Tests: Full pipeline testing
- Language Tests: Per-language test coverage
- Query Validation: Automated query syntax checking
- Incremental Parsing: Only reparse changed portions
- Cross-File Resolution: Resolve symbols across files
- Type Information: Extract and track type annotations
- Semantic Search: Search by meaning not just text
- IDE Integration: Language server protocol support
- File Access: Respects file permissions
- Path Validation: Prevents directory traversal
- Resource Limits: Caps on file sizes and counts
- Safe Parsing: Handles malformed code gracefully
The Tree-Sitter Intelligence feature provides a robust foundation for code analysis in Coda. The hybrid tool approach, comprehensive language support, and extensible architecture make it suitable for both interactive use and automated analysis workflows.