Skip to content

Comments

feat: Add AST parsing with tree-sitter for code symbol extraction#14

Merged
gulivan merged 12 commits intomainfrom
feat/ast-parsing-with-tree-sitter
Oct 23, 2025
Merged

feat: Add AST parsing with tree-sitter for code symbol extraction#14
gulivan merged 12 commits intomainfrom
feat/ast-parsing-with-tree-sitter

Conversation

@gulivan
Copy link
Contributor

@gulivan gulivan commented Oct 21, 2025

New Feature: AST Parsing with Tree-Sitter

Adds a new --output ast mode that extracts and displays high-level code entities (functions, classes, interfaces, etc.) using tree-sitter AST parsing.

📊 What's New

New CLI Option

# Analyze code symbols
contextcalc . --output ast

# Works with all existing filters
contextcalc src --output ast --mode code --depth 2

Example Output

src/core/scanner.ts (1750 tokens, 258 lines, 7.6KB)
├─ ← from "node:path" { join, relative } line 1
├─ C DirectoryScanner lines 11-258
│  ├─ v cache line 12
│  ├─ v tokenizer line 13
│  ├─ ƒ constructor(...) lines 19-32
│  ├─ ƒ async initialize(useGitignore: boolean, ...): Promise<void> lines 34-45
│  ├─ ƒ async scan(): Promise<ScanResult> lines 47-69
│  └─ ƒ static calculatePercentages(...): Node[] lines 237-257
└─ c CACHE_VERSION line 7

Found 326 symbols across 16 files

🚀 Features

Extracted Symbol Types

  • Functions/Methods (ƒ) - with parameters, return types, async/generator flags
  • Classes (C) - with extends, implements, abstract modifiers
  • Interfaces (I) - with extends and members
  • Types (T) - type aliases and definitions
  • Enums (E) - with member values
  • Variables/Constants (v/c) - with type annotations
  • Imports (←) - showing source and imported items
  • Exports (→) - showing exported items
  • Namespaces (N) - for languages that support them

Language Support

  • TypeScript/TSX - Fully implemented
  • JavaScript/JSX - Fully implemented
  • Python - Fully implemented
  • 🔧 Go, Rust, Java, C++, C#, Ruby, PHP - Stubs ready for expansion

Technical Highlights

  • Lazy loading - Grammars only loaded when needed
  • Performance - Maintains parallel file processing
  • Caching - Integrated with existing cache system
  • Type-safe - Comprehensive TypeScript types
  • Extensible - Easy to add new languages

📦 Dependencies

Added tree-sitter and language grammars to dependencies. No user action required - these are automatically installed with npm install and include prebuilt binaries for common platforms.

🧪 Testing

# Type checking
bun run typecheck  # ✅ Passes

# Test on real codebase
bun src/cli.ts src/core --output ast  # ✅ Works perfectly

📝 Implementation Details

New Files

  • src/core/astParser.ts - Main AST parser
  • src/core/languages/ - Language configurations
    • typescript.ts - Full TS/TSX symbol extraction
    • javascript.ts - Full JS/JSX symbol extraction
    • python.ts - Full Python symbol extraction
    • go.ts, rust.ts, java.ts, etc. - Stubs for expansion
  • src/formatters/astFormatter.ts - AST output formatter

Modified Files

  • package.json - Added tree-sitter dependencies
  • src/types/index.ts - Added AST symbol types
  • src/cli.ts - Added --output ast support
  • src/core/scanner.ts - Integrated AST parsing

🔄 Backward Compatibility

✅ Fully backward compatible - all existing functionality unchanged. AST parsing is opt-in via --output ast.

📚 Use Cases

  • Code exploration - Quickly understand codebase structure
  • Documentation - Generate symbol lists
  • Analysis - Identify patterns and architecture
  • Code review - Get high-level overview of changes
  • LLM context - Provide structured code summaries

🎯 Next Steps (Future)

  • Implement symbol extraction for remaining languages
  • Add token counts per symbol
  • Support custom symbol filtering
  • Generate symbol documentation

Installation note: All dependencies are bundled with the package. Users don't need to install tree-sitter separately - it's handled automatically by npm/bun.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added AST (Abstract Syntax Tree) output format for detailed code symbol extraction and visualization
    • Introduced support for Python, TypeScript, and JavaScript code analysis
    • Symbol output now includes functions, classes, interfaces, imports, enums, and variables with metadata such as parameters, return types, and source locations
  • Documentation

    • Updated README with AST output format usage and examples

Implements a new --output ast mode that extracts and displays high-level
code entities (functions, classes, interfaces, etc.) using tree-sitter.

Features:
- New output format: --output ast displays parsed code symbols
- Multi-language support: TypeScript, JavaScript, Python (fully implemented)
- Extensible architecture: Ready for 7+ additional languages (Go, Rust, Java, C++, C#, Ruby, PHP)
- Symbol extraction: Functions, classes, interfaces, types, enums, variables, imports, exports
- Hierarchical display: Shows nested symbols (classes → methods → properties)
- Location tracking: Displays line numbers and ranges for each symbol
- Signature formatting: Full type signatures with parameters and return types

Technical implementation:
- AST parser with lazy-loaded grammars for performance
- Language configuration system for easy extensibility
- Integrated with existing cache and scanner infrastructure
- Type-safe implementation with comprehensive TypeScript types
- Zero additional user installation required (dependencies bundled)

Symbol types extracted:
- Functions/Methods (ƒ) - with parameters, return types, async/generator flags
- Classes (C) - with extends, implements, abstract modifiers
- Interfaces (I) - with extends and members
- Types (T) - type aliases and definitions
- Enums (E) - with member values
- Variables/Constants (v/c) - with type annotations
- Imports (←) - showing source and imported items
- Exports (→) - showing exported items
- Namespaces (N) - for languages that support them

Dependencies added:
- tree-sitter: Core parsing library
- tree-sitter-typescript: TypeScript/TSX grammar
- tree-sitter-javascript: JavaScript/JSX grammar
- tree-sitter-python: Python grammar
- tree-sitter-go: Go grammar
- tree-sitter-rust: Rust grammar
- tree-sitter-java: Java grammar
- tree-sitter-cpp: C++ grammar
- tree-sitter-c-sharp: C# grammar
- tree-sitter-ruby: Ruby grammar
- tree-sitter-php: PHP grammar
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 21, 2025

Important

Review skipped

Review was skipped due to path filters

⛔ Files ignored due to path filters (1)
  • bun.lock is excluded by !**/*.lock

CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including **/dist/** will override the default block on the dist directory, by removing the pattern from both the lists.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This PR introduces AST parsing and symbol extraction capabilities using tree-sitter. It adds language-specific parsers for TypeScript, JavaScript, and Python; a new ASTParser core module; comprehensive symbol type definitions; integration with the existing scanning pipeline; and AST-formatted output rendering. Cache versioning is incremented to reflect the schema changes.

Changes

Cohort / File(s) Summary
Dependencies & Build Configuration
package.json
Added tree-sitter family packages and configured externals to prevent bundling of tree-sitter modules.
Type System Expansion
src/types/index.ts
Introduced SymbolType enum, SourceLocation, Parameter, BaseSymbol, and twelve symbol-specific interfaces (FunctionSymbol, ClassSymbol, InterfaceSymbol, etc.). Extended OutputFormat with AST, added entities property to FileNode/CacheEntry, and added ASTOptions interface.
Core AST Infrastructure
src/core/astParser.ts, src/core/languages/index.ts, src/core/languages/typescript.ts, src/core/languages/javascript.ts, src/core/languages/python.ts
Added ASTParser class with file/text parsing, lazy language initialization, and grammar caching. Implemented language registry with LanguageConfig interface and three language implementations (TypeScript, JavaScript, Python) providing grammar loading and symbol extraction.
Pipeline Integration
src/core/scanner.ts, src/cli.ts
Extended DirectoryScanner with enableAST flag, AST parsing per-file, and entity attachment to FileNode. Updated CLI to support AST output format, route to ASTParser, and pass enableAST to scanner.
Output Formatting
src/formatters/astFormatter.ts
New formatAsAST function providing ASCII tree rendering of symbols with icons, signatures, optional locations, and token counts.
Utility & Maintenance
src/core/cache.ts
Bumped CACHE\_VERSION from '1.0' to '1.2' to invalidate existing caches.
Documentation
README.md
Added AST Output section covering command syntax, supported languages, symbol icons, sample output, and features.
Test Coverage
test/astParser.test.ts, test/astFormatter.test.ts, test/languages/typescript.test.ts, test/languages/python.test.ts
Comprehensive test suites for ASTParser lifecycle and parsing accuracy, AST formatter output, and language-specific symbol extraction across TypeScript, Python, and JavaScript.

Sequence Diagram

sequenceDiagram
    participant User as User/CLI
    participant CLI as cli.ts
    participant Scanner as DirectoryScanner
    participant ASTParser as ASTParser
    participant LangReg as Language Registry
    participant Formatter as astFormatter
    
    User->>CLI: Request AST output
    CLI->>CLI: Parse args, set enableAST=true
    CLI->>Scanner: Instantiate with enableAST
    Scanner->>ASTParser: Create instance
    Scanner->>Scanner: Start scanning
    
    loop For each file
        Scanner->>ASTParser: parseFile(filePath)
        ASTParser->>LangReg: getLanguageByExtension()
        LangReg-->>ASTParser: LanguageConfig
        ASTParser->>ASTParser: loadGrammar (cached)
        ASTParser->>ASTParser: parse with tree-sitter
        ASTParser->>ASTParser: extractSymbols()
        ASTParser-->>Scanner: ASTSymbol[]
        Scanner->>Scanner: Attach entities to FileNode
    end
    
    Scanner-->>CLI: ScanResult with entities
    CLI->>Formatter: formatAsAST(result, options)
    Formatter->>Formatter: Traverse files/symbols
    Formatter->>Formatter: Build ASCII tree
    Formatter-->>CLI: Formatted string
    CLI-->>User: Output AST representation
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Rationale: This PR introduces substantial new infrastructure (ASTParser class, language registry system, symbol type definitions) alongside deep integration into existing scanning and CLI pipelines. While individual language implementations follow consistent patterns, the diverse mix of core logic, type definitions, integration points, and grammar-handling mechanics requires careful reasoning across multiple domains. Comprehensive test coverage and incremental design patterns reduce friction, but the interconnected nature and novel dependencies (tree-sitter) demand thorough review.

Poem

🐰 Parsing symbols in the trees so fine,
With tree-sitter's grammar, dancing in a line,
TypeScript, Python, JavaScript too—
AST extraction brings code into view!
From scanners to formatters, the symbols shine bright,
A rabbit's delight in the code's true sight. 🌲✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The pull request title "feat: Add AST parsing with tree-sitter for code symbol extraction" accurately and directly summarizes the main change in the changeset. The PR's primary objective is to introduce AST parsing capabilities using tree-sitter to extract code symbols (functions, classes, interfaces, imports, etc.), which is precisely what the title conveys. The title is concise, specific, and uses clear language without noise or vague terms. A teammate scanning the git history would immediately understand that this commit adds a new feature for code symbol extraction via tree-sitter AST parsing.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

🧹 Nitpick comments (12)
src/core/scanner.ts (1)

162-170: Avoid triple I/O (hash + tokenize + AST).

Current flow likely reads file up to 3x (hashFile, tokenizer.countTokens, astParser.parseFile). Consider a small perf refactor:

  • Read once (e.g., via Bun.file(path).text()) and feed:
    • tokenizer.countTokensFromText(text) [new helper]
    • astParser.parseText(text, ext)
    • hashFromText(text) [optional helper]
      Even switching only AST to parseText saves one read. Also consider always initializing entities: [] to simplify consumers.

Also applies to: 172-186

src/core/languages/rust.ts (1)

1-18: Fix lints: alias Symbol type and underscore unused args.

Prevents global Symbol shadowing and unused-arg errors while keeping the stub.

Apply:

-import type Parser from 'tree-sitter';
-import type { Symbol } from '../../types/index.js';
+import type Parser from 'tree-sitter';
+import type { Symbol as CodeSymbol } from '../../types/index.js';

 export const RustConfig: LanguageConfig = {
@@
-  extractSymbols: (tree: Parser.Tree, sourceCode: string): Symbol[] => {
+  extractSymbols: (_tree: Parser.Tree, _sourceCode: string): CodeSymbol[] => {
     // TODO: Implement Rust-specific symbol extraction
     return [];
   }
 };
src/core/languages/typescript.ts (4)

1-5: Resolve lints: alias Symbol type; drop unused SymbolType import.

Prevents global Symbol shadowing and removes an unused type import.

-import type Parser from 'tree-sitter';
-import type { Symbol, SymbolType, FunctionSymbol, ClassSymbol, InterfaceSymbol, TypeSymbol, EnumSymbol, VariableSymbol, ImportSymbol, ExportSymbol, SourceLocation, Parameter } from '../../types/index.js';
+import type Parser from 'tree-sitter';
+import type { Symbol as CodeSymbol, FunctionSymbol, ClassSymbol, InterfaceSymbol, TypeSymbol, EnumSymbol, VariableSymbol, ImportSymbol, ExportSymbol, SourceLocation, Parameter } from '../../types/index.js';
 import type { LanguageConfig } from './index.js';
 import { SymbolType as ST } from '../../types/index.js';

Also update return/locals to CodeSymbol:

-  extractSymbols: (tree: Parser.Tree, sourceCode: string): Symbol[] => {
-    const symbols: Symbol[] = [];
+  extractSymbols: (tree: Parser.Tree, sourceCode: string): CodeSymbol[] => {
+    const symbols: CodeSymbol[] = [];
@@
-      const members: Symbol[] = [];
+      const members: CodeSymbol[] = [];
@@
-      const members: Symbol[] = [];
+      const members: CodeSymbol[] = [];

67-68: Generator detection likely brittle.

Checking for a child with type '*' may miss generator markers; prefer grammar token check (e.g., c.type === 'asterisk') or drop generator until verified.

Would you like me to cross‑check the tree-sitter TypeScript grammar nodes for generator functions and adjust?


321-323: Robust const detection.

Using startsWith('const') on node text can be wrong with leading modifiers (e.g., export). Prefer token check.

-                type: node.type === 'lexical_declaration' && getNodeText(node).startsWith('const') ? ST.CONSTANT : ST.VARIABLE,
+                type: node.type === 'lexical_declaration' && node.children.some(c => c.type === 'const')
+                  ? ST.CONSTANT
+                  : ST.VARIABLE,

87-90: Refine class heritage parsing (extends vs implements).

Reading whole class_heritage may mix extends/implements. Consider extracting specific child nodes (extends_clause, implements_clause) for cleaner strings and arrays.

Also applies to: 114-123

src/formatters/astFormatter.ts (3)

1-4: Clean imports: alias Symbol type and drop unused FolderNode.

Prevents global Symbol shadowing and removes an unused type.

-import chalk from 'chalk';
-import type { ScanResult, Node, FileNode, FolderNode, Symbol, TreeOptions, FunctionSymbol, ClassSymbol, InterfaceSymbol, EnumSymbol, ImportSymbol, ExportSymbol, NamespaceSymbol } from '../types/index.js';
+import chalk from 'chalk';
+import type { ScanResult, Node, FileNode, Symbol as CodeSymbol, TreeOptions, FunctionSymbol, ClassSymbol, InterfaceSymbol, EnumSymbol, ImportSymbol, ExportSymbol, NamespaceSymbol } from '../types/index.js';

Update function signatures/uses to CodeSymbol (examples):

-function formatSymbolSignature(symbol: Symbol): string {
+function formatSymbolSignature(symbol: CodeSymbol): string {
@@
-function formatSymbol(symbol: Symbol, indent: string, isLast: boolean, showLocation: boolean = true): string[] {
+function formatSymbol(symbol: CodeSymbol, indent: string, isLast: boolean, showLocation: boolean = true): string[] {
@@
-function getNestedSymbols(symbol: Symbol): Symbol[] {
+function getNestedSymbols(symbol: CodeSymbol): CodeSymbol[] {
@@
-function countNestedSymbols(symbol: Symbol): number {
+function countNestedSymbols(symbol: CodeSymbol): number {

47-109: Wrap case blocks to avoid lexical leakage in switch.

Adds braces to satisfy no-case-declarations and Biome’s noSwitchDeclarations.

-  function formatSymbolSignature(symbol: Symbol): string {
+  function formatSymbolSignature(symbol: CodeSymbol): string {
     switch (symbol.type) {
       case SymbolType.FUNCTION:
       case SymbolType.METHOD:
-        const funcSymbol = symbol as FunctionSymbol;
-        if (funcSymbol.signature) {
-          return funcSymbol.signature;
-        }
-        const params = funcSymbol.parameters.map(p => {
-          let param = p.name;
-          if (p.type) param += `: ${p.type}`;
-          if (p.optional) param += '?';
-          if (p.defaultValue) param += ` = ${p.defaultValue}`;
-          return param;
-        }).join(', ');
-        let sig = `${funcSymbol.name}(${params})`;
-        if (funcSymbol.returnType) sig += `: ${funcSymbol.returnType}`;
-        if (funcSymbol.async) sig = `async ${sig}`;
-        return sig;
+        {
+          const funcSymbol = symbol as FunctionSymbol;
+          if (funcSymbol.signature) {
+            return funcSymbol.signature;
+          }
+          const params = funcSymbol.parameters.map(p => {
+            let param = p.name;
+            if (p.type) param += `: ${p.type}`;
+            if (p.optional) param += '?';
+            if (p.defaultValue) param += ` = ${p.defaultValue}`;
+            return param;
+          }).join(', ');
+          let sig = `${funcSymbol.name}(${params})`;
+          if (funcSymbol.returnType) sig += `: ${funcSymbol.returnType}`;
+          if (funcSymbol.async) sig = `async ${sig}`;
+          return sig;
+        }
 
       case SymbolType.CLASS:
-        const classSymbol = symbol as ClassSymbol;
-        let classSig = classSymbol.name;
-        if (classSymbol.abstract) classSig = `abstract ${classSig}`;
-        if (classSymbol.extends) classSig += ` extends ${classSymbol.extends}`;
-        if (classSymbol.implements && classSymbol.implements.length > 0) {
-          classSig += ` implements ${classSymbol.implements.join(', ')}`;
-        }
-        return classSig;
+        {
+          const classSymbol = symbol as ClassSymbol;
+          let classSig = classSymbol.name;
+          if (classSymbol.abstract) classSig = `abstract ${classSig}`;
+          if (classSymbol.extends) classSig += ` extends ${classSymbol.extends}`;
+          if (classSymbol.implements && classSymbol.implements.length > 0) {
+            classSig += ` implements ${classSymbol.implements.join(', ')}`;
+          }
+          return classSig;
+        }
 
       case SymbolType.INTERFACE:
-        const ifaceSymbol = symbol as InterfaceSymbol;
-        let ifaceSig = ifaceSymbol.name;
-        if (ifaceSymbol.extends && ifaceSymbol.extends.length > 0) {
-          ifaceSig += ` extends ${ifaceSymbol.extends.join(', ')}`;
-        }
-        return ifaceSig;
+        {
+          const ifaceSymbol = symbol as InterfaceSymbol;
+          let ifaceSig = ifaceSymbol.name;
+          if (ifaceSymbol.extends && ifaceSymbol.extends.length > 0) {
+            ifaceSig += ` extends ${ifaceSymbol.extends.join(', ')}`;
+          }
+          return ifaceSig;
+        }
 
       case SymbolType.ENUM:
-        const enumSymbol = symbol as EnumSymbol;
-        return `${enumSymbol.name} { ${enumSymbol.members.length} members }`;
+        {
+          const enumSymbol = symbol as EnumSymbol;
+          return `${enumSymbol.name} { ${enumSymbol.members.length} members }`;
+        }
 
       case SymbolType.IMPORT:
-        const importSymbol = symbol as ImportSymbol;
-        let importSig = `from "${importSymbol.from}"`;
-        if (importSymbol.default) importSig = `${importSymbol.default} ${importSig}`;
-        if (importSymbol.imports.length > 0) importSig += ` { ${importSymbol.imports.join(', ')} }`;
-        if (importSymbol.namespace) importSig += ` as ${importSymbol.namespace}`;
-        return importSig;
+        {
+          const importSymbol = symbol as ImportSymbol;
+          let importSig = `from "${importSymbol.from}"`;
+          if (importSymbol.default) importSig = `${importSymbol.default} ${importSig}`;
+          if (importSymbol.imports.length > 0) importSig += ` { ${importSymbol.imports.join(', ')} }`;
+          if (importSymbol.namespace) importSig += ` as ${importSymbol.namespace}`;
+          return importSig;
+        }
 
       case SymbolType.EXPORT:
-        const exportSymbol = symbol as ExportSymbol;
-        if (exportSymbol.default) return `default ${exportSymbol.default}`;
-        return `{ ${exportSymbol.exports.join(', ')} }`;
+        {
+          const exportSymbol = symbol as ExportSymbol;
+          if (exportSymbol.default) return `default ${exportSymbol.default}`;
+          return `{ ${exportSymbol.exports.join(', ')} }`;
+        }
 
       case SymbolType.NAMESPACE:
-        const nsSymbol = symbol as NamespaceSymbol;
-        return `${nsSymbol.name} { ${nsSymbol.members.length} members }`;
+        {
+          const nsSymbol = symbol as NamespaceSymbol;
+          return `${nsSymbol.name} { ${nsSymbol.members.length} members }`;
+        }

6-9: Option not used.

ASTFormatterOptions.showTokensPerSymbol is unused. Either implement it or remove to avoid confusion.

src/core/languages/python.ts (1)

1-5: Alias Symbol type to avoid shadowing.

Prevents global Symbol shadowing; no behavior change.

-import type Parser from 'tree-sitter';
-import type { Symbol, FunctionSymbol, ClassSymbol, ImportSymbol, SourceLocation, Parameter } from '../../types/index.js';
+import type Parser from 'tree-sitter';
+import type { Symbol as CodeSymbol, FunctionSymbol, ClassSymbol, ImportSymbol, SourceLocation, Parameter } from '../../types/index.js';
@@
-  extractSymbols: (tree: Parser.Tree, sourceCode: string): Symbol[] => {
-    const symbols: Symbol[] = [];
+  extractSymbols: (tree: Parser.Tree, sourceCode: string): CodeSymbol[] => {
+    const symbols: CodeSymbol[] = [];

Also applies to: 15-17

src/core/languages/index.ts (1)

4-9: Consider typing loadGrammar return value more specifically.

The loadGrammar function returns Promise<any>, which loses type safety. If tree-sitter provides a Language type, use it:

+import type Parser from 'tree-sitter';

 export interface LanguageConfig {
   name: string;
   extensions: string[];
-  loadGrammar: () => Promise<any>;
+  loadGrammar: () => Promise<Parser.Language>;
   extractSymbols: (tree: Parser.Tree, sourceCode: string) => Symbol[];
 }

If Parser.Language is not exported or the grammars have incompatible types, document why any is necessary with a comment.

Based on static analysis (GitHub Check: Test and Build).

src/core/astParser.ts (1)

7-11: Type grammarCache more specifically.

The Map<string, any> loses type safety. If tree-sitter provides a Language type:

+import type Parser from 'tree-sitter';

 export class ASTParser {
   private parser: Parser | null = null;
   private initialized = false;
-  private grammarCache: Map<string, any> = new Map();
+  private grammarCache: Map<string, Parser.Language> = new Map();

Based on static analysis (GitHub Check: Test and Build).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between edc552d and 76c0d1b.

⛔ Files ignored due to path filters (1)
  • bun.lock is excluded by !**/*.lock
📒 Files selected for processing (17)
  • package.json (2 hunks)
  • src/cli.ts (4 hunks)
  • src/core/astParser.ts (1 hunks)
  • src/core/languages/cpp.ts (1 hunks)
  • src/core/languages/csharp.ts (1 hunks)
  • src/core/languages/go.ts (1 hunks)
  • src/core/languages/index.ts (1 hunks)
  • src/core/languages/java.ts (1 hunks)
  • src/core/languages/javascript.ts (1 hunks)
  • src/core/languages/php.ts (1 hunks)
  • src/core/languages/python.ts (1 hunks)
  • src/core/languages/ruby.ts (1 hunks)
  • src/core/languages/rust.ts (1 hunks)
  • src/core/languages/typescript.ts (1 hunks)
  • src/core/scanner.ts (4 hunks)
  • src/formatters/astFormatter.ts (1 hunks)
  • src/types/index.ts (3 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{ts,tsx,js,jsx}: Do not use dotenv; Bun loads .env automatically
Use Bun.serve() for HTTP/WebSocket/HTTPS routes; do not use Express
Use bun:sqlite for SQLite; do not use better-sqlite3
Use Bun.redis for Redis; do not use ioredis
Use Bun.sql for Postgres; do not use pg or postgres.js
Use built-in WebSocket; do not use ws
Prefer Bun.file over node:fs readFile/writeFile
Use Bun.$ for shelling out instead of execa

Files:

  • src/core/languages/csharp.ts
  • src/core/languages/rust.ts
  • src/core/languages/php.ts
  • src/core/languages/javascript.ts
  • src/core/languages/cpp.ts
  • src/core/languages/typescript.ts
  • src/core/languages/python.ts
  • src/core/languages/go.ts
  • src/core/languages/ruby.ts
  • src/core/scanner.ts
  • src/core/astParser.ts
  • src/formatters/astFormatter.ts
  • src/core/languages/java.ts
  • src/core/languages/index.ts
  • src/cli.ts
  • src/types/index.ts
src/core/scanner.ts

📄 CodeRabbit inference engine (CLAUDE.md)

Implement file scanning logic in src/core/scanner.ts

Files:

  • src/core/scanner.ts
src/formatters/**

📄 CodeRabbit inference engine (CLAUDE.md)

Place output formatters (tree, flat, json, csv) under src/formatters/

Files:

  • src/formatters/astFormatter.ts
src/cli.ts

📄 CodeRabbit inference engine (CLAUDE.md)

ContextCalc CLI entry point is src/cli.ts

Files:

  • src/cli.ts
src/types/index.ts

📄 CodeRabbit inference engine (CLAUDE.md)

Keep core type definitions in src/types/index.ts

Files:

  • src/types/index.ts
🧠 Learnings (1)
📚 Learning: 2025-09-12T14:25:55.847Z
Learnt from: CR
PR: agentinit/contextcalc#0
File: CLAUDE.md:0-0
Timestamp: 2025-09-12T14:25:55.847Z
Learning: Applies to src/core/scanner.ts : Implement file scanning logic in src/core/scanner.ts

Applied to files:

  • src/core/scanner.ts
  • src/cli.ts
🧬 Code graph analysis (15)
src/core/languages/csharp.ts (1)
src/core/languages/index.ts (1)
  • LanguageConfig (4-9)
src/core/languages/rust.ts (1)
src/core/languages/index.ts (1)
  • LanguageConfig (4-9)
src/core/languages/php.ts (1)
src/core/languages/index.ts (1)
  • LanguageConfig (4-9)
src/core/languages/javascript.ts (2)
src/core/languages/index.ts (1)
  • LanguageConfig (4-9)
src/core/languages/typescript.ts (1)
  • TypeScriptConfig (6-341)
src/core/languages/cpp.ts (1)
src/core/languages/index.ts (1)
  • LanguageConfig (4-9)
src/core/languages/typescript.ts (2)
src/core/languages/index.ts (1)
  • LanguageConfig (4-9)
src/types/index.ts (10)
  • SourceLocation (129-136)
  • Parameter (138-143)
  • FunctionSymbol (153-160)
  • ClassSymbol (162-168)
  • VariableSymbol (186-190)
  • InterfaceSymbol (170-174)
  • TypeSymbol (176-179)
  • EnumSymbol (181-184)
  • ImportSymbol (192-198)
  • ExportSymbol (200-204)
src/core/languages/python.ts (2)
src/core/languages/index.ts (1)
  • LanguageConfig (4-9)
src/types/index.ts (5)
  • SourceLocation (129-136)
  • Parameter (138-143)
  • FunctionSymbol (153-160)
  • ClassSymbol (162-168)
  • ImportSymbol (192-198)
src/core/languages/go.ts (1)
src/core/languages/index.ts (1)
  • LanguageConfig (4-9)
src/core/languages/ruby.ts (1)
src/core/languages/index.ts (1)
  • LanguageConfig (4-9)
src/core/scanner.ts (2)
src/core/astParser.ts (1)
  • ASTParser (7-147)
src/types/index.ts (1)
  • FileNode (1-11)
src/core/astParser.ts (2)
src/core/languages/index.ts (2)
  • initializeLanguages (32-56)
  • getLanguageByExtension (24-26)
src/types/index.ts (1)
  • ASTOptions (234-239)
src/formatters/astFormatter.ts (2)
src/types/index.ts (11)
  • TreeOptions (79-92)
  • ScanResult (104-110)
  • FunctionSymbol (153-160)
  • ClassSymbol (162-168)
  • InterfaceSymbol (170-174)
  • EnumSymbol (181-184)
  • ImportSymbol (192-198)
  • ExportSymbol (200-204)
  • NamespaceSymbol (206-209)
  • FileNode (1-11)
  • Node (23-23)
src/utils/formatUtils.ts (1)
  • formatFileSize (1-13)
src/core/languages/java.ts (1)
src/core/languages/index.ts (1)
  • LanguageConfig (4-9)
src/core/languages/index.ts (10)
src/core/languages/typescript.ts (1)
  • TypeScriptConfig (6-341)
src/core/languages/javascript.ts (1)
  • JavaScriptConfig (5-20)
src/core/languages/python.ts (1)
  • PythonConfig (6-163)
src/core/languages/go.ts (1)
  • GoConfig (5-18)
src/core/languages/rust.ts (1)
  • RustConfig (5-18)
src/core/languages/java.ts (1)
  • JavaConfig (5-18)
src/core/languages/cpp.ts (1)
  • CppConfig (5-18)
src/core/languages/csharp.ts (1)
  • CSharpConfig (5-18)
src/core/languages/ruby.ts (1)
  • RubyConfig (5-18)
src/core/languages/php.ts (1)
  • PhpConfig (5-18)
src/cli.ts (3)
src/core/scanner.ts (1)
  • DirectoryScanner (11-258)
src/types/index.ts (1)
  • TreeOptions (79-92)
src/formatters/astFormatter.ts (1)
  • formatAsAST (11-222)
🪛 Biome (2.1.2)
src/core/languages/csharp.ts

[error] 2-2: Do not shadow the global "Symbol" property.

Consider renaming this variable. It's easy to confuse the origin of variables when they're named after a known global.

(lint/suspicious/noShadowRestrictedNames)

src/core/languages/rust.ts

[error] 2-2: Do not shadow the global "Symbol" property.

Consider renaming this variable. It's easy to confuse the origin of variables when they're named after a known global.

(lint/suspicious/noShadowRestrictedNames)

src/core/languages/php.ts

[error] 2-2: Do not shadow the global "Symbol" property.

Consider renaming this variable. It's easy to confuse the origin of variables when they're named after a known global.

(lint/suspicious/noShadowRestrictedNames)

src/core/languages/javascript.ts

[error] 2-2: Do not shadow the global "Symbol" property.

Consider renaming this variable. It's easy to confuse the origin of variables when they're named after a known global.

(lint/suspicious/noShadowRestrictedNames)

src/core/languages/cpp.ts

[error] 2-2: Do not shadow the global "Symbol" property.

Consider renaming this variable. It's easy to confuse the origin of variables when they're named after a known global.

(lint/suspicious/noShadowRestrictedNames)

src/core/languages/typescript.ts

[error] 2-2: Do not shadow the global "Symbol" property.

Consider renaming this variable. It's easy to confuse the origin of variables when they're named after a known global.

(lint/suspicious/noShadowRestrictedNames)

src/core/languages/python.ts

[error] 2-2: Do not shadow the global "Symbol" property.

Consider renaming this variable. It's easy to confuse the origin of variables when they're named after a known global.

(lint/suspicious/noShadowRestrictedNames)

src/core/languages/go.ts

[error] 2-2: Do not shadow the global "Symbol" property.

Consider renaming this variable. It's easy to confuse the origin of variables when they're named after a known global.

(lint/suspicious/noShadowRestrictedNames)

src/core/languages/ruby.ts

[error] 2-2: Do not shadow the global "Symbol" property.

Consider renaming this variable. It's easy to confuse the origin of variables when they're named after a known global.

(lint/suspicious/noShadowRestrictedNames)

src/core/astParser.ts

[error] 4-4: Do not shadow the global "Symbol" property.

Consider renaming this variable. It's easy to confuse the origin of variables when they're named after a known global.

(lint/suspicious/noShadowRestrictedNames)

src/formatters/astFormatter.ts

[error] 2-2: Do not shadow the global "Symbol" property.

Consider renaming this variable. It's easy to confuse the origin of variables when they're named after a known global.

(lint/suspicious/noShadowRestrictedNames)


[error] 51-52: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Safe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)


[error] 55-62: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Safe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)


[error] 62-63: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Safe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)


[error] 68-69: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Safe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)


[error] 69-70: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Safe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)


[error] 78-79: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Safe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)


[error] 79-80: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Safe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)


[error] 86-87: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Safe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)


[error] 90-91: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Safe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)


[error] 91-92: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Safe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)


[error] 98-99: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Safe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)


[error] 103-104: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Safe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)

src/core/languages/java.ts

[error] 2-2: Do not shadow the global "Symbol" property.

Consider renaming this variable. It's easy to confuse the origin of variables when they're named after a known global.

(lint/suspicious/noShadowRestrictedNames)

src/core/languages/index.ts

[error] 2-2: Do not shadow the global "Symbol" property.

Consider renaming this variable. It's easy to confuse the origin of variables when they're named after a known global.

(lint/suspicious/noShadowRestrictedNames)

🪛 ESLint
src/core/languages/csharp.ts

[error] 14-14: 'tree' is defined but never used. Allowed unused args must match /^_/u.

(@typescript-eslint/no-unused-vars)


[error] 14-14: 'sourceCode' is defined but never used. Allowed unused args must match /^_/u.

(@typescript-eslint/no-unused-vars)

src/core/languages/rust.ts

[error] 14-14: 'tree' is defined but never used. Allowed unused args must match /^_/u.

(@typescript-eslint/no-unused-vars)


[error] 14-14: 'sourceCode' is defined but never used. Allowed unused args must match /^_/u.

(@typescript-eslint/no-unused-vars)

src/core/languages/php.ts

[error] 14-14: 'tree' is defined but never used. Allowed unused args must match /^_/u.

(@typescript-eslint/no-unused-vars)


[error] 14-14: 'sourceCode' is defined but never used. Allowed unused args must match /^_/u.

(@typescript-eslint/no-unused-vars)

src/core/languages/javascript.ts

[error] 17-17: A require() style import is forbidden.

(@typescript-eslint/no-require-imports)

src/core/languages/cpp.ts

[error] 14-14: 'tree' is defined but never used. Allowed unused args must match /^_/u.

(@typescript-eslint/no-unused-vars)


[error] 14-14: 'sourceCode' is defined but never used. Allowed unused args must match /^_/u.

(@typescript-eslint/no-unused-vars)

src/core/languages/typescript.ts

[error] 2-2: 'SymbolType' is defined but never used. Allowed unused vars must match /^_/u.

(@typescript-eslint/no-unused-vars)

src/core/languages/go.ts

[error] 14-14: 'tree' is defined but never used. Allowed unused args must match /^_/u.

(@typescript-eslint/no-unused-vars)


[error] 14-14: 'sourceCode' is defined but never used. Allowed unused args must match /^_/u.

(@typescript-eslint/no-unused-vars)

src/core/languages/ruby.ts

[error] 14-14: 'tree' is defined but never used. Allowed unused args must match /^_/u.

(@typescript-eslint/no-unused-vars)


[error] 14-14: 'sourceCode' is defined but never used. Allowed unused args must match /^_/u.

(@typescript-eslint/no-unused-vars)

src/core/astParser.ts

[error] 20-20: 'options' is assigned a value but never used. Allowed unused args must match /^_/u.

(@typescript-eslint/no-unused-vars)


[error] 85-85: 'options' is assigned a value but never used. Allowed unused args must match /^_/u.

(@typescript-eslint/no-unused-vars)

src/formatters/astFormatter.ts

[error] 2-2: 'FolderNode' is defined but never used. Allowed unused vars must match /^_/u.

(@typescript-eslint/no-unused-vars)


[error] 51-51: Unexpected lexical declaration in case block.

(no-case-declarations)


[error] 55-61: Unexpected lexical declaration in case block.

(no-case-declarations)


[error] 62-62: Unexpected lexical declaration in case block.

(no-case-declarations)


[error] 68-68: Unexpected lexical declaration in case block.

(no-case-declarations)


[error] 69-69: Unexpected lexical declaration in case block.

(no-case-declarations)


[error] 78-78: Unexpected lexical declaration in case block.

(no-case-declarations)


[error] 79-79: Unexpected lexical declaration in case block.

(no-case-declarations)


[error] 86-86: Unexpected lexical declaration in case block.

(no-case-declarations)


[error] 90-90: Unexpected lexical declaration in case block.

(no-case-declarations)


[error] 91-91: Unexpected lexical declaration in case block.

(no-case-declarations)


[error] 98-98: Unexpected lexical declaration in case block.

(no-case-declarations)


[error] 103-103: Unexpected lexical declaration in case block.

(no-case-declarations)

src/core/languages/java.ts

[error] 14-14: 'tree' is defined but never used. Allowed unused args must match /^_/u.

(@typescript-eslint/no-unused-vars)


[error] 14-14: 'sourceCode' is defined but never used. Allowed unused args must match /^_/u.

(@typescript-eslint/no-unused-vars)

🪛 GitHub Actions: CI
src/core/astParser.ts

[warning] 10-10: ESLint: Unexpected any. Specify a different type. @typescript-eslint/no-explicit-any


[error] 20-20: ESLint: 'options' is assigned a value but never used. Allowed unused args must match /^_/ (no-unused-vars)

🪛 GitHub Check: Test and Build (18)
src/core/languages/csharp.ts

[failure] 14-14:
'sourceCode' is defined but never used. Allowed unused args must match /^_/u


[failure] 14-14:
'tree' is defined but never used. Allowed unused args must match /^_/u

src/core/languages/cpp.ts

[failure] 14-14:
'sourceCode' is defined but never used. Allowed unused args must match /^_/u


[failure] 14-14:
'tree' is defined but never used. Allowed unused args must match /^_/u

src/core/languages/go.ts

[failure] 14-14:
'sourceCode' is defined but never used. Allowed unused args must match /^_/u


[failure] 14-14:
'tree' is defined but never used. Allowed unused args must match /^_/u

src/core/astParser.ts

[failure] 85-85:
'options' is assigned a value but never used. Allowed unused args must match /^_/u


[failure] 20-20:
'options' is assigned a value but never used. Allowed unused args must match /^_/u


[warning] 10-10:
Unexpected any. Specify a different type

src/core/languages/java.ts

[failure] 14-14:
'sourceCode' is defined but never used. Allowed unused args must match /^_/u


[failure] 14-14:
'tree' is defined but never used. Allowed unused args must match /^_/u

src/core/languages/index.ts

[warning] 7-7:
Unexpected any. Specify a different type

🪛 GitHub Check: Test and Build (22)
src/core/languages/csharp.ts

[failure] 14-14:
'sourceCode' is defined but never used. Allowed unused args must match /^_/u


[failure] 14-14:
'tree' is defined but never used. Allowed unused args must match /^_/u

src/core/languages/cpp.ts

[failure] 14-14:
'sourceCode' is defined but never used. Allowed unused args must match /^_/u


[failure] 14-14:
'tree' is defined but never used. Allowed unused args must match /^_/u

src/core/languages/go.ts

[failure] 14-14:
'sourceCode' is defined but never used. Allowed unused args must match /^_/u


[failure] 14-14:
'tree' is defined but never used. Allowed unused args must match /^_/u

src/core/astParser.ts

[failure] 85-85:
'options' is assigned a value but never used. Allowed unused args must match /^_/u


[failure] 20-20:
'options' is assigned a value but never used. Allowed unused args must match /^_/u


[warning] 10-10:
Unexpected any. Specify a different type

src/core/languages/java.ts

[failure] 14-14:
'sourceCode' is defined but never used. Allowed unused args must match /^_/u


[failure] 14-14:
'tree' is defined but never used. Allowed unused args must match /^_/u

src/core/languages/index.ts

[warning] 7-7:
Unexpected any. Specify a different type

🔇 Additional comments (7)
package.json (1)

21-22: LGTM! Tree-sitter integration properly configured.

The externals configuration correctly prevents bundling of tree-sitter native modules, and the dependency versions are consistently pinned across all language grammars.

Also applies to: 55-66

src/core/scanner.ts (1)

15-31: All constructor call sites verified and correct.

The single DirectoryScanner instantiation at src/cli.ts:279 properly includes the new enableAST parameter, correctly derived from outputFormat === OutputFormat.AST at line 278. The constructor signature matches the call site, and enableAST lifecycle gating is consistent throughout initialization and disposal.

src/cli.ts (1)

10-10: LGTM! Clean integration of AST output support.

The AST output format integration follows the existing patterns for other formatters (CSV, JSON, etc.) and correctly enables AST parsing in the scanner when needed. The CLI help text and completion message are appropriately updated.

Also applies to: 109-109, 277-279, 287-304, 371-371

src/core/languages/index.ts (1)

32-56: LGTM! Language initialization is well-structured.

The lazy-loading approach with dynamic imports is efficient and the registration logic is straightforward.

src/core/astParser.ts (1)

20-83: LGTM! Robust parsing implementation with graceful error handling.

The parser implementation properly handles initialization, grammar loading with caching, and parsing errors by returning empty arrays. The error handling strategy is appropriate for this use case.

Also applies to: 85-139, 141-146

src/types/index.ts (2)

112-239: LGTM! Comprehensive and well-structured AST type system.

The AST symbol type definitions are thorough and properly use discriminated unions. The type hierarchy (BaseSymbol → specialized interfaces → union type) is clean and extensible.


43-49: LGTM! Clean additions to support AST functionality.

The OutputFormat enum extension and ASTOptions interface are straightforward and integrate well with the existing type system.

Also applies to: 234-239

Fixes all 31 linting errors and 2 warnings identified by ESLint:

Symbol Naming Conflict:
- Renamed 'Symbol' type to 'ASTSymbol' throughout codebase to avoid shadowing JavaScript's global Symbol
- Updated all imports, type annotations, and usages across 14 files

Unused Parameters:
- Prefixed unused parameters with underscore in stub language implementations
- Fixed _options parameter in astParser.ts parseFile and parseText methods

ESLint Violations:
- Replaced require() with ESM import in javascript.ts
- Added braces to all case blocks in astFormatter.ts formatSymbolSignature
- Changed 'any' types to 'unknown' in grammarCache and loadGrammar

Removed Unused Imports:
- Removed FolderNode from astFormatter.ts imports

Test Results:
✅ All 83 tests passing
✅ TypeScript compilation successful
✅ ESLint: 0 errors, 0 warnings
✅ Functionality verified: AST parsing working correctly
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
src/core/languages/go.ts (1)

14-17: Unused params prefix fix — LGTM.

The underscore-prefixed parameters resolve the lint violations noted earlier.

🧹 Nitpick comments (5)
src/types/index.ts (1)

10-10: Solid AST type surface and OutputFormat extension.

Types are cohesive and align with the formatter and language extractors; FileNode.entities and OutputFormat.AST look correct.

Optional enhancement: in ImportSymbol, model aliasing (e.g., { original: string; alias?: string }[]) to preserve “as” names for named imports.

Also applies to: 47-49, 112-239

src/core/languages/go.ts (1)

9-12: Make grammar loading resilient to export shape differences.

tree-sitter grammars vary (default export vs named). Guard for both to avoid runtime failures.

Apply:

-  loadGrammar: async () => {
-    const GoLanguage = await import('tree-sitter-go');
-    return GoLanguage.default;
-  },
+  loadGrammar: async () => {
+    const mod = await import('tree-sitter-go');
+    // Support both ESM default and named exports
+    return (mod as any).default ?? (mod as any).Go ?? (mod as any).language ?? mod;
+  },

Please run a quick smoke parse on a tiny Go file to confirm Parser.setLanguage(await loadGrammar()) succeeds.

src/core/languages/typescript.ts (3)

85-123: Parse extends/implements into discrete names.

Currently extends/implements are captured as full clauses (e.g., “implements A, B”). Normalize to arrays of identifiers.

-      const extendsNode = node.children.find(c => c.type === 'class_heritage');
-      const implementsNode = node.children.find(c => c.type === 'implements_clause');
+      const extendsClause = node.children.find(c => c.type === 'extends_clause') 
+        ?? node.children.find(c => c.type === 'class_heritage' && c.text.startsWith('extends'));
+      const implementsClause = node.children.find(c => c.type === 'implements_clause')
+        ?? node.children.find(c => c.type === 'class_heritage' && c.text.includes('implements'));
@@
-        extends: extendsNode ? getNodeText(extendsNode) : undefined,
-        implements: implementsNode ? [getNodeText(implementsNode)] : undefined,
+        extends: extendsClause
+          ? getNodeText(extendsClause).replace(/^extends\s+/,'').split(',').map(s => s.trim())
+          : undefined,
+        implements: implementsClause
+          ? getNodeText(implementsClause).replace(/^implements\s+/,'').split(',').map(s => s.trim())
+          : undefined,

60-79: Derive function signature via body field (avoid split('{') heuristics).

Split on “{” breaks with generics or object types. Slice to the start of the body instead.

-      return {
+      const body = node.childForFieldName('body');
+      const header = body ? sourceCode.slice(node.startIndex, body.startIndex).trim() : getNodeText(node);
+      return {
         name: getNodeText(nameNode),
         type: ST.FUNCTION,
         location: getLocation(node),
         parameters,
         returnType: returnTypeNode ? getNodeText(returnTypeNode) : undefined,
         async: isAsync,
         generator: isGenerator,
-        signature: getNodeText(node).split('{')[0]?.trim() || getNodeText(node)
+        signature: header
       };

34-58: Handle rest parameters and destructuring in extractParameters.

Support rest_parameter and keep names readable for patterns.

-      if (formalParams) {
+      if (formalParams) {
         for (const child of formalParams.namedChildren) {
-          if (child.type === 'required_parameter' || child.type === 'optional_parameter') {
+          if (child.type === 'required_parameter' || child.type === 'optional_parameter' || child.type === 'rest_parameter') {
             const nameNode = child.childForFieldName('pattern') || child.children.find(c => c.type === 'identifier');
             const typeNode = child.childForFieldName('type');
             const defaultValue = child.children.find(c => c.type === 'initializer');
+            const isRest = child.type === 'rest_parameter';
             if (nameNode) {
               params.push({
-                name: getNodeText(nameNode),
+                name: (isRest ? '...' : '') + getNodeText(nameNode),
                 type: typeNode ? getNodeText(typeNode) : undefined,
                 optional: child.type === 'optional_parameter',
                 defaultValue: defaultValue ? getNodeText(defaultValue) : undefined
               });
             }
           }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 76c0d1b and 3f29a7e.

📒 Files selected for processing (14)
  • src/core/astParser.ts (1 hunks)
  • src/core/languages/cpp.ts (1 hunks)
  • src/core/languages/csharp.ts (1 hunks)
  • src/core/languages/go.ts (1 hunks)
  • src/core/languages/index.ts (1 hunks)
  • src/core/languages/java.ts (1 hunks)
  • src/core/languages/javascript.ts (1 hunks)
  • src/core/languages/php.ts (1 hunks)
  • src/core/languages/python.ts (1 hunks)
  • src/core/languages/ruby.ts (1 hunks)
  • src/core/languages/rust.ts (1 hunks)
  • src/core/languages/typescript.ts (1 hunks)
  • src/formatters/astFormatter.ts (1 hunks)
  • src/types/index.ts (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (10)
  • src/core/languages/ruby.ts
  • src/core/astParser.ts
  • src/core/languages/csharp.ts
  • src/core/languages/php.ts
  • src/core/languages/java.ts
  • src/core/languages/cpp.ts
  • src/core/languages/index.ts
  • src/core/languages/javascript.ts
  • src/core/languages/rust.ts
  • src/core/languages/python.ts
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{ts,tsx,js,jsx}: Do not use dotenv; Bun loads .env automatically
Use Bun.serve() for HTTP/WebSocket/HTTPS routes; do not use Express
Use bun:sqlite for SQLite; do not use better-sqlite3
Use Bun.redis for Redis; do not use ioredis
Use Bun.sql for Postgres; do not use pg or postgres.js
Use built-in WebSocket; do not use ws
Prefer Bun.file over node:fs readFile/writeFile
Use Bun.$ for shelling out instead of execa

Files:

  • src/core/languages/go.ts
  • src/core/languages/typescript.ts
  • src/types/index.ts
  • src/formatters/astFormatter.ts
src/types/index.ts

📄 CodeRabbit inference engine (CLAUDE.md)

Keep core type definitions in src/types/index.ts

Files:

  • src/types/index.ts
src/formatters/**

📄 CodeRabbit inference engine (CLAUDE.md)

Place output formatters (tree, flat, json, csv) under src/formatters/

Files:

  • src/formatters/astFormatter.ts
🧬 Code graph analysis (3)
src/core/languages/go.ts (2)
src/core/languages/index.ts (1)
  • LanguageConfig (4-9)
src/types/index.ts (1)
  • ASTSymbol (221-232)
src/core/languages/typescript.ts (2)
src/core/languages/index.ts (1)
  • LanguageConfig (4-9)
src/types/index.ts (11)
  • ASTSymbol (221-232)
  • SourceLocation (129-136)
  • Parameter (138-143)
  • FunctionSymbol (153-160)
  • ClassSymbol (162-168)
  • VariableSymbol (186-190)
  • InterfaceSymbol (170-174)
  • TypeSymbol (176-179)
  • EnumSymbol (181-184)
  • ImportSymbol (192-198)
  • ExportSymbol (200-204)
src/formatters/astFormatter.ts (2)
src/types/index.ts (12)
  • TreeOptions (79-92)
  • ScanResult (104-110)
  • ASTSymbol (221-232)
  • FunctionSymbol (153-160)
  • ClassSymbol (162-168)
  • InterfaceSymbol (170-174)
  • EnumSymbol (181-184)
  • ImportSymbol (192-198)
  • ExportSymbol (200-204)
  • NamespaceSymbol (206-209)
  • FileNode (1-11)
  • Node (23-23)
src/utils/formatUtils.ts (1)
  • formatFileSize (1-13)

- Add top-level guard to prevent nested variables, functions, classes, etc. from being extracted as top-level symbols
- Implement showTokensPerSymbol option in AST formatter to display per-symbol token counts when enabled
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
src/formatters/astFormatter.ts (1)

130-156: LGTM! Token-per-symbol display has been implemented.

The showTokensPerSymbol option is now properly utilized (lines 135-138) and included in the symbol output (line 143). This addresses the previous review comment about the unused option.

Based on past review comment.

src/core/languages/typescript.ts (1)

289-344: Make the top-level check more robust to handle different root node types.

The isTopLevel check (line 291) only tests for 'program', but tree-sitter grammars can use different root node types (e.g., 'source_file'). The previous review suggested checking both to ensure compatibility.

Apply this diff:

     function traverse(node: Parser.SyntaxNode) {
-      // Check if this node is at the top level (parent is program)
-      const isTopLevel = node.parent?.type === 'program';
+      // Check if this node is at the top level (parent is program or source_file, or is root itself)
+      const isTopLevel = !node.parent || node.parent === rootNode || 
+                         node.parent.type === 'program' || node.parent.type === 'source_file';

Based on past review comment.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3f29a7e and 807d11d.

📒 Files selected for processing (2)
  • src/core/languages/typescript.ts (1 hunks)
  • src/formatters/astFormatter.ts (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{ts,tsx,js,jsx}: Do not use dotenv; Bun loads .env automatically
Use Bun.serve() for HTTP/WebSocket/HTTPS routes; do not use Express
Use bun:sqlite for SQLite; do not use better-sqlite3
Use Bun.redis for Redis; do not use ioredis
Use Bun.sql for Postgres; do not use pg or postgres.js
Use built-in WebSocket; do not use ws
Prefer Bun.file over node:fs readFile/writeFile
Use Bun.$ for shelling out instead of execa

Files:

  • src/formatters/astFormatter.ts
  • src/core/languages/typescript.ts
src/formatters/**

📄 CodeRabbit inference engine (CLAUDE.md)

Place output formatters (tree, flat, json, csv) under src/formatters/

Files:

  • src/formatters/astFormatter.ts
🧬 Code graph analysis (2)
src/formatters/astFormatter.ts (2)
src/types/index.ts (12)
  • TreeOptions (79-92)
  • ScanResult (104-110)
  • ASTSymbol (221-232)
  • FunctionSymbol (153-160)
  • ClassSymbol (162-168)
  • InterfaceSymbol (170-174)
  • EnumSymbol (181-184)
  • ImportSymbol (192-198)
  • ExportSymbol (200-204)
  • NamespaceSymbol (206-209)
  • FileNode (1-11)
  • Node (23-23)
src/utils/formatUtils.ts (1)
  • formatFileSize (1-13)
src/core/languages/typescript.ts (2)
src/core/languages/index.ts (1)
  • LanguageConfig (4-9)
src/types/index.ts (11)
  • ASTSymbol (221-232)
  • SourceLocation (129-136)
  • Parameter (138-143)
  • FunctionSymbol (153-160)
  • ClassSymbol (162-168)
  • VariableSymbol (186-190)
  • InterfaceSymbol (170-174)
  • TypeSymbol (176-179)
  • EnumSymbol (181-184)
  • ImportSymbol (192-198)
  • ExportSymbol (200-204)
🔇 Additional comments (1)
src/formatters/astFormatter.ts (1)

235-270: LGTM! Symbol counting correctly handles nested structures.

The counting logic properly distinguishes between enum members (lines 256-258) and nested ASTSymbol members (lines 260-263), which aligns with the type definitions.

Comment on lines 81 to 123
function extractClass(node: Parser.SyntaxNode): ClassSymbol | null {
const nameNode = node.childForFieldName('name');
if (!nameNode) return null;

const members: ASTSymbol[] = [];
const bodyNode = node.childForFieldName('body');
const extendsNode = node.children.find(c => c.type === 'class_heritage');
const implementsNode = node.children.find(c => c.type === 'implements_clause');
const isAbstract = node.children.some(c => c.type === 'abstract');

if (bodyNode) {
for (const child of bodyNode.namedChildren) {
if (child.type === 'method_definition') {
const method = extractFunction(child);
if (method) {
method.type = ST.METHOD;
members.push(method);
}
} else if (child.type === 'public_field_definition' || child.type === 'field_definition') {
const propName = child.childForFieldName('name');
const propType = child.childForFieldName('type');
if (propName) {
members.push({
name: getNodeText(propName),
type: ST.VARIABLE,
location: getLocation(child),
variableType: propType ? getNodeText(propType) : undefined
} as VariableSymbol);
}
}
}
}

return {
name: getNodeText(nameNode),
type: ST.CLASS,
location: getLocation(node),
extends: extendsNode ? getNodeText(extendsNode) : undefined,
implements: implementsNode ? [getNodeText(implementsNode)] : undefined,
members,
abstract: isAbstract
};
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Consider using field-based lookups for class metadata.

Lines 87-89 search through node.children to find heritage clauses and abstract modifiers. Tree-sitter's field-based API (childForFieldName()) is typically more reliable than searching by type. Additionally, 'abstract' is unlikely to be a node type (similar to the async/generator issue).

Verify the correct field names and node types in tree-sitter-typescript documentation:

If field names are available (e.g., heritage, decorator, modifiers), refactor to use childForFieldName() instead of children.find().


🌐 Web query:

tree-sitter-typescript class_declaration node structure and field names

💡 Result:

Summary of the tree-sitter‑typescript class_declaration node (common field names)

Typical named fields you can expect on a class_declaration node in the tree-sitter TypeScript grammar:

  • name — the class identifier (identifier node). [1]
  • type_parameters — generic type parameter list (type_parameters node). [1]
  • superclass / superclass: (superclass ...) — the extends clause / superclass name(s). [2]
  • interfaces / super_interfaces / implements — implemented interfaces (super_interfaces / type_list style). [2]
  • body — the class body (class_body node). [1][2]
  • decorators / modifiers (when present) — decorator or modifier tokens attached to the class (grammar-dependent). [1][2]

Notes:

  • Exact field keys may vary slightly between versions (e.g., superclass vs. superclass_name or super_interfaces vs. interfaces), but the canonical node-types.json for the tree-sitter-typescript grammar exposes the name, type_parameters, superclass/super_interfaces (extends/implements), and body fields used by queries. See the grammar/node-types data for the precise names in the version you use. [1][2]

Sources:
[1] tree-sitter static node-types / tree-sitter-typescript node-types (node-types.json) documentation.
[2] Example tagging / stack-graph queries that match class_declaration fields (shows name, body, superclass/superclass_name, type_parameters, interfaces/super_interfaces).



🏁 Script executed:

#!/bin/bash
# Find tree-sitter-typescript version and extract field name details from node-types
if [ -f "package.json" ]; then
  cat package.json | grep -A2 -B2 "tree-sitter"
fi

if [ -f "package-lock.json" ]; then
  echo "=== tree-sitter packages in lock file ==="
  grep -A5 '"tree-sitter' package-lock.json | head -30
fi

# Look for node_modules or grammar definitions
find . -name "node-types.json" -o -name "grammar.js" 2>/dev/null | head -10

Length of output: 1032


🏁 Script executed:

#!/bin/bash
# Examine the actual extractClass implementation at lines 81-123
cat src/core/languages/typescript.ts | sed -n '81,123p'

Length of output: 1678


🏁 Script executed:

#!/bin/bash
# Search for other uses of class heritage/implements in the file to see pattern
rg -n 'class_heritage|implements_clause|superclass|super_interfaces' src/core/languages/typescript.ts -C2

Length of output: 351



🏁 Script executed:

#!/bin/bash
# Search for modifier/abstract handling patterns elsewhere in the file
rg -n 'modifiers|async|generator' src/core/languages/typescript.ts -C2 | head -40

Length of output: 705


🏁 Script executed:

#!/bin/bash
# Check if there are test files or examples showing tree-sitter usage
find . -type f \( -name "*.test.ts" -o -name "*.spec.ts" -o -name "*example*" \) -exec grep -l "class_declaration\|extractClass" {} \; 2>/dev/null | head -5

Length of output: 0


Fix class metadata extraction using field-based lookups.

Lines 87-89 search for class metadata using incorrect node types (class_heritage, implements_clause, abstract). These type names don't exist in tree-sitter-typescript v0.23.0; the search will fail silently, preventing extends/implements/abstract from being detected.

Replace with field-based lookups:

  • Use childForFieldName('superclass') instead of children.find(c => c.type === 'class_heritage')
  • Use childForFieldName('super_interfaces') instead of children.find(c => c.type === 'implements_clause')
  • Check the modifiers field (via childForFieldName('modifiers')) for the abstract modifier, not a standalone abstract node type
🤖 Prompt for AI Agents
In src/core/languages/typescript.ts around lines 81 to 123, the class extractor
currently looks for extends/implements/abstract via children.find(...) using
non-existent node types; replace those lookups with field-based calls: use
node.childForFieldName('superclass') for extends,
node.childForFieldName('super_interfaces') for implements (if present extract
the individual interfaces from that node, e.g., map its namedChildren to
getNodeText), and use node.childForFieldName('modifiers') to detect abstract
(check modifiers.namedChildren for a child with type 'abstract' or whose text
equals 'abstract'); update the returned extends/implements/abstract values
accordingly so they are populated when present.

… Ruby, PHP, Swift)

- Implement full symbol extraction for Go: functions, methods, structs, interfaces, imports, constants, variables
- Implement full symbol extraction for Rust: functions, structs, enums, traits, impl blocks, imports
- Implement full symbol extraction for Java: classes, interfaces, methods, enums, imports, fields
- Implement full symbol extraction for C++: functions, classes, structs, namespaces, enums, includes
- Implement full symbol extraction for C#: classes, interfaces, structs, methods, properties, namespaces
- Implement full symbol extraction for Ruby: classes, modules, methods, constants, requires
- Implement full symbol extraction for PHP: classes, interfaces, traits, functions, namespaces
- Add new Swift language support with full AST parsing (functions, classes, structs, enums, protocols)
- Add tree-sitter-swift@0.6.0 dependency
- Register Swift in language registry

All implementations follow the comprehensive pattern established by Python/TypeScript parsers,
extracting detailed metadata about functions, classes, types, and their members for accurate
codebase analysis.
…mprehensive tests

- Add file size limit to AST parser (default 10MB) to prevent OOM issues
- Add comprehensive JSDoc documentation to ASTParser class methods
- Improve error handling and logging throughout AST parsing pipeline
- Update README with AST output documentation and supported languages list
- Add symbol icons reference (ƒ functions, C classes, I interfaces, T types, E enums, v variables)
- Add CSV output format documentation to README

Tests:
- Add comprehensive AST parser test suite (astParser.test.ts)
- Add AST formatter test suite with 25+ test cases (astFormatter.test.ts)
- Add language-specific tests for Python and TypeScript parsers
- All tests passing (159 tests, 424 expectations)

Core improvements:
- Enhance cache version handling for AST symbols
- Improve scanner integration with AST extraction
- Add STRUCT and TRAIT symbol types to type definitions
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

🧹 Nitpick comments (23)
src/core/languages/java.ts (1)

201-225: Consider enhancing import extraction for Java-specific patterns.

The current implementation splits the import path and extracts the last component (e.g., "List" from "java.util.List"), which works for simple cases. However, it doesn't distinguish between:

  • Regular imports: import java.util.List;
  • Wildcard imports: import java.util.*;
  • Static imports: import static java.lang.Math.PI;

Consider parsing these patterns separately to provide more detailed import information (e.g., marking wildcard imports or static imports with metadata).

src/core/languages/cpp.ts (3)

44-54: Avoid calling childForFieldName twice.

The parameter name extraction calls childForFieldName('declarator') twice—once in the condition (line 46) and again on line 47. Store the result in a variable to improve efficiency and readability.

Apply this diff:

           if (child.type === 'parameter_declaration') {
             const declaratorNode = child.childForFieldName('declarator');
             const typeNode = child.childForFieldName('type');
 
             if (declaratorNode) {
-              const name = declaratorNode.type === 'identifier' ? getNodeText(declaratorNode) :
-                           declaratorNode.childForFieldName('declarator') ?
-                           getNodeText(declaratorNode.childForFieldName('declarator')!) :
-                           getNodeText(declaratorNode);
+              let name: string;
+              if (declaratorNode.type === 'identifier') {
+                name = getNodeText(declaratorNode);
+              } else {
+                const nestedDeclarator = declaratorNode.childForFieldName('declarator');
+                name = nestedDeclarator ? getNodeText(nestedDeclarator) : getNodeText(declaratorNode);
+              }
 
               params.push({
                 name,

221-239: Consider more robust include parsing.

The regex-based parsing handles common cases but may miss edge cases like escaped quotes, macros in include paths, or unusual whitespace. For the initial implementation, this is acceptable.

If you encounter parsing issues in real-world code, consider using tree-sitter's AST structure to extract the include path more reliably:

function extractImport(node: Parser.SyntaxNode): ImportSymbol | null {
  // tree-sitter-cpp parses the path as a string or system_lib_string child node
  const pathNode = node.childForFieldName('path');
  if (!pathNode) return null;
  
  const from = getNodeText(pathNode).replace(/^["<]|[">]$/g, '');
  
  return {
    name: from,
    type: ST.IMPORT,
    location: getLocation(node),
    from,
    imports: [from]
  };
}

111-120: Consider iterating direct children instead of descendantsOfType.

Using descendantsOfType recursively searches all descendants, which can be less efficient than iterating through direct children or using childForFieldName when the structure is predictable.

For field declarations, you can use the tree-sitter field name directly:

// In extractClass (line 111-120)
} else if (child.type === 'field_declaration') {
  const declarator = child.childForFieldName('declarator');
  if (declarator) {
    const fieldName = declarator.type === 'field_identifier' ? 
                      getNodeText(declarator) : 
                      getNodeText(declarator.childForFieldName('declarator') || declarator);
    members.push({
      name: fieldName,
      type: ST.VARIABLE,
      location: getLocation(child)
    } as VariableSymbol);
  }
}

// Similar pattern for extractStruct (line 140-151)

Also applies to: 140-151

src/core/languages/csharp.ts (2)

236-261: Namespace extraction is incomplete.

The extractNamespace function only extracts classes and interfaces (lines 245-251), but C# namespaces can contain other top-level declarations such as structs, enums, delegates, and nested namespaces.

Consider extending the namespace extraction to handle all declaration types:

     function extractNamespace(node: Parser.SyntaxNode): NamespaceSymbol | null {
       const nameNode = node.childForFieldName('name');
       if (!nameNode) return null;
 
       const members: ASTSymbol[] = [];
       const bodyNode = node.childForFieldName('body');
 
       if (bodyNode) {
         for (const child of bodyNode.namedChildren) {
           if (child.type === 'class_declaration') {
             const cls = extractClass(child);
             if (cls) members.push(cls);
           } else if (child.type === 'interface_declaration') {
             const iface = extractInterface(child);
             if (iface) members.push(iface);
+          } else if (child.type === 'struct_declaration') {
+            const struct = extractStruct(child);
+            if (struct) members.push(struct);
+          } else if (child.type === 'enum_declaration') {
+            const enumDecl = extractEnum(child);
+            if (enumDecl) members.push(enumDecl);
+          } else if (child.type === 'namespace_declaration') {
+            const ns = extractNamespace(child);
+            if (ns) members.push(ns);
           }
         }
       }
 
       return {
         name: getNodeText(nameNode),
         type: ST.NAMESPACE,
         location: getLocation(node),
         members
       };
     }

263-280: Using directive extraction is overly simplistic.

C# using directives have different forms that aren't captured by the current implementation:

  1. Namespace import: using System.Collections.Generic;
  2. Static using: using static System.Math;
  3. Alias directive: using Alias = Some.Long.Namespace;

The current implementation only extracts the name field and treats it as both the from and a single imports entry, which doesn't accurately represent these different forms.

Consider enhancing the extraction to handle different using directive types:

function extractImport(node: Parser.SyntaxNode): ImportSymbol | null {
  const nameNode = node.childForFieldName('name');
  if (!nameNode) return null;

  const namespaceOrType = getNodeText(nameNode);
  const isStatic = node.children.some(c => c.type === 'static');
  
  // Check for alias (using Alias = Target;)
  const aliasNode = node.childForFieldName('alias');
  
  if (aliasNode) {
    return {
      name: getNodeText(aliasNode),
      type: ST.IMPORT,
      location: getLocation(node),
      from: namespaceOrType,
      imports: [getNodeText(aliasNode)]
    };
  }

  return {
    name: namespaceOrType,
    type: ST.IMPORT,
    location: getLocation(node),
    from: namespaceOrType,
    imports: isStatic ? [`static ${namespaceOrType}`] : [namespaceOrType]
  };
}

Note: Verify the tree-sitter-c-sharp grammar's field names for using directives to ensure accurate extraction.

src/core/languages/rust.ts (2)

19-32: Extract shared helper functions to reduce duplication.

The getLocation and getNodeText functions are identical across multiple language configs (rust.ts, go.ts, and likely others). Consider extracting these to a shared utility module to improve maintainability and reduce code duplication.

For example, create src/core/languages/utils.ts:

import type Parser from 'tree-sitter';
import type { SourceLocation } from '../../types/index.js';

export function getLocation(node: Parser.SyntaxNode): SourceLocation {
  return {
    startLine: node.startPosition.row + 1,
    startColumn: node.startPosition.column,
    endLine: node.endPosition.row + 1,
    endColumn: node.endPosition.column,
    startByte: node.startIndex,
    endByte: node.endIndex
  };
}

export function getNodeText(node: Parser.SyntaxNode, sourceCode: string): string {
  return sourceCode.slice(node.startIndex, node.endIndex);
}

Then import and use in each language config.


34-61: Consider simplifying the parameter name extraction.

The nested ternary operator at lines 45-47 is complex and reduces readability. Consider refactoring for clarity.

Apply this diff:

-            if (patternNode) {
-              const name = patternNode.type === 'identifier' ? getNodeText(patternNode) :
-                           patternNode.childForFieldName('name') ? getNodeText(patternNode.childForFieldName('name')!) :
-                           getNodeText(patternNode);
+            if (patternNode) {
+              let name: string;
+              if (patternNode.type === 'identifier') {
+                name = getNodeText(patternNode);
+              } else {
+                const nameChild = patternNode.childForFieldName('name');
+                name = nameChild ? getNodeText(nameChild) : getNodeText(patternNode);
+              }
src/core/languages/go.ts (1)

73-89: Method signature could include parameters and return type.

The method signature at line 87 only includes the receiver and method name, but excludes parameters and return type. Consider including these for a more complete signature representation.

For example:

       return {
         name: getNodeText(nameNode),
         type: ST.METHOD,
         location: getLocation(node),
         parameters,
         returnType: resultNode ? getNodeText(resultNode) : undefined,
-        signature: receiverNode ? `func ${getNodeText(receiverNode)} ${getNodeText(nameNode)}` : undefined
+        signature: receiverNode ? 
+          `func ${getNodeText(receiverNode)} ${getNodeText(nameNode)}(${parameters.map(p => `${p.name} ${p.type || ''}`).join(', ')})${resultNode ? ` ${getNodeText(resultNode)}` : ''}` 
+          : undefined
       };

This would produce signatures like func (r *Receiver) methodName(param1 string, param2 int) error.

src/core/languages/ruby.ts (8)

106-124: Prefer NamespaceSymbol/ST.NAMESPACE for Ruby modules

Modules are namespaces/mixins; typing them as CLASS confuses downstream formatters. Switch to NamespaceSymbol + ST.NAMESPACE (types appear supported in src/types/index.ts ASTSymbol union).

Apply:

-import type { ASTSymbol, FunctionSymbol, ClassSymbol, ImportSymbol, VariableSymbol, SourceLocation, Parameter } from '../../types/index.js';
+import type { ASTSymbol, FunctionSymbol, ClassSymbol, NamespaceSymbol, ImportSymbol, VariableSymbol, SourceLocation, Parameter } from '../../types/index.js';
@@
-    function extractModule(node: Parser.SyntaxNode): ClassSymbol | null {
+    function extractModule(node: Parser.SyntaxNode): NamespaceSymbol | null {
@@
-      return {
+      return {
         name: getNodeText(nameNode),
-        type: ST.CLASS, // Use CLASS type for modules as they're similar
+        type: ST.NAMESPACE,
         location: getLocation(node),
         members
       };
     }

Confirm ST.NAMESPACE and NamespaceSymbol exist; if not, keep CLASS as a fallback and open a types follow-up.


34-56: Parameter extraction: handle method_parameters and splat/block/kw variants

Ruby exposes parameters under method.parameters: method_parameters; also supports splat/hash_splat/block params. Make parsing resilient. (tree-sitter.github.io)

Apply:

-    function extractParameters(node: Parser.SyntaxNode): Parameter[] {
-      const params: Parameter[] = [];
-      const paramsNode = node.childForFieldName('parameters');
-      if (paramsNode) {
-        for (const child of paramsNode.namedChildren) {
-          if (child.type === 'identifier' || child.type === 'optional_parameter' || child.type === 'keyword_parameter') {
-            const name = child.type === 'identifier' ? getNodeText(child) :
-                         child.childForFieldName('name') ? getNodeText(child.childForFieldName('name')!) :
-                         getNodeText(child);
-            const defaultValue = child.childForFieldName('value');
-            params.push({
-              name,
-              defaultValue: defaultValue ? getNodeText(defaultValue) : undefined
-            });
-          }
-        }
-      }
-      return params;
-    }
+    function extractParameters(node: Parser.SyntaxNode): Parameter[] {
+      const params: Parameter[] = [];
+      const paramsNode =
+        node.childForFieldName('parameters') ??
+        node.descendantsOfType('method_parameters')[0] ??
+        node.descendantsOfType('parameters')[0];
+      if (!paramsNode) return params;
+      const supported = new Set([
+        'identifier','optional_parameter','keyword_parameter',
+        'splat_parameter','hash_splat_parameter','block_parameter'
+      ]);
+      for (const p of paramsNode.namedChildren) {
+        if (!supported.has(p.type)) continue;
+        const nameNode = p.childForFieldName('name') ?? (p.type === 'identifier' ? p : p.child(0));
+        const defaultNode = p.childForFieldName('value') ?? p.childForFieldName('default') ?? null;
+        params.push({
+          name: nameNode ? getNodeText(nameNode) : '?',
+          optional: p.type.includes('optional') || p.type.includes('keyword'),
+          defaultValue: defaultNode ? getNodeText(defaultNode) : undefined,
+        });
+      }
+      return params;
+    }

85-95: Deduplicate instance variables and add variableType metadata

Instance variables can appear many times; emit once per name and tag as instance for clarity.

Apply:

-      for (const child of node.descendantsOfType('instance_variable')) {
+      const seenIvars = new Set<string>();
+      for (const child of node.descendantsOfType('instance_variable')) {
         const varName = getNodeText(child);
         if (varName) {
-          members.push({
+          if (seenIvars.has(varName)) continue;
+          seenIvars.add(varName);
+          members.push({
             name: varName,
             type: ST.VARIABLE,
-            location: getLocation(child)
+            location: getLocation(child),
+            variableType: 'instance'
           } as VariableSymbol);
         }
       }

126-153: Require/include parsing: support argument_list, constants, and multiple args

Ruby call nodes expose method and arguments: argument_list; include often uses constants (not strings) and can take multiple modules. Parse all args and normalize text; set imports to all, from to first. (tree-sitter.github.io)

Apply:

-    function extractRequire(node: Parser.SyntaxNode): ImportSymbol | null {
+    function extractRequire(node: Parser.SyntaxNode): ImportSymbol | null {
       // Look for method calls with 'require' or 'require_relative'
       const methodNode = node.childForFieldName('method');
       if (!methodNode) return null;
 
       const methodName = getNodeText(methodNode);
       if (methodName !== 'require' && methodName !== 'require_relative' && methodName !== 'include') {
         return null;
       }
 
-      const args = node.childForFieldName('arguments');
-      let from = '';
-
-      if (args) {
-        const stringNode = args.namedChildren[0];
-        if (stringNode) {
-          from = getNodeText(stringNode).replace(/['"]/g, '');
-        }
-      }
-
-      return {
-        name: from,
-        type: ST.IMPORT,
-        location: getLocation(node),
-        from,
-        imports: [from]
-      };
+      const args = node.childForFieldName('arguments');
+      const imports: string[] = [];
+      if (args) {
+        for (const a of args.namedChildren) {
+          const raw = getNodeText(a);
+          imports.push(raw.replace(/['"]/g, ''));
+        }
+      }
+      const from = imports[0] ?? methodName;
+      return {
+        name: from,
+        type: ST.IMPORT,
+        location: getLocation(node),
+        from,
+        imports: imports.length ? imports : [from],
+      };
     }

To be safe, confirm tree-sitter-ruby emits fields method and arguments: argument_list on call, as shown in docs. If your local grammar differs, adjust field names accordingly.


10-13: Make loadGrammar robust to ESM/CJS export shapes

Some language packages export default, others export language; guard both to avoid runtime errors in Bun/Node.

Apply:

-  loadGrammar: async () => {
-    const RubyLanguage = await import('tree-sitter-ruby');
-    return RubyLanguage.default;
-  },
+  loadGrammar: async () => {
+    const mod: any = await import('tree-sitter-ruby');
+    return (mod?.default ?? mod?.language ?? mod);
+  },

If you’ve standardized on one export shape across languages, align with that instead.


169-201: Traversal: consider top-level-only loop to reduce work

You recurse entire tree but only handle top-level nodes. Iterate rootNode.namedChildren and drop recursion for a small speed win.

Apply:

-    function traverse(node: Parser.SyntaxNode) {
-      const isTopLevel = node.parent?.type === 'program';
-      if (isTopLevel) {
-        // existing cases...
-      }
-      for (const child of node.children) {
-        traverse(child);
-      }
-    }
-    traverse(rootNode);
+    for (const node of rootNode.namedChildren) {
+      // existing top-level cases...
+    }

34-70: Method metadata: consider capturing receiver for singleton methods and signature

Optional: if method has a receiver (def self.foo), record it (e.g., signature or name like self.foo) for clarity.

Apply:

-    function extractMethod(node: Parser.SyntaxNode): FunctionSymbol | null {
+    function extractMethod(node: Parser.SyntaxNode): FunctionSymbol | null {
       const nameNode = node.childForFieldName('name');
       if (!nameNode) return null;
+      const receiver = node.childForFieldName('receiver');
@@
-      return {
+      return {
         name: getNodeText(nameNode),
         type: ST.METHOD,
         location: getLocation(node),
-        parameters
+        parameters,
+        signature: receiver ? `${getNodeText(receiver)}.${getNodeText(nameNode)}` : undefined,
       };
     }

85-95: Minor: sort/dedupe imports and ivars for stable output

After building members, consider sorting and Set-based dedupe to keep formatter output deterministic.

Also applies to: 126-153

src/core/languages/swift.ts (6)

77-118: Split class inheritance into extends (superclass) and implements (protocols).

Currently you store the entire inheritance clause in extends. Parse identifiers; first is superclass (if any), remainder are protocols. ClassSymbol supports implements?: string[] (see src/types/index.ts).

Apply this diff:

@@
-      const inheritanceNode = node.childForFieldName('inheritance');
+      const inheritanceNode = node.childForFieldName('inheritance');
+      const heritage = inheritanceNode
+        ? inheritanceNode.descendantsOfType(['type_identifier', 'scoped_identifier']).map(getNodeText)
+        : [];
@@
-        extends: inheritanceNode ? getNodeText(inheritanceNode) : undefined,
+        extends: heritage.length ? heritage[0] : undefined,
+        implements: heritage.length > 1 ? heritage.slice(1) : undefined,
         members
       };

185-226: Protocol inheritance should be an array of protocols, not a single joined string.

Emit InterfaceSymbol.extends as string[] of type identifiers.

Apply this diff:

@@
-      const inheritanceNode = node.childForFieldName('inheritance');
+      const inheritanceNode = node.childForFieldName('inheritance');
@@
-      return {
+      const inherited = inheritanceNode
+        ? inheritanceNode.descendantsOfType(['type_identifier', 'scoped_identifier']).map(getNodeText)
+        : [];
+      return {
         name: getNodeText(nameNode),
         type: ST.INTERFACE,
         location: getLocation(node),
-        extends: inheritanceNode ? [getNodeText(inheritanceNode)] : undefined,
+        extends: inherited.length ? inherited : undefined,
         members
       };

297-299: Avoid traversing punctuation/anonymous nodes.

Use namedChildren to cut noise and speed up traversal.

Apply this diff:

-      for (const child of node.children) {
+      for (const child of node.namedChildren) {
         traverse(child);
       }

228-247: Import representation duplicates module name in both from and imports.

Swift imports usually bring a module (optionally with kind: struct/class/enum). Consider:

  • from: module name (e.g., Foundation)
  • imports: [] (or specific symbol when using import kind syntax)
    This avoids redundancy and matches ImportSymbol semantics.

Please confirm how the formatter expects ImportSymbol for single-module imports to avoid regressions.


153-183: Enum members: raw values/associated values are dropped.

If feasible, capture raw-value expressions (e.g., case a = 1) in members[].value; associated values could be reflected in value as a signature string.

I can extend this with a minimal extractor for raw values using descendantsOfType on enum_case patterns.


10-13: Grammar import interop/typing.

Dynamic import shape can vary (CJS vs ESM). Add a safe fallback and clarify return typing.

Apply this diff:

   loadGrammar: async () => {
-    const SwiftLanguage = await import('tree-sitter-swift');
-    return SwiftLanguage.default;
+    const mod: any = await import('tree-sitter-swift');
+    return (mod.default ?? mod) as unknown; // Parser.Language
   },
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 807d11d and 082043a.

⛔ Files ignored due to path filters (1)
  • bun.lock is excluded by !**/*.lock
📒 Files selected for processing (10)
  • package.json (2 hunks)
  • src/core/languages/cpp.ts (1 hunks)
  • src/core/languages/csharp.ts (1 hunks)
  • src/core/languages/go.ts (1 hunks)
  • src/core/languages/index.ts (1 hunks)
  • src/core/languages/java.ts (1 hunks)
  • src/core/languages/php.ts (1 hunks)
  • src/core/languages/ruby.ts (1 hunks)
  • src/core/languages/rust.ts (1 hunks)
  • src/core/languages/swift.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/core/languages/index.ts
  • package.json
  • src/core/languages/php.ts
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{ts,tsx,js,jsx}: Do not use dotenv; Bun loads .env automatically
Use Bun.serve() for HTTP/WebSocket/HTTPS routes; do not use Express
Use bun:sqlite for SQLite; do not use better-sqlite3
Use Bun.redis for Redis; do not use ioredis
Use Bun.sql for Postgres; do not use pg or postgres.js
Use built-in WebSocket; do not use ws
Prefer Bun.file over node:fs readFile/writeFile
Use Bun.$ for shelling out instead of execa

Files:

  • src/core/languages/swift.ts
  • src/core/languages/go.ts
  • src/core/languages/rust.ts
  • src/core/languages/ruby.ts
  • src/core/languages/java.ts
  • src/core/languages/cpp.ts
  • src/core/languages/csharp.ts
🧬 Code graph analysis (7)
src/core/languages/swift.ts (2)
src/core/languages/index.ts (1)
  • LanguageConfig (4-9)
src/types/index.ts (10)
  • ASTSymbol (221-232)
  • SourceLocation (129-136)
  • Parameter (138-143)
  • FunctionSymbol (153-160)
  • ClassSymbol (162-168)
  • VariableSymbol (186-190)
  • StructSymbol (211-214)
  • EnumSymbol (181-184)
  • InterfaceSymbol (170-174)
  • ImportSymbol (192-198)
src/core/languages/go.ts (2)
src/core/languages/index.ts (1)
  • LanguageConfig (4-9)
src/types/index.ts (8)
  • ASTSymbol (221-232)
  • SourceLocation (129-136)
  • Parameter (138-143)
  • FunctionSymbol (153-160)
  • StructSymbol (211-214)
  • InterfaceSymbol (170-174)
  • ImportSymbol (192-198)
  • VariableSymbol (186-190)
src/core/languages/rust.ts (2)
src/core/languages/index.ts (1)
  • LanguageConfig (4-9)
src/types/index.ts (9)
  • ASTSymbol (221-232)
  • SourceLocation (129-136)
  • Parameter (138-143)
  • FunctionSymbol (153-160)
  • StructSymbol (211-214)
  • EnumSymbol (181-184)
  • TraitSymbol (216-219)
  • ImportSymbol (192-198)
  • VariableSymbol (186-190)
src/core/languages/ruby.ts (2)
src/core/languages/index.ts (1)
  • LanguageConfig (4-9)
src/types/index.ts (7)
  • ASTSymbol (221-232)
  • SourceLocation (129-136)
  • Parameter (138-143)
  • FunctionSymbol (153-160)
  • ClassSymbol (162-168)
  • VariableSymbol (186-190)
  • ImportSymbol (192-198)
src/core/languages/java.ts (2)
src/core/languages/index.ts (1)
  • LanguageConfig (4-9)
src/types/index.ts (9)
  • ASTSymbol (221-232)
  • SourceLocation (129-136)
  • Parameter (138-143)
  • FunctionSymbol (153-160)
  • ClassSymbol (162-168)
  • VariableSymbol (186-190)
  • InterfaceSymbol (170-174)
  • EnumSymbol (181-184)
  • ImportSymbol (192-198)
src/core/languages/cpp.ts (2)
src/core/languages/index.ts (1)
  • LanguageConfig (4-9)
src/types/index.ts (10)
  • ASTSymbol (221-232)
  • SourceLocation (129-136)
  • Parameter (138-143)
  • FunctionSymbol (153-160)
  • ClassSymbol (162-168)
  • VariableSymbol (186-190)
  • StructSymbol (211-214)
  • EnumSymbol (181-184)
  • NamespaceSymbol (206-209)
  • ImportSymbol (192-198)
src/core/languages/csharp.ts (2)
src/core/languages/index.ts (1)
  • LanguageConfig (4-9)
src/types/index.ts (11)
  • ASTSymbol (221-232)
  • SourceLocation (129-136)
  • Parameter (138-143)
  • FunctionSymbol (153-160)
  • VariableSymbol (186-190)
  • ClassSymbol (162-168)
  • InterfaceSymbol (170-174)
  • StructSymbol (211-214)
  • EnumSymbol (181-184)
  • NamespaceSymbol (206-209)
  • ImportSymbol (192-198)
🪛 Biome (2.1.2)
src/core/languages/java.ts

[error] 99-99: Do not shadow the global "constructor" property.

Consider renaming this variable. It's easy to confuse the origin of variables when they're named after a known global.

(lint/suspicious/noShadowRestrictedNames)

src/core/languages/csharp.ts

[error] 106-106: Do not shadow the global "constructor" property.

Consider renaming this variable. It's easy to confuse the origin of variables when they're named after a known global.

(lint/suspicious/noShadowRestrictedNames)

🔇 Additional comments (21)
src/core/languages/java.ts (7)

1-13: LGTM!

The imports and configuration setup follow the established pattern. Dynamic grammar loading is correct.


15-32: LGTM!

The helper functions for location mapping and text extraction are correctly implemented.


34-65: LGTM!

Parameter extraction correctly handles both regular parameters and Java varargs (spread_parameter).


67-81: LGTM!

Method extraction is correctly implemented with proper null handling and type defaults.


173-199: LGTM!

Enum extraction correctly handles enum constants and matches the EnumSymbol interface.


227-249: LGTM!

The traversal logic correctly filters top-level declarations and recursively processes the AST.


251-253: LGTM!

The extraction execution is straightforward and correct.

src/core/languages/cpp.ts (2)

6-13: LGTM! Configuration is well-structured.

The language configuration correctly defines C++ support with comprehensive file extensions and lazy grammar loading.


15-273: Excellent implementation of C++ symbol extraction!

The extractSymbols function is comprehensive and well-organized. It correctly:

  • Extracts all major C++ constructs (functions, classes, structs, enums, namespaces, includes)
  • Uses helper functions for code reusability
  • Follows the LanguageConfig interface contract
  • Handles AST traversal with proper top-level scope checking

The minor refactoring suggestions above will enhance efficiency and robustness, but the core implementation is solid and ready for use.

src/core/languages/csharp.ts (3)

1-13: LGTM!

The imports and configuration structure are correct, following the established pattern for language configs with lazy grammar loading.


19-89: LGTM!

The helper functions correctly handle location mapping, text extraction, and parameter/method/property extraction with appropriate type information.


282-314: LGTM!

The traverse function correctly identifies top-level declarations and recursively processes the AST to extract all symbols.

src/core/languages/rust.ts (3)

1-13: LGTM!

The imports and grammar loading follow the established pattern from other language configs. The async dynamic import ensures lazy loading of the tree-sitter grammar.


63-166: LGTM!

The symbol extraction functions correctly handle Rust-specific constructs (functions, structs, enums, traits) and properly transform function types to methods within traits. The logic aligns well with Rust's AST structure.


212-261: LGTM!

The traversal logic comprehensively handles Rust's top-level declarations and correctly extracts methods from impl blocks. The recursive traversal ensures all nested structures are visited.

src/core/languages/go.ts (3)

1-13: LGTM!

The imports and grammar loading are correct and consistent with the other language configs. The previous linting issue with unused parameters has been properly addressed.


91-159: LGTM!

The struct and interface extraction logic correctly handles Go's type system and AST structure. The method spec extraction properly creates METHOD symbols with parameters and return types.


213-259: LGTM!

The traversal logic comprehensively handles Go's declaration types and correctly processes grouped constant and variable declarations. The type distinction between structs and interfaces is handled properly.

src/core/languages/ruby.ts (1)

126-153: Field names are correct per tree-sitter-ruby documentation.

The code correctly uses method and arguments fields on call nodes, matching the confirmed tree-sitter-ruby grammar. No adjustments needed.

src/core/languages/swift.ts (2)

19-28: ****

The review comment assumes inconsistency across language configurations, but verification confirms all 10 language files (swift, typescript, ruby, rust, python, php, java, go, cpp, csharp) follow an identical, intentional convention: 1-based line indexing and 0-based column indexing. Swift.ts is already correctly aligned with the established global convention. No changes are needed.

Likely an incorrect or invalid review comment.


6-9: Registry integration verified successfully.

All checks confirmed:

  • SwiftConfig is properly exported and imported in src/core/languages/index.ts (line 44)
  • SwiftConfig is registered via registerLanguage(SwiftConfig) (line 57)
  • Grammar declaration tree-sitter-swift is present in package.json
  • SwiftConfig includes complete loadGrammar and extractSymbols implementations

Comment on lines 91 to 140
function extractClass(node: Parser.SyntaxNode): ClassSymbol | null {
const nameNode = node.childForFieldName('name');
if (!nameNode) return null;

const members: ASTSymbol[] = [];
const bodyNode = node.childForFieldName('body');
const basesNode = node.childForFieldName('bases');
const isAbstract = node.children.some(c => c.type === 'abstract');

if (bodyNode) {
for (const child of bodyNode.namedChildren) {
if (child.type === 'method_declaration') {
const method = extractMethod(child);
if (method) members.push(method);
} else if (child.type === 'constructor_declaration') {
const constructor = extractMethod(child);
if (constructor) {
constructor.name = 'constructor';
members.push(constructor);
}
} else if (child.type === 'property_declaration') {
const property = extractProperty(child);
if (property) members.push(property);
} else if (child.type === 'field_declaration') {
const declarator = child.descendantsOfType('variable_declarator')[0];
if (declarator) {
const fieldName = declarator.childForFieldName('name');
const typeNode = child.childForFieldName('type');
if (fieldName) {
members.push({
name: getNodeText(fieldName),
type: ST.VARIABLE,
location: getLocation(declarator),
variableType: typeNode ? getNodeText(typeNode) : undefined
} as VariableSymbol);
}
}
}
}
}

return {
name: getNodeText(nameNode),
type: ST.CLASS,
location: getLocation(node),
extends: basesNode ? getNodeText(basesNode) : undefined,
members,
abstract: isAbstract
};
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Incorrect handling of C# class inheritance and interface implementation.

In C#, a class declaration can have both a base class and implemented interfaces (e.g., class Foo : BaseClass, IInterface1, IInterface2). The current implementation at line 136 assigns the entire bases clause as a single string to the extends field, but:

  1. The ClassSymbol type has separate fields: extends?: string (for the base class) and implements?: string[] (for interfaces)
  2. The first item in the bases list is the base class (if it exists), and subsequent items are interfaces

The bases need to be parsed and split appropriately to populate both extends and implements fields.

Consider parsing the bases clause to distinguish between the base class and interfaces:

     function extractClass(node: Parser.SyntaxNode): ClassSymbol | null {
       const nameNode = node.childForFieldName('name');
       if (!nameNode) return null;
 
       const members: ASTSymbol[] = [];
       const bodyNode = node.childForFieldName('body');
       const basesNode = node.childForFieldName('bases');
       const isAbstract = node.children.some(c => c.type === 'abstract');
+
+      let extendsClass: string | undefined;
+      let implementsInterfaces: string[] | undefined;
+
+      if (basesNode) {
+        const basesList = basesNode.namedChildren.map(c => getNodeText(c));
+        if (basesList.length > 0) {
+          // In C#, if there's a base class, it comes first
+          // Heuristic: if first item doesn't start with 'I' followed by uppercase, treat as base class
+          // More robust: check tree-sitter node types or use semantic analysis
+          extendsClass = basesList[0];
+          if (basesList.length > 1) {
+            implementsInterfaces = basesList.slice(1);
+          }
+        }
+      }
 
       if (bodyNode) {
         // ... member extraction ...
       }
 
       return {
         name: getNodeText(nameNode),
         type: ST.CLASS,
         location: getLocation(node),
-        extends: basesNode ? getNodeText(basesNode) : undefined,
+        extends: extendsClass,
+        implements: implementsInterfaces,
         members,
         abstract: isAbstract
       };
     }

Note: Distinguishing between a base class and interfaces may require additional logic or tree-sitter node type analysis, as C# naming conventions (interfaces starting with 'I') are not always reliable.

🧰 Tools
🪛 Biome (2.1.2)

[error] 106-106: Do not shadow the global "constructor" property.

Consider renaming this variable. It's easy to confuse the origin of variables when they're named after a known global.

(lint/suspicious/noShadowRestrictedNames)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (3)
test/languages/python.test.ts (1)

11-18: Initialize once with a simple guard; acceptable as-is, but consider beforeAll for clarity.

Current initParser works; beforeAll(async () => { ... }) would be more idiomatic and avoids repeated awaits.

test/astParser.test.ts (1)

236-240: Simplify redundant non-null assertion check.

The condition if (funcSymbol! && 'parameters' in funcSymbol!) is redundant because the non-null assertion (!) already asserts that funcSymbol is truthy. The first check always evaluates to true.

Apply this diff to simplify:

-      if (funcSymbol! && 'parameters' in funcSymbol!) {
+      if (funcSymbol && 'parameters' in funcSymbol) {
         expect(funcSymbol!.parameters.length).toBe(3);
         expect(funcSymbol.parameters[0].name).toBe('id');
         // Note: default values and optional flags may vary by tree-sitter parser version

Alternatively, if you're confident funcSymbol is defined (due to the previous assertion on line 235), you can keep the non-null assertion but remove the redundant check:

-      if (funcSymbol! && 'parameters' in funcSymbol!) {
+      if ('parameters' in funcSymbol!) {
src/core/astParser.ts (1)

159-168: Avoid hardcoding the list of implemented languages.

The hardcoded list of implemented languages (['TypeScript', 'JavaScript', 'Python']) at lines 162-163 creates a maintenance burden. When new language extractors are added, developers must remember to update this list, or users will see misleading "not yet implemented" warnings for working implementations.

Consider one of these approaches:

Option 1 (Preferred): Add an isImplemented flag to the LanguageConfig interface:

In src/core/languages/index.ts:

export interface LanguageConfig {
  name: string;
  extensions: string[];
  loadGrammar: () => Promise<unknown>;
  extractSymbols: (tree: Parser.Tree, sourceCode: string) => ASTSymbol[];
  isImplemented?: boolean;  // Add this flag
}

Then update this code to:

-        const isStubImplementation = languageConfig.name &&
-          !['TypeScript', 'JavaScript', 'Python'].includes(languageConfig.name);
+        const isStubImplementation = languageConfig.isImplemented === false;

Option 2: Remove the warning entirely and let users discover unsupported languages through empty results, which is already the documented behavior ("returns empty array if language unsupported").

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 082043a and bfc6e6b.

📒 Files selected for processing (9)
  • README.md (2 hunks)
  • src/core/astParser.ts (1 hunks)
  • src/core/cache.ts (1 hunks)
  • src/core/scanner.ts (3 hunks)
  • src/types/index.ts (4 hunks)
  • test/astFormatter.test.ts (1 hunks)
  • test/astParser.test.ts (1 hunks)
  • test/languages/python.test.ts (1 hunks)
  • test/languages/typescript.test.ts (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • src/core/cache.ts
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{ts,tsx,js,jsx}: Do not use dotenv; Bun loads .env automatically
Use Bun.serve() for HTTP/WebSocket/HTTPS routes; do not use Express
Use bun:sqlite for SQLite; do not use better-sqlite3
Use Bun.redis for Redis; do not use ioredis
Use Bun.sql for Postgres; do not use pg or postgres.js
Use built-in WebSocket; do not use ws
Prefer Bun.file over node:fs readFile/writeFile
Use Bun.$ for shelling out instead of execa

Files:

  • src/core/astParser.ts
  • test/languages/typescript.test.ts
  • test/astFormatter.test.ts
  • test/languages/python.test.ts
  • src/types/index.ts
  • test/astParser.test.ts
  • src/core/scanner.ts
**/*.test.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Write tests using Bun’s test API: import from 'bun:test' and use test/expect

Files:

  • test/languages/typescript.test.ts
  • test/astFormatter.test.ts
  • test/languages/python.test.ts
  • test/astParser.test.ts
src/types/index.ts

📄 CodeRabbit inference engine (CLAUDE.md)

Keep core type definitions in src/types/index.ts

Files:

  • src/types/index.ts
src/core/scanner.ts

📄 CodeRabbit inference engine (CLAUDE.md)

Implement file scanning logic in src/core/scanner.ts

Files:

  • src/core/scanner.ts
🧠 Learnings (2)
📚 Learning: 2025-09-12T14:25:55.847Z
Learnt from: CR
PR: agentinit/contextcalc#0
File: CLAUDE.md:0-0
Timestamp: 2025-09-12T14:25:55.847Z
Learning: Applies to **/*.test.{ts,tsx,js,jsx} : Write tests using Bun’s test API: import from 'bun:test' and use test/expect

Applied to files:

  • test/astParser.test.ts
📚 Learning: 2025-09-12T14:25:55.847Z
Learnt from: CR
PR: agentinit/contextcalc#0
File: CLAUDE.md:0-0
Timestamp: 2025-09-12T14:25:55.847Z
Learning: Applies to src/core/scanner.ts : Implement file scanning logic in src/core/scanner.ts

Applied to files:

  • src/core/scanner.ts
🧬 Code graph analysis (6)
src/core/astParser.ts (3)
src/utils/pathUtils.ts (1)
  • parseFileSize (28-51)
src/core/languages/index.ts (2)
  • initializeLanguages (32-58)
  • getLanguageByExtension (24-26)
src/types/index.ts (2)
  • ASTOptions (235-240)
  • ASTSymbol (222-233)
test/languages/typescript.test.ts (1)
src/core/languages/typescript.ts (1)
  • TypeScriptConfig (6-349)
test/astFormatter.test.ts (2)
src/types/index.ts (7)
  • FileNode (1-11)
  • ScanResult (105-111)
  • FunctionSymbol (154-161)
  • ClassSymbol (163-169)
  • InterfaceSymbol (171-175)
  • ImportSymbol (193-199)
  • VariableSymbol (187-191)
src/formatters/astFormatter.ts (1)
  • formatAsAST (11-233)
test/languages/python.test.ts (1)
src/core/languages/python.ts (1)
  • PythonConfig (6-163)
test/astParser.test.ts (1)
src/core/astParser.ts (1)
  • ASTParser (12-187)
src/core/scanner.ts (7)
src/core/cache.ts (1)
  • CacheManager (9-111)
src/core/tokenizer.ts (1)
  • Tokenizer (6-78)
src/utils/ignoreParser.ts (1)
  • IgnoreManager (148-210)
src/core/astParser.ts (1)
  • ASTParser (12-187)
src/utils/pathUtils.ts (1)
  • parseFileSize (28-51)
src/types/index.ts (1)
  • FileNode (1-11)
src/utils/fileDetector.ts (1)
  • getFileTypeFromExtension (120-123)
🪛 ESLint
test/languages/typescript.test.ts

[error] 1-1: Do not use "@ts-nocheck" because it alters compilation errors.

(@typescript-eslint/ban-ts-comment)

test/astFormatter.test.ts

[error] 1-1: Do not use "@ts-nocheck" because it alters compilation errors.

(@typescript-eslint/ban-ts-comment)

test/languages/python.test.ts

[error] 1-1: Do not use "@ts-nocheck" because it alters compilation errors.

(@typescript-eslint/ban-ts-comment)

test/astParser.test.ts

[error] 1-1: Do not use "@ts-nocheck" because it alters compilation errors.

(@typescript-eslint/ban-ts-comment)

🪛 GitHub Actions: CI
test/astFormatter.test.ts

[error] 1-1: ESLint (ban-ts-comment): Do not use '@ts-nocheck' because it alters compilation errors. (error triggered by command 'eslint src/ test/')

🪛 GitHub Check: Test and Build (18)
test/languages/typescript.test.ts

[failure] 1-1:
Do not use "@ts-nocheck" because it alters compilation errors

test/astFormatter.test.ts

[failure] 1-1:
Do not use "@ts-nocheck" because it alters compilation errors

test/languages/python.test.ts

[failure] 1-1:
Do not use "@ts-nocheck" because it alters compilation errors

test/astParser.test.ts

[failure] 1-1:
Do not use "@ts-nocheck" because it alters compilation errors

🪛 markdownlint-cli2 (0.18.1)
README.md

186-186: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (9)
README.md (2)

181-201: AST output docs look clear and consistent with CLI and formatter.

Section reads well and matches the new --output ast mode.


240-240: Options table correctly adds ast.

Accurate default and values. No further action.

test/languages/typescript.test.ts (2)

117-133: Notes on expectations.

Comments acknowledge parser-specific nuances (implements/abstract). Tests are reasonable and won’t be flaky.

Also applies to: 135-149, 150-163


1-1: Replace @ts-nocheck with proper types (fixes ESLint error).

Align with Python test: type the grammar as Language.

-// @ts-nocheck
-import Parser from 'tree-sitter';
+import Parser, { type Language } from 'tree-sitter';
 ...
-  let parser: Parser;
-  let grammar: unknown;
+  let parser: Parser;
+  let grammar: Language;

Also applies to: 4-4, 8-16

src/core/scanner.ts (4)

15-32: AST parser lifecycle is correctly gated and initialized.

Lazy creation and initialize-on-demand are appropriate; disposal in finally is good hygiene. Based on learnings.

Also applies to: 34-45


156-163: Good: AST parse is size-guarded and failure-tolerant.

AST is parsed before caching, skipped on error, and attached to FileNode. This prevents rework and keeps scans resilient.

Also applies to: 175-187


238-258: Percentage calculation remains consistent; no AST coupling issues.

Method continues to operate on tokens only; safe with optional entities.


135-172: CacheEntry type is properly updated with entities field; schema versioning is in place.

Verification confirms the implementation is correct:

  • CacheEntry interface (src/types/index.ts) includes entities?: ASTSymbol[]
  • CACHE_VERSION ('1.1') is defined in src/core/cache.ts and validated on cache load
  • Schema versioning mechanism prevents stale reads
src/types/index.ts (1)

112-240: Excellent AST type system design.

The comprehensive AST type definitions are well-structured with:

  • Clean discriminated union via SymbolType enum
  • Proper inheritance hierarchy with BaseSymbol
  • Rich metadata support (locations, parameters, members, doc comments)
  • Good coverage of language constructs across multiple programming languages

The type system provides a solid foundation for the tree-sitter AST parsing feature.

…type safety

Rust imports:
- Replace broad descendantsOfType scan with targeted childForFieldName queries
- Add recursive processUseTree to handle simple, grouped, renamed, and wildcard imports
- Extract only actual imported items, not path components

Swift functions & properties:
- Fix return type field: change 'result' to 'return_type'
- Rewrite extractParameters to properly navigate function_value_parameters structure
- Read default_value from parent function_value_parameter node
- Add let/var classification for class and protocol properties (ST.CONSTANT vs ST.VARIABLE)
- Fix multi-binding support by refactoring extractVariable to extractVariableFromBinding

Tests:
- Remove @ts-nocheck from test files to enable type checking
- Add proper type guards to fix all TypeScript compilation errors
- All 159 tests passing with clean linting
- Change LanguageConfig.loadGrammar return type from Promise<unknown> to Promise<any> to match tree-sitter API
- Update grammar variable type from unknown to any in Python and TypeScript test files
- Replace 'any' type with proper TreeOptions interface in astFormatter tests
- Fix MetricSettings to use showPercentages instead of non-existent primary field
- Add language tag 'text' to README code fence to satisfy markdownlint MD040 rule

All tests passing, typecheck clean.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
README.md (1)

186-186: Add language identifier to fenced code block.

The fenced code block at line 186 lacks a language identifier, causing markdownlint MD040 violations.

-```
+```text
🧹 Nitpick comments (2)
src/core/languages/index.ts (1)

18-22: Consider warning on extension conflicts.

If two language configurations register the same extension, the second registration silently overwrites the first. This could lead to unexpected behavior.

Consider adding a check:

 export function registerLanguage(config: LanguageConfig): void {
   for (const ext of config.extensions) {
+    if (languages[ext]) {
+      console.warn(`Extension ${ext} already registered for ${languages[ext].name}, overwriting with ${config.name}`);
+    }
     languages[ext] = config;
   }
 }
test/languages/python.test.ts (1)

8-8: Consider typing grammar as unknown for better type safety.

While the current any type matches the return type of PythonConfig.loadGrammar(), using unknown would provide slightly better type safety without functional changes.

-  let grammar: any;
+  let grammar: unknown;

This is consistent with the broader effort to reduce any usage, though it's acceptable as-is given the upstream typing.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bfc6e6b and 3d1e607.

📒 Files selected for processing (8)
  • README.md (2 hunks)
  • src/core/languages/index.ts (1 hunks)
  • src/core/languages/rust.ts (1 hunks)
  • src/core/languages/swift.ts (1 hunks)
  • test/astFormatter.test.ts (1 hunks)
  • test/astParser.test.ts (1 hunks)
  • test/languages/python.test.ts (1 hunks)
  • test/languages/typescript.test.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/core/languages/rust.ts
  • src/core/languages/swift.ts
  • test/languages/typescript.test.ts
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{ts,tsx,js,jsx}: Do not use dotenv; Bun loads .env automatically
Use Bun.serve() for HTTP/WebSocket/HTTPS routes; do not use Express
Use bun:sqlite for SQLite; do not use better-sqlite3
Use Bun.redis for Redis; do not use ioredis
Use Bun.sql for Postgres; do not use pg or postgres.js
Use built-in WebSocket; do not use ws
Prefer Bun.file over node:fs readFile/writeFile
Use Bun.$ for shelling out instead of execa

Files:

  • test/languages/python.test.ts
  • test/astParser.test.ts
  • test/astFormatter.test.ts
  • src/core/languages/index.ts
**/*.test.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Write tests using Bun’s test API: import from 'bun:test' and use test/expect

Files:

  • test/languages/python.test.ts
  • test/astParser.test.ts
  • test/astFormatter.test.ts
🧠 Learnings (1)
📚 Learning: 2025-09-12T14:25:55.847Z
Learnt from: CR
PR: agentinit/contextcalc#0
File: CLAUDE.md:0-0
Timestamp: 2025-09-12T14:25:55.847Z
Learning: Applies to **/*.test.{ts,tsx,js,jsx} : Write tests using Bun’s test API: import from 'bun:test' and use test/expect

Applied to files:

  • test/astParser.test.ts
🧬 Code graph analysis (4)
test/languages/python.test.ts (1)
src/core/languages/python.ts (1)
  • PythonConfig (6-163)
test/astParser.test.ts (1)
src/core/astParser.ts (1)
  • ASTParser (12-187)
test/astFormatter.test.ts (2)
src/types/index.ts (8)
  • FileNode (1-11)
  • ScanResult (105-111)
  • TreeOptions (80-93)
  • FunctionSymbol (154-161)
  • ClassSymbol (163-169)
  • InterfaceSymbol (171-175)
  • ImportSymbol (193-199)
  • VariableSymbol (187-191)
src/formatters/astFormatter.ts (1)
  • formatAsAST (11-233)
src/core/languages/index.ts (3)
src/types/index.ts (1)
  • ASTSymbol (222-233)
src/core/languages/typescript.ts (1)
  • TypeScriptConfig (6-349)
src/core/languages/javascript.ts (1)
  • JavaScriptConfig (6-19)
🪛 GitHub Check: Test and Build (18)
src/core/languages/index.ts

[warning] 7-7:
Unexpected any. Specify a different type

🪛 GitHub Check: Test and Build (20)
src/core/languages/index.ts

[warning] 7-7:
Unexpected any. Specify a different type

🪛 GitHub Check: Test and Build (22)
src/core/languages/index.ts

[warning] 7-7:
Unexpected any. Specify a different type

🔇 Additional comments (11)
src/core/languages/index.ts (3)

1-2: LGTM!

The imports are correct, and the ASTSymbol type naming properly avoids shadowing JavaScript's built-in Symbol.


32-58: LGTM!

The lazy initialization pattern with dynamic imports is well-implemented. The function loads and registers all language configurations efficiently, enabling code splitting and deferred grammar loading.


7-7: Remove suggestion to import Language type; keep recommendation to use Promise<unknown>.

The codebase uses the native node-tree-sitter binding (v0.21.1), which does not export a standalone Language type. The Language type exists only in the web-tree-sitter WebAssembly binding, which is a different package.

Using Promise<unknown> is a valid improvement here since grammar objects are dynamically loaded from language-specific packages and return opaque types that cannot be unified. This provides better type safety than any without requiring unification of package-specific types.

Likely an incorrect or invalid review comment.

test/astFormatter.test.ts (2)

1-31: LGTM!

The test setup correctly uses Bun's test API per the coding guidelines. The mock helpers are well-structured and properly typed without requiring type suppression directives.

Based on coding guidelines.


476-633: Excellent test coverage for summary and location formatting.

The test suite comprehensively validates symbol counting (including nested members), file counting, and location range formatting. The tests correctly verify both single-line and multi-line symbol location displays.

test/languages/python.test.ts (2)

19-127: Excellent Python-specific test coverage.

The function extraction tests thoroughly validate Python-specific features, including type hints, async functions, default parameters, and the critical self/cls parameter filtering for methods. This is exactly the kind of language-specific testing needed.


262-304: Critical test coverage for extraction scope.

These tests correctly verify that only top-level declarations are extracted, while nested functions are appropriately excluded. This prevents symbol pollution and ensures the AST output remains clean and manageable. The distinction between nested functions (excluded) and class methods (included) is particularly well-tested.

test/astParser.test.ts (4)

1-26: LGTM!

The test setup correctly uses Bun's test API and implements proper test isolation with temporary directories. The lifecycle hooks ensure clean setup and teardown for each test.

Based on coding guidelines.


48-130: Comprehensive parseFile test coverage.

The tests thoroughly validate file parsing across multiple languages, symbol types, error conditions, and size limits. The size limit test at line 116-129 is particularly valuable for preventing OOM issues in production.


132-184: Excellent parseText API coverage.

The tests validate text parsing with strong emphasis on error handling and edge cases. The language identifier normalization test (lines 156-163) is particularly important for API usability, ensuring users can specify languages with or without the leading dot.


223-280: Strong integration testing for symbol extraction.

These tests validate the end-to-end symbol extraction pipeline, covering functions with complex parameters, class members, and imports. The comment at line 241 appropriately documents expected variations across tree-sitter parser versions, making the tests more maintainable.

…uages

- TypeScript: Fix async/generator detection and class extends/implements/abstract extraction
- C#: Rename constructor variable to avoid shadowing, extract individual interface names
- Go: Handle empty imports to avoid creating symbols with empty names
- Java: Extract individual interface names instead of wrapping entire text

All fixes improve type safety and accuracy of AST symbol extraction.
- Replace any with unknown in loadGrammar type signature
- Fix TypeScript parser to handle inline export declarations
- Bump cache version from 1.1 to 1.2 to invalidate old AST data

The TypeScript AST parser now properly extracts exported interfaces,
classes, types, and enums instead of creating empty export symbols.
This fixes the issue where AST output showed only 5 symbols instead
of 500+ across the codebase.
- Fix single file AST output support (was ignoring --output ast flag)
- Fix TypeScript class extraction to properly parse extends/implements/abstract
- Remove 8 untested language parsers (Go, Rust, Java, C++, C#, Ruby, PHP, Swift)
- Clean up error handling (remove noisy warnings, silent failure for expected cases)
- Update README to reflect only supported languages (TS/JS/Python)
- Remove unused tree-sitter dependencies from package.json

All tests passing (159/159). TypeScript type checking passes.
…for AST output

- Add comprehensive language support matrix to README showing feature coverage
- Integrate AST parser with DEBUG=1 flag for detailed parsing diagnostics
- Add file statistics showing processed/skipped files with categorized reasons
- Track AST parsing stats (files processed, skipped, skip reasons)
- Display grouped skip reasons (unsupported extensions, file size, errors)
- Improve user feedback with clear summary of parsing results

Example output:
Found 28 symbols across 2 files
Skipped 2 files
  - Unsupported extensions: .json (1), .md (1)
@gulivan gulivan merged commit 41053c9 into main Oct 23, 2025
4 checks passed
github-actions bot pushed a commit that referenced this pull request Oct 23, 2025
# [1.4.0](v1.3.6...v1.4.0) (2025-10-23)

### Features

* Add AST parsing with tree-sitter for code symbol extraction ([#14](#14)) ([41053c9](41053c9))
@github-actions
Copy link

🎉 This PR is included in version 1.4.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

@coderabbitai coderabbitai bot mentioned this pull request Oct 23, 2025
gulivan added a commit that referenced this pull request Oct 23, 2025
Fixed critical bug where AST parsing was failing due to incorrect grammar
loading when using ES module imports on CommonJS tree-sitter modules.

Changes:
- Fix grammar loading in TypeScript, JavaScript, and Python language configs
  to handle both ESM and CJS module formats
- Add cache invalidation when switching to AST output mode
- Fix AST formatter to count files with symbols instead of only freshly
  parsed files (which excluded cache hits)
- Add countFilesWithSymbols() helper to accurately report files in summary

This fixes the issue where `contextcalc . --output ast` would only show
1 file instead of all parseable code files in the project.

Fixes #14
gulivan added a commit that referenced this pull request Oct 23, 2025
Fixed critical bug where AST parsing was failing due to incorrect grammar
loading when using ES module imports on CommonJS tree-sitter modules.

Changes:
- Fix grammar loading in TypeScript, JavaScript, and Python language configs
  to handle both ESM and CJS module formats
- Add cache invalidation when switching to AST output mode
- Fix AST formatter to count files with symbols instead of only freshly
  parsed files (which excluded cache hits)
- Add countFilesWithSymbols() helper to accurately report files in summary

This fixes the issue where `contextcalc . --output ast` would only show
1 file instead of all parseable code files in the project.

Fixes #14
github-actions bot pushed a commit that referenced this pull request Oct 23, 2025
## [1.4.3](v1.4.2...v1.4.3) (2025-10-23)

### Bug Fixes

* Fix AST grammar loading and improve file counting ([#17](#17)) ([9e3f9b2](9e3f9b2)), closes [#14](#14)
@github-actions
Copy link

🎉 This PR is included in version 1.4.3 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant