Skip to content

Comments

fix: prevent entities field in cache when AST is disabled#19

Merged
gulivan merged 1 commit intomainfrom
fix/ast-cache-reparse-bug
Oct 23, 2025
Merged

fix: prevent entities field in cache when AST is disabled#19
gulivan merged 1 commit intomainfrom
fix/ast-cache-reparse-bug

Conversation

@gulivan
Copy link
Contributor

@gulivan gulivan commented Oct 23, 2025

Fixes a bug where cache entries would include the entities field even when AST parsing was not enabled, preventing re-parsing when switching to AST mode.

The cache now only includes the entities field when AST parsing is actually enabled, ensuring that files are properly re-parsed for AST symbols when switching from non-AST to AST output mode.

This resolves an issue where scanning from the project root would find significantly fewer symbols than scanning from a subdirectory due to stale cache entries.

Summary by CodeRabbit

Release Notes

  • Performance

    • Optimized cache management to conditionally store AST data based on configuration, reducing cache overhead when AST parsing is disabled.
  • Refactor

    • Updated cache retrieval logic to properly handle optional AST metadata and cache validation.

Fixes a bug where cache entries would include the entities field even when
AST parsing was not enabled, preventing re-parsing when switching to AST mode.

The cache now only includes the entities field when AST parsing is actually
enabled, ensuring that files are properly re-parsed for AST symbols when
switching from non-AST to AST output mode.

This resolves an issue where scanning from the project root would find
significantly fewer symbols than scanning from a subdirectory due to stale
cache entries.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 23, 2025

Walkthrough

Cache entry creation in the scanner is now conditional, including AST entities only when AST parsing is enabled. Retrieval logic updated to handle optional entities, triggering re-parsing when AST is enabled and cached entities are missing. Cache type definition now supports optional entities field.

Changes

Cohort / File(s) Summary
AST Cache Entry Conditional Handling
src/core/scanner.ts
Modified cache entry creation to conditionally include entities field based on enableAST flag. Updated retrieval path to read optional entities from cached entry and trigger re-parse when AST is enabled but entities are undefined. Cache type now supports optional entities field.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant Scanner
    participant Cache
    participant AST Parser

    Caller->>Scanner: scan(file, enableAST=true)
    Scanner->>Cache: check cache hit
    
    alt Cache Hit
        Cache-->>Scanner: return cached entry
        alt enableAST && entities undefined
            Scanner->>AST Parser: re-parse
            AST Parser-->>Scanner: entities
            Scanner->>Cache: update with entities
        else enableAST && entities exist
            Scanner-->>Caller: return with entities
        else enableAST disabled
            Scanner-->>Caller: return without entities
        end
    else Cache Miss
        Scanner->>Scanner: compute tokens, lines
        alt enableAST
            Scanner->>AST Parser: parse AST
            AST Parser-->>Scanner: entities
            Scanner->>Cache: store (hash, tokens, lines, entities)
        else enableAST disabled
            Scanner->>Cache: store (hash, tokens, lines)
        end
        Scanner-->>Caller: return result
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

The changes are localized to a single file but involve critical caching logic with conditional branching based on configuration. Requires verification that optional entity handling is correct across all code paths, cache invalidation logic is sound, and type safety is maintained.

Possibly related PRs

Poem

🐰 Cache entries now dance with grace,
AST entities find their place,
Optional when parsing's slow,
Re-parsing when we need to know!
Smart caching hops ahead

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "fix: prevent entities field in cache when AST is disabled" directly and accurately reflects the main change in the changeset. The raw summary confirms that the primary modification is making cache entries conditionally include the entities field only when AST parsing is enabled, which is exactly what the title conveys. The title is concise, uses clear and specific language without vague terms or noise, and provides sufficient context for a developer reviewing history to immediately understand the core purpose of the change.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/ast-cache-reparse-bug

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/core/scanner.ts (2)

221-230: Prevent AST data leakage when AST is disabled

If an old cache contains entities, the current code will still surface them in the output even when enableAST === false. Gate the file node emission to respect the mode.

       const fileNode: FileNode = {
         path: relativePath,
         hash: fileHash,
         tokens,
         lines,
         size: stats.size,
         type: 'file',
         filetype: getFileTypeFromExtension(filePath),
-        entities
+        entities: this.enableAST ? entities : undefined
       };

184-188: Let me search for more information about the codebase to understand what the intended behavior should be:

Add public method to record cached AST usage

The empty code block at lines 184-188 contradicts the comment claiming manual increment. ASTParser lacks a public API to record cached usage, causing AST stats to undercount. Add a public method (e.g., recordCachedFile()) to ASTParser and call it when cached entities are used.

🧹 Nitpick comments (4)
src/core/scanner.ts (4)

154-171: Recompute zero token/line counts on cache hits (even without AST)

Old/stale cache entries may have tokens or lines as 0. Today you only fix that during AST reparse. Add a general guard so non‑AST runs also recover.

       if (cachedEntry) {
         tokens = cachedEntry.tokens;
         lines = cachedEntry.lines;
         entities = cachedEntry.entities;

+        // Recover from stale caches with zero counts
+        if (tokens === 0 || lines === 0) {
+          const fresh = await this.tokenizer.countTokens(filePath);
+          tokens = fresh.tokens;
+          lines = fresh.lines;
+        }
 
         // If AST is enabled but cached entry has never been parsed (entities === undefined), re-parse

199-203: Align error logging with debug flag

Cache‑hit AST parse errors are debug‑gated, but cache‑miss errors are always logged. Make both consistent to reduce noise.

-          } catch (error) {
-            console.warn(`Warning: Failed to parse AST for ${filePath}:`, error instanceof Error ? error.message : 'Unknown error');
-          }
+          } catch (error) {
+            if (this.debug) {
+              console.warn(
+                `Warning: Failed to parse AST for ${filePath}:`,
+                error instanceof Error ? error.message : 'Unknown error'
+              );
+            }
+          }

213-215: Only add entities to cache when present

Minor cleanliness: avoid setting entities: undefined on the in‑memory object; it’ll be dropped in JSON anyway.

-        if (this.enableAST) {
-          cacheEntry.entities = entities;
-        }
+        if (this.enableAST && entities !== undefined) {
+          cacheEntry.entities = entities;
+        }

241-242: Avoid dynamic fs import in hot path

Dynamic importing node:fs/promises inside readDirectory adds overhead in deep trees. Prefer a top‑level import.

@@
-import { join, relative } from 'node:path';
+import { join, relative } from 'node:path';
+import * as fs from 'node:fs/promises';
@@
-      const fs = await import('node:fs/promises');
       const dirents = await fs.readdir(dirPath, { withFileTypes: true });
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7953947 and 48fe20d.

📒 Files selected for processing (1)
  • src/core/scanner.ts (2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{ts,tsx,js,jsx}: Do not use dotenv; Bun loads .env automatically
Use Bun.serve() for HTTP/WebSocket/HTTPS routes; do not use Express
Use bun:sqlite for SQLite; do not use better-sqlite3
Use Bun.redis for Redis; do not use ioredis
Use Bun.sql for Postgres; do not use pg or postgres.js
Use built-in WebSocket; do not use ws
Prefer Bun.file over node:fs readFile/writeFile
Use Bun.$ for shelling out instead of execa

Files:

  • src/core/scanner.ts
src/core/scanner.ts

📄 CodeRabbit inference engine (CLAUDE.md)

Implement file scanning logic in src/core/scanner.ts

Files:

  • src/core/scanner.ts
🔇 Additional comments (1)
src/core/scanner.ts (1)

205-218: Conditional cache shape looks good

The core fix—only adding entities to cache entries when AST is enabled—correctly forces reparse on later AST runs. Nice.

@gulivan gulivan merged commit c8ac0a5 into main Oct 23, 2025
4 checks passed
github-actions bot pushed a commit that referenced this pull request Oct 23, 2025
## [1.4.4](v1.4.3...v1.4.4) (2025-10-23)

### Bug Fixes

* prevent entities field in cache when AST is disabled ([#19](#19)) ([c8ac0a5](c8ac0a5))
@github-actions
Copy link

🎉 This PR is included in version 1.4.4 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant