Cache invalid-workspace alias inference per storage fingerprint (#116)#125
Conversation
Compute invalid_workspace_aliases once per mtime-keyed fingerprint and reuse it in assemble_single_tab, tab summaries, and workspace listing instead of re-scanning all composerData rows on every request.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds disk-backed caching for invalid workspace alias mappings. ChangesInvalid workspace alias caching
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@services/summary_cache.py`:
- Around line 242-261: The cached alias loader in
get_cached_invalid_workspace_aliases is accepting malformed entries and
converting them with str(), which can turn bad cache values into bogus aliases.
Update the alias validation in this function to strictly require
string-to-string mappings, rejecting any entry whose key or value is not already
a string so the function returns None and recomputes instead of using corrupt
cache data.
In `@services/workspace_context.py`:
- Around line 157-160: The fast path in WorkspaceContext is not honoring
precomputed invalid_workspace_aliases and only short-circuits on
invalid_workspace_ids, which can still trigger fingerprint recomputation and DB
rescans. Update infer_invalid_workspace_aliases and any related accessors to
first return the already-populated invalid_workspace_aliases from
WorkspaceContext, keeping with_invalid_workspace_aliases as the source of that
enriched state. Ensure the cache/lookup path consistently prefers the
precomputed aliases before falling back to storage fingerprint checks or
scanning composer rows.
In `@services/workspace_listing.py`:
- Around line 127-133: Preserve the effective nocache behavior when resolving
invalid workspace aliases: `list_workspace_projects(nocache=True)` currently
still lets `resolve_invalid_workspace_aliases_cached` use the alias disk cache
because `_build_workspace_projects_uncached` does not pass the flag through.
Thread the effective nocache value into `_build_workspace_projects_uncached`,
then pass it into `resolve_invalid_workspace_aliases_cached` so the helper
respects the caller’s cache policy.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 6130cf01-64e7-4fca-b1a3-261b4cddc45d
📒 Files selected for processing (7)
services/summary_cache.pyservices/workspace_context.pyservices/workspace_listing.pyservices/workspace_tabs.pytests/test_summary_cache.pytests/test_workspace_context.pytests/test_workspace_tabs_summary.py
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/test_workspace_listing_performance.py (1)
140-145: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winAssert alias cache writes are bypassed too.
Line 140 only patches
get_cached_invalid_workspace_aliases, so a regression that still writes alias results undernocache=Truewould pass. Patchset_cached_invalid_workspace_aliasesas well.Proposed test tightening
- with patch( - "services.summary_cache.get_cached_invalid_workspace_aliases", - ) as mock_get: + with ( + patch("services.summary_cache.get_cached_invalid_workspace_aliases") as mock_get, + patch("services.summary_cache.set_cached_invalid_workspace_aliases") as mock_set, + ): mock_get.return_value = {"invalid-ws": "global"} list_workspace_projects(ws_path, rules=[], nocache=True) mock_get.assert_not_called() + mock_set.assert_not_called()🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_workspace_listing_performance.py` around lines 140 - 145, The nocache test around list_workspace_projects only verifies that get_cached_invalid_workspace_aliases is not read, so it can miss regressions where alias results are still written. Update the test to also patch and assert set_cached_invalid_workspace_aliases is not called when nocache=True, alongside the existing mock_get check, so the cache bypass is covered end-to-end.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/test_workspace_listing_performance.py`:
- Around line 140-145: The nocache test around list_workspace_projects only
verifies that get_cached_invalid_workspace_aliases is not read, so it can miss
regressions where alias results are still written. Update the test to also patch
and assert set_cached_invalid_workspace_aliases is not called when nocache=True,
alongside the existing mock_get check, so the cache bypass is covered
end-to-end.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c9dc800f-dae2-491f-981d-21ff73895073
📒 Files selected for processing (6)
services/summary_cache.pyservices/workspace_context.pyservices/workspace_listing.pytests/test_summary_cache.pytests/test_workspace_context.pytests/test_workspace_listing_performance.py
🚧 Files skipped from review as they are similar to previous changes (4)
- tests/test_summary_cache.py
- services/summary_cache.py
- tests/test_workspace_context.py
- services/workspace_listing.py
bradjin8
left a comment
There was a problem hiding this comment.
services/export_engine.py line 200:
Still calls infer_invalid_workspace_aliases inline on every export. Issue #116 scope lists export_engine as a call site; PR defers search but not export. Large exports with invalid workspaces will keep paying the full composer scan — consider resolve_invalid_workspace_aliases_cached here in a follow-up.
services/search.py line 386:
Still uses inline infer_invalid_workspace_aliases in _load_search_workspace_assigner. PR marks search as out of scope; document in issue/PR that search deep-links may still cold-scan until a follow-up.
wired resolve_invalid_workspace_aliases_cached |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
services/export_engine.py (1)
203-210: 🚀 Performance & Scalability | 🔵 Trivial | 🏗️ Heavy liftReuse the already loaded composer rows on cache misses.
Line 201 has already loaded
ide_composer_rows, butresolve_invalid_workspace_aliases_cachedfetchesCOMPOSER_ROWS_WITH_HEADERS_SQLagain on cold cache/no-cache paths. Consider extending the resolver to accept preloadedcomposer_rowsand passide_composer_rowshere to avoid a second full global DB scan during export.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/export_engine.py` around lines 203 - 210, The export path is doing a second global DB scan for composer rows on cold-cache/no-cache calls even though ide_composer_rows is already loaded. Update resolve_invalid_workspace_aliases_cached to accept preloaded composer_rows and thread that through its internal alias-resolution logic, then pass ide_composer_rows from the export_engine flow so the cached/no-cache path reuses the existing data instead of rerunning COMPOSER_ROWS_WITH_HEADERS_SQL.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@services/export_engine.py`:
- Around line 203-210: The export path is doing a second global DB scan for
composer rows on cold-cache/no-cache calls even though ide_composer_rows is
already loaded. Update resolve_invalid_workspace_aliases_cached to accept
preloaded composer_rows and thread that through its internal alias-resolution
logic, then pass ide_composer_rows from the export_engine flow so the
cached/no-cache path reuses the existing data instead of rerunning
COMPOSER_ROWS_WITH_HEADERS_SQL.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 8cb2639d-3e84-4ad1-b594-9303fddc8cba
📒 Files selected for processing (4)
services/export_engine.pyservices/search.pyservices/summary_cache.pytests/test_summary_cache.py
✅ Files skipped from review due to trivial changes (1)
- services/search.py
🚧 Files skipped from review as they are similar to previous changes (2)
- services/summary_cache.py
- tests/test_summary_cache.py
Summary
invalid_workspace_aliasesinsummary_cache.py, using the same storage fingerprint invalidation flow as composer-map and tab-summary caches.WorkspaceContextviaresolve_invalid_workspace_aliases_cached()andwith_invalid_workspace_aliases().assemble_single_tab, tab summaries, and workspace listing to read precomputed aliases instead of re-scanning allcomposerData:*rows on every request.Problem
When
invalid_workspace_idsis non-empty, hot paths such asassemble_single_tabran a full globalcomposerData:*scan on every deep-link request solely to buildinvalid_workspace_aliases. On largestate.vscdbfiles this produced O(tabs × keys) I/O.Solution
Compute
invalid_workspace_aliasesonce per storage fingerprint and serve subsequent requests from the disk cache until the fingerprint changes (DB replaced or storage mtimes change).Test plan
pytest tests/test_workspace_tabs_summary.py tests/test_workspace_list_count_alignment.py -qpytest -q(532 passed, 21 skipped)mypy .assemble_single_tabno longer runs globalcomposerData:%scan on cache hit when invalid workspaces existOut of scope
search.py) — deferred per operatorinfer_invalid_workspace_aliasesCloses #116
Summary by CodeRabbit
New Features
nocacheoption that bypasses the alias disk cache for workspace tab/listing and per-tab routes, and for export orchestration.Bug Fixes