-
-
Notifications
You must be signed in to change notification settings - Fork 19
feat: fix Semantic Caching with GPTCache for Offline Mode #299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAdds an in-process SQLite-backed semantic cache with LRU eviction and similarity matching, integrates caching and an offline mode into the command interpreter and CLI (including a cache stats command), adds unit tests and docs, and removes the CodeQL GitHub Actions workflow. Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant CLI as CortexCLI
participant Interp as CommandInterpreter
participant Cache as SemanticCache
participant Provider as LLM Provider
User->>CLI: cortex "prompt" [--offline]
CLI->>Interp: parse(prompt, provider, model)
Interp->>Cache: get_commands(prompt, provider, model, system_prompt)
alt Cache Hit
Cache-->>Interp: cached commands
else Cache Miss
Cache-->>Interp: None
alt Offline Mode
Interp-->>CLI: raise RuntimeError (no cached command)
else Online Mode
Interp->>Provider: call LLM API
Provider-->>Interp: generated commands
Interp->>Cache: put_commands(prompt, provider, model, system_prompt, commands)
Cache-->>Interp: ack
end
end
Interp-->>CLI: validated commands/result
CLI-->>User: display output
User->>CLI: cortex cache stats
CLI->>Cache: stats()
Cache-->>CLI: CacheStats(hits, misses, hit_rate)
CLI-->>User: display stats
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes
Suggested labels
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 2 inconclusive)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (5)
LLM/interpreter.py (2)
18-34: Add type hint forcacheparameter.The
cacheparameter lacks a type hint, which is required per the coding guidelines. Consider addingOptional["SemanticCache"]or using a protocol/ABC if you want to avoid circular imports.def __init__( self, api_key: str, provider: str = "openai", model: Optional[str] = None, offline: bool = False, - cache=None, + cache: Optional["SemanticCache"] = None, ):You may need to add a forward reference or import at the top:
from typing import TYPE_CHECKING if TYPE_CHECKING: from cortex.semantic_cache import SemanticCache
218-229: Consider logging cache failures in verbose mode.The silent
passon cache failures is acceptable for graceful degradation, but could make debugging cache issues difficult. Consider logging at debug level when caching fails.if self.cache is not None and commands: try: self.cache.put_commands( prompt=user_input, provider=self.provider.value, model=self.model, system_prompt=cache_system_prompt, commands=commands, ) - except Exception: - pass + except Exception as e: + # Caching is best-effort; log but don't fail + import logging + logging.getLogger(__name__).debug("Cache put failed: %s", e)cortex/cli.py (1)
653-653: Minor: Simplify offline flag assignment.Since
--offlineusesstore_true,args.offlinewill always be a boolean. Thegetattrwith default andbool()wrapper are defensive but unnecessary.- cli.offline = bool(getattr(args, 'offline', False)) + cli.offline = args.offlinecortex/semantic_cache.py (2)
45-52: Consider catching broader OS errors in directory creation.Only
PermissionErrortriggers the fallback to~/.cortex. OtherOSErrorsubclasses (e.g.,FileNotFoundErroron certain systems,OSErrorfor disk full) could also occur duringmkdir.def _ensure_db_directory(self) -> None: db_dir = Path(self.db_path).parent try: db_dir.mkdir(parents=True, exist_ok=True) - except PermissionError: + except OSError: user_dir = Path.home() / ".cortex" user_dir.mkdir(parents=True, exist_ok=True) self.db_path = str(user_dir / "cache.db")
101-103:datetime.utcnow()is deprecated in Python 3.12+.Use
datetime.now(timezone.utc)for forward compatibility:-from datetime import datetime +from datetime import datetime, timezone @staticmethod def _utcnow_iso() -> str: - return datetime.utcnow().replace(microsecond=0).isoformat() + "Z" + return datetime.now(timezone.utc).replace(microsecond=0).isoformat().replace("+00:00", "Z")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
.github/workflows/codeql.yml(0 hunks)LLM/interpreter.py(3 hunks)cortex/cli.py(7 hunks)cortex/semantic_cache.py(1 hunks)docs/ISSUE-268-TESTING.md(1 hunks)
💤 Files with no reviewable changes (1)
- .github/workflows/codeql.yml
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Follow PEP 8 style guide
Type hints required in Python code
Docstrings required for all public APIs
Files:
LLM/interpreter.pycortex/semantic_cache.pycortex/cli.py
🧬 Code graph analysis (1)
LLM/interpreter.py (2)
tests/test_graceful_degradation.py (1)
cache(31-34)cortex/semantic_cache.py (2)
get_commands(169-245)put_commands(247-292)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Agent
🔇 Additional comments (10)
docs/ISSUE-268-TESTING.md (1)
1-60: LGTM! Clear and comprehensive testing guide.The documentation accurately reflects the implementation details, including the cache location fallback logic and environment variable names (
CORTEX_CACHE_MAX_ENTRIES,CORTEX_CACHE_SIMILARITY_THRESHOLD). The three test scenarios cover the key functionality: warming the cache, checking stats, and offline mode behavior.LLM/interpreter.py (1)
191-206: LGTM! Cache integration looks correct.Good design decisions:
- Including the
validateflag in the cache key prevents returning unvalidated commands when validation is expected- The offline mode fail-fast behavior with a clear error message is appropriate
- Cache lookup before API call correctly reduces unnecessary network calls
cortex/cli.py (3)
315-331: LGTM! Cache stats implementation is clean.The method correctly handles errors and displays meaningful statistics. Using
stats.hitsas the "Saved calls (approx)" metric is a reasonable approximation since each cache hit avoids an API call.
675-679: LGTM! Cache command dispatch handles edge cases correctly.The fallback to
parser.print_help()when no subcommand is provided is consistent with the existing patterns in the codebase.
203-203: LGTM! Offline mode correctly propagated to interpreter.cortex/semantic_cache.py (5)
12-25: LGTM! Clean CacheStats implementation.The frozen dataclass with computed properties is well-designed, and the division-by-zero guard in
hit_rateis correct.
127-144: Clarification: Hash-based embeddings provide limited semantic similarity.The
_embedmethod uses hash-based token embeddings rather than learned semantic embeddings (like those from sentence-transformers or OpenAI). This approach:
- Works well for exact or near-exact prompt matches
- May miss semantically similar but lexically different queries (e.g., "install nginx" vs "setup nginx web server")
This is acceptable for reducing duplicate API calls, but consider documenting this limitation or potentially integrating a lightweight embedding model (e.g.,
sentence-transformers) in a future iteration if true semantic matching is desired.
169-245: LGTM! Two-phase lookup is well-designed.The exact-match-first strategy is a good optimization that avoids computing embeddings for repeated identical prompts. The
candidate_limitparameter bounds memory usage during similarity search.
247-292: LGTM! Put operation correctly preserves hit_count on updates.The
COALESCEsubquery elegantly preserves the existinghit_countwhen replacing an entry, which is important for LRU eviction accuracy.
294-312: LGTM! LRU eviction logic is correct.The eviction removes only the necessary number of entries. For higher throughput scenarios, you might consider evicting a small percentage buffer (e.g., 5%) to reduce churn, but the current implementation is functionally correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements semantic caching for LLM responses to reduce repeated API calls and introduces an offline mode for cached-only operation. The implementation adds a lightweight SQLite-based cache with LRU eviction, cosine similarity-based semantic matching, and a CLI command to view cache statistics.
Key changes:
- Adds
SemanticCacheclass with embedding-based similarity search and LRU eviction - Introduces
--offlineflag to fail fast when no cached match exists - Adds
cortex cache statscommand to display cache performance metrics
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| docs/ISSUE-268-TESTING.md | New end-user testing guide documenting cache warming, stats checking, and offline mode |
| cortex/semantic_cache.py | New semantic cache implementation with SQLite storage, custom embeddings, and similarity matching |
| cortex/cli.py | Adds offline flag support and cache stats command integration |
| LLM/interpreter.py | Integrates semantic cache into command parsing flow with offline mode support |
| .github/workflows/codeql.yml | Removes CodeQL security scanning workflow (unrelated to PR purpose) |
Comments suppressed due to low confidence (1)
LLM/interpreter.py:228
- 'except' clause does nothing but pass and there is no explanatory comment.
except Exception:
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (9)
cortex/semantic_cache.py (7)
62-67: Add validation for environment variable values.The
max_entriesandsimilarity_thresholdvalues read from environment variables are not validated. Invalid values (negative numbers, non-numeric strings, or thresholds outside [0,1]) could cause runtime errors or unexpected behavior.
71-78: Silent fallback to user directory should be logged.When permissions fail on the default path, the code silently switches to
~/.cortex/cache.db. Users may be confused about which cache is being used. Consider logging a warning.
127-129:datetime.utcnow()is deprecated in Python 3.12+.Consider using
datetime.now(timezone.utc)for proper timezone handling.
160-165: Sign calculation using bit 63 works but could be simplified.The bit 63 check is valid for unsigned 64-bit integers (values ≥ 2^63 have bit 63 set). However, using
value % 2as suggested in past review would be simpler and achieve similar distribution.
221-222: Each operation opens a new database connection.For high-frequency cache operations, this creates overhead. Consider connection pooling or a persistent connection.
250-259: Semantic search loads up to 200 candidates into memory.For large caches, computing cosine similarity in Python for each candidate could be a performance bottleneck. Consider early termination when a high-confidence match is found.
312-320:INSERT OR REPLACEdeletes the row before inserting, breaking the hit_count preservation.The subquery will return NULL because the row is deleted first. Use
INSERT ... ON CONFLICT DO UPDATEinstead.LLM/interpreter.py (2)
45-54: Consider logging cache initialization failures.When
SemanticCacheinitialization fails, the exception is silently caught. This makes diagnosing cache issues difficult.
252-263: Silent cache write failures make debugging difficult.Consider logging exceptions at debug level while still allowing the operation to continue.
🧹 Nitpick comments (3)
cortex/semantic_cache.py (1)
180-187: Consider documenting the normalization assumption.The
_cosinemethod works correctly because_embedproduces normalized vectors. A brief docstring noting this assumption would improve clarity.@staticmethod def _cosine(a: List[float], b: List[float]) -> float: + """Compute cosine similarity (assumes pre-normalized vectors).""" if not a or not b or len(a) != len(b): return 0.0LLM/interpreter.py (1)
189-206: Consider notifying when dangerous commands are filtered.Silently removing commands matching dangerous patterns could confuse users who expect to see their requested operation. Consider returning filtered commands separately or logging a warning.
tests/test_semantic_cache.py (1)
12-214: Consider adding tests for additional edge cases.The test suite covers core functionality well. Consider adding:
- Model isolation (same provider, different models)
- System prompt isolation (same provider/model, different system prompts)
- Edge cases: empty commands list, very long prompts
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
LLM/interpreter.py(4 hunks)cortex/semantic_cache.py(1 hunks)docs/ISSUE-268-TESTING.md(1 hunks)tests/test_semantic_cache.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- docs/ISSUE-268-TESTING.md
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Follow PEP 8 style guide
Type hints required in Python code
Docstrings required for all public APIs
Files:
tests/test_semantic_cache.pycortex/semantic_cache.pyLLM/interpreter.py
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Maintain >80% test coverage for pull requests
Files:
tests/test_semantic_cache.py
🧬 Code graph analysis (2)
tests/test_semantic_cache.py (1)
cortex/semantic_cache.py (8)
CacheStats(19-39)stats(363-378)total(30-32)hit_rate(35-39)put_commands(287-341)get_commands(195-285)_embed(154-170)_cosine(181-187)
LLM/interpreter.py (1)
cortex/semantic_cache.py (3)
SemanticCache(42-378)get_commands(195-285)put_commands(287-341)
🔇 Additional comments (13)
cortex/semantic_cache.py (5)
1-16: Module imports and structure look appropriate.The imports are well-organized and necessary for the functionality. Using
dataclassforCacheStatsand standard library modules for persistence is a good choice.
18-39: Clean dataclass implementation with proper computed properties.Using
frozen=Truefor immutability and handling the division-by-zero case inhit_rateare good practices.
80-125: Database initialization is well-structured.The schema includes appropriate indexes for both the unique constraint and LRU ordering. The
CHECK (id = 1)constraint ensures single-row stats table.
343-361: LRU eviction implementation is correct.The approach of counting entries and deleting the oldest by
last_accessedis sound. The subquery-based deletion correctly removes the least recently used entries.
363-378: Stats retrieval is implemented correctly.The fallback to
CacheStats(hits=0, misses=0)when the row is missing provides a safe default.LLM/interpreter.py (5)
3-8: TYPE_CHECKING pattern correctly used for forward reference typing.The
sqlite3import is needed for the exception handler on line 261. The conditional import ofSemanticCacheavoids circular dependencies.
86-102: System prompt is well-structured with clear formatting rules.The JSON format requirement and example help ensure consistent, parseable LLM responses.
172-187: Robust command parsing with markdown handling.The code correctly strips JSON from markdown code blocks and validates the command list type. Filtering out empty/non-string values is a good defensive measure.
225-235: Cache key properly includes validation state.The
[cortex-cache-validate={bool(validate)}]suffix ensures cached commands are segregated by validation setting, preventing validated/unvalidated result mixing.
138-170: Ollama integration uses standard library for HTTP, avoiding extra dependencies.The 60-second timeout is reasonable for local LLM inference which can be slow. Error handling differentiates network issues from other failures.
tests/test_semantic_cache.py (3)
15-29: Test fixture setup is well-designed for isolation.Using
tempfile.mkdtemp()ensures each test run uses a fresh database, andtearDownproperly cleans up. The explicit configuration values (max_entries=10,similarity_threshold=0.85) make tests deterministic.
161-189: LRU eviction test correctly verifies cache size enforcement.The test fills the cache to capacity and verifies that adding one more entry triggers eviction while maintaining the max size.
191-214: Embedding and cosine similarity tests validate mathematical properties.The tests verify embedding dimension, type, L2 normalization, and cosine similarity for edge cases (identical and orthogonal vectors).
|
Review Request@dhvll - This PR touches core files ( CI Status: All checks passing |



Related Issue
Closes #268
Summary
Optimizing tool selection.
--offlinemode to run cached-only (fails fast if there’s no cached match).cortex cache statsto display cache hits/misses and approximate saved calls.Checklist
pytest tests/)[#XX] Descriptionvideo-
https://drive.google.com/file/d/1KA2BepHiR05p0kncMkA4MYctRXXl7zlE/view?usp=sharing
Summary by CodeRabbit
New Features
statsview for hits, misses, and hit rateDocumentation
Tests
Chores
✏️ Tip: You can customize this high-level summary in your review settings.