feat: fix Semantic Caching with GPTCache for Offline Mode #299

Sahilbhatane · 2025-12-14T12:20:06Z

Related Issue

Closes #268

Summary

Optimizing tool selection.

Adds a lightweight semantic cache (SQLite + LRU eviction) for LLM responses to reduce repeated API calls.
Introduces --offline mode to run cached-only (fails fast if there’s no cached match).
Adds cortex cache stats to display cache hits/misses and approximate saved calls.
Includes a single end-user testing guide: ISSUE-268-TESTING.md

Checklist

Tests pass (pytest tests/)
PR title format: [#XX] Description
MVP label added if closing MVP issue

video-

https://drive.google.com/file/d/1KA2BepHiR05p0kncMkA4MYctRXXl7zlE/view?usp=sharing

Summary by CodeRabbit

New Features
- Offline mode to run using cached LLM responses when offline
- Semantic caching to speed up and reduce redundant LLM calls
- New top-level cache command with a stats view for hits, misses, and hit rate
- Small CLI UX improvements (help entries and install flow enhancements)
Documentation
- Added a testing guide for offline mode and caching
Tests
- Added unit tests covering the semantic cache behavior and metrics
Chores
- Removed the CodeQL GitHub Actions workflow

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Fixes cortexlinux#268

coderabbitai · 2025-12-14T12:20:17Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds an in-process SQLite-backed semantic cache with LRU eviction and similarity matching, integrates caching and an offline mode into the command interpreter and CLI (including a cache stats command), adds unit tests and docs, and removes the CodeQL GitHub Actions workflow.

Changes

Cohort / File(s)	Summary
Semantic Caching Infrastructure `cortex/semantic_cache.py`	New `SemanticCache` and `CacheStats` dataclass. SQLite-backed persistent cache, deterministic embeddings, cosine similarity semantic lookup, exact-key fallback, LRU eviction, stats tracking, configurable db_path/max_entries/similarity_threshold (env var fallbacks).
Interpreter Integration `LLM/interpreter.py`	`CommandInterpreter` constructor extended with `offline` and `cache` parameters (lazy import of `SemanticCache`). Adds provider-specific client init, system prompt normalization, cache read-before-API, offline-only behavior (raise on miss), caching of generated commands, and enhanced error handling and parsing.
CLI Integration `cortex/cli.py`	Adds global `--offline` flag and `offline` attribute on `CortexCLI`, passes offline to interpreter, new `cache stats` subcommand and `cache_stats()` method to display hits/misses/hit rate, and updates help output.
Tests `tests/test_semantic_cache.py`	New comprehensive unit tests for cache initialization, put/get exact matches, semantic similarity, provider isolation, LRU eviction, embedding generation, cosine similarity, and stats. Uses temporary SQLite DB per test.
Documentation `docs/ISSUE-268-TESTING.md`	New end-user testing guide describing scenarios (warm cache, cache stats, offline mode), commands to run, expected outcomes, and notes about cache location/size/similarity threshold.
CI/CD Cleanup `.github/workflows/codeql.yml`	Removed CodeQL security analysis workflow file.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant CLI as CortexCLI
    participant Interp as CommandInterpreter
    participant Cache as SemanticCache
    participant Provider as LLM Provider

    User->>CLI: cortex "prompt" [--offline]
    CLI->>Interp: parse(prompt, provider, model)
    Interp->>Cache: get_commands(prompt, provider, model, system_prompt)
    alt Cache Hit
        Cache-->>Interp: cached commands
    else Cache Miss
        Cache-->>Interp: None
        alt Offline Mode
            Interp-->>CLI: raise RuntimeError (no cached command)
        else Online Mode
            Interp->>Provider: call LLM API
            Provider-->>Interp: generated commands
            Interp->>Cache: put_commands(prompt, provider, model, system_prompt, commands)
            Cache-->>Interp: ack
        end
    end
    Interp-->>CLI: validated commands/result
    CLI-->>User: display output

    User->>CLI: cortex cache stats
    CLI->>Cache: stats()
    Cache-->>CLI: CacheStats(hits, misses, hit_rate)
    CLI-->>User: display stats

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Areas needing focused review:
- cortex/semantic_cache.py: embedding algorithm, normalization, SQLite schema, concurrency/transaction safety, and eviction correctness.
- LLM/interpreter.py: lazy cache import, offline-mode semantics, cache read/write error handling, and provider client initialization.
- cortex/cli.py: CLI argument wiring and stats display formatting.
- Tests: ensure coverage matches intended edge cases and DB cleanup.

Suggested labels

enhancement

Poem

🐇 I cached a prompt beneath the moonlight bright,
SQLite hummed softly through the silent night,
Embeddings twined like roots beneath the hill,
Offline we wander, wallets calm and still,
Hop, hit, miss — the rabbit sorts it right.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 2 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 55.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Description check	❓ Inconclusive	The description covers the key changes and includes the required issue reference, summary of work, and checklist. However, the PR title format requirement appears not to follow the template's specified format [#XX] Description.	Verify if the PR title format requirement [#XX] Description is strictly enforced; the current title 'feat: fix Semantic Caching with GPTCache for Offline Mode' uses a conventional commit format but not the template's [#268] format.
Out of Scope Changes check	❓ Inconclusive	The PR removes CodeQL workflow and implements semantic caching with offline support per requirements; the CodeQL removal appears unrelated to issue #268 objectives and may warrant clarification.	Clarify the rationale for removing .github/workflows/codeql.yml, as this security workflow change is not mentioned in issue #268 requirements or PR description.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly reflects the main feature added: semantic caching with offline mode support, matching the primary objective of the PR.
Linked Issues check	✅ Passed	The PR successfully implements all core requirements from issue #268: semantic caching with SQLite/LRU eviction, offline mode via --offline flag, and cache stats command showing hit/miss metrics.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (5)

LLM/interpreter.py (2)
18-34: Add type hint for cache parameter.

The cache parameter lacks a type hint, which is required per the coding guidelines. Consider adding Optional["SemanticCache"] or using a protocol/ABC if you want to avoid circular imports.
     def __init__(
         self,
         api_key: str,
         provider: str = "openai",
         model: Optional[str] = None,
         offline: bool = False,
-        cache=None,
+        cache: Optional["SemanticCache"] = None,
     ):
You may need to add a forward reference or import at the top:
from typing import TYPE_CHECKING
if TYPE_CHECKING:
    from cortex.semantic_cache import SemanticCache
218-229: Consider logging cache failures in verbose mode.

The silent pass on cache failures is acceptable for graceful degradation, but could make debugging cache issues difficult. Consider logging at debug level when caching fails.
         if self.cache is not None and commands:
             try:
                 self.cache.put_commands(
                     prompt=user_input,
                     provider=self.provider.value,
                     model=self.model,
                     system_prompt=cache_system_prompt,
                     commands=commands,
                 )
-            except Exception:
-                pass
+            except Exception as e:
+                # Caching is best-effort; log but don't fail
+                import logging
+                logging.getLogger(__name__).debug("Cache put failed: %s", e)
cortex/cli.py (1)
653-653: Minor: Simplify offline flag assignment.

Since --offline uses store_true, args.offline will always be a boolean. The getattr with default and bool() wrapper are defensive but unnecessary.
-    cli.offline = bool(getattr(args, 'offline', False))
+    cli.offline = args.offline
cortex/semantic_cache.py (2)
45-52: Consider catching broader OS errors in directory creation.

Only PermissionError triggers the fallback to ~/.cortex. Other OSError subclasses (e.g., FileNotFoundError on certain systems, OSError for disk full) could also occur during mkdir.
     def _ensure_db_directory(self) -> None:
         db_dir = Path(self.db_path).parent
         try:
             db_dir.mkdir(parents=True, exist_ok=True)
-        except PermissionError:
+        except OSError:
             user_dir = Path.home() / ".cortex"
             user_dir.mkdir(parents=True, exist_ok=True)
             self.db_path = str(user_dir / "cache.db")
101-103: datetime.utcnow() is deprecated in Python 3.12+.

Use datetime.now(timezone.utc) for forward compatibility:
-from datetime import datetime
+from datetime import datetime, timezone
 
 @staticmethod
 def _utcnow_iso() -> str:
-    return datetime.utcnow().replace(microsecond=0).isoformat() + "Z"
+    return datetime.now(timezone.utc).replace(microsecond=0).isoformat().replace("+00:00", "Z")

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1b8a042 and b822144.

📒 Files selected for processing (5)

.github/workflows/codeql.yml (0 hunks)
LLM/interpreter.py (3 hunks)
cortex/cli.py (7 hunks)
cortex/semantic_cache.py (1 hunks)
docs/ISSUE-268-TESTING.md (1 hunks)

💤 Files with no reviewable changes (1)

.github/workflows/codeql.yml

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Follow PEP 8 style guide
Type hints required in Python code
Docstrings required for all public APIs

Files:

LLM/interpreter.py
cortex/semantic_cache.py
cortex/cli.py

🧬 Code graph analysis (1)

LLM/interpreter.py (2)

tests/test_graceful_degradation.py (1)

cache (31-34)

cortex/semantic_cache.py (2)

get_commands (169-245)

put_commands (247-292)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Agent

🔇 Additional comments (10)

docs/ISSUE-268-TESTING.md (1)

1-60: LGTM! Clear and comprehensive testing guide.

The documentation accurately reflects the implementation details, including the cache location fallback logic and environment variable names (CORTEX_CACHE_MAX_ENTRIES, CORTEX_CACHE_SIMILARITY_THRESHOLD). The three test scenarios cover the key functionality: warming the cache, checking stats, and offline mode behavior.

LLM/interpreter.py (1)

191-206: LGTM! Cache integration looks correct.

Good design decisions:

Including the validate flag in the cache key prevents returning unvalidated commands when validation is expected

The offline mode fail-fast behavior with a clear error message is appropriate

Cache lookup before API call correctly reduces unnecessary network calls

cortex/cli.py (3)

315-331: LGTM! Cache stats implementation is clean.

The method correctly handles errors and displays meaningful statistics. Using stats.hits as the "Saved calls (approx)" metric is a reasonable approximation since each cache hit avoids an API call.

675-679: LGTM! Cache command dispatch handles edge cases correctly.

The fallback to parser.print_help() when no subcommand is provided is consistent with the existing patterns in the codebase.

203-203: LGTM! Offline mode correctly propagated to interpreter.

cortex/semantic_cache.py (5)

12-25: LGTM! Clean CacheStats implementation.

The frozen dataclass with computed properties is well-designed, and the division-by-zero guard in hit_rate is correct.

127-144: Clarification: Hash-based embeddings provide limited semantic similarity.

The _embed method uses hash-based token embeddings rather than learned semantic embeddings (like those from sentence-transformers or OpenAI). This approach:

Works well for exact or near-exact prompt matches

May miss semantically similar but lexically different queries (e.g., "install nginx" vs "setup nginx web server")

This is acceptable for reducing duplicate API calls, but consider documenting this limitation or potentially integrating a lightweight embedding model (e.g., sentence-transformers) in a future iteration if true semantic matching is desired.

169-245: LGTM! Two-phase lookup is well-designed.

The exact-match-first strategy is a good optimization that avoids computing embeddings for repeated identical prompts. The candidate_limit parameter bounds memory usage during similarity search.

247-292: LGTM! Put operation correctly preserves hit_count on updates.

The COALESCE subquery elegantly preserves the existing hit_count when replacing an entry, which is important for LRU eviction accuracy.

294-312: LGTM! LRU eviction logic is correct.

The eviction removes only the necessary number of entries. For higher throughput scenarios, you might consider evicting a small percentage buffer (e.g., 5%) to reduce churn, but the current implementation is functionally correct.

Copilot

Pull request overview

This PR implements semantic caching for LLM responses to reduce repeated API calls and introduces an offline mode for cached-only operation. The implementation adds a lightweight SQLite-based cache with LRU eviction, cosine similarity-based semantic matching, and a CLI command to view cache statistics.

Key changes:

Adds SemanticCache class with embedding-based similarity search and LRU eviction
Introduces --offline flag to fail fast when no cached match exists
Adds cortex cache stats command to display cache performance metrics

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
docs/ISSUE-268-TESTING.md	New end-user testing guide documenting cache warming, stats checking, and offline mode
cortex/semantic_cache.py	New semantic cache implementation with SQLite storage, custom embeddings, and similarity matching
cortex/cli.py	Adds offline flag support and cache stats command integration
LLM/interpreter.py	Integrates semantic cache into command parsing flow with offline mode support
.github/workflows/codeql.yml	Removes CodeQL security scanning workflow (unrelated to PR purpose)

Comments suppressed due to low confidence (1)

LLM/interpreter.py:228

'except' clause does nothing but pass and there is no explanatory comment.

            except Exception:

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

cortex/semantic_cache.py

LLM/interpreter.py

cortex/semantic_cache.py

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (9)

cortex/semantic_cache.py (7)

62-67: Add validation for environment variable values.

The max_entries and similarity_threshold values read from environment variables are not validated. Invalid values (negative numbers, non-numeric strings, or thresholds outside [0,1]) could cause runtime errors or unexpected behavior.

71-78: Silent fallback to user directory should be logged.

When permissions fail on the default path, the code silently switches to ~/.cortex/cache.db. Users may be confused about which cache is being used. Consider logging a warning.

127-129: datetime.utcnow() is deprecated in Python 3.12+.

Consider using datetime.now(timezone.utc) for proper timezone handling.

160-165: Sign calculation using bit 63 works but could be simplified.

The bit 63 check is valid for unsigned 64-bit integers (values ≥ 2^63 have bit 63 set). However, using value % 2 as suggested in past review would be simpler and achieve similar distribution.

221-222: Each operation opens a new database connection.

For high-frequency cache operations, this creates overhead. Consider connection pooling or a persistent connection.

250-259: Semantic search loads up to 200 candidates into memory.

For large caches, computing cosine similarity in Python for each candidate could be a performance bottleneck. Consider early termination when a high-confidence match is found.

312-320: INSERT OR REPLACE deletes the row before inserting, breaking the hit_count preservation.

The subquery will return NULL because the row is deleted first. Use INSERT ... ON CONFLICT DO UPDATE instead.

LLM/interpreter.py (2)

45-54: Consider logging cache initialization failures.

When SemanticCache initialization fails, the exception is silently caught. This makes diagnosing cache issues difficult.

252-263: Silent cache write failures make debugging difficult.

Consider logging exceptions at debug level while still allowing the operation to continue.

🧹 Nitpick comments (3)

cortex/semantic_cache.py (1)
180-187: Consider documenting the normalization assumption.

The _cosine method works correctly because _embed produces normalized vectors. A brief docstring noting this assumption would improve clarity.
     @staticmethod
     def _cosine(a: List[float], b: List[float]) -> float:
+        """Compute cosine similarity (assumes pre-normalized vectors)."""
         if not a or not b or len(a) != len(b):
             return 0.0
LLM/interpreter.py (1)

189-206: Consider notifying when dangerous commands are filtered.

Silently removing commands matching dangerous patterns could confuse users who expect to see their requested operation. Consider returning filtered commands separately or logging a warning.

tests/test_semantic_cache.py (1)

12-214: Consider adding tests for additional edge cases.

The test suite covers core functionality well. Consider adding:

Model isolation (same provider, different models)

System prompt isolation (same provider/model, different system prompts)

Edge cases: empty commands list, very long prompts

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b822144 and 7f7f97f.

📒 Files selected for processing (4)

LLM/interpreter.py (4 hunks)
cortex/semantic_cache.py (1 hunks)
docs/ISSUE-268-TESTING.md (1 hunks)
tests/test_semantic_cache.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

docs/ISSUE-268-TESTING.md

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Follow PEP 8 style guide
Type hints required in Python code
Docstrings required for all public APIs

Files:

tests/test_semantic_cache.py
cortex/semantic_cache.py
LLM/interpreter.py

tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Maintain >80% test coverage for pull requests

Files:

tests/test_semantic_cache.py

🧬 Code graph analysis (2)

tests/test_semantic_cache.py (1)

cortex/semantic_cache.py (8)

CacheStats (19-39)

stats (363-378)

total (30-32)

hit_rate (35-39)

put_commands (287-341)

get_commands (195-285)

_embed (154-170)

_cosine (181-187)

LLM/interpreter.py (1)

cortex/semantic_cache.py (3)

SemanticCache (42-378)

get_commands (195-285)

put_commands (287-341)

🔇 Additional comments (13)

cortex/semantic_cache.py (5)

1-16: Module imports and structure look appropriate.

The imports are well-organized and necessary for the functionality. Using dataclass for CacheStats and standard library modules for persistence is a good choice.

18-39: Clean dataclass implementation with proper computed properties.

Using frozen=True for immutability and handling the division-by-zero case in hit_rate are good practices.

80-125: Database initialization is well-structured.

The schema includes appropriate indexes for both the unique constraint and LRU ordering. The CHECK (id = 1) constraint ensures single-row stats table.

343-361: LRU eviction implementation is correct.

The approach of counting entries and deleting the oldest by last_accessed is sound. The subquery-based deletion correctly removes the least recently used entries.

363-378: Stats retrieval is implemented correctly.

The fallback to CacheStats(hits=0, misses=0) when the row is missing provides a safe default.

LLM/interpreter.py (5)

3-8: TYPE_CHECKING pattern correctly used for forward reference typing.

The sqlite3 import is needed for the exception handler on line 261. The conditional import of SemanticCache avoids circular dependencies.

86-102: System prompt is well-structured with clear formatting rules.

The JSON format requirement and example help ensure consistent, parseable LLM responses.

172-187: Robust command parsing with markdown handling.

The code correctly strips JSON from markdown code blocks and validates the command list type. Filtering out empty/non-string values is a good defensive measure.

225-235: Cache key properly includes validation state.

The [cortex-cache-validate={bool(validate)}] suffix ensures cached commands are segregated by validation setting, preventing validated/unvalidated result mixing.

138-170: Ollama integration uses standard library for HTTP, avoiding extra dependencies.

The 60-second timeout is reasonable for local LLM inference which can be slow. Error handling differentiates network issues from other failures.

tests/test_semantic_cache.py (3)

15-29: Test fixture setup is well-designed for isolation.

Using tempfile.mkdtemp() ensures each test run uses a fresh database, and tearDown properly cleans up. The explicit configuration values (max_entries=10, similarity_threshold=0.85) make tests deterministic.

161-189: LRU eviction test correctly verifies cache size enforcement.

The test fills the cache to capacity and verifies that adding one more entry triggers eviction while maintaining the max size.

191-214: Embedding and cosine similarity tests validate mathematical properties.

The tests verify embedding dimension, type, L2 normalization, and cosine similarity for edge cases (identical and orthogonal vectors).

tests/test_semantic_cache.py

sonarqubecloud · 2025-12-14T22:48:59Z

Quality Gate passed

Issues
3 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

mikejmorgan-ai · 2025-12-16T16:41:44Z

Review Request

@dhvll - This PR touches core files (cortex/cli.py, LLM/interpreter.py). Please review before merge.

CI Status: All checks passing
CodeRabbit: Review completed

feat: fix Semantic Caching with GPTCache for Offline Mode

b822144

Fixes cortexlinux#268

Copilot AI review requested due to automatic review settings December 14, 2025 12:20

Sahilbhatane requested a review from mikejmorgan-ai as a code owner December 14, 2025 12:20

Sahilbhatane self-assigned this Dec 14, 2025

Copilot started reviewing on behalf of Sahilbhatane December 14, 2025 12:20 View session

coderabbitai bot reviewed Dec 14, 2025

View reviewed changes

Copilot AI reviewed Dec 14, 2025

View reviewed changes

Add unit test file and reolve suggetions

7f7f97f

coderabbitai bot reviewed Dec 14, 2025

View reviewed changes

tests/test_semantic_cache.py Show resolved Hide resolved

Sahilbhatane added priority: critical Must have for MVP - work on these first MVP Killer feature sprint labels Dec 14, 2025

mikejmorgan-ai requested a review from dhvll December 16, 2025 16:41

mikejmorgan-ai merged commit ab8db91 into cortexlinux:main Dec 16, 2025
8 checks passed

mikejmorgan-ai mentioned this pull request Dec 16, 2025

feat(cli) : add offline diagnose command with safe fallback #300

Open

Uh oh!

feat: fix Semantic Caching with GPTCache for Offline Mode #299

feat: fix Semantic Caching with GPTCache for Offline Mode #299

Uh oh!

Conversation

Sahilbhatane commented Dec 14, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related Issue

Summary

Checklist

video-

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested labels

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sonarqubecloud bot commented Dec 14, 2025

Quality Gate passed

Uh oh!

mikejmorgan-ai commented Dec 16, 2025

Review Request

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Sahilbhatane commented Dec 14, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 14, 2025 •

edited

Loading