Skip to content

Adding Semantic Cache#6

Merged
dewitt4 merged 8 commits intomainfrom
semantic-cache
Oct 27, 2025
Merged

Adding Semantic Cache#6
dewitt4 merged 8 commits intomainfrom
semantic-cache

Conversation

@dewitt4
Copy link
Copy Markdown
Contributor

@dewitt4 dewitt4 commented Oct 27, 2025

No description provided.

@dewitt4 dewitt4 self-assigned this Oct 27, 2025
Copilot AI review requested due to automatic review settings October 27, 2025 18:27
@dewitt4 dewitt4 added documentation Improvements or additions to documentation enhancement New feature or request labels Oct 27, 2025
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements a two-tier semantic caching system that reduces AWS Bedrock costs by 30-50% through intelligent response caching using AWS Bedrock Titan Embeddings. The system attempts exact hash-based matching first (O(1) lookup), then falls back to embedding-based semantic similarity matching (cosine similarity with 0.95 threshold) to identify and serve cached responses for similar queries.

Key Changes:

  • Added embedding service integration with AWS Bedrock Titan Embeddings for generating 1536-dimensional vectors
  • Implemented vector similarity utilities (cosine similarity, Euclidean distance, batch operations) using NumPy
  • Enhanced cache service to support both exact and semantic matching with separate TTLs and comprehensive statistics tracking

Reviewed Changes

Copilot reviewed 10 out of 12 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
app/utils/vector.py New vector similarity utilities for cosine similarity, normalization, Euclidean distance, and batch operations
app/utils/init.py New utils package initialization
app/services/embeddings.py New embedding service using AWS Bedrock Titan Embeddings with retry logic and cost estimation
app/services/cache.py Enhanced cache service with semantic matching, dual-tier lookup strategy, and detailed statistics
app/models/schemas.py Updated cache statistics response schema with semantic-specific fields
app/core/config.py Added semantic cache and embedding configuration settings
app/api/v1/endpoints/cache.py Updated cache stats endpoint to return semantic cache metrics
SEMANTIC_CACHE.md Comprehensive documentation on semantic cache implementation, architecture, and usage
README.md Updated with semantic cache features and documentation links
.env.example Added semantic cache and embedding configuration variables

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread app/services/embeddings.py Outdated
Comment thread app/services/cache.py Outdated
Comment thread app/services/cache.py Outdated
if self.semantic_enabled:
# Generate unique cache ID
cache_id = hashlib.sha256(
f"{model_id}:{prompt}:{response.get('timestamp', '')}".encode()
Copy link

Copilot AI Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using response.get('timestamp', '') for cache_id generation is unreliable because the response may not have a timestamp field, resulting in duplicate cache IDs for the same model+prompt combination. This could cause cache collisions. Consider using a more reliable unique identifier such as uuid4() or the current timestamp from time.time().

Copilot uses AI. Check for mistakes.
Comment thread app/services/cache.py Outdated
Comment on lines +289 to +290
# $0.0001 per 1K tokens, average prompt ~200 tokens
embedding_cost = (semantic_hits * 200 / 1000) * 0.0001
Copy link

Copilot AI Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The magic number 200 for average prompt tokens is hardcoded. Consider extracting this as a configuration constant or calculating it from actual prompt lengths to improve accuracy of cost estimates.

Copilot uses AI. Check for mistakes.
Comment thread SEMANTIC_CACHE.md Outdated
dewitt4 and others added 4 commits October 27, 2025 11:29
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@dewitt4 dewitt4 requested a review from Copilot October 27, 2025 18:32
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 10 out of 12 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread app/services/cache.py Outdated
Comment thread app/services/cache.py Outdated
Comment thread app/services/cache.py
Comment thread app/services/embeddings.py Outdated
dewitt4 and others added 3 commits October 27, 2025 11:55
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@dewitt4 dewitt4 merged commit d6aa244 into main Oct 27, 2025
@dewitt4 dewitt4 deleted the semantic-cache branch October 27, 2025 18:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants