Adding Semantic Cache by dewitt4 · Pull Request #6 · AgentaFlow/fluxai

dewitt4 · 2025-10-27T18:27:54Z

No description provided.

Copilot

Pull Request Overview

This PR implements a two-tier semantic caching system that reduces AWS Bedrock costs by 30-50% through intelligent response caching using AWS Bedrock Titan Embeddings. The system attempts exact hash-based matching first (O(1) lookup), then falls back to embedding-based semantic similarity matching (cosine similarity with 0.95 threshold) to identify and serve cached responses for similar queries.

Key Changes:

Added embedding service integration with AWS Bedrock Titan Embeddings for generating 1536-dimensional vectors
Implemented vector similarity utilities (cosine similarity, Euclidean distance, batch operations) using NumPy
Enhanced cache service to support both exact and semantic matching with separate TTLs and comprehensive statistics tracking

Reviewed Changes

Copilot reviewed 10 out of 12 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
app/utils/vector.py	New vector similarity utilities for cosine similarity, normalization, Euclidean distance, and batch operations
app/utils/init.py	New utils package initialization
app/services/embeddings.py	New embedding service using AWS Bedrock Titan Embeddings with retry logic and cost estimation
app/services/cache.py	Enhanced cache service with semantic matching, dual-tier lookup strategy, and detailed statistics
app/models/schemas.py	Updated cache statistics response schema with semantic-specific fields
app/core/config.py	Added semantic cache and embedding configuration settings
app/api/v1/endpoints/cache.py	Updated cache stats endpoint to return semantic cache metrics
SEMANTIC_CACHE.md	Comprehensive documentation on semantic cache implementation, architecture, and usage
README.md	Updated with semantic cache features and documentation links
.env.example	Added semantic cache and embedding configuration variables

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-10-27T18:29:03Z

+            if self.semantic_enabled:
+                # Generate unique cache ID
+                cache_id = hashlib.sha256(
+                    f"{model_id}:{prompt}:{response.get('timestamp', '')}".encode()


Using response.get('timestamp', '') for cache_id generation is unreliable because the response may not have a timestamp field, resulting in duplicate cache IDs for the same model+prompt combination. This could cause cache collisions. Consider using a more reliable unique identifier such as uuid4() or the current timestamp from time.time().

Copilot · 2025-10-27T18:29:04Z

+            # $0.0001 per 1K tokens, average prompt ~200 tokens
+            embedding_cost = (semantic_hits * 200 / 1000) * 0.0001


The magic number 200 for average prompt tokens is hardcoded. Consider extracting this as a configuration constant or calculating it from actual prompt lengths to improve accuracy of cost estimates.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Pull Request Overview

Copilot reviewed 10 out of 12 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Adding Semantic Cache

323f88e

dewitt4 self-assigned this Oct 27, 2025

Copilot AI review requested due to automatic review settings October 27, 2025 18:27

dewitt4 added documentation Improvements or additions to documentation enhancement New feature or request labels Oct 27, 2025

Copilot AI reviewed Oct 27, 2025

View reviewed changes

dewitt4 and others added 4 commits October 27, 2025 11:29

Update app/services/embeddings.py

a7615b8

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update app/services/cache.py

12714d1

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update SEMANTIC_CACHE.md

40ae0ba

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

update cache.py

332153d

dewitt4 requested a review from Copilot October 27, 2025 18:32

Copilot AI reviewed Oct 27, 2025

View reviewed changes

Comment thread app/services/cache.py Outdated

Comment thread app/services/cache.py Outdated

Comment thread app/services/cache.py

Comment thread app/services/embeddings.py Outdated

dewitt4 and others added 3 commits October 27, 2025 11:55

Update app/services/cache.py

0c7d6aa

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update app/services/embeddings.py

8fa3290

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update app/services/cache.py

97a0e94

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

dewitt4 merged commit d6aa244 into main Oct 27, 2025

dewitt4 deleted the semantic-cache branch October 27, 2025 18:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding Semantic Cache#6

Adding Semantic Cache#6
dewitt4 merged 8 commits intomainfrom
semantic-cache

dewitt4 commented Oct 27, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI Oct 27, 2025

Uh oh!

Copilot AI Oct 27, 2025

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		# $0.0001 per 1K tokens, average prompt ~200 tokens
		embedding_cost = (semantic_hits * 200 / 1000) * 0.0001

Conversation

dewitt4 commented Oct 27, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants