Skip to content

chore(caching): improve semantic cache#198

Merged
giorgi-pro merged 17 commits intomainfrom
chore/improve_semantich_cache
Apr 3, 2026
Merged

chore(caching): improve semantic cache#198
giorgi-pro merged 17 commits intomainfrom
chore/improve_semantich_cache

Conversation

@giorgi-pro
Copy link
Copy Markdown
Collaborator

@giorgi-pro giorgi-pro commented Mar 31, 2026

Semantic cache: external vector backends (replace sqlite-vec)

Summary

Replaces sqlite-vec with explicit external vector backends for the semantic response cache: Qdrant, pgvector, Pinecone, and Weaviate. Semantic caching now requires a non-empty vector_store.type and the matching nested config; there is no embedded default store.

Changes

  • Config (config/config.go): Removed SQLiteVec; extended VectorStoreConfig with Pinecone and Weaviate; pgvector gains required dimension; validation branches per type; mergeSemanticResponseDefaults no longer defaults to sqlite-vec; env adds SEMANTIC_CACHE_* for pgvector/pinecone/weaviate and drops SEMANTIC_CACHE_SQLITE_PATH.
  • Vec stores (internal/responsecache/): Deleted sqlite-vec implementations; added Qdrant (REST, lazy collection, cosine), pgvector (pgx + extension/table), Pinecone (data-plane HTTP, metadata size guard), Weaviate (REST + GraphQL search); shared hourly DeleteExpired ticker via cleanup helper; NewVecStore switches on the four types only.
  • Dependencies: go mod tidy — removed sqlite-vec / unused sqlite driver deps tied to semantic cache (storage still uses modernc.org/sqlite where applicable).
  • Tests & examples: config tests and cache_validation_test updated for the new model; config.example.yaml documents all four backends (commented alternatives); checked-in config.yaml aligned if it had semantic/sqlite paths.
  • Docs: ADR-0006 updated for API-only embedder, CredentialResolvedProviders / base_url, vector matrix, TTL/cleanup, and consequences (no sqlite-vec / zero-infra story); CLAUDE.md / README note four backends and explicit vector_store.type; README roadmap treats semantic cache as shipped where appropriate.

Summary by CodeRabbit

  • New Features

    • Response caching split into exact-match (env opt-in) and semantic layers; semantic caching now requires an external embedding provider (no local embedder) and supports qdrant, pgvector, pinecone, and weaviate. Background expiration cleanup added for vector stores.
  • Infrastructure

    • Docker Compose supports infra-only vs full-stack via profiles; new Make targets to start infra or full stack. Helm exposes the simple-cache opt-in when Redis is enabled.
  • Documentation

    • README, examples, ADR, and roadmap updated to reflect new cache modes, env opt-ins, provider/embedder requirements, and supported vector backends.
  • Tests

    • Added/updated tests for semantic cache validation, env opt-ins, provider resolution, and vector-store behavior.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 31, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Response-cache config moved from env-tagged scalars to optional YAML pointer blocks; .env.template documents a RESPONSE_CACHE_SIMPLE_ENABLED opt-in. Semantic embeddings require an API provider (no local ONNX embedder) and explicit vector-store backends (qdrant, pgvector, pinecone, weaviate). Docker Compose was split into infra vs full-stack; provider resolution now exposes credential-resolved providers for embedder/cache.

Changes

Cohort / File(s) Summary
Cache config & examples
\.env.template, config/config.go, config/config.example.yaml, config/cache_validation_test.go, config/config_test.go
Response cache layers become optional pointer blocks (Simple, Semantic). .env.template adds commented RESPONSE_CACHE_SIMPLE_ENABLED opt-in. Removed many env tags from schema, require explicit vector_store.type, added pg/pinecone/weaviate configs and validation; tests updated to pointer shapes and new validation cases.
Embedding & provider plumbing
internal/embedding/embedding.go, internal/embedding/embedding_test.go, internal/providers/config.go, internal/providers/config_test.go, internal/providers/init.go
Embedder now requires a resolved provider key (rejects local), builds OpenAI-compatible embeddings endpoint from provider base_url, normalizes Gemini defaults. resolveProviders returns credential-filtered raw provider map; InitResult exposes CredentialResolvedProviders.
Response-cache middleware & app wiring
internal/responsecache/responsecache.go, internal/app/app.go, internal/app/app_test.go, internal/responsecache/handle_request_test.go
Middleware signature switched to accept credential-resolved providers; simple/semantic activation checks tightened (nil + enabled helper); initialization/teardown improved to close partial resources on error. App passes credential-resolved providers into cache construction.
Semantic parsing, fingerprinting & hashing
internal/responsecache/semantic.go, internal/responsecache/semantic_extract_test.go, internal/responsecache/semantic_test.go
Request parsing extended to support messages and input (string or array); conversation-invariant fingerprinting adjusted; params hashing expanded to include max_output_tokens, reasoning, and instructions. Unit tests added/updated.
Vector store core & helpers
internal/responsecache/vecstore.go, internal/responsecache/vecstore_helpers.go
NewVecStore requires explicit type (no sqlite default) and dispatches to supported backends. Added helpers: vecPointID64, pgvectorLiteral, trimSlash.
Vecstore implementations
internal/responsecache/vecstore_qdrant.go, internal/responsecache/vecstore_pgvector.go, internal/responsecache/vecstore_pinecone.go, internal/responsecache/vecstore_weaviate.go
Added production implementations for Qdrant, pgvector (pgx), Pinecone, and Weaviate (insert/search/delete, TTL handling, schema/collection validation, background cleanup hooks).
Cleanup loop
internal/responsecache/vecstore_cleanup.go
Added periodic background DeleteExpired runner (~1 hour) with cancel/wait semantics used by vecstores.
Removed sqlite vecstore & deps
internal/responsecache/vecstore_sqlite.go (deleted), go.mod
Removed embedded SQLite vecstore implementation and related ONNX/sqlite modules and replace directive.
Tests
internal/responsecache/*_test.go, internal/embedding/*_test.go, config/*_test.go, internal/providers/*_test.go
Updated and added tests to reflect pointer-based cache blocks, provider resolution return value, embedder endpoint behavior, semantic extraction/hashing, and vector-store validation.
Infra / Deployment & docs
docker-compose.yaml, Makefile, README.md, CLAUDE.md, helm/templates/configmap.yaml
Split Compose into infra vs app profile (--profile app), added make infra/make image, updated docs for semantic cache/provider/backends, and added RESPONSE_CACHE_SIMPLE_ENABLED: "true" to Helm ConfigMap when Redis enabled.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant Cache as Response Cache
    participant Sem as Semantic Layer
    participant Embed as Embedder (API)
    participant Provider as Resolved Provider
    participant Vec as Vector Store

    Client->>Cache: Request
    Cache->>Cache: Check exact (Redis) if simple enabled
    alt Simple hit
        Cache-->>Client: Return cached response
    else
        Cache->>Sem: Normalize input/messages, compute params hash
        Sem->>Embed: Request embedding for text
        Embed->>Provider: Use credential-resolved provider (base_url + /v1/embeddings)
        Embed-->>Sem: Embedding vector
        Sem->>Vec: Search by params_hash + similarity
        alt Semantic hit
            Vec-->>Sem: Return best response
            Sem-->>Cache: Provide cached response
            Cache-->>Client: Return cached response
        else
            Cache-->>Client: Forward to app
            Client->>Cache: App response
            Cache->>Sem: Insert vector+response with TTL
            Sem->>Vec: Upsert vector (store expires_at)
            Vec-->>Sem: Stored
            Cache-->>Client: Return response
        end
    end
    Note over Vec: Background cleanup deletes expired entries ~hourly
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

🐰 I hopped through configs, pointers in tow,

No local ONNX where embedders used to grow.
Redis peeks first, then vectors take flight,
Qdrant, pg, Pinecone, Weaviate—what a sight!
I tidy expired keys and nibble a byte.

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 19.54% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'chore(caching): improve semantic cache' is vague and generic, using 'improve' without specifying the key architectural change of replacing embedded sqlite-vec with external vector backends. Consider a more descriptive title such as 'chore(caching): replace embedded sqlite-vec with external vector stores' to clearly communicate the main change.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch chore/improve_semantich_cache

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@giorgi-pro giorgi-pro marked this pull request as draft March 31, 2026 23:06
@giorgi-pro giorgi-pro self-assigned this Mar 31, 2026
@giorgi-pro giorgi-pro marked this pull request as ready for review March 31, 2026 23:06
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
internal/responsecache/responsecache.go (1)

58-68: ⚠️ Potential issue | 🟠 Major

Tear down the simple cache on semantic-cache init errors.

If exact-cache initialization succeeds and semantic init later fails, the Redis store already held by m.simple is leaked. That is a common startup misconfiguration path and leaves open connections behind.

♻️ Proposed fix
 	sem := cfg.Semantic
 	if sem != nil && config.SemanticCacheActive(sem) {
 		emb, err := embedding.NewEmbedder(sem.Embedder, resolvedProviders)
 		if err != nil {
+			if m.simple != nil {
+				_ = m.simple.close()
+			}
 			return nil, err
 		}
 		vs, err := NewVecStore(sem.VectorStore)
 		if err != nil {
 			_ = emb.Close()
+			if m.simple != nil {
+				_ = m.simple.close()
+			}
 			return nil, err
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/responsecache/responsecache.go` around lines 58 - 68, The
simple-cache (m.simple) is leaked when semantic-cache init fails; ensure you
tear down m.simple on any early return during semantic initialization by calling
its close/teardown method (e.g., m.simple.Close() or the appropriate shutdown
method) before returning; specifically update the semantic init path around
embedding.NewEmbedder, NewVecStore, and newSemanticCacheMiddleware so that if
embedding.NewEmbedder(sem.Embedder, ...), NewVecStore(sem.VectorStore) or
subsequent steps return an error you first close/teardown m.simple (and also
close emb when created) to avoid leaving open Redis connections.
internal/responsecache/semantic.go (1)

38-45: ⚠️ Potential issue | 🟠 Major

Hash the effective embedder identity, not the raw config.

internal/embedding.NewEmbedder trims/defaults/normalizes the model, but embedderIdentity still hashes cfg.Embedder.Model verbatim. That splits the semantic namespace for the same actual embedder ("" vs an explicit default model, whitespace variants, Gemini normalization), and every vecstore lookup matches params_hash exactly. Please derive this from the resolved provider plus the effective model actually used by the embedder.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/responsecache/semantic.go` around lines 38 - 45, The
embedderIdentity is currently built from cfg.Embedder.Model (in
newSemanticCacheMiddleware) which can differ from the actual normalized model
used by embedding.NewEmbedder; change it to derive the identity from the
resolved embedder instance (the emb parameter) — i.e., obtain the embedder's
effective provider and model from emb (using the embedder's API such as its
resolved Provider/Model or Identity accessor) and hash/concatenate those values
into semanticCacheMiddleware.embedderIdentity so lookups use the embedder's
normalized identity rather than the raw config.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@config/config.go`:
- Around line 538-540: The validator currently treats zero as missing because
defaults are applied during merge, breaking intended semantics where sem.TTL ==
0 and sem.MaxConversationMessages == 0 are valid; change the semantic cache
config to track presence instead of using zero-as-missing (e.g., make TTL and
MaxConversationMessages pointer types or add explicit boolean flags like
TTLSet/MaxConversationMessagesSet) so merges preserve an explicit 0, update the
merge/constructor logic that sets defaults to only apply defaults when the
presence flag or pointer is nil, and keep the existing validator check (sem.TTL
< 0) but use sem.TTL presence to decide whether to use a default value.
- Around line 618-624: The env parsing silently ignores malformed numeric values
(e.g., REDIS_TTL_RESPONSES) causing silent fallback to defaults; update the
parsing blocks (e.g., handling REDIS_TTL_RESPONSES into RedisResponseConfig.TTL
and the other numeric envs in the 646-705 range) to validate errors from
strconv.Atoi/ParseFloat and fail-fast on parse errors—propagate or return a
startup error (or log and exit) instead of swallowing the error, and ensure
behavior is consistent with applyEnvOverrides by referencing the same
validation/failure pattern for functions/structs like RedisResponseConfig,
applyEnvOverrides and the Redis TTL/threshold/dimension env handlers.
- Around line 554-561: The environment override for
RESPONSE_CACHE_SIMPLE_ENABLED isn't being applied when resp.Simple already
exists because the merge helper skips updating the existing SimpleCacheConfig;
update the merge/override logic to explicitly set or overwrite s.Enabled on an
existing *SimpleCacheConfig based on the env var (i.e., when parsing the env
var, write the boolean into resp.Simple.Enabled even if resp.Simple != nil) so
SimpleCacheEnabled will observe the override, and apply the same change to the
other analogous merge helper referenced (the same pattern at the other
SimpleCache handling site).

In `@internal/responsecache/vecstore_cleanup.go`:
- Around line 17-43: The cleanup goroutine started by startVecCleanup currently
uses context.Background() and can block vecCleanup.close() forever; change the
lifecycle to be caller-controlled by (1) changing startVecCleanup to accept a
context (ctx) or instead expose a Run(ctx) / Start(ctx) method on vecCleanup so
the caller provides the cancelable context, (2) call store.DeleteExpired(ctx)
(not context.Background()) inside the ticker loop so the backend call is
cancellable, (3) stop creating a detached goroutine that hides work—have the
caller start the goroutine using the provided ctx or provide a Cancel/Close that
cancels the ctx so wg.Wait() cannot block, and (4) ensure any errors returned
from DeleteExpired are wrapped/returned as typed core.GatewayError (rather than
only logged) so callers can handle them; update references to
vecStoreCleanupInterval, startVecCleanup, vecCleanup.close, and DeleteExpired
accordingly.

In `@internal/responsecache/vecstore_pgvector.go`:
- Around line 53-61: The table DDL only defines PRIMARY KEY (cache_key,
params_hash) so queries that filter by params_hash (used by Search) and by
expires_at (used by DeleteExpired) cannot use efficient b-tree indexes; add
separate indexes on params_hash and expires_at (e.g., CREATE INDEX ON <table>
(params_hash); CREATE INDEX ON <table> (expires_at);) so Search and
DeleteExpired can use index scans; update the DDL building code that creates the
table (the ddl variable in vecstore_pgvector.go) to include these CREATE INDEX
statements or include them as part of migration logic.
- Around line 66-67: The constructor currently starts a detached cleanup
goroutine via startVecCleanup(s) which calls DeleteExpired(context.Background())
and only logs errors; remove the call to startVecCleanup from the constructor
(do not hide cleanup work), instead expose an explicit, caller-invokable cleanup
function or method on the returned store (e.g., VecStore.Cleanup or return the
cleanup func) that accepts a context; implement that cleanup to call
s.DeleteExpired(ctx) and propagate any error as a typed core.GatewayError rather
than silently logging it so callers control cancellation and error handling;
update references to the cleanup field (s.cleanup) and startVecCleanup to match
this new explicit, context-respecting API.

In `@internal/responsecache/vecstore_pinecone.go`:
- Around line 127-183: Search currently removes expired matches only after
Pinecone returns results in func (s *pineconeStore) Search, which can make
limit==1 miss live neighbors; fix by adding an expires_at filter to the query
body (or by over-fetching until you collect limit live results). Concretely, in
Search update the request body construction (the body variable) to include a
filter condition for expires_at > now (compute now := time.Now().Unix() and add
something like "expires_at": {"$gt": now} alongside the existing "params_hash"
filter) so Pinecone excludes expired vectors server-side; alternatively
implement over-fetching by increasing "topK" (e.g., limit*3 with a safe cap) and
in the parsed.Matches loop collect up to limit live VecResult entries before
returning. Ensure the change references body, topK, and the parsed.Matches
processing in Search.
- Around line 44-52: The constructor for pineconeStore (where pineconeStore is
created and s.cleanup = startVecCleanup(s) is invoked) must not start background
cleanup work; remove the call to startVecCleanup from the vecstore constructor
and stop spawning detached goroutines there. Instead expose a method (or return
the store without scheduling) so the owning service can explicitly start the
DeleteExpired scheduling via startVecCleanup(s) or a StartCleanup method during
service startup and call the store.Close/cleanup function during shutdown; apply
the same change pattern to the other new vecstores so cleanup scheduling is
owned by the service lifecycle and ensure returned errors follow the typed
core.GatewayError convention rather than hiding async failures.

In `@internal/responsecache/vecstore_qdrant.go`:
- Around line 76-111: In ensureCollection (method on qdrantStore) change the GET
response handling to parse the response body JSON when resp.StatusCode ==
http.StatusOK, extract the existing vectors.size (and ensure distance if
desired), compare it to dim and return an error if they differ (instead of
silently accepting); only attempt to create the collection when resp.StatusCode
== http.StatusNotFound (404); for other non-200 responses (e.g., auth/outage)
return an error including status and response body; keep updating s.vectorSize =
dim after successful validation or creation.

In `@internal/responsecache/vecstore_weaviate.go`:
- Around line 170-193: Update the GraphQL queries so they filter out expired
rows instead of relying on post-fetch filtering: in the Search method build a
where clause that includes an additional condition on "expires_at" (e.g.
operator GreaterThan / valueDateTime compared to time.Now().UTC() formatted
RFC3339) alongside the params_hash condition so expired entries are never
returned by the query; make the same change to the other query block referenced
around the 267-269 diff. Locate and modify the Search function and the other
GraphQL-builder (both of which call runGraphQLSearch) to inject the current UTC
timestamp into the where filter for expires_at.
- Around line 93-121: The current logic attempts to create the Weaviate class on
any non-200 from the GET /v1/schema/{class}; change it so only a 404 triggers
the POST create flow: after the initial GET (where resp is returned), read
resp.Body into raw and if resp.StatusCode == http.StatusOK return nil; if
resp.StatusCode == http.StatusNotFound continue to build and POST the schema
(existing req2/resp2 flow); otherwise return an error (e.g.,
fmt.Errorf("vecstore weaviate: get class: %s: %s", resp.Status, raw)) so
401/403/5xx and transient errors are surfaced instead of attempting a create.
Ensure you reference resp, resp.Body, s.class and s.baseURL when implementing
this.

---

Outside diff comments:
In `@internal/responsecache/responsecache.go`:
- Around line 58-68: The simple-cache (m.simple) is leaked when semantic-cache
init fails; ensure you tear down m.simple on any early return during semantic
initialization by calling its close/teardown method (e.g., m.simple.Close() or
the appropriate shutdown method) before returning; specifically update the
semantic init path around embedding.NewEmbedder, NewVecStore, and
newSemanticCacheMiddleware so that if embedding.NewEmbedder(sem.Embedder, ...),
NewVecStore(sem.VectorStore) or subsequent steps return an error you first
close/teardown m.simple (and also close emb when created) to avoid leaving open
Redis connections.

In `@internal/responsecache/semantic.go`:
- Around line 38-45: The embedderIdentity is currently built from
cfg.Embedder.Model (in newSemanticCacheMiddleware) which can differ from the
actual normalized model used by embedding.NewEmbedder; change it to derive the
identity from the resolved embedder instance (the emb parameter) — i.e., obtain
the embedder's effective provider and model from emb (using the embedder's API
such as its resolved Provider/Model or Identity accessor) and hash/concatenate
those values into semanticCacheMiddleware.embedderIdentity so lookups use the
embedder's normalized identity rather than the raw config.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7e5a1c4e-d446-4536-adbf-54a07dac3d89

📥 Commits

Reviewing files that changed from the base of the PR and between ce6b390 and 7ef8d6b.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (31)
  • .env.template
  • CLAUDE.md
  • Makefile
  • README.md
  • config/cache_validation_test.go
  • config/config.example.yaml
  • config/config.go
  • config/config_test.go
  • docker-compose.yaml
  • docs/adr/0006-semantic-response-cache.md
  • go.mod
  • helm/templates/configmap.yaml
  • internal/app/app.go
  • internal/embedding/embedding.go
  • internal/embedding/embedding_test.go
  • internal/providers/config.go
  • internal/providers/config_test.go
  • internal/providers/init.go
  • internal/responsecache/handle_request_test.go
  • internal/responsecache/responsecache.go
  • internal/responsecache/semantic.go
  • internal/responsecache/semantic_extract_test.go
  • internal/responsecache/semantic_test.go
  • internal/responsecache/vecstore.go
  • internal/responsecache/vecstore_cleanup.go
  • internal/responsecache/vecstore_helpers.go
  • internal/responsecache/vecstore_pgvector.go
  • internal/responsecache/vecstore_pinecone.go
  • internal/responsecache/vecstore_qdrant.go
  • internal/responsecache/vecstore_sqlite.go
  • internal/responsecache/vecstore_weaviate.go
💤 Files with no reviewable changes (4)
  • internal/responsecache/semantic_test.go
  • internal/responsecache/handle_request_test.go
  • go.mod
  • internal/responsecache/vecstore_sqlite.go

Comment thread config/config.go Outdated
Comment thread config/config.go
Comment thread config/config.go
Comment thread internal/responsecache/vecstore_cleanup.go
Comment thread internal/responsecache/vecstore_pgvector.go
Comment thread internal/responsecache/vecstore_pinecone.go
Comment thread internal/responsecache/vecstore_pinecone.go
Comment thread internal/responsecache/vecstore_qdrant.go Outdated
Comment thread internal/responsecache/vecstore_weaviate.go
Comment thread internal/responsecache/vecstore_weaviate.go
# Conflicts:
#	go.mod
#	go.sum
#	internal/app/app.go
#	internal/embedding/embedding.go
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/app/app.go`:
- Line 341: The review says the extra warning about missing providers is
redundant; remove any added warning/log around
responsecache.NewResponseCacheMiddleware or embedding.NewEmbedder and rely on
the existing synchronous error returned from
responsecache.NewResponseCacheMiddleware (rcm, err) which already surfaces
embedding.NewEmbedder's detailed error that includes the provider name and
troubleshooting guidance; keep the current error propagation flow using
providerResult.CredentialResolvedProviders and do not introduce an additional
log or warning for missing providers.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 144af659-e38c-40b2-9139-af171b554b46

📥 Commits

Reviewing files that changed from the base of the PR and between 7ef8d6b and f40fb62.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (9)
  • CLAUDE.md
  • Makefile
  • go.mod
  • internal/app/app.go
  • internal/app/app_test.go
  • internal/embedding/embedding.go
  • internal/responsecache/handle_request_test.go
  • internal/responsecache/semantic.go
  • internal/responsecache/semantic_test.go
💤 Files with no reviewable changes (3)
  • internal/responsecache/handle_request_test.go
  • internal/responsecache/semantic_test.go
  • go.mod

Comment thread internal/app/app.go Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

♻️ Duplicate comments (1)
internal/responsecache/vecstore_qdrant.go (1)

85-113: ⚠️ Potential issue | 🟠 Major

Handle collection GET statuses explicitly and validate existing vector size before accepting.

Current flow treats any non-200 as create-path and accepts 200 without checking existing vectors.size. This can mask auth/outage errors and delay dimension mismatch failure until later operations.

Suggested fix
-	resp, err := s.req(ctx, http.MethodGet, "/collections/"+s.collection, nil)
+	resp, err := s.req(ctx, http.MethodGet, "/collections/"+s.collection, nil)
 	if err != nil {
 		return err
 	}
-	defer resp.Body.Close()
-	if _, err := io.Copy(io.Discard, resp.Body); err != nil {
-		return err
-	}
+	raw, _ := io.ReadAll(resp.Body)
+	resp.Body.Close()
 	if resp.StatusCode == http.StatusOK {
+		var meta struct {
+			Result struct {
+				Config struct {
+					Params struct {
+						Vectors struct {
+							Size int `json:"size"`
+						} `json:"vectors"`
+					} `json:"params"`
+				} `json:"config"`
+			} `json:"result"`
+		}
+		if err := json.Unmarshal(raw, &meta); err != nil {
+			return fmt.Errorf("vecstore qdrant: decode collection: %w", err)
+		}
+		if meta.Result.Config.Params.Vectors.Size != dim {
+			return fmt.Errorf("vecstore qdrant: collection %q has vector size %d, expected %d",
+				s.collection, meta.Result.Config.Params.Vectors.Size, dim)
+		}
 		s.vectorSize = dim
 		return nil
 	}
+	if resp.StatusCode != http.StatusNotFound {
+		return fmt.Errorf("vecstore qdrant: get collection: %s: %s", resp.Status, string(raw))
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/responsecache/vecstore_qdrant.go` around lines 85 - 113, When
initializing collection in vecstore_qdrant.go, change the GET/handle logic (the
block using s.req with http.MethodGet and s.vectorSize) to explicitly handle
response statuses: if 200, parse the response JSON to validate that the existing
"vectors.size" equals the provided dim and return an error if it mismatches; if
404, proceed to the create-path (the subsequent http.MethodPut call) to create
the collection; for any other non-200/404 status return an immediate error
including resp.Status and body to surface auth/outage issues. Keep setting
s.vectorSize = dim only after successful validation or successful creation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/responsecache/vecstore_qdrant.go`:
- Around line 204-210: Validate required payload fields on each hit before
decoding and appending: check that hit.Payload["response_b64"] exists and is a
non-empty string and that hit.Payload["cache_key"] is a string (and any other
required fields) before calling base64.StdEncoding.DecodeString or appending to
out; if validation fails, skip the hit. Apply this guard in the loop around the
existing code that sets b64, rb, k and appends VecResult (use the same variables
and the VecResult constructor), so you only decode and append when the types and
values are present and valid.
- Around line 199-203: The loop over parsed.Result in the Search implementation
(the code using parsed.Result, payloadInt64 and now) currently skips expired
points instead of removing them; modify it to collect expired point IDs (from
hit.ID) as you detect them (where exp > 0 && exp < now) and issue a delete call
to the Qdrant client (use the same client method used elsewhere for deletions,
e.g., DeletePoints/Delete with the collection name and the collected IDs) rather
than just continue; ensure you handle and log any delete errors but do not let
delete failures break the search response, and keep the existing
payloadInt64/parsed.Result logic and now check intact.
- Line 45: The constructor currently starts detached background work by
assigning s.cleanup = startVecCleanup(s); remove that call from the constructor
and do not spawn background goroutines there; instead add an explicit lifecycle
API (for example a method StartVecCleanup(ctx context.Context) error or
StartVecCleanup(ctx context.Context) core.GatewayError on the same type) which
invokes startVecCleanup with the provided context and returns a typed error
rather than hiding failures, and ensure you provide a corresponding Stop/Close
method to cancel the cleanup (or return a cleanup cancel function) so callers
manage context, error handling, and cancellation explicitly rather than via the
constructor.

In `@internal/responsecache/vecstore_weaviate.go`:
- Around line 127-130: Replace the hardcoded DNS namespace UUID in objectID with
a custom namespace constant specific to this cache: define a package-level
constant (e.g., cacheNamespaceUUID) holding a freshly-generated UUID and use
uuid.MustParse(cacheNamespaceUUID) in the objectID method; update the
weaviateStore.objectID function to reference that constant so the cache uses a
semantically distinct namespace instead of the RFC4122 DNS namespace.
- Around line 273-276: Add a debug/warn log when base64 decoding fails instead
of silently continuing: in the block where rb, err :=
base64.StdEncoding.DecodeString(it.ResponseB64) is attempted (use the
surrounding function handling Weaviate result items), log the error and the
offending it.ResponseB64 (or its identifier) via slog (add "log/slog" to
imports) before continuing so decode failures are visible for troubleshooting.
- Around line 277-289: The current code mixes Weaviate's legacy certainty and
modern distance semantics (score variable, it.Additional.Certainty,
it.Additional.Distance), causing inconsistent thresholds; remove the legacy
certainty fallback and standardize on distance-based scoring: compute score = 1
- distance (clamped 0..1) when it.Additional.Distance is present, and if
Distance is absent, either skip the result or set score to 0 and log a warning
so we don't mix incompatible certainty semantics into threshold logic. Update
the block referencing score / it.Additional.Certainty / it.Additional.Distance
accordingly.
- Around line 295-312: The current weaviateStore.DeleteExpired builds a GraphQL
delete mutation (unsupported) and returns plain errors; replace it to call the
REST batch delete endpoint by POST/DELETE against s.baseURL+"/v1/batch/objects"
with a JSON body containing the "match" object for class s.class and the "where"
operands (expires_at > 0 and expires_at < now), marshal that body and create an
http.NewRequestWithContext(ctx, http.MethodDelete, ... , bytes.NewReader(b)),
set Content-Type: application/json, invoke s.authHeader(req), perform the
request via s.httpClient.Do(req), read resp body, and convert all failures into
core.GatewayError using core.NewInvalidRequestError for marshal errors and
core.NewProviderError ("weaviate", appropriate HTTP status like 500 or 502,
message, err) for request/response errors; return nil on 2xx. Also add the
"gomodel/core" import.

---

Duplicate comments:
In `@internal/responsecache/vecstore_qdrant.go`:
- Around line 85-113: When initializing collection in vecstore_qdrant.go, change
the GET/handle logic (the block using s.req with http.MethodGet and
s.vectorSize) to explicitly handle response statuses: if 200, parse the response
JSON to validate that the existing "vectors.size" equals the provided dim and
return an error if it mismatches; if 404, proceed to the create-path (the
subsequent http.MethodPut call) to create the collection; for any other
non-200/404 status return an immediate error including resp.Status and body to
surface auth/outage issues. Keep setting s.vectorSize = dim only after
successful validation or successful creation.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: fd2abc8f-566c-4ec5-9d2d-33992d32557b

📥 Commits

Reviewing files that changed from the base of the PR and between f40fb62 and 6a68f14.

📒 Files selected for processing (2)
  • internal/responsecache/vecstore_qdrant.go
  • internal/responsecache/vecstore_weaviate.go

Comment thread internal/responsecache/vecstore_qdrant.go
Comment thread internal/responsecache/vecstore_qdrant.go
Comment thread internal/responsecache/vecstore_qdrant.go
Comment thread internal/responsecache/vecstore_weaviate.go
Comment thread internal/responsecache/vecstore_weaviate.go
Comment thread internal/responsecache/vecstore_weaviate.go
Comment thread internal/responsecache/vecstore_weaviate.go
Comment thread config/config.go
# Conflicts:
#	internal/app/app.go
#	internal/responsecache/responsecache.go
#	internal/responsecache/semantic.go
#	internal/responsecache/semantic_test.go
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (4)
internal/responsecache/vecstore_weaviate.go (2)

292-304: ⚠️ Potential issue | 🟠 Major

Don't mix certainty with distance-derived scores.

Weaviate documents distance as the preferred vector metric and certainty as a cosine-only normalized compatibility field. Using whichever one happens to be present means the same hit can cross your semantic threshold differently depending on server behavior; normalize from distance only, or reject results that don't return it. (docs.weaviate.io)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/responsecache/vecstore_weaviate.go` around lines 292 - 304, The code
mixes Weaviate's `certainty` with `distance` when computing `score` (see `score`
assignment and `it.Additional.Certainty` / `it.Additional.Distance`); change
this to rely exclusively on `distance`: if `it.Additional.Distance` is present,
compute score = 1 - float32(*it.Additional.Distance) and clamp to [0,1]; if
`it.Additional.Distance` is nil, treat the hit as invalid (skip it or return an
error) instead of using `certainty`, ensuring downstream logic only sees scores
derived from distance.

310-357: ⚠️ Potential issue | 🔴 Critical

Move expiry cleanup off the GraphQL path.

The current Weaviate docs describe /v1/graphql as the query/search surface, while object deletion is documented under the object-management APIs. Posting a Delete mutation here makes expiry cleanup depend on an undocumented path instead of the supported delete API, so stale cache rows are likely to accumulate once cleanup starts running. (docs.weaviate.io)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/responsecache/vecstore_weaviate.go` around lines 310 - 357, The
DeleteExpired function currently issues a GraphQL Delete mutation (in
weaviateStore.DeleteExpired) which uses /v1/graphql; change it to use Weaviate's
object-management REST API instead: query for expired object IDs using the
Objects API (filter by class s.class and expires_at < now) and then delete them
via the object-management delete endpoint (DELETE /v1/objects/{id} or use the
batch delete endpoint if available), calling s.authHeader(req) and s.httpClient
for each request, handling non-2xx responses and JSON error envelopes the same
way as before; update the function to build and issue the REST query + per-ID
deletions (or batch deletion), collect and return any errors, and remove the
GraphQL mutation construction and GraphQL POST to s.baseURL+"/v1/graphql".
internal/responsecache/vecstore_qdrant.go (2)

103-113: ⚠️ Potential issue | 🟠 Major

Reject existing collections you can't validate.

If the GET response can't be decoded, or vectors isn't the expected single-vector shape, this branch still sets s.vectorSize = dim and returns success. That allows an incompatible pre-existing collection through startup and defers the failure to the first upsert/search.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/responsecache/vecstore_qdrant.go` around lines 103 - 113, The code
currently sets s.vectorSize = dim even when the GET response or the vectors
field cannot be decoded; change this so we reject the collection when validation
fails: when json.Unmarshal(raw, &info) returns an error return a descriptive
error (including s.collection and the unmarshal error), and if
json.Unmarshal(info.Result.Config.Params.Vectors, &singleVec) fails or
singleVec.Size <= 0 return a descriptive error indicating we could not validate
the collection's vectors; only set s.vectorSize = dim after successfully
validating singleVec.Size and confirming it matches dim (or after validating it
in the existing comparisons).

189-226: ⚠️ Potential issue | 🟠 Major

Expired nearest neighbors can still force false misses.

semantic.go calls Search(..., 1), but this query filters only on params_hash and drops expired hits after the fetch. If the top hit is expired, you return an empty result even when the next live vector would satisfy the threshold. Add expiry to the Qdrant filter or over-fetch until you collect limit live hits.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/responsecache/vecstore_qdrant.go` around lines 189 - 226, The search
currently filters only by params_hash and drops expired hits after fetch, which
can let an expired top hit hide valid neighbors; update the Qdrant query to
filter out expired points server-side by adding an expires_at range condition
(e.g. include a "range" clause on "expires_at" with "gte": now) to the filter
map used when building body (refer to paramsHash and the body variable) so the
response's parsed.Result returns only non-expired points; alternatively
implement over-fetching in the same Search routine by requesting more than limit
and looping (using s.req, qdrantSearchResp/parsed.Result and payloadInt64 to
skip expired entries) until you collect limit live VecResult entries or exhaust
results.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@config/config.go`:
- Around line 651-654: The block that calls os.Getenv("SEMANTIC_CACHE_ENABLED")
and re-parses it is redundant because the same variable was already retrieved
and parsed into v and enabledKeySet earlier; remove the second getenv branch and
instead set sem.Enabled using the previously computed value (use enabledKeySet
to check presence and reuse the boolean parsed from v via parseBool), updating
the sem.Enabled pointer accordingly—refer to the variables enabledKeySet, v,
parseBool and the sem.Enabled assignment in the surrounding code to locate and
apply the change.

In `@internal/responsecache/semantic.go`:
- Around line 436-439: The code currently writes req.Reasoning bytes directly
into the hasher (h.Write(req.Reasoning)), which preserves whitespace and key
order; instead, parse req.Reasoning into an interface{} (or
map[string]interface{}), re-marshal it with json.Marshal to produce a stable
canonical representation, and write that marshaled bytes to h via h.Write; if
json.Unmarshal fails, fall back to writing the original req.Reasoning to avoid
breaking behavior. Apply this change in the block that currently checks
len(req.Reasoning) and calls h.Write, so the hashing uses the re-marshaled JSON.

---

Duplicate comments:
In `@internal/responsecache/vecstore_qdrant.go`:
- Around line 103-113: The code currently sets s.vectorSize = dim even when the
GET response or the vectors field cannot be decoded; change this so we reject
the collection when validation fails: when json.Unmarshal(raw, &info) returns an
error return a descriptive error (including s.collection and the unmarshal
error), and if json.Unmarshal(info.Result.Config.Params.Vectors, &singleVec)
fails or singleVec.Size <= 0 return a descriptive error indicating we could not
validate the collection's vectors; only set s.vectorSize = dim after
successfully validating singleVec.Size and confirming it matches dim (or after
validating it in the existing comparisons).
- Around line 189-226: The search currently filters only by params_hash and
drops expired hits after fetch, which can let an expired top hit hide valid
neighbors; update the Qdrant query to filter out expired points server-side by
adding an expires_at range condition (e.g. include a "range" clause on
"expires_at" with "gte": now) to the filter map used when building body (refer
to paramsHash and the body variable) so the response's parsed.Result returns
only non-expired points; alternatively implement over-fetching in the same
Search routine by requesting more than limit and looping (using s.req,
qdrantSearchResp/parsed.Result and payloadInt64 to skip expired entries) until
you collect limit live VecResult entries or exhaust results.

In `@internal/responsecache/vecstore_weaviate.go`:
- Around line 292-304: The code mixes Weaviate's `certainty` with `distance`
when computing `score` (see `score` assignment and `it.Additional.Certainty` /
`it.Additional.Distance`); change this to rely exclusively on `distance`: if
`it.Additional.Distance` is present, compute score = 1 -
float32(*it.Additional.Distance) and clamp to [0,1]; if `it.Additional.Distance`
is nil, treat the hit as invalid (skip it or return an error) instead of using
`certainty`, ensuring downstream logic only sees scores derived from distance.
- Around line 310-357: The DeleteExpired function currently issues a GraphQL
Delete mutation (in weaviateStore.DeleteExpired) which uses /v1/graphql; change
it to use Weaviate's object-management REST API instead: query for expired
object IDs using the Objects API (filter by class s.class and expires_at < now)
and then delete them via the object-management delete endpoint (DELETE
/v1/objects/{id} or use the batch delete endpoint if available), calling
s.authHeader(req) and s.httpClient for each request, handling non-2xx responses
and JSON error envelopes the same way as before; update the function to build
and issue the REST query + per-ID deletions (or batch deletion), collect and
return any errors, and remove the GraphQL mutation construction and GraphQL POST
to s.baseURL+"/v1/graphql".
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: a514c3c2-22d5-48df-be3e-10559d5ec0fc

📥 Commits

Reviewing files that changed from the base of the PR and between 6a68f14 and 670e621.

📒 Files selected for processing (14)
  • config/cache_validation_test.go
  • config/config.go
  • internal/app/app.go
  • internal/app/app_test.go
  • internal/embedding/embedding.go
  • internal/responsecache/handle_request_test.go
  • internal/responsecache/responsecache.go
  • internal/responsecache/semantic.go
  • internal/responsecache/semantic_test.go
  • internal/responsecache/vecstore_cleanup.go
  • internal/responsecache/vecstore_pgvector.go
  • internal/responsecache/vecstore_pinecone.go
  • internal/responsecache/vecstore_qdrant.go
  • internal/responsecache/vecstore_weaviate.go

Comment thread config/config.go Outdated
Comment thread internal/responsecache/semantic.go
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
internal/responsecache/vecstore_qdrant.go (4)

228-234: ⚠️ Potential issue | 🟡 Minor

Require payload fields before treating a hit as valid.

If response_b64 is missing, DecodeString("") succeeds and appends an empty cached response; missing cache_key becomes "" as well. Skip the hit unless both fields exist, are strings, and response_b64 is non-empty.

Suggested fix
-		b64, _ := hit.Payload["response_b64"].(string)
-		rb, err := base64.StdEncoding.DecodeString(b64)
+		b64, ok := hit.Payload["response_b64"].(string)
+		if !ok || b64 == "" {
+			continue
+		}
+		k, ok := hit.Payload["cache_key"].(string)
+		if !ok || k == "" {
+			continue
+		}
+		rb, err := base64.StdEncoding.DecodeString(b64)
 		if err != nil {
 			continue
 		}
-		k, _ := hit.Payload["cache_key"].(string)
 		out = append(out, VecResult{
 			Key:      k,
 			Score:    float32(hit.Score),
 			Response: rb,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/responsecache/vecstore_qdrant.go` around lines 228 - 234, Validate
payload fields before decoding: when iterating hits, check that hit.Payload
contains "response_b64" and "cache_key" as non-empty strings (use the map access
on hit.Payload to assert both are strings and that response_b64 != ""), and only
then call base64.StdEncoding.DecodeString on response_b64; if validation or
decoding fails, skip the hit. Update the block that builds VecResult (where out
is appended) to rely on these validated values so empty responses or missing
cache_key are not appended.

45-45: ⚠️ Potential issue | 🟠 Major

Do not start cleanup goroutines from newQdrantStore.

startVecCleanup(s) hides background work inside object construction, so callers cannot bind cleanup to their own context or decide when it should start. Move this to an explicit caller-managed lifecycle step instead of spawning it here.

As per coding guidelines "Do not hide work in detached goroutines; respect context synchronously and return typed core.GatewayError values".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/responsecache/vecstore_qdrant.go` at line 45, newQdrantStore
currently calls startVecCleanup(s) and spawns background work; remove that call
so construction is side-effect free and s.cleanup remains unstarted. Instead
expose an explicit lifecycle API (e.g., a StartVecCleanup(ctx context.Context)
method on the Qdrant store or return a cleanup-start function) that callers
invoke with their context; ensure startVecCleanup accepts context and returns a
typed core.GatewayError (or nil) rather than hiding goroutines, and set
s.cleanup only when the caller successfully starts cleanup so the caller
controls cancellation and lifecycle.

103-113: ⚠️ Potential issue | 🟠 Major

Fail fast when the existing collection config cannot be decoded.

On 200 OK, any decode failure or missing vectors.size still falls through to s.vectorSize = dim. That still accepts incompatible pre-existing collections and defers the breakage to later search/upsert calls. Return an error unless the current vector size is actually decoded and matches dim.

Suggested fix
-		if err := json.Unmarshal(raw, &info); err == nil {
-			var singleVec struct {
-				Size int `json:"size"`
-			}
-			if json.Unmarshal(info.Result.Config.Params.Vectors, &singleVec) == nil && singleVec.Size > 0 {
-				if singleVec.Size != dim {
-					return fmt.Errorf("vecstore qdrant: collection %q has vector size %d but embedder produces %d", s.collection, singleVec.Size, dim)
-				}
-			}
-		}
+		if err := json.Unmarshal(raw, &info); err != nil {
+			return fmt.Errorf("vecstore qdrant: decode collection: %w", err)
+		}
+		var singleVec struct {
+			Size int `json:"size"`
+		}
+		if err := json.Unmarshal(info.Result.Config.Params.Vectors, &singleVec); err != nil || singleVec.Size == 0 {
+			return fmt.Errorf("vecstore qdrant: collection %q returned an unsupported vector config", s.collection)
+		}
+		if singleVec.Size != dim {
+			return fmt.Errorf("vecstore qdrant: collection %q has vector size %d but embedder produces %d", s.collection, singleVec.Size, dim)
+		}
 		s.vectorSize = dim
 		return nil
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/responsecache/vecstore_qdrant.go` around lines 103 - 113, The code
currently ignores failures decoding the collection config and still sets
s.vectorSize = dim; change the logic in the block that unmarshals raw -> info
and info.Result.Config.Params.Vectors so that any decode failure or a
missing/zero vectors.size returns an error instead of falling through.
Specifically, in vecstore_qdrant.go around the json.Unmarshal of raw into info
and the subsequent unmarshal into singleVec, if the first unmarshal returns err
!= nil return a descriptive fmt.Errorf, if the second unmarshal fails or
singleVec.Size == 0 return an error, and only when singleVec.Size > 0 and equals
dim set s.vectorSize = dim (otherwise return the mismatched-size error using
s.collection and dim).

189-201: ⚠️ Potential issue | 🟠 Major

Client-side expiry pruning can return false misses.

The search asks the backend for only limit hits and then drops expired ones locally. The current call site in internal/responsecache/semantic.go always passes limit == 1, so one stale top hit makes this return a miss even when a lower-ranked live entry exists. Filter non-expired points before the backend enforces the limit, or overfetch and prune locally, and best-effort delete expired IDs you encounter.

Also applies to: 223-227

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/responsecache/vecstore_qdrant.go` around lines 189 - 201, The search
currently asks Qdrant for exactly "limit" results then prunes expired entries
locally so a stale top hit can hide a valid lower hit; update the code that
builds the search "body" and the surrounding search logic in vecstore_qdrant.go
(the map named "filter" and "body" which contains keys "vector", "limit",
"with_payload", and the params_hash clause) to avoid false misses: either add a
server-side filter clause that excludes expired points (e.g., add a condition on
your expiry payload field to be > current time) so Qdrant applies expiry before
enforcing "limit", or overfetch by requesting more results (e.g., multiply limit
by a safe factor or cap) and then locally prune expired entries and, for any
expired IDs encountered, issue a best-effort delete/remove call via your Qdrant
delete API; apply the same fix to the other search site around the second
snippet (the block noted at 223-227).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/responsecache/vecstore_qdrant.go`:
- Around line 275-283: The DeleteExpired code treats any non-2xx response as an
error; change it to treat http.StatusNotFound as a no-op (like Search does) so a
missing collection isn't returned as an error. Specifically, in the block around
the call to s.req(... "/points/delete?wait=true", body) and the subsequent
resp.StatusCode check, detect if resp.StatusCode == http.StatusNotFound and
simply close the response and return nil; otherwise keep the existing behavior
of reading the body and returning an error for other non-2xx statuses. Ensure
you still defer/close resp.Body and preserve the fmt.Errorf message for non-404
errors.

---

Duplicate comments:
In `@internal/responsecache/vecstore_qdrant.go`:
- Around line 228-234: Validate payload fields before decoding: when iterating
hits, check that hit.Payload contains "response_b64" and "cache_key" as
non-empty strings (use the map access on hit.Payload to assert both are strings
and that response_b64 != ""), and only then call base64.StdEncoding.DecodeString
on response_b64; if validation or decoding fails, skip the hit. Update the block
that builds VecResult (where out is appended) to rely on these validated values
so empty responses or missing cache_key are not appended.
- Line 45: newQdrantStore currently calls startVecCleanup(s) and spawns
background work; remove that call so construction is side-effect free and
s.cleanup remains unstarted. Instead expose an explicit lifecycle API (e.g., a
StartVecCleanup(ctx context.Context) method on the Qdrant store or return a
cleanup-start function) that callers invoke with their context; ensure
startVecCleanup accepts context and returns a typed core.GatewayError (or nil)
rather than hiding goroutines, and set s.cleanup only when the caller
successfully starts cleanup so the caller controls cancellation and lifecycle.
- Around line 103-113: The code currently ignores failures decoding the
collection config and still sets s.vectorSize = dim; change the logic in the
block that unmarshals raw -> info and info.Result.Config.Params.Vectors so that
any decode failure or a missing/zero vectors.size returns an error instead of
falling through. Specifically, in vecstore_qdrant.go around the json.Unmarshal
of raw into info and the subsequent unmarshal into singleVec, if the first
unmarshal returns err != nil return a descriptive fmt.Errorf, if the second
unmarshal fails or singleVec.Size == 0 return an error, and only when
singleVec.Size > 0 and equals dim set s.vectorSize = dim (otherwise return the
mismatched-size error using s.collection and dim).
- Around line 189-201: The search currently asks Qdrant for exactly "limit"
results then prunes expired entries locally so a stale top hit can hide a valid
lower hit; update the code that builds the search "body" and the surrounding
search logic in vecstore_qdrant.go (the map named "filter" and "body" which
contains keys "vector", "limit", "with_payload", and the params_hash clause) to
avoid false misses: either add a server-side filter clause that excludes expired
points (e.g., add a condition on your expiry payload field to be > current time)
so Qdrant applies expiry before enforcing "limit", or overfetch by requesting
more results (e.g., multiply limit by a safe factor or cap) and then locally
prune expired entries and, for any expired IDs encountered, issue a best-effort
delete/remove call via your Qdrant delete API; apply the same fix to the other
search site around the second snippet (the block noted at 223-227).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: b0c32747-afa4-4d80-a076-fb5186d5c91a

📥 Commits

Reviewing files that changed from the base of the PR and between 670e621 and 6a15378.

📒 Files selected for processing (1)
  • internal/responsecache/vecstore_qdrant.go

Comment thread internal/responsecache/vecstore_qdrant.go
giorgi-pro and others added 8 commits April 1, 2026 23:48
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
# Conflicts:
#	internal/responsecache/semantic.go
@giorgi-pro giorgi-pro merged commit 9b30694 into main Apr 3, 2026
15 checks passed
@giorgi-pro giorgi-pro deleted the chore/improve_semantich_cache branch April 3, 2026 08:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants