chore(caching): improve semantic cache#198
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughResponse-cache config moved from env-tagged scalars to optional YAML pointer blocks; Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant Cache as Response Cache
participant Sem as Semantic Layer
participant Embed as Embedder (API)
participant Provider as Resolved Provider
participant Vec as Vector Store
Client->>Cache: Request
Cache->>Cache: Check exact (Redis) if simple enabled
alt Simple hit
Cache-->>Client: Return cached response
else
Cache->>Sem: Normalize input/messages, compute params hash
Sem->>Embed: Request embedding for text
Embed->>Provider: Use credential-resolved provider (base_url + /v1/embeddings)
Embed-->>Sem: Embedding vector
Sem->>Vec: Search by params_hash + similarity
alt Semantic hit
Vec-->>Sem: Return best response
Sem-->>Cache: Provide cached response
Cache-->>Client: Return cached response
else
Cache-->>Client: Forward to app
Client->>Cache: App response
Cache->>Sem: Insert vector+response with TTL
Sem->>Vec: Upsert vector (store expires_at)
Vec-->>Sem: Stored
Cache-->>Client: Return response
end
end
Note over Vec: Background cleanup deletes expired entries ~hourly
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 11
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
internal/responsecache/responsecache.go (1)
58-68:⚠️ Potential issue | 🟠 MajorTear down the simple cache on semantic-cache init errors.
If exact-cache initialization succeeds and semantic init later fails, the Redis store already held by
m.simpleis leaked. That is a common startup misconfiguration path and leaves open connections behind.♻️ Proposed fix
sem := cfg.Semantic if sem != nil && config.SemanticCacheActive(sem) { emb, err := embedding.NewEmbedder(sem.Embedder, resolvedProviders) if err != nil { + if m.simple != nil { + _ = m.simple.close() + } return nil, err } vs, err := NewVecStore(sem.VectorStore) if err != nil { _ = emb.Close() + if m.simple != nil { + _ = m.simple.close() + } return nil, err }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/responsecache/responsecache.go` around lines 58 - 68, The simple-cache (m.simple) is leaked when semantic-cache init fails; ensure you tear down m.simple on any early return during semantic initialization by calling its close/teardown method (e.g., m.simple.Close() or the appropriate shutdown method) before returning; specifically update the semantic init path around embedding.NewEmbedder, NewVecStore, and newSemanticCacheMiddleware so that if embedding.NewEmbedder(sem.Embedder, ...), NewVecStore(sem.VectorStore) or subsequent steps return an error you first close/teardown m.simple (and also close emb when created) to avoid leaving open Redis connections.internal/responsecache/semantic.go (1)
38-45:⚠️ Potential issue | 🟠 MajorHash the effective embedder identity, not the raw config.
internal/embedding.NewEmbeddertrims/defaults/normalizes the model, butembedderIdentitystill hashescfg.Embedder.Modelverbatim. That splits the semantic namespace for the same actual embedder (""vs an explicit default model, whitespace variants, Gemini normalization), and every vecstore lookup matchesparams_hashexactly. Please derive this from the resolved provider plus the effective model actually used by the embedder.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/responsecache/semantic.go` around lines 38 - 45, The embedderIdentity is currently built from cfg.Embedder.Model (in newSemanticCacheMiddleware) which can differ from the actual normalized model used by embedding.NewEmbedder; change it to derive the identity from the resolved embedder instance (the emb parameter) — i.e., obtain the embedder's effective provider and model from emb (using the embedder's API such as its resolved Provider/Model or Identity accessor) and hash/concatenate those values into semanticCacheMiddleware.embedderIdentity so lookups use the embedder's normalized identity rather than the raw config.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@config/config.go`:
- Around line 538-540: The validator currently treats zero as missing because
defaults are applied during merge, breaking intended semantics where sem.TTL ==
0 and sem.MaxConversationMessages == 0 are valid; change the semantic cache
config to track presence instead of using zero-as-missing (e.g., make TTL and
MaxConversationMessages pointer types or add explicit boolean flags like
TTLSet/MaxConversationMessagesSet) so merges preserve an explicit 0, update the
merge/constructor logic that sets defaults to only apply defaults when the
presence flag or pointer is nil, and keep the existing validator check (sem.TTL
< 0) but use sem.TTL presence to decide whether to use a default value.
- Around line 618-624: The env parsing silently ignores malformed numeric values
(e.g., REDIS_TTL_RESPONSES) causing silent fallback to defaults; update the
parsing blocks (e.g., handling REDIS_TTL_RESPONSES into RedisResponseConfig.TTL
and the other numeric envs in the 646-705 range) to validate errors from
strconv.Atoi/ParseFloat and fail-fast on parse errors—propagate or return a
startup error (or log and exit) instead of swallowing the error, and ensure
behavior is consistent with applyEnvOverrides by referencing the same
validation/failure pattern for functions/structs like RedisResponseConfig,
applyEnvOverrides and the Redis TTL/threshold/dimension env handlers.
- Around line 554-561: The environment override for
RESPONSE_CACHE_SIMPLE_ENABLED isn't being applied when resp.Simple already
exists because the merge helper skips updating the existing SimpleCacheConfig;
update the merge/override logic to explicitly set or overwrite s.Enabled on an
existing *SimpleCacheConfig based on the env var (i.e., when parsing the env
var, write the boolean into resp.Simple.Enabled even if resp.Simple != nil) so
SimpleCacheEnabled will observe the override, and apply the same change to the
other analogous merge helper referenced (the same pattern at the other
SimpleCache handling site).
In `@internal/responsecache/vecstore_cleanup.go`:
- Around line 17-43: The cleanup goroutine started by startVecCleanup currently
uses context.Background() and can block vecCleanup.close() forever; change the
lifecycle to be caller-controlled by (1) changing startVecCleanup to accept a
context (ctx) or instead expose a Run(ctx) / Start(ctx) method on vecCleanup so
the caller provides the cancelable context, (2) call store.DeleteExpired(ctx)
(not context.Background()) inside the ticker loop so the backend call is
cancellable, (3) stop creating a detached goroutine that hides work—have the
caller start the goroutine using the provided ctx or provide a Cancel/Close that
cancels the ctx so wg.Wait() cannot block, and (4) ensure any errors returned
from DeleteExpired are wrapped/returned as typed core.GatewayError (rather than
only logged) so callers can handle them; update references to
vecStoreCleanupInterval, startVecCleanup, vecCleanup.close, and DeleteExpired
accordingly.
In `@internal/responsecache/vecstore_pgvector.go`:
- Around line 53-61: The table DDL only defines PRIMARY KEY (cache_key,
params_hash) so queries that filter by params_hash (used by Search) and by
expires_at (used by DeleteExpired) cannot use efficient b-tree indexes; add
separate indexes on params_hash and expires_at (e.g., CREATE INDEX ON <table>
(params_hash); CREATE INDEX ON <table> (expires_at);) so Search and
DeleteExpired can use index scans; update the DDL building code that creates the
table (the ddl variable in vecstore_pgvector.go) to include these CREATE INDEX
statements or include them as part of migration logic.
- Around line 66-67: The constructor currently starts a detached cleanup
goroutine via startVecCleanup(s) which calls DeleteExpired(context.Background())
and only logs errors; remove the call to startVecCleanup from the constructor
(do not hide cleanup work), instead expose an explicit, caller-invokable cleanup
function or method on the returned store (e.g., VecStore.Cleanup or return the
cleanup func) that accepts a context; implement that cleanup to call
s.DeleteExpired(ctx) and propagate any error as a typed core.GatewayError rather
than silently logging it so callers control cancellation and error handling;
update references to the cleanup field (s.cleanup) and startVecCleanup to match
this new explicit, context-respecting API.
In `@internal/responsecache/vecstore_pinecone.go`:
- Around line 127-183: Search currently removes expired matches only after
Pinecone returns results in func (s *pineconeStore) Search, which can make
limit==1 miss live neighbors; fix by adding an expires_at filter to the query
body (or by over-fetching until you collect limit live results). Concretely, in
Search update the request body construction (the body variable) to include a
filter condition for expires_at > now (compute now := time.Now().Unix() and add
something like "expires_at": {"$gt": now} alongside the existing "params_hash"
filter) so Pinecone excludes expired vectors server-side; alternatively
implement over-fetching by increasing "topK" (e.g., limit*3 with a safe cap) and
in the parsed.Matches loop collect up to limit live VecResult entries before
returning. Ensure the change references body, topK, and the parsed.Matches
processing in Search.
- Around line 44-52: The constructor for pineconeStore (where pineconeStore is
created and s.cleanup = startVecCleanup(s) is invoked) must not start background
cleanup work; remove the call to startVecCleanup from the vecstore constructor
and stop spawning detached goroutines there. Instead expose a method (or return
the store without scheduling) so the owning service can explicitly start the
DeleteExpired scheduling via startVecCleanup(s) or a StartCleanup method during
service startup and call the store.Close/cleanup function during shutdown; apply
the same change pattern to the other new vecstores so cleanup scheduling is
owned by the service lifecycle and ensure returned errors follow the typed
core.GatewayError convention rather than hiding async failures.
In `@internal/responsecache/vecstore_qdrant.go`:
- Around line 76-111: In ensureCollection (method on qdrantStore) change the GET
response handling to parse the response body JSON when resp.StatusCode ==
http.StatusOK, extract the existing vectors.size (and ensure distance if
desired), compare it to dim and return an error if they differ (instead of
silently accepting); only attempt to create the collection when resp.StatusCode
== http.StatusNotFound (404); for other non-200 responses (e.g., auth/outage)
return an error including status and response body; keep updating s.vectorSize =
dim after successful validation or creation.
In `@internal/responsecache/vecstore_weaviate.go`:
- Around line 170-193: Update the GraphQL queries so they filter out expired
rows instead of relying on post-fetch filtering: in the Search method build a
where clause that includes an additional condition on "expires_at" (e.g.
operator GreaterThan / valueDateTime compared to time.Now().UTC() formatted
RFC3339) alongside the params_hash condition so expired entries are never
returned by the query; make the same change to the other query block referenced
around the 267-269 diff. Locate and modify the Search function and the other
GraphQL-builder (both of which call runGraphQLSearch) to inject the current UTC
timestamp into the where filter for expires_at.
- Around line 93-121: The current logic attempts to create the Weaviate class on
any non-200 from the GET /v1/schema/{class}; change it so only a 404 triggers
the POST create flow: after the initial GET (where resp is returned), read
resp.Body into raw and if resp.StatusCode == http.StatusOK return nil; if
resp.StatusCode == http.StatusNotFound continue to build and POST the schema
(existing req2/resp2 flow); otherwise return an error (e.g.,
fmt.Errorf("vecstore weaviate: get class: %s: %s", resp.Status, raw)) so
401/403/5xx and transient errors are surfaced instead of attempting a create.
Ensure you reference resp, resp.Body, s.class and s.baseURL when implementing
this.
---
Outside diff comments:
In `@internal/responsecache/responsecache.go`:
- Around line 58-68: The simple-cache (m.simple) is leaked when semantic-cache
init fails; ensure you tear down m.simple on any early return during semantic
initialization by calling its close/teardown method (e.g., m.simple.Close() or
the appropriate shutdown method) before returning; specifically update the
semantic init path around embedding.NewEmbedder, NewVecStore, and
newSemanticCacheMiddleware so that if embedding.NewEmbedder(sem.Embedder, ...),
NewVecStore(sem.VectorStore) or subsequent steps return an error you first
close/teardown m.simple (and also close emb when created) to avoid leaving open
Redis connections.
In `@internal/responsecache/semantic.go`:
- Around line 38-45: The embedderIdentity is currently built from
cfg.Embedder.Model (in newSemanticCacheMiddleware) which can differ from the
actual normalized model used by embedding.NewEmbedder; change it to derive the
identity from the resolved embedder instance (the emb parameter) — i.e., obtain
the embedder's effective provider and model from emb (using the embedder's API
such as its resolved Provider/Model or Identity accessor) and hash/concatenate
those values into semanticCacheMiddleware.embedderIdentity so lookups use the
embedder's normalized identity rather than the raw config.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 7e5a1c4e-d446-4536-adbf-54a07dac3d89
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (31)
.env.templateCLAUDE.mdMakefileREADME.mdconfig/cache_validation_test.goconfig/config.example.yamlconfig/config.goconfig/config_test.godocker-compose.yamldocs/adr/0006-semantic-response-cache.mdgo.modhelm/templates/configmap.yamlinternal/app/app.gointernal/embedding/embedding.gointernal/embedding/embedding_test.gointernal/providers/config.gointernal/providers/config_test.gointernal/providers/init.gointernal/responsecache/handle_request_test.gointernal/responsecache/responsecache.gointernal/responsecache/semantic.gointernal/responsecache/semantic_extract_test.gointernal/responsecache/semantic_test.gointernal/responsecache/vecstore.gointernal/responsecache/vecstore_cleanup.gointernal/responsecache/vecstore_helpers.gointernal/responsecache/vecstore_pgvector.gointernal/responsecache/vecstore_pinecone.gointernal/responsecache/vecstore_qdrant.gointernal/responsecache/vecstore_sqlite.gointernal/responsecache/vecstore_weaviate.go
💤 Files with no reviewable changes (4)
- internal/responsecache/semantic_test.go
- internal/responsecache/handle_request_test.go
- go.mod
- internal/responsecache/vecstore_sqlite.go
# Conflicts: # go.mod # go.sum # internal/app/app.go # internal/embedding/embedding.go
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/app/app.go`:
- Line 341: The review says the extra warning about missing providers is
redundant; remove any added warning/log around
responsecache.NewResponseCacheMiddleware or embedding.NewEmbedder and rely on
the existing synchronous error returned from
responsecache.NewResponseCacheMiddleware (rcm, err) which already surfaces
embedding.NewEmbedder's detailed error that includes the provider name and
troubleshooting guidance; keep the current error propagation flow using
providerResult.CredentialResolvedProviders and do not introduce an additional
log or warning for missing providers.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 144af659-e38c-40b2-9139-af171b554b46
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (9)
CLAUDE.mdMakefilego.modinternal/app/app.gointernal/app/app_test.gointernal/embedding/embedding.gointernal/responsecache/handle_request_test.gointernal/responsecache/semantic.gointernal/responsecache/semantic_test.go
💤 Files with no reviewable changes (3)
- internal/responsecache/handle_request_test.go
- internal/responsecache/semantic_test.go
- go.mod
There was a problem hiding this comment.
Actionable comments posted: 7
♻️ Duplicate comments (1)
internal/responsecache/vecstore_qdrant.go (1)
85-113:⚠️ Potential issue | 🟠 MajorHandle collection GET statuses explicitly and validate existing vector size before accepting.
Current flow treats any non-200 as create-path and accepts 200 without checking existing
vectors.size. This can mask auth/outage errors and delay dimension mismatch failure until later operations.Suggested fix
- resp, err := s.req(ctx, http.MethodGet, "/collections/"+s.collection, nil) + resp, err := s.req(ctx, http.MethodGet, "/collections/"+s.collection, nil) if err != nil { return err } - defer resp.Body.Close() - if _, err := io.Copy(io.Discard, resp.Body); err != nil { - return err - } + raw, _ := io.ReadAll(resp.Body) + resp.Body.Close() if resp.StatusCode == http.StatusOK { + var meta struct { + Result struct { + Config struct { + Params struct { + Vectors struct { + Size int `json:"size"` + } `json:"vectors"` + } `json:"params"` + } `json:"config"` + } `json:"result"` + } + if err := json.Unmarshal(raw, &meta); err != nil { + return fmt.Errorf("vecstore qdrant: decode collection: %w", err) + } + if meta.Result.Config.Params.Vectors.Size != dim { + return fmt.Errorf("vecstore qdrant: collection %q has vector size %d, expected %d", + s.collection, meta.Result.Config.Params.Vectors.Size, dim) + } s.vectorSize = dim return nil } + if resp.StatusCode != http.StatusNotFound { + return fmt.Errorf("vecstore qdrant: get collection: %s: %s", resp.Status, string(raw)) + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/responsecache/vecstore_qdrant.go` around lines 85 - 113, When initializing collection in vecstore_qdrant.go, change the GET/handle logic (the block using s.req with http.MethodGet and s.vectorSize) to explicitly handle response statuses: if 200, parse the response JSON to validate that the existing "vectors.size" equals the provided dim and return an error if it mismatches; if 404, proceed to the create-path (the subsequent http.MethodPut call) to create the collection; for any other non-200/404 status return an immediate error including resp.Status and body to surface auth/outage issues. Keep setting s.vectorSize = dim only after successful validation or successful creation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/responsecache/vecstore_qdrant.go`:
- Around line 204-210: Validate required payload fields on each hit before
decoding and appending: check that hit.Payload["response_b64"] exists and is a
non-empty string and that hit.Payload["cache_key"] is a string (and any other
required fields) before calling base64.StdEncoding.DecodeString or appending to
out; if validation fails, skip the hit. Apply this guard in the loop around the
existing code that sets b64, rb, k and appends VecResult (use the same variables
and the VecResult constructor), so you only decode and append when the types and
values are present and valid.
- Around line 199-203: The loop over parsed.Result in the Search implementation
(the code using parsed.Result, payloadInt64 and now) currently skips expired
points instead of removing them; modify it to collect expired point IDs (from
hit.ID) as you detect them (where exp > 0 && exp < now) and issue a delete call
to the Qdrant client (use the same client method used elsewhere for deletions,
e.g., DeletePoints/Delete with the collection name and the collected IDs) rather
than just continue; ensure you handle and log any delete errors but do not let
delete failures break the search response, and keep the existing
payloadInt64/parsed.Result logic and now check intact.
- Line 45: The constructor currently starts detached background work by
assigning s.cleanup = startVecCleanup(s); remove that call from the constructor
and do not spawn background goroutines there; instead add an explicit lifecycle
API (for example a method StartVecCleanup(ctx context.Context) error or
StartVecCleanup(ctx context.Context) core.GatewayError on the same type) which
invokes startVecCleanup with the provided context and returns a typed error
rather than hiding failures, and ensure you provide a corresponding Stop/Close
method to cancel the cleanup (or return a cleanup cancel function) so callers
manage context, error handling, and cancellation explicitly rather than via the
constructor.
In `@internal/responsecache/vecstore_weaviate.go`:
- Around line 127-130: Replace the hardcoded DNS namespace UUID in objectID with
a custom namespace constant specific to this cache: define a package-level
constant (e.g., cacheNamespaceUUID) holding a freshly-generated UUID and use
uuid.MustParse(cacheNamespaceUUID) in the objectID method; update the
weaviateStore.objectID function to reference that constant so the cache uses a
semantically distinct namespace instead of the RFC4122 DNS namespace.
- Around line 273-276: Add a debug/warn log when base64 decoding fails instead
of silently continuing: in the block where rb, err :=
base64.StdEncoding.DecodeString(it.ResponseB64) is attempted (use the
surrounding function handling Weaviate result items), log the error and the
offending it.ResponseB64 (or its identifier) via slog (add "log/slog" to
imports) before continuing so decode failures are visible for troubleshooting.
- Around line 277-289: The current code mixes Weaviate's legacy certainty and
modern distance semantics (score variable, it.Additional.Certainty,
it.Additional.Distance), causing inconsistent thresholds; remove the legacy
certainty fallback and standardize on distance-based scoring: compute score = 1
- distance (clamped 0..1) when it.Additional.Distance is present, and if
Distance is absent, either skip the result or set score to 0 and log a warning
so we don't mix incompatible certainty semantics into threshold logic. Update
the block referencing score / it.Additional.Certainty / it.Additional.Distance
accordingly.
- Around line 295-312: The current weaviateStore.DeleteExpired builds a GraphQL
delete mutation (unsupported) and returns plain errors; replace it to call the
REST batch delete endpoint by POST/DELETE against s.baseURL+"/v1/batch/objects"
with a JSON body containing the "match" object for class s.class and the "where"
operands (expires_at > 0 and expires_at < now), marshal that body and create an
http.NewRequestWithContext(ctx, http.MethodDelete, ... , bytes.NewReader(b)),
set Content-Type: application/json, invoke s.authHeader(req), perform the
request via s.httpClient.Do(req), read resp body, and convert all failures into
core.GatewayError using core.NewInvalidRequestError for marshal errors and
core.NewProviderError ("weaviate", appropriate HTTP status like 500 or 502,
message, err) for request/response errors; return nil on 2xx. Also add the
"gomodel/core" import.
---
Duplicate comments:
In `@internal/responsecache/vecstore_qdrant.go`:
- Around line 85-113: When initializing collection in vecstore_qdrant.go, change
the GET/handle logic (the block using s.req with http.MethodGet and
s.vectorSize) to explicitly handle response statuses: if 200, parse the response
JSON to validate that the existing "vectors.size" equals the provided dim and
return an error if it mismatches; if 404, proceed to the create-path (the
subsequent http.MethodPut call) to create the collection; for any other
non-200/404 status return an immediate error including resp.Status and body to
surface auth/outage issues. Keep setting s.vectorSize = dim only after
successful validation or successful creation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: fd2abc8f-566c-4ec5-9d2d-33992d32557b
📒 Files selected for processing (2)
internal/responsecache/vecstore_qdrant.gointernal/responsecache/vecstore_weaviate.go
# Conflicts: # internal/app/app.go # internal/responsecache/responsecache.go # internal/responsecache/semantic.go # internal/responsecache/semantic_test.go
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (4)
internal/responsecache/vecstore_weaviate.go (2)
292-304:⚠️ Potential issue | 🟠 MajorDon't mix
certaintywith distance-derived scores.Weaviate documents
distanceas the preferred vector metric andcertaintyas a cosine-only normalized compatibility field. Using whichever one happens to be present means the same hit can cross your semantic threshold differently depending on server behavior; normalize fromdistanceonly, or reject results that don't return it. (docs.weaviate.io)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/responsecache/vecstore_weaviate.go` around lines 292 - 304, The code mixes Weaviate's `certainty` with `distance` when computing `score` (see `score` assignment and `it.Additional.Certainty` / `it.Additional.Distance`); change this to rely exclusively on `distance`: if `it.Additional.Distance` is present, compute score = 1 - float32(*it.Additional.Distance) and clamp to [0,1]; if `it.Additional.Distance` is nil, treat the hit as invalid (skip it or return an error) instead of using `certainty`, ensuring downstream logic only sees scores derived from distance.
310-357:⚠️ Potential issue | 🔴 CriticalMove expiry cleanup off the GraphQL path.
The current Weaviate docs describe
/v1/graphqlas the query/search surface, while object deletion is documented under the object-management APIs. Posting aDeletemutation here makes expiry cleanup depend on an undocumented path instead of the supported delete API, so stale cache rows are likely to accumulate once cleanup starts running. (docs.weaviate.io)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/responsecache/vecstore_weaviate.go` around lines 310 - 357, The DeleteExpired function currently issues a GraphQL Delete mutation (in weaviateStore.DeleteExpired) which uses /v1/graphql; change it to use Weaviate's object-management REST API instead: query for expired object IDs using the Objects API (filter by class s.class and expires_at < now) and then delete them via the object-management delete endpoint (DELETE /v1/objects/{id} or use the batch delete endpoint if available), calling s.authHeader(req) and s.httpClient for each request, handling non-2xx responses and JSON error envelopes the same way as before; update the function to build and issue the REST query + per-ID deletions (or batch deletion), collect and return any errors, and remove the GraphQL mutation construction and GraphQL POST to s.baseURL+"/v1/graphql".internal/responsecache/vecstore_qdrant.go (2)
103-113:⚠️ Potential issue | 🟠 MajorReject existing collections you can't validate.
If the GET response can't be decoded, or
vectorsisn't the expected single-vector shape, this branch still setss.vectorSize = dimand returns success. That allows an incompatible pre-existing collection through startup and defers the failure to the first upsert/search.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/responsecache/vecstore_qdrant.go` around lines 103 - 113, The code currently sets s.vectorSize = dim even when the GET response or the vectors field cannot be decoded; change this so we reject the collection when validation fails: when json.Unmarshal(raw, &info) returns an error return a descriptive error (including s.collection and the unmarshal error), and if json.Unmarshal(info.Result.Config.Params.Vectors, &singleVec) fails or singleVec.Size <= 0 return a descriptive error indicating we could not validate the collection's vectors; only set s.vectorSize = dim after successfully validating singleVec.Size and confirming it matches dim (or after validating it in the existing comparisons).
189-226:⚠️ Potential issue | 🟠 MajorExpired nearest neighbors can still force false misses.
semantic.gocallsSearch(..., 1), but this query filters only onparams_hashand drops expired hits after the fetch. If the top hit is expired, you return an empty result even when the next live vector would satisfy the threshold. Add expiry to the Qdrant filter or over-fetch until you collectlimitlive hits.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/responsecache/vecstore_qdrant.go` around lines 189 - 226, The search currently filters only by params_hash and drops expired hits after fetch, which can let an expired top hit hide valid neighbors; update the Qdrant query to filter out expired points server-side by adding an expires_at range condition (e.g. include a "range" clause on "expires_at" with "gte": now) to the filter map used when building body (refer to paramsHash and the body variable) so the response's parsed.Result returns only non-expired points; alternatively implement over-fetching in the same Search routine by requesting more than limit and looping (using s.req, qdrantSearchResp/parsed.Result and payloadInt64 to skip expired entries) until you collect limit live VecResult entries or exhaust results.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@config/config.go`:
- Around line 651-654: The block that calls os.Getenv("SEMANTIC_CACHE_ENABLED")
and re-parses it is redundant because the same variable was already retrieved
and parsed into v and enabledKeySet earlier; remove the second getenv branch and
instead set sem.Enabled using the previously computed value (use enabledKeySet
to check presence and reuse the boolean parsed from v via parseBool), updating
the sem.Enabled pointer accordingly—refer to the variables enabledKeySet, v,
parseBool and the sem.Enabled assignment in the surrounding code to locate and
apply the change.
In `@internal/responsecache/semantic.go`:
- Around line 436-439: The code currently writes req.Reasoning bytes directly
into the hasher (h.Write(req.Reasoning)), which preserves whitespace and key
order; instead, parse req.Reasoning into an interface{} (or
map[string]interface{}), re-marshal it with json.Marshal to produce a stable
canonical representation, and write that marshaled bytes to h via h.Write; if
json.Unmarshal fails, fall back to writing the original req.Reasoning to avoid
breaking behavior. Apply this change in the block that currently checks
len(req.Reasoning) and calls h.Write, so the hashing uses the re-marshaled JSON.
---
Duplicate comments:
In `@internal/responsecache/vecstore_qdrant.go`:
- Around line 103-113: The code currently sets s.vectorSize = dim even when the
GET response or the vectors field cannot be decoded; change this so we reject
the collection when validation fails: when json.Unmarshal(raw, &info) returns an
error return a descriptive error (including s.collection and the unmarshal
error), and if json.Unmarshal(info.Result.Config.Params.Vectors, &singleVec)
fails or singleVec.Size <= 0 return a descriptive error indicating we could not
validate the collection's vectors; only set s.vectorSize = dim after
successfully validating singleVec.Size and confirming it matches dim (or after
validating it in the existing comparisons).
- Around line 189-226: The search currently filters only by params_hash and
drops expired hits after fetch, which can let an expired top hit hide valid
neighbors; update the Qdrant query to filter out expired points server-side by
adding an expires_at range condition (e.g. include a "range" clause on
"expires_at" with "gte": now) to the filter map used when building body (refer
to paramsHash and the body variable) so the response's parsed.Result returns
only non-expired points; alternatively implement over-fetching in the same
Search routine by requesting more than limit and looping (using s.req,
qdrantSearchResp/parsed.Result and payloadInt64 to skip expired entries) until
you collect limit live VecResult entries or exhaust results.
In `@internal/responsecache/vecstore_weaviate.go`:
- Around line 292-304: The code mixes Weaviate's `certainty` with `distance`
when computing `score` (see `score` assignment and `it.Additional.Certainty` /
`it.Additional.Distance`); change this to rely exclusively on `distance`: if
`it.Additional.Distance` is present, compute score = 1 -
float32(*it.Additional.Distance) and clamp to [0,1]; if `it.Additional.Distance`
is nil, treat the hit as invalid (skip it or return an error) instead of using
`certainty`, ensuring downstream logic only sees scores derived from distance.
- Around line 310-357: The DeleteExpired function currently issues a GraphQL
Delete mutation (in weaviateStore.DeleteExpired) which uses /v1/graphql; change
it to use Weaviate's object-management REST API instead: query for expired
object IDs using the Objects API (filter by class s.class and expires_at < now)
and then delete them via the object-management delete endpoint (DELETE
/v1/objects/{id} or use the batch delete endpoint if available), calling
s.authHeader(req) and s.httpClient for each request, handling non-2xx responses
and JSON error envelopes the same way as before; update the function to build
and issue the REST query + per-ID deletions (or batch deletion), collect and
return any errors, and remove the GraphQL mutation construction and GraphQL POST
to s.baseURL+"/v1/graphql".
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: a514c3c2-22d5-48df-be3e-10559d5ec0fc
📒 Files selected for processing (14)
config/cache_validation_test.goconfig/config.gointernal/app/app.gointernal/app/app_test.gointernal/embedding/embedding.gointernal/responsecache/handle_request_test.gointernal/responsecache/responsecache.gointernal/responsecache/semantic.gointernal/responsecache/semantic_test.gointernal/responsecache/vecstore_cleanup.gointernal/responsecache/vecstore_pgvector.gointernal/responsecache/vecstore_pinecone.gointernal/responsecache/vecstore_qdrant.gointernal/responsecache/vecstore_weaviate.go
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (4)
internal/responsecache/vecstore_qdrant.go (4)
228-234:⚠️ Potential issue | 🟡 MinorRequire payload fields before treating a hit as valid.
If
response_b64is missing,DecodeString("")succeeds and appends an empty cached response; missingcache_keybecomes""as well. Skip the hit unless both fields exist, are strings, andresponse_b64is non-empty.Suggested fix
- b64, _ := hit.Payload["response_b64"].(string) - rb, err := base64.StdEncoding.DecodeString(b64) + b64, ok := hit.Payload["response_b64"].(string) + if !ok || b64 == "" { + continue + } + k, ok := hit.Payload["cache_key"].(string) + if !ok || k == "" { + continue + } + rb, err := base64.StdEncoding.DecodeString(b64) if err != nil { continue } - k, _ := hit.Payload["cache_key"].(string) out = append(out, VecResult{ Key: k, Score: float32(hit.Score), Response: rb,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/responsecache/vecstore_qdrant.go` around lines 228 - 234, Validate payload fields before decoding: when iterating hits, check that hit.Payload contains "response_b64" and "cache_key" as non-empty strings (use the map access on hit.Payload to assert both are strings and that response_b64 != ""), and only then call base64.StdEncoding.DecodeString on response_b64; if validation or decoding fails, skip the hit. Update the block that builds VecResult (where out is appended) to rely on these validated values so empty responses or missing cache_key are not appended.
45-45:⚠️ Potential issue | 🟠 MajorDo not start cleanup goroutines from
newQdrantStore.
startVecCleanup(s)hides background work inside object construction, so callers cannot bind cleanup to their own context or decide when it should start. Move this to an explicit caller-managed lifecycle step instead of spawning it here.As per coding guidelines "Do not hide work in detached goroutines; respect context synchronously and return typed
core.GatewayErrorvalues".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/responsecache/vecstore_qdrant.go` at line 45, newQdrantStore currently calls startVecCleanup(s) and spawns background work; remove that call so construction is side-effect free and s.cleanup remains unstarted. Instead expose an explicit lifecycle API (e.g., a StartVecCleanup(ctx context.Context) method on the Qdrant store or return a cleanup-start function) that callers invoke with their context; ensure startVecCleanup accepts context and returns a typed core.GatewayError (or nil) rather than hiding goroutines, and set s.cleanup only when the caller successfully starts cleanup so the caller controls cancellation and lifecycle.
103-113:⚠️ Potential issue | 🟠 MajorFail fast when the existing collection config cannot be decoded.
On
200 OK, any decode failure or missingvectors.sizestill falls through tos.vectorSize = dim. That still accepts incompatible pre-existing collections and defers the breakage to later search/upsert calls. Return an error unless the current vector size is actually decoded and matchesdim.Suggested fix
- if err := json.Unmarshal(raw, &info); err == nil { - var singleVec struct { - Size int `json:"size"` - } - if json.Unmarshal(info.Result.Config.Params.Vectors, &singleVec) == nil && singleVec.Size > 0 { - if singleVec.Size != dim { - return fmt.Errorf("vecstore qdrant: collection %q has vector size %d but embedder produces %d", s.collection, singleVec.Size, dim) - } - } - } + if err := json.Unmarshal(raw, &info); err != nil { + return fmt.Errorf("vecstore qdrant: decode collection: %w", err) + } + var singleVec struct { + Size int `json:"size"` + } + if err := json.Unmarshal(info.Result.Config.Params.Vectors, &singleVec); err != nil || singleVec.Size == 0 { + return fmt.Errorf("vecstore qdrant: collection %q returned an unsupported vector config", s.collection) + } + if singleVec.Size != dim { + return fmt.Errorf("vecstore qdrant: collection %q has vector size %d but embedder produces %d", s.collection, singleVec.Size, dim) + } s.vectorSize = dim return nil🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/responsecache/vecstore_qdrant.go` around lines 103 - 113, The code currently ignores failures decoding the collection config and still sets s.vectorSize = dim; change the logic in the block that unmarshals raw -> info and info.Result.Config.Params.Vectors so that any decode failure or a missing/zero vectors.size returns an error instead of falling through. Specifically, in vecstore_qdrant.go around the json.Unmarshal of raw into info and the subsequent unmarshal into singleVec, if the first unmarshal returns err != nil return a descriptive fmt.Errorf, if the second unmarshal fails or singleVec.Size == 0 return an error, and only when singleVec.Size > 0 and equals dim set s.vectorSize = dim (otherwise return the mismatched-size error using s.collection and dim).
189-201:⚠️ Potential issue | 🟠 MajorClient-side expiry pruning can return false misses.
The search asks the backend for only
limithits and then drops expired ones locally. The current call site ininternal/responsecache/semantic.goalways passeslimit == 1, so one stale top hit makes this return a miss even when a lower-ranked live entry exists. Filter non-expired points before the backend enforces the limit, or overfetch and prune locally, and best-effort delete expired IDs you encounter.Also applies to: 223-227
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/responsecache/vecstore_qdrant.go` around lines 189 - 201, The search currently asks Qdrant for exactly "limit" results then prunes expired entries locally so a stale top hit can hide a valid lower hit; update the code that builds the search "body" and the surrounding search logic in vecstore_qdrant.go (the map named "filter" and "body" which contains keys "vector", "limit", "with_payload", and the params_hash clause) to avoid false misses: either add a server-side filter clause that excludes expired points (e.g., add a condition on your expiry payload field to be > current time) so Qdrant applies expiry before enforcing "limit", or overfetch by requesting more results (e.g., multiply limit by a safe factor or cap) and then locally prune expired entries and, for any expired IDs encountered, issue a best-effort delete/remove call via your Qdrant delete API; apply the same fix to the other search site around the second snippet (the block noted at 223-227).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/responsecache/vecstore_qdrant.go`:
- Around line 275-283: The DeleteExpired code treats any non-2xx response as an
error; change it to treat http.StatusNotFound as a no-op (like Search does) so a
missing collection isn't returned as an error. Specifically, in the block around
the call to s.req(... "/points/delete?wait=true", body) and the subsequent
resp.StatusCode check, detect if resp.StatusCode == http.StatusNotFound and
simply close the response and return nil; otherwise keep the existing behavior
of reading the body and returning an error for other non-2xx statuses. Ensure
you still defer/close resp.Body and preserve the fmt.Errorf message for non-404
errors.
---
Duplicate comments:
In `@internal/responsecache/vecstore_qdrant.go`:
- Around line 228-234: Validate payload fields before decoding: when iterating
hits, check that hit.Payload contains "response_b64" and "cache_key" as
non-empty strings (use the map access on hit.Payload to assert both are strings
and that response_b64 != ""), and only then call base64.StdEncoding.DecodeString
on response_b64; if validation or decoding fails, skip the hit. Update the block
that builds VecResult (where out is appended) to rely on these validated values
so empty responses or missing cache_key are not appended.
- Line 45: newQdrantStore currently calls startVecCleanup(s) and spawns
background work; remove that call so construction is side-effect free and
s.cleanup remains unstarted. Instead expose an explicit lifecycle API (e.g., a
StartVecCleanup(ctx context.Context) method on the Qdrant store or return a
cleanup-start function) that callers invoke with their context; ensure
startVecCleanup accepts context and returns a typed core.GatewayError (or nil)
rather than hiding goroutines, and set s.cleanup only when the caller
successfully starts cleanup so the caller controls cancellation and lifecycle.
- Around line 103-113: The code currently ignores failures decoding the
collection config and still sets s.vectorSize = dim; change the logic in the
block that unmarshals raw -> info and info.Result.Config.Params.Vectors so that
any decode failure or a missing/zero vectors.size returns an error instead of
falling through. Specifically, in vecstore_qdrant.go around the json.Unmarshal
of raw into info and the subsequent unmarshal into singleVec, if the first
unmarshal returns err != nil return a descriptive fmt.Errorf, if the second
unmarshal fails or singleVec.Size == 0 return an error, and only when
singleVec.Size > 0 and equals dim set s.vectorSize = dim (otherwise return the
mismatched-size error using s.collection and dim).
- Around line 189-201: The search currently asks Qdrant for exactly "limit"
results then prunes expired entries locally so a stale top hit can hide a valid
lower hit; update the code that builds the search "body" and the surrounding
search logic in vecstore_qdrant.go (the map named "filter" and "body" which
contains keys "vector", "limit", "with_payload", and the params_hash clause) to
avoid false misses: either add a server-side filter clause that excludes expired
points (e.g., add a condition on your expiry payload field to be > current time)
so Qdrant applies expiry before enforcing "limit", or overfetch by requesting
more results (e.g., multiply limit by a safe factor or cap) and then locally
prune expired entries and, for any expired IDs encountered, issue a best-effort
delete/remove call via your Qdrant delete API; apply the same fix to the other
search site around the second snippet (the block noted at 223-227).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: b0c32747-afa4-4d80-a076-fb5186d5c91a
📒 Files selected for processing (1)
internal/responsecache/vecstore_qdrant.go
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
# Conflicts: # internal/responsecache/semantic.go
Semantic cache: external vector backends (replace sqlite-vec)
Summary
Replaces sqlite-vec with explicit external vector backends for the semantic response cache: Qdrant, pgvector, Pinecone, and Weaviate. Semantic caching now requires a non-empty
vector_store.typeand the matching nested config; there is no embedded default store.Changes
config/config.go): RemovedSQLiteVec; extendedVectorStoreConfigwithPineconeandWeaviate; pgvector gains requireddimension; validation branches pertype;mergeSemanticResponseDefaultsno longer defaults to sqlite-vec; env addsSEMANTIC_CACHE_*for pgvector/pinecone/weaviate and dropsSEMANTIC_CACHE_SQLITE_PATH.internal/responsecache/): Deleted sqlite-vec implementations; added Qdrant (REST, lazy collection, cosine), pgvector (pgx + extension/table), Pinecone (data-plane HTTP, metadata size guard), Weaviate (REST + GraphQL search); shared hourlyDeleteExpiredticker via cleanup helper;NewVecStoreswitches on the four types only.go mod tidy— removed sqlite-vec / unused sqlite driver deps tied to semantic cache (storage still usesmodernc.org/sqlitewhere applicable).configtests andcache_validation_testupdated for the new model;config.example.yamldocuments all four backends (commented alternatives); checked-inconfig.yamlaligned if it had semantic/sqlite paths.CredentialResolvedProviders/base_url, vector matrix, TTL/cleanup, and consequences (no sqlite-vec / zero-infra story); CLAUDE.md / README note four backends and explicitvector_store.type; README roadmap treats semantic cache as shipped where appropriate.Summary by CodeRabbit
New Features
Infrastructure
Documentation
Tests