Skip to content

Persist endpoint UUID for vector_search_endpoints drift detection#5127

Merged
janniklasrose merged 3 commits intomainfrom
janniklasrose/vs-endpoint-fixes
Apr 30, 2026
Merged

Persist endpoint UUID for vector_search_endpoints drift detection#5127
janniklasrose merged 3 commits intomainfrom
janniklasrose/vs-endpoint-fixes

Conversation

@janniklasrose
Copy link
Copy Markdown
Contributor

@janniklasrose janniklasrose commented Apr 29, 2026

Changes

Persist endpoint_uuid in state and detect identity drift on vector_search_endpoints.

The endpoint name is stable but its UUID changes if the endpoint is deleted and recreated by name (e.g. via the workspace UI). Without persisting the UUID:

  • The bundle silently rebound permissions to a different backing endpoint without recreating the endpoint resource.
  • Anything else referencing endpoint_uuid (most importantly the permissions object_id, but also indexes added on top in the next PR) raced the recreate.

VectorSearchEndpointState now embeds vectorsearch.CreateEndpoint and adds EndpointUuid. DoCreate records the UUID from the create response; DoUpdate copies it from entry.RemoteState so unrelated updates (e.g. min_qps) don't blank it out. OverrideChangeDesc classifies endpoint_uuid drift as Recreate when saved differs from remote, Skip otherwise.

drift/recreated_same_name flips from a "badness snapshot" (which captured the old behavior of permissions silently rebinding) to the recreate behavior, with a permissions block on the endpoint to verify the cascade rebinds correctly.

drift/min_qps/out.plan.direct.json regenerates to include the new endpoint_uuid skip entry in the detailed plan.

Why

Splitting this out of the larger vector_search_indexes PR (#5123) so it can land independently. The index PR builds on the persisted UUID for orphan detection, but the endpoint UUID work stands on its own and is useful regardless.

Tests

  • make fmtfull, make checks, make lintfull — clean.
  • make test — green (libs/apps/runlocal needed NODE_OPTIONS= for the harness leak; unrelated). bundle/internal/schema TestRequiredAnnotationsForNewFields panics, which is failing on main for unrelated reasons.
  • go test ./acceptance -run 'TestAccept/bundle/resources/vector_search_endpoints' — all green, including the flipped drift/recreated_same_name.

This PR was written by Claude Code.

…ctor_search_endpoints

Two fixes that together make `vector_search_endpoints` plan/deploy correctly
across out-of-band changes the user can trigger from the console.

1) endpoint UUID drift detection.

   The endpoint name is stable but its UUID changes if the endpoint is
   deleted and recreated with the same name. Without persisting that UUID
   the planner couldn't detect:
   - the endpoint itself being replaced out-of-band (permissions silently
     rebound to a different backing endpoint);
   - any caller that depends on endpoint_uuid (e.g. permissions object_id)
     racing the recreate.

   VectorSearchEndpointState now embeds CreateEndpoint and adds
   EndpointUuid. DoCreate records the UUID from the create response;
   DoUpdate copies it from entry.RemoteState so unrelated updates (e.g.
   min_qps) don't blank it out. OverrideChangeDesc classifies
   endpoint_uuid drift as Recreate when saved != remote, Skip otherwise.

   drift/recreated_same_name flips from a "badness snapshot" to the
   recreate behavior, with a permissions block on the endpoint to verify
   the cascade rebinds correctly.

2) ignore_remote_changes for endpoint.budget_policy_id.

   The API returns effective_budget_policy_id on Get, which folds in
   workspace-inherited policy. That value rarely matches the user-set
   budget_policy_id, so every plan was seeing drift on a field the user
   never touched. drift/budget_policy now asserts the field is correctly
   ignored.

drift/min_qps/out.plan.direct.json regenerates to include the new
endpoint_uuid skip entry in the detailed plan.

Co-authored-by: Isaac
Will be handled in an SDK update. Keeping this PR focused on the
endpoint UUID drift fix.

Co-authored-by: Isaac
@janniklasrose janniklasrose changed the title Persist endpoint UUID and ignore budget_policy_id drift on vector_search_endpoints Persist endpoint UUID for vector_search_endpoints drift detection Apr 29, 2026
@janniklasrose janniklasrose marked this pull request as ready for review April 29, 2026 15:01
@janniklasrose janniklasrose requested a review from denik April 29, 2026 15:01
"action": "skip",
"reason": "state-only field",
"old": "[UUID]",
"remote": "[UUID]"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess both old and remote point to the same uuid? Can we use add_repl.py to add replacement for exact UUID that we're replacing so that test confirms that's it's the same?

"changes": {
"endpoint_uuid": {
"action": "skip",
"reason": "state-only field",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "state-only field" correct description? It also exists on the resource itself, e.g. you can see the value in "remote".

We should probably use the same name as etag, same idea there? (not that "custom" is a good name, but makes sense to align there).

bundle/migrate/dashboards/out.plan_after_migrate.json: "etag": {
bundle/migrate/dashboards/out.plan_after_migrate.json- "action": "skip",
bundle/migrate/dashboards/out.plan_after_migrate.json- "reason": "custom",
bundle/migrate/dashboards/out.plan_after_migrate.json- "old": "[NUMID]",
bundle/migrate/dashboards/out.plan_after_migrate.json- "remote": "[NUMID]"
bundle/migrate/dashboards/out.plan_after_migrate.json- },

}
if savedUuid != "" && remoteUuid != "" && savedUuid != remoteUuid {
change.Action = deployplan.Recreate
change.Reason = "endpoint replaced out-of-band"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

~/work/cli/bundle % git grep Reason deployplan/plan.go
deployplan/plan.go:     Reason string     `json:"reason,omitempty"`
deployplan/plan.go:// Possible values for Reason field
deployplan/plan.go:     ReasonBackendDefault   = "backend_default"
deployplan/plan.go:     ReasonAlias            = "alias"
deployplan/plan.go:     ReasonRemoteAlreadySet = "remote_already_set"
deployplan/plan.go:     ReasonEmpty            = "empty"
deployplan/plan.go:     ReasonCustom           = "custom"
deployplan/plan.go:     ReasonDrop = "!drop"

We should use a constant there. I'd just reuse ReasonCustom here, same as we do for dashboard.etag, we don't need to explain too much in reason field, it's more of a debugging tool.

also, we don't need to have separate reasons for recreate and skip cases, action already distinguishes those two.

Drop custom Reason strings from OverrideChangeDesc and let the framework
default to ReasonCustom when the action is changed (matches dashboard.etag).
Register MY_ENDPOINT_UUID via add_repl.py in the min_qps drift test so the
plan output shows both old and remote rendering as the same labeled token,
proving they're the same UUID rather than two arbitrary UUIDs both masked
to [UUID] by the generic regex. Regenerate refschema out.fields.txt to
include the new endpoint_uuid STATE classification.

Co-authored-by: Isaac
@janniklasrose janniklasrose merged commit 22be781 into main Apr 30, 2026
22 of 23 checks passed
@janniklasrose janniklasrose deleted the janniklasrose/vs-endpoint-fixes branch April 30, 2026 11:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants