Persist endpoint UUID for vector_search_endpoints drift detection#5127
Persist endpoint UUID for vector_search_endpoints drift detection#5127janniklasrose merged 3 commits intomainfrom
Conversation
…ctor_search_endpoints
Two fixes that together make `vector_search_endpoints` plan/deploy correctly
across out-of-band changes the user can trigger from the console.
1) endpoint UUID drift detection.
The endpoint name is stable but its UUID changes if the endpoint is
deleted and recreated with the same name. Without persisting that UUID
the planner couldn't detect:
- the endpoint itself being replaced out-of-band (permissions silently
rebound to a different backing endpoint);
- any caller that depends on endpoint_uuid (e.g. permissions object_id)
racing the recreate.
VectorSearchEndpointState now embeds CreateEndpoint and adds
EndpointUuid. DoCreate records the UUID from the create response;
DoUpdate copies it from entry.RemoteState so unrelated updates (e.g.
min_qps) don't blank it out. OverrideChangeDesc classifies
endpoint_uuid drift as Recreate when saved != remote, Skip otherwise.
drift/recreated_same_name flips from a "badness snapshot" to the
recreate behavior, with a permissions block on the endpoint to verify
the cascade rebinds correctly.
2) ignore_remote_changes for endpoint.budget_policy_id.
The API returns effective_budget_policy_id on Get, which folds in
workspace-inherited policy. That value rarely matches the user-set
budget_policy_id, so every plan was seeing drift on a field the user
never touched. drift/budget_policy now asserts the field is correctly
ignored.
drift/min_qps/out.plan.direct.json regenerates to include the new
endpoint_uuid skip entry in the detailed plan.
Co-authored-by: Isaac
Will be handled in an SDK update. Keeping this PR focused on the endpoint UUID drift fix. Co-authored-by: Isaac
| "action": "skip", | ||
| "reason": "state-only field", | ||
| "old": "[UUID]", | ||
| "remote": "[UUID]" |
There was a problem hiding this comment.
I guess both old and remote point to the same uuid? Can we use add_repl.py to add replacement for exact UUID that we're replacing so that test confirms that's it's the same?
| "changes": { | ||
| "endpoint_uuid": { | ||
| "action": "skip", | ||
| "reason": "state-only field", |
There was a problem hiding this comment.
Is "state-only field" correct description? It also exists on the resource itself, e.g. you can see the value in "remote".
We should probably use the same name as etag, same idea there? (not that "custom" is a good name, but makes sense to align there).
bundle/migrate/dashboards/out.plan_after_migrate.json: "etag": {
bundle/migrate/dashboards/out.plan_after_migrate.json- "action": "skip",
bundle/migrate/dashboards/out.plan_after_migrate.json- "reason": "custom",
bundle/migrate/dashboards/out.plan_after_migrate.json- "old": "[NUMID]",
bundle/migrate/dashboards/out.plan_after_migrate.json- "remote": "[NUMID]"
bundle/migrate/dashboards/out.plan_after_migrate.json- },
| } | ||
| if savedUuid != "" && remoteUuid != "" && savedUuid != remoteUuid { | ||
| change.Action = deployplan.Recreate | ||
| change.Reason = "endpoint replaced out-of-band" |
There was a problem hiding this comment.
~/work/cli/bundle % git grep Reason deployplan/plan.go
deployplan/plan.go: Reason string `json:"reason,omitempty"`
deployplan/plan.go:// Possible values for Reason field
deployplan/plan.go: ReasonBackendDefault = "backend_default"
deployplan/plan.go: ReasonAlias = "alias"
deployplan/plan.go: ReasonRemoteAlreadySet = "remote_already_set"
deployplan/plan.go: ReasonEmpty = "empty"
deployplan/plan.go: ReasonCustom = "custom"
deployplan/plan.go: ReasonDrop = "!drop"
We should use a constant there. I'd just reuse ReasonCustom here, same as we do for dashboard.etag, we don't need to explain too much in reason field, it's more of a debugging tool.
also, we don't need to have separate reasons for recreate and skip cases, action already distinguishes those two.
Drop custom Reason strings from OverrideChangeDesc and let the framework default to ReasonCustom when the action is changed (matches dashboard.etag). Register MY_ENDPOINT_UUID via add_repl.py in the min_qps drift test so the plan output shows both old and remote rendering as the same labeled token, proving they're the same UUID rather than two arbitrary UUIDs both masked to [UUID] by the generic regex. Regenerate refschema out.fields.txt to include the new endpoint_uuid STATE classification. Co-authored-by: Isaac
Changes
Persist
endpoint_uuidin state and detect identity drift onvector_search_endpoints.The endpoint name is stable but its UUID changes if the endpoint is deleted and recreated by name (e.g. via the workspace UI). Without persisting the UUID:
endpoint_uuid(most importantly the permissions object_id, but also indexes added on top in the next PR) raced the recreate.VectorSearchEndpointStatenow embedsvectorsearch.CreateEndpointand addsEndpointUuid.DoCreaterecords the UUID from the create response;DoUpdatecopies it fromentry.RemoteStateso unrelated updates (e.g.min_qps) don't blank it out.OverrideChangeDescclassifiesendpoint_uuiddrift asRecreatewhen saved differs from remote,Skipotherwise.drift/recreated_same_nameflips from a "badness snapshot" (which captured the old behavior of permissions silently rebinding) to the recreate behavior, with a permissions block on the endpoint to verify the cascade rebinds correctly.drift/min_qps/out.plan.direct.jsonregenerates to include the newendpoint_uuidskip entry in the detailed plan.Why
Splitting this out of the larger
vector_search_indexesPR (#5123) so it can land independently. The index PR builds on the persisted UUID for orphan detection, but the endpoint UUID work stands on its own and is useful regardless.Tests
make fmtfull,make checks,make lintfull— clean.make test— green (libs/apps/runlocalneededNODE_OPTIONS=for the harness leak; unrelated).bundle/internal/schema TestRequiredAnnotationsForNewFieldspanics, which is failing onmainfor unrelated reasons.go test ./acceptance -run 'TestAccept/bundle/resources/vector_search_endpoints'— all green, including the flippeddrift/recreated_same_name.This PR was written by Claude Code.