release-25.2: sqlinstance: deduplicate live rows by rpcAddr#169217
Open
blathers-crl[bot] wants to merge 1 commit intorelease-25.2from
Open
release-25.2: sqlinstance: deduplicate live rows by rpcAddr#169217blathers-crl[bot] wants to merge 1 commit intorelease-25.2from
blathers-crl[bot] wants to merge 1 commit intorelease-25.2from
Conversation
selectDistinctLiveRows was deduplicating live SQL instance rows by sqlAddr (the SQL advertise address). This was the wrong key: the SQL advertise address (--sql-advertise-addr) is allowed to be non-unique across nodes — Kubernetes deployments commonly point it at a regional service DNS shared by all pods in the region. Deduplication on sqlAddr silently collapsed distinct live instances down to one entry, breaking downstream consumers like DistSQL placement and execution-locality filtering. Switch the dedup key to rpcAddr. RPC advertise addresses must be node-unique among live instances — gossip and KV peer dialing depend on this — so rpcAddr is the correct identity for the pod-restart race that motivates dedup in the first place (a SQL pod crashes and a new pod starts at the same rpcAddr with a fresh instance ID before the dead pod's session expires). The wrong key was introduced when the RPC and SQL listen ports were split; before the split there was a single advertise address and the choice was unambiguous. Resolves: #168991 Epic: CRDB-63207 Release note (bug fix): Fixed a bug where setting `--advertise-sql-addr` to the same value across multiple SQL instances — for example, in a Kubernetes deployment where all pods in a region share a regional service DNS name — caused distributed SQL query plans to place onto a single instance per region, and could cause changefeeds with execution_locality filters to fail with "no instances found matching locality filter".
2879d02 to
6f4cb47
Compare
Contributor
|
Merging to
After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here |
Author
|
Thanks for opening a backport. Before merging, please confirm that it falls into one of the following categories (select one):
Add a brief release justification to the PR description explaining your selection. Also, confirm that the change does not break backward compatibility and complies with all aspects of the backport policy. All backports must be reviewed by the TL and EM for the owning area. |
Member
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Backport 1/1 commits from #169043 on behalf of @shubhamdhama.
selectDistinctLiveRows was deduplicating live SQL instance rows by sqlAddr (the SQL advertise address). This was the wrong key: the SQL advertise address (--sql-advertise-addr) is allowed to be non-unique across nodes — Kubernetes deployments commonly point it at a regional service DNS shared by all pods in the region. Deduplication on sqlAddr silently collapsed distinct live instances down to one entry, breaking downstream consumers like DistSQL placement and execution-locality filtering.
Switch the dedup key to rpcAddr. RPC advertise addresses must be node-unique among live instances — gossip and KV peer dialing depend on this — so rpcAddr is the correct identity for the pod-restart race that motivates dedup in the first place (a SQL pod crashes and a new pod starts at the same rpcAddr with a fresh instance ID before the dead pod's session expires).
The wrong key was introduced when the RPC and SQL listen ports were split; before the split there was a single advertise address and the choice was unambiguous.
Resolves: #168991
Epic: CRDB-63207
Release note (bug fix): Fixed a bug where setting
--advertise-sql-addrto the same value across multiple SQL instances — for example, in a Kubernetes deployment where all pods in a region share a regional service DNS name — caused distributed SQL query plans to place onto a single instance per region, and could cause changefeeds with execution_locality filters to fail with "no instances found matching locality filter".Release justification: