Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crdb: add a connection-balancing retry-aware connection pool #1294

Merged
merged 3 commits into from May 22, 2023

Conversation

ecordell
Copy link
Contributor

@ecordell ecordell commented May 1, 2023

This PR:

  • Adds a RetryPool that exposes a query interface that can correctly perform CRDB connection reset / retry logic
  • Adds machinery to balance the connections between backend CRDB nodes: a pruner that attempts to kill connections to nodes that have too many connections (allowing the pool to refill with new connections to other nodes) and a separate health check connection pool that records any new nodes it finds (to discover newly added nodes to balance over).

Together, these allow SpiceDB to have long-lived connections to CRDB (longer than the 5m maximum without this) but still survive loss of cockroach nodes (especially during an upgrade). The additional connection management doesn't require admin permission and doesn't require any additional roundtrips (contrasting with previous approaches in #1283 and #1284).

Cockroach's view of connections during a CRDB upgrade:
Screenshot 2023-05-10 at 3 05 31 PM
vs. SpiceDB's view:
Screenshot 2023-05-10 at 3 03 06 PM

API availability during the upgrade:
Screenshot 2023-05-10 at 3 03 18 PM

It's always possible that a request will be unlucky and unable to find a good connection to use before timing out, but it seems to be within acceptable levels (and is expected when we're reactive to changes like this).

Fixes #1295

@github-actions github-actions bot added area/datastore Affects the storage system area/dependencies Affects dependencies area/tooling Affects the dev or user toolchain (e.g. tests, ci, build tools) labels May 1, 2023
@ecordell ecordell force-pushed the crdb-balanced-retry branch 2 times, most recently from 5379e47 to a720a00 Compare May 3, 2023 12:23
@github-actions github-actions bot added the area/CLI Affects the command line label May 3, 2023
@ecordell ecordell force-pushed the crdb-balanced-retry branch 10 times, most recently from 8306700 to dc0b09d Compare May 6, 2023 20:35
@ecordell ecordell marked this pull request as ready for review May 6, 2023 20:35
@ecordell ecordell requested a review from a team as a code owner May 6, 2023 20:35
@josephschorr josephschorr marked this pull request as draft May 9, 2023 19:21
@ecordell ecordell force-pushed the crdb-balanced-retry branch 4 times, most recently from c73972d to eb29ad2 Compare May 10, 2023 19:06
@ecordell ecordell marked this pull request as ready for review May 10, 2023 19:24
@ecordell ecordell requested a review from vroldanbet May 11, 2023 15:19
@ecordell ecordell force-pushed the crdb-balanced-retry branch 2 times, most recently from c574f6e to 3b1e714 Compare May 12, 2023 18:01
@ecordell ecordell force-pushed the crdb-balanced-retry branch 7 times, most recently from 90539ce to 6b5924d Compare May 18, 2023 21:29
vroldanbet
vroldanbet previously approved these changes May 19, 2023
Copy link
Contributor

@vroldanbet vroldanbet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some serious load-testing of this thing, I believe this is ready to go! Fantastic work @ecordell 👏🏻 💯

@ecordell ecordell marked this pull request as draft May 19, 2023 12:14
@ecordell ecordell marked this pull request as ready for review May 19, 2023 17:03
@ecordell ecordell force-pushed the crdb-balanced-retry branch 4 times, most recently from 56113ce to 336c67e Compare May 19, 2023 17:11
internal/datastore/crdb/crdb.go Show resolved Hide resolved
internal/datastore/crdb/pool/balancer.go Outdated Show resolved Hide resolved
internal/datastore/crdb/pool/health.go Outdated Show resolved Hide resolved
internal/datastore/crdb/pool/pool.go Show resolved Hide resolved
@ecordell ecordell force-pushed the crdb-balanced-retry branch 2 times, most recently from 3f922a9 to 98b5d4a Compare May 19, 2023 18:49
@ecordell ecordell enabled auto-merge May 19, 2023 19:01
we used a "transaction factory" everywhere but never used it to create a transaction (ever since we switched to implicit transactions).

this removes the extra abstraction. it also gives the DBReader interface
a better name (since it can be used for more than just reading)
this ensures that small network blips don't inadvertently mark a node
as unhealthy, and that new nodes coming online don't get flooded with
new connections
@ecordell ecordell added this pull request to the merge queue May 22, 2023
Merged via the queue into authzed:main with commit 8d7932b May 22, 2023
25 checks passed
@ecordell ecordell deleted the crdb-balanced-retry branch May 22, 2023 15:45
@github-actions github-actions bot locked and limited conversation to collaborators May 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/CLI Affects the command line area/datastore Affects the storage system area/dependencies Affects dependencies area/tooling Affects the dev or user toolchain (e.g. tests, ci, build tools)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Survive CRDB Rollouts with long lifetimes
2 participants