Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-endpoint feature #135

Merged
merged 11 commits into from Jul 12, 2022
Merged

Multi-endpoint feature #135

merged 11 commits into from Jul 12, 2022

Conversation

nimf
Copy link
Collaborator

@nimf nimf commented Jul 1, 2022

The purpose of GcpMultiEndpointChannel is twofold:

  1. Fallback to an alternative endpoint (host:port) of a gRPC service when the original endpoint is completely unavailable.
  2. Be able to route an RPC call to a specific group of endpoints.

A group of endpoints is called a MultiEndpoint and is essentially a list of endpoints where priority is defined by the position in the list with the first endpoint having top priority. A MultiEndpoint tracks endpoints' availability. When a MultiEndpoint is picked for an RPC call, it picks the top priority endpoint that is currently available. More information on the MultiEndpoint class.

GcpMultiEndpointChannel can have one or more MultiEndpoint identified by its name -- arbitrary string provided in the GcpMultiEndpointOptions when configuring MultiEndpoints. This name can be used to route an RPC call to this MultiEndpoint by setting the ME_KEY key value of the RPC CallOptions.

GcpMultiEndpointChannel receives a list of GcpMultiEndpointOptions for initial configuration. An updated configuration can be provided at any time later using setMultiEndpoints(List). The first item in the GcpMultiEndpointOptions list defines the default MultiEndpoint that will be used when no MultiEndpoint name is provided with an RPC call.

Example configuration:

  • MultiEndpoint named "default" with endpoints:
  1. service.example.com:443
  2. service-fallback.example.com:443
  • MultiEndpoint named "read" with endpoints:
  1. ro-service.example.com:443
  2. service-fallback.example.com:443
  3. service.example.com:443

Let's assume we have a service with read and write operations and the following backends:

  • service.example.com -- the main set of backends supporting all operations
  • service-fallback.example.com -- read-write replica supporting all operations
  • ro-service.example.com -- read-only replica supporting only read operations

With the configuration above GcpMultiEndpointChannel will use the "default" MultiEndpoint by default. It means that RPC calls by default will use the main endpoint and if it is not available then the read-write replica.

To offload some read calls to the read-only replica we can specify "read" MultiEndpoint in the CallOptions. Then these calls will use the read-only replica endpoint and if it is not available then the read-write replica and if it is also not available then the main endpoint.

GcpMultiEndpointChannel creates a GcpManagedChannel channel pool for every unique endpoint. For the example above three channel pools will be created.

@nimf nimf requested a review from mohanli-ml July 1, 2022 19:27
Copy link
Collaborator

@mohanli-ml mohanli-ml left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Please take a look at the comments, and let me know if you have any questions.

@nimf nimf merged commit b7e0ef2 into master Jul 12, 2022
Copy link
Contributor

@wenbozhu wenbozhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG overall.

One thing I was looking for is a way to define a pluggable module to monitor the original endpoint and the underlying mechanism, e.g. /gen204 etc. The implementation will be subject to change.

@nimf
Copy link
Collaborator Author

nimf commented Jul 13, 2022

@wenbozhu I found we don't need to make /gen204 requests as we set up a minimum number of channels to be always connected. This replaces the need of /gen204 because a few HTTP2 connections are always alive to every endpoint. If all connections to an endpoint break, the endpoint is considered unavailable and the minimum number of channels/connections try to reconnect until connection is successful and thus treating the endpoint as available again. Does it make sense?

@wenbozhu
Copy link
Contributor

Are those H2 connections gRPC channels? Do we rely on H2 PING to keep the connections alive, and what's the (default) interval?
Given we will maintain a few (default?) the interval could be increased (maybe by 1/2 of the total number of connections) if we are concerned about the overhead.

@nimf
Copy link
Collaborator Author

nimf commented Jul 14, 2022

Yes those are gRPC channels. We do rely on http2 ping keepalive but if there are no calls then it won’t send pings because keepalive_permit_without_calls is expected to be false. So for the few connections on an unused endpoint no pings will be sent.

@wenbozhu
Copy link
Contributor

Let's discuss this offline. Those channels (to the backup endpoint or to a broken primary endpoint) will not have active requests, so some form of keepalive pings are needed for the purpose of detecting the health of the endpoint.

@rahul2393
Copy link

@nimf Any ETA when this will be available in Go https://github.com/GoogleCloudPlatform/grpc-gcp-go.

@nimf
Copy link
Collaborator Author

nimf commented Oct 27, 2022

@rahul2393 preliminary ETA is somewhere in Q1 2023

@nimf nimf deleted the multi-endpoint branch November 8, 2023 00:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants