Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(core,grpc-sdk): rework service discovery #383

Merged
merged 8 commits into from
Oct 14, 2022
Merged

Conversation

kon14
Copy link
Contributor

@kon14 kon14 commented Oct 14, 2022

This PR reworks Conduit's service discovery.

  • Multi-instance services are not handled individually (LoadBalancer)
  • Online Services are recovered on startup
  • Unresponsive services are instantly removed from the list of exposed services
  • Reconnection to recently removed services is attempted using linear backoff
  • Services that do not provide a gRPC health check service are assumed to be healthy

Added (optional) environment variables:
SERVICE_MONITOR_INTERVAL_MS: Service discovery monitor interval in ms (default: 30000)
SERVICE_RECONN_RETRIES: Reconnection attempts before removal of offline services (default: 5)
SERVICE_RECONN_INIT_MS: Initial delay for linear backoff reconnection to offline services in ms (default: 250)

Extremely short SERVICE_MONITOR_INTERVAL_MS values are expected to fail, but are not currently prohibited.

This PR also introduces previously missing closeConnection() calls for removed modules in Core.
Modules already handled this through grpc-sdk instead as, in their case, the latter is configured to receive and act upon service updates from Core.

What kind of change does this PR introduce?

  • Bugfix
  • Feature
  • Code style update
  • Refactor
  • Build-related changes
  • Other (please describe)

Does this PR introduce a breaking change?

  • Yes
  • No

The PR fulfills these requirements:

  • It's submitted to the main branch
  • When resolving a specific issue, it's referenced in the PR's description (e.g. fix #xxx, where "xxx" is the issue number)

Specified via SERVICE_MONITOR_INTERVAL_MS.
Default: 30000ms
fix(core,grpc-sdk): core missing closeConnection() calls
feat(core): service discovery reconnection envs

Conduit Service Discovery:
-  Multi-instance services are not handled individually (talk to Load Balancer)
- Services that are Online AND Serving are recovered on startup
- Services that go Offline OR Non-Serving are instantly removed from the list of exposed services
- Reconnection to recently removed services is attempted using exponential backoff

SERVICE_RECONN_RETRIES (default: 5)
SERVICE_RECONN_INIT_MS (default: 250ms)
@kon14 kon14 marked this pull request as draft October 14, 2022 13:12
@kon14 kon14 marked this pull request as ready for review October 14, 2022 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants