Alternate implementation to support workers that are referenced by host names rather than by ip addresses#2593
Open
Alternate implementation to support workers that are referenced by host names rather than by ip addresses#2593
Conversation
… may resolve to different ip addresses during the worker lifespan.
Member
|
@quasiben if you have a moment can I ask you to take a look at this and share your opinion? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
As the title says, this is alternate implementation for the pull request #2590.
In this variation, there are some subtle changes. The scheduler aliases is checked prior to the worker being registered so that if that fails, the worker remains unregistered with the scheduler so that latter attempts to yield "worker is already registered" when in fact it didn't fully register.
Another change is to not use alias for address resolution (coerce_address) as worker would always be referenced to the first registration ip address.
To support this, the worker attempt to register with the scheduler must be robust enough to fail registration so that re-attempts can be made. Currently a scheduler failure during worker registration, will forever prevents the worker from registering with the scheduler. In my testing, hostname lookup via ensure_ip, failed as dns hadn't propagated, thus the worker become zombied, still running but no longer registering with the scheduler.