Improve slave-check: Check every configured master and AXFR from mast… #4956
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
…er with highest serial
Currently PDNS as slave has a poor logic when there are multiple masters:
On incoming NOTIFY, PDNS queues the zone for refresh. Then, the SOA query uses a
random master to get the master's serial. On timeout, PDNS does not try another master.
If the SOA query was successful and the serial increased, PDNS queues the domain for
AXFR, but this AXFR uses only the first master.
Thus, if the first master is offline, even if the second master sends
NOTIFYs and has a newer zone, PDNS will never transfer it.
This patch does 2 things:
a) If a slave check for a domain is requested and the domain has multiple masters, the SOA
check will be performed against every master.
b) Further, the name server which answered with
the highest serial will be used for the AXFR.
Known issue: If there is a serial rollover, then choosing the highest serial is not the best
choice. But as soon as the rollover happend on every master the logic is correct again.
This pull request needs some more polishing (eg. replace some log messages with DLOG), but I wanted to ask first if you would accept this.
The same patch for 3.4 is running in production since ~1 year with nearly 1mio slave domains without problems.