New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auth: Incremental backoff for failed slave checks #4953

Merged
merged 1 commit into from Feb 21, 2017

Conversation

Projects
None yet
5 participants
@pieterlexis
Member

pieterlexis commented Jan 27, 2017

Short description

When we cannot retrieve the SOA record for a slave domain, use an increasing interval between checking the domain again. This prevents hammering down or already busy servers.
Closes #349
Closes #602

Checklist

I have:

  • read the CONTRIBUTING.md document
  • compiled and tested this code
  • included documentation (including possible behaviour changes)
  • documented the code
  • added regression tests
  • added unit tests
  • checked that this code was merged to master
@mind04

This comment has been minimized.

Show comment
Hide comment
@mind04

mind04 Jan 27, 2017

Contributor

Is soa-retry-default not a better value for the backoff time if you cannot retreive a soa? With the current max 60 * slave-cycle-interval the backoff time will grow quickly if you increase slave-cycle-interval (1 hour = max 2.5 days).

Contributor

mind04 commented Jan 27, 2017

Is soa-retry-default not a better value for the backoff time if you cannot retreive a soa? With the current max 60 * slave-cycle-interval the backoff time will grow quickly if you increase slave-cycle-interval (1 hour = max 2.5 days).

Show outdated Hide outdated pdns/slavecommunicator.cc
Show outdated Hide outdated pdns/slavecommunicator.cc
Show outdated Hide outdated pdns/slavecommunicator.cc
Show outdated Hide outdated pdns/slavecommunicator.cc
@pieterlexis

This comment has been minimized.

Show comment
Hide comment
@pieterlexis

pieterlexis Feb 14, 2017

Member

Is soa-retry-default not a better value for the backoff time if you cannot retreive a soa? With the current max 60 * slave-cycle-interval the backoff time will grow quickly if you increase slave-cycle-interval (1 hour = max 2.5 days).

Both for soa-retry-default and slave-cycle-interval appear 'related' settings for this... We could limit the time between checks to the minimum of slave-cycle-interval*60 and soa-retry-default*2?

Member

pieterlexis commented Feb 14, 2017

Is soa-retry-default not a better value for the backoff time if you cannot retreive a soa? With the current max 60 * slave-cycle-interval the backoff time will grow quickly if you increase slave-cycle-interval (1 hour = max 2.5 days).

Both for soa-retry-default and slave-cycle-interval appear 'related' settings for this... We could limit the time between checks to the minimum of slave-cycle-interval*60 and soa-retry-default*2?

@pieterlexis

This comment has been minimized.

Show comment
Hide comment
@pieterlexis

pieterlexis Feb 14, 2017

Member

Hrm, that is confusing... maybe it should be std::min(slave-cycle-interval*num_failures, soa-retry-default)

Member

pieterlexis commented Feb 14, 2017

Hrm, that is confusing... maybe it should be std::min(slave-cycle-interval*num_failures, soa-retry-default)

@pieterlexis pieterlexis merged commit 8679b32 into PowerDNS:master Feb 21, 2017

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@pieterlexis pieterlexis deleted the pieterlexis:issue-349-602-slave-checking-backoff branch Feb 21, 2017

@klaus3000

This comment has been minimized.

Show comment
Hide comment
@klaus3000

klaus3000 Sep 16, 2017

Thanks for this great feature. Just a comment from our experience with failing masters: a zone which was removed form the master but not from the slave causes plenty of logging every slave-check cycle, but rarely caused problem on the slave. But what really causes massive problems is a non-responsive master. Because then the queries have to timeout which massively slow downs the slave. We have customers with >200.000 domains. If their master is not available we have to manually remove these customer from the slave checks to avoid performance issues on our slave. Hence, it would be great if you could also implement a similar feature but not per "domain", but per "master IP". So, skipping SOA checks for some time for certain master IP addresses if the a certain amount of queries failed in a certain time.

klaus3000 commented Sep 16, 2017

Thanks for this great feature. Just a comment from our experience with failing masters: a zone which was removed form the master but not from the slave causes plenty of logging every slave-check cycle, but rarely caused problem on the slave. But what really causes massive problems is a non-responsive master. Because then the queries have to timeout which massively slow downs the slave. We have customers with >200.000 domains. If their master is not available we have to manually remove these customer from the slave checks to avoid performance issues on our slave. Hence, it would be great if you could also implement a similar feature but not per "domain", but per "master IP". So, skipping SOA checks for some time for certain master IP addresses if the a certain amount of queries failed in a certain time.

@pieterlexis

This comment has been minimized.

Show comment
Hide comment
@pieterlexis

pieterlexis Sep 18, 2017

Member

That sounds not unreasonable on first glance, can you open a feature request?

Member

pieterlexis commented Sep 18, 2017

That sounds not unreasonable on first glance, can you open a feature request?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment