Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix replication lag warning to be always lower that failover timeout #63

Merged

Conversation

sjamgade
Copy link
Contributor

Without this patch warning boundary could be set lower than failover
timeout leading to warning file not created at all. As result warning
handling mechanism never comes into play.

This patch improves that behaviour by setting warning boundary half the
failover timeout.

@sjamgade sjamgade force-pushed the warning-should-be-lower-than-timeout branch from d2e183a to 9673b46 Compare February 23, 2021 15:50
Copy link
Contributor

@rikonen rikonen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to ensuring warning threshold is smaller than critical threshold also check_replication_lag needs to be adjusted so that it does not check whether value is over critical threshold if warning threshold was just exceeded, otherwise it's still possible both limits are exceeded at the same time and promotion happens without going through warning state.

pglookout/pglookout.py Outdated Show resolved Hide resolved
pglookout/pglookout.py Outdated Show resolved Hide resolved
@sjamgade sjamgade force-pushed the warning-should-be-lower-than-timeout branch 3 times, most recently from 940ed43 to 5cd6bfb Compare February 24, 2021 10:37
pglookout/pglookout.py Outdated Show resolved Hide resolved
Without this patch warning boundary could be set higher than failover
timeout leading to warning file not created at all. As result warning
handling mechanism never comes into play.

This patch improves that behaviour by setting warning boundary to same
as failover timeout
If pglookout is unable to do some periodic checks and comes back to
notice replication lag has grown over failover limit because the replica
was unresponsive, then its better to force one more check than straight
jump to failover.
@sjamgade sjamgade force-pushed the warning-should-be-lower-than-timeout branch from 5cd6bfb to 3422690 Compare February 24, 2021 11:56
@rikonen rikonen merged commit 181b0d3 into Aiven-Open:master Feb 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants