Add KSR recovery from transient Etcd errors#571
Merged
tiewei merged 5 commits intocontiv:masterfrom Feb 5, 2018
jmedved:master
Merged
Add KSR recovery from transient Etcd errors#571tiewei merged 5 commits intocontiv:masterfrom jmedved:master
tiewei merged 5 commits intocontiv:masterfrom
jmedved:master
Conversation
lukasmacko
reviewed
Feb 5, 2018
|
|
||
| // mockKeyProtoVaBroker is a mock implementation of KSR's interface to the | ||
| // key-value data store. | ||
| type mockKeyProtoVaBroker struct { |
Collaborator
There was a problem hiding this comment.
typo mockKeyProtoValBroker - missing l
lukasmacko
approved these changes
Feb 5, 2018
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The main purpose of this pull request is to implement the recovery from transient Etcd errors that are not detected by the agent status monitor that also monitors Etcd. The agent status monitor does not detect fast Etcd restarts (a couple of seconds) that may result in data loss.
To monitor data loss in Etcd we utilize Etcd's capability to record revisions for every item stored in Etcd. A KSR status record was introduced in Etcd that for now holds just KSR statistics (a collection of stats for reflector in the KSR). The KSR status record is periodically updated. With every update, the etcd rev of the status record is checked with its last recorded value; if the etcd rev is not monotonically larger than the last recorded value, a resync is triggered. This functionality is implemented in plugin_impl_ksr.go (new functions: monitorEtcdStatus(), processEtcdMonitorEvent(), checkEtcdTransientError(), ksrHasSynced() and getKsrStats()).
To support the above chanes, the code base also underwent some cleanup:
The Writer and Lister interfaces and mocks were consolidated into a single interface/mock. This cleaned up reflector initalization and testing.
The ability to inject errors into the mock was made more fine grained (one for the ListValues() function, one for all other data operations) so that all error paths in KSR data resync can be tested in unit tests.
Reflector object type handling was cleaned up (constants introduced for each object type and then used consistently throughout the code).