Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcd: start the status checker only after establishing the initial session #26363

Merged
merged 1 commit into from
Jun 22, 2023

Commits on Jun 19, 2023

  1. etcd: start status checker only after establishing initial session

    Currently, the etcd status checker is started in a separate goroutine
    as soon as the client is created, without waiting for the establishment
    of the initial session. Yet, the status checker will never succeed until
    then, based on how it is implemented. This is problematic especially in
    the clustermesh case, because the errors propagated by the status
    checker cause a watchdog logic to restart the connection.
    
    Let's consider the following situation: a new etcd client is created,
    starting the corresponding status checker, while the remote
    clustermesh-apiserver is not yet running. The status checker will likely
    fail a few times, propagating the corresponding error through the
    StatusCheckErrors channel. Eventually, the clustermesh-apiserver pod
    boots up, the initial session is established and the different watchers
    are started. At this point, the watchdog logic starts to read from the
    channel, and restarts an otherwise working connection. It is worth
    mentioning that this issue affects clustermesh only because the other
    components ignore the errors returned through the StatusCheckErrors
    channel.
    
    This commit modifies the etcd client to start the status checker only
    after that the initial connection has been established (the same also
    for the heartbeat watcher), so that the above issue is prevented.
    Another alternative might be to ignore all errors reported through the
    StatusCheckErrors when the watchdog is initially started, but this would
    be subject to race conditions, and possibly show incorrect statuses
    through `cilium status`.
    
    Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
    giorio94 committed Jun 19, 2023
    Configuration menu
    Copy the full SHA
    f4ef3d2 View commit details
    Browse the repository at this point in the history