Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delay removal of flow-restore-wait #6342

Commits on May 31, 2024

  1. Delay removal of flow-restore-wait

    Until a set of "essential" flows has been installed. At the moment, we
    include NetworkPolicy flows (using podNetworkWait as the signal), Pod
    forwarding flows (reconciled by the CNIServer), and Node routing flows
    (installed by the NodeRouteController). This set can be extended in the
    future if desired.
    
    We leverage the wrapper around sync.WaitGroup which was introduced
    previously in antrea-io#5777. It simplifies unit testing, and we can achieve some
    symmetry with podNetworkWait.
    
    We can also start leveraging this new wait group
    (flowRestoreCompleteWait) as the signal to delete flows from previous
    rounds. However, at the moment this is incomplete, as we don't wait for
    all controllers to signal that they have installed initial flows.
    
    Because the NodeRouteController does not have an initial "reconcile"
    operation (like the CNIServer) to install flows for the initial Node
    list, we instead rely on a different mechanims provided by upstream K8s
    for controllers. When registering event handlers, we can request for the
    ADD handler to include a boolean flag indicating whether the object is
    part of the initial list retrieved by the informer. Using this
    mechanism, we can reliably signal through flowRestoreCompleteWait when
    this initial list of Nodes has been synced at least once.
    
    This change is possible because of antrea-io#6361, which removed the dependency
    on the proxy (kube-proxy or AntreaProxy) to access the Antrea
    Controller. Prior to antrea-io#6361, there would have been a circular dependency
    in the case where kube-proxy was removed: flow-restore-wait will not be
    removed until the Pod network is "ready", which will not happen until
    the NetworkPolicy controller has started its watchers, and that depends
    on antrea Service reachability which depends on flow-restore-wait being
    removed.
    
    Fixes antrea-io#6338
    
    Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
    antoninbas committed May 31, 2024
    Configuration menu
    Copy the full SHA
    01b3092 View commit details
    Browse the repository at this point in the history
  2. Address review comments

    Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
    antoninbas committed May 31, 2024
    Configuration menu
    Copy the full SHA
    313eb8f View commit details
    Browse the repository at this point in the history