Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hubble/relay: improve peer connections handling #12556

Merged
merged 24 commits into from
Jul 22, 2020

Commits on Jul 21, 2020

  1. hubble/relay: decouple peer syncing logic from the server struct

    This is a first step that will allow implementing unit tests for the
    relay package. A new interface, `peerSyncer`, is defined and implemented
    by a `syncer` struct. Having the interface will allow creating a mock
    for testing purposes.
    
    Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
    rolinh committed Jul 21, 2020
    Configuration menu
    Copy the full SHA
    510b2e7 View commit details
    Browse the repository at this point in the history
  2. hubble/relay: move reconnection logic to peer syncer

    This provides a better abstraction as the client that calls `List()` is
    expected to receive a list of peers that are "ready" to connect to and
    does not have to manage re-connect.
    
    Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
    rolinh committed Jul 21, 2020
    Configuration menu
    Copy the full SHA
    49194dd View commit details
    Browse the repository at this point in the history
  3. hubble/relay: add a pool manager to handle peers connections

    This commit is an in-depth reimplementation of the peers
    synchronization/connections handling logic. All logic regarding
    management of peers and connections to them has been moved to a separate
    package with a clearly defined interface for the consumer
    (`pool.PeerManager`).
    
    The goal here is threefold:
    
      1. Separation of concerns (by hiding all internal peer handling logic
         into a dedicated package)
      2. Increased testability
      3. Improved connections handling
    
    The consumer of the `pool` package does not have to care whether a peer
    is connected or not anymore. If a peer is not connected, it's `Conn`
    attribute will be `<nil>`. If `Conn` is not `<nil>` but the peer cannot
    be reached, the consumer can signal it by using the manager's
    `ReportOffline(name string)` method.
    
    The new `pool.Manager` (which implements the `pool.PeerManager`
    interface) also implements a better mechanism to ensure peers connection
    are healthy by using a dedicated routine that monitors for
    offline/reported offline peers and attempts to (re-)connect to them.
    This improves the situation as with the older implementation,
    re-connecting to a peer required a new call to the Hubble Relay service
    (and the peers that were "woken up" were not available for the request).
    
    Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
    rolinh committed Jul 21, 2020
    Configuration menu
    Copy the full SHA
    ec924ef View commit details
    Browse the repository at this point in the history
  4. hubble/peer: define a Peer service Client interface and Builder

    Such interface are useful to create good abstractions with code that
    require a Peer service client. They can also easily be mocked which is
    essential to write good unit tests.
    
    Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
    rolinh committed Jul 21, 2020
    Configuration menu
    Copy the full SHA
    215bc08 View commit details
    Browse the repository at this point in the history
  5. hubble/peer: implement ClientBuilder for a local connection

    LocalClientBuilder creates a new Client which is suitable when
    the gRPC connection to the Peer service is local (typically a Unix
    Domain Socket).
    This scenario is the most common one and is directly useful to Hubble
    Relay.
    
    Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
    rolinh committed Jul 21, 2020
    Configuration menu
    Copy the full SHA
    0bb2c86 View commit details
    Browse the repository at this point in the history
  6. hubble/relay: use peer.ClientBuilder to build Peer service clients

    Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
    rolinh committed Jul 21, 2020
    Configuration menu
    Copy the full SHA
    68b45a1 View commit details
    Browse the repository at this point in the history
  7. hubble/relay: define and use a gRPC client connection builder

    This commits defines an interface for a gRPC client connection builder.
    When instantiating a new pool.Manager, the caller may specify a grpc
    client connection builder implementation to be used to create connection
    to Hubble peers. This allows for fine grained control over the client
    connection parameters by the caller. In addition, as this is an
    interface, it can easily be mocked which is useful when writing unit
    tests.
    
    Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
    rolinh committed Jul 21, 2020
    Configuration menu
    Copy the full SHA
    a1c87a7 View commit details
    Browse the repository at this point in the history
  8. hubble/relay: implement backoff for peer connection attempts

    With the rework of peer connections handling, a goroutine periodically
    checks whether peer connections are healthy or not. This check is
    necessary as there are scenarios where a connection can be lost without
    any peer service notification or client request (for instance, a node
    reboot).
    
    This commit improves reconnection attempts by implementing a backoff
    mechanism so attempts are not too frequent. There is a default backoff
    mechanism that can be configured by caller when instantiating a new
    pool.Manager.
    
    Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
    rolinh committed Jul 21, 2020
    Configuration menu
    Copy the full SHA
    7bf60d3 View commit details
    Browse the repository at this point in the history
  9. hubble/relay: make the connection check interval configurable

    Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
    rolinh committed Jul 21, 2020
    Configuration menu
    Copy the full SHA
    cfe2615 View commit details
    Browse the repository at this point in the history
  10. hubble/relay: report offline peers to the pool manager

    Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
    rolinh committed Jul 21, 2020
    Configuration menu
    Copy the full SHA
    1db1402 View commit details
    Browse the repository at this point in the history
  11. hubble/relay: ignore backoff on direct connection requests

    On peer change notifications or call to
    `pool.Manager.ReportOffline(name string)`, the backoff interval should
    be ignored to re-attempt a connection. This change typically addresses
    the following scenario:
    
      1. Node Foo is offline
      2. The connectivity checker attempts to reconnect several time; each
        time the backoff is increased.
      3. Node Foo becomes available; a peer change notification is received
      4. Oops, no connection attempt is made to Foo because of a large
         backoff duration
    
    Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
    rolinh committed Jul 21, 2020
    Configuration menu
    Copy the full SHA
    0427a13 View commit details
    Browse the repository at this point in the history
  12. hubble/relay: do not call disconnect before connection attempt

    This is not necessary as the connect function closes pre-existing
    connections.
    
    Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
    rolinh committed Jul 21, 2020
    Configuration menu
    Copy the full SHA
    c3107e0 View commit details
    Browse the repository at this point in the history
  13. hubble: move FakePeerNotifyServer to testutils

    Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
    rolinh committed Jul 21, 2020
    Configuration menu
    Copy the full SHA
    6e113c0 View commit details
    Browse the repository at this point in the history
  14. hubble: move peer interface definitions to hubble/peer/types

    This is necessary to avoid import cycles when implementing fakes for the
    peer interfaces in the `testutils` package.
    
    Note that the `types` sub-package is commonly found throughout Cilium's
    codebase so this change follows the convention.
    
    Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
    rolinh committed Jul 21, 2020
    Configuration menu
    Copy the full SHA
    ed450b3 View commit details
    Browse the repository at this point in the history
  15. hubble/testutils: implement fakers for peer/types/{Client,ClientBuilder}

    Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
    rolinh committed Jul 21, 2020
    Configuration menu
    Copy the full SHA
    c9a5df5 View commit details
    Browse the repository at this point in the history
  16. hubble/testutils: add FakeGRPCClientStream

    This will allow mocking gRPC clients which is useful for unit testing.
    
    Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
    rolinh committed Jul 21, 2020
    Configuration menu
    Copy the full SHA
    431382f View commit details
    Browse the repository at this point in the history
  17. hubble/testutils: implement FakePeerNotifyClient

    Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
    rolinh committed Jul 21, 2020
    Configuration menu
    Copy the full SHA
    c34a041 View commit details
    Browse the repository at this point in the history
  18. hubble/relay: implement unit tests for the peer pool manager

    These unit tests are not exhaustive although they still provide a decent
    coverage. However, it provides a good structure to implement more tests
    in the future such as complex peer connectivity issue scenarios or
    simply regression tests for bugs that are fixed.
    
    Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
    rolinh committed Jul 21, 2020
    Configuration menu
    Copy the full SHA
    a9b8845 View commit details
    Browse the repository at this point in the history
  19. hubble/[peer|relay]: remove Target() from the ClientBuilder interface

    This commit removes `Target()` from the peer `ClientBuilder` interface
    and, instead, adds the target as a parameter to `Client()`.
    
    As a consequence, Hubble Relay code is adapted to the change. The peer
    pool manager now has a new option to provide the address of the peer
    gRPC service.
    
    Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
    rolinh committed Jul 21, 2020
    Configuration menu
    Copy the full SHA
    d395dd2 View commit details
    Browse the repository at this point in the history
  20. hubble/relay: add pool option to set the logger and remove debug option

    The debug option was only used to set the log level of the logger. As
    the caller now has control over the logger to be used, this option is no
    longer necessary.
    
    Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
    rolinh committed Jul 21, 2020
    Configuration menu
    Copy the full SHA
    58d9524 View commit details
    Browse the repository at this point in the history
  21. hubble/relay: wait on goroutine to finish when pool.Stop() is called

    This change makes tests more reliable and is cleaner at the same time as
    once the function returns, the pool manager is really shut down.
    
    Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
    rolinh committed Jul 21, 2020
    Configuration menu
    Copy the full SHA
    334035b View commit details
    Browse the repository at this point in the history
  22. hubble/relay: assert logger messages in pool manager tests

    This commit tests the pool manager logger output for certain tests. This
    change revealed some flakiness in the tests that have been addressed.
    
    Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
    rolinh committed Jul 21, 2020
    Configuration menu
    Copy the full SHA
    7c4f4e5 View commit details
    Browse the repository at this point in the history
  23. hubble/relay: use a RWLock to protect the peers map in the pool Manager

    The peers map is expected to be more frequently read than written to,
    especially with the `List()` method being exposed and used by Hubble
    Relay for every request it handles. Therefore, a RWLock is more
    approriate than a regular lock.
    
    Suggested-by: André Martins <andre@cilium.io>
    Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
    rolinh committed Jul 21, 2020
    Configuration menu
    Copy the full SHA
    864f3f0 View commit details
    Browse the repository at this point in the history
  24. hubble/relay: rename struct member from pool to pm to avoid confusion

    The name `pool` clashes with the package `pool`. It is not a problem per
    se but creates a bit of confusion so rename it to `pm`.
    
    Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
    rolinh committed Jul 21, 2020
    Configuration menu
    Copy the full SHA
    f95e76a View commit details
    Browse the repository at this point in the history