New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hubble: implement peer service, enable it locally #10969
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the change looks good, but it wasn't clear to me how i can test this in my cluster. i guess nobody is actually calling rpc Notify(NotifyRequest)
yet?
You can use a gRPC client. For example, with grpcurl, you can invoke the service like this from the root of the git repository:
Note that this call "blocks". So one way to test this is to invoke this command then add/remove nodes from your cluster. Of course, the cilium-agent needs the
Not yet. I have an in-development version of |
pkg/hubble/peer/service.go
Outdated
// interface so that it can be mocked for the purpose of testing. | ||
type Service struct { | ||
stop chan (struct{}) | ||
mgr *manager.Manager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, consider using an interface instead of implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, this is required for testing, like mentioned in the struct comment just above. If it's fine with you, I'll add an interface in pkg/node/manager/manager.go
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The interface could be defined here, not in pkg/node/manager/manager.go
, this allows you to not need to import pkg/node/manager
in this package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added the interface with this commit faae24149afbb9da49688945123c09a79716f903 already and implemented tests of the service.
I think that importing the interface from pkg/node/manager
is not so bad because it's otherwise really hard to understand who implements it if you're not familiar enough with cilium's codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rolinh there are tools built for that purpose [0] so there's no point in keeping the interfaces in the same place as 1 of the implementations. What happens if in the future we will have a different node manager that implements the same interface? Where will the interface live?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind moving the interface to pkg/hubble/peer
if you have strong feelings about it; I don't think it matters much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The interface could be defined here, not in pkg/node/manager/manager.go, this allows you to not need to import pkg/node/manager in this package.
BTW, by moving the interface to pkg/hubble/peer
I could save on importing pkg/node/manager
but then I'll have to import pkg/datapath
as it's required for the interface definition. I'm not "saving" on import anyway.
c5b9ce6
to
2650dd5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! I do see an issue with blocking the Node Manager which I think must be addressed, but all my other comments are just comments and don't require any action.
case <-s.stop: | ||
return nil | ||
case cn := <-h.C: | ||
if err := stream.Send(cn); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stream.Send()
can block (it does I/O). Which means the unbuffered h.C
channel causes the NodeManager to be blocked as well in h.Node{Add,Update,Delete}
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why isn't a context.Context
part of the Send
parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the ServerStream
interface doesn't use a context.Context
. SendMsg()
is used by Send()
, as can be seen in the generated code in peer.pb.go
:
type Peer_NotifyServer interface {
Send(*ChangeNotification) error
grpc.ServerStream
}
type peerNotifyServer struct {
grpc.ServerStream
}
func (x *peerNotifyServer) Send(m *ChangeNotification) error {
return x.ServerStream.SendMsg(m)
}
The interface defines a context but it's global to ServerStream
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gandro I have added a test that simulates a blocked send (TestService_NotifyWithBlockedSend
) and it failed (which is good as it tests what we want).
To fix the problem and have the test pass, I've added small buffer (of size 10 by default but this is configurable through service options) that, when full, triggers the server to stop steaming to the client.
This eventually pushes the problem back to the client which will have to call Notify()
again in such a case. I don't think it's a problem overall because upon calling Notify()
, the client receives the list of all peers before receiving continuous updates so maintaining a local cache client-side is easy.
Please, have a look again at the Notify()
implementation and tell me if you think this way of handling the problem is OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, yes, the code itself looks good! 🎉 We have to get a bit of experience with this, because I honestly have no clue how big the buffer size should be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to get a bit of experience with this, because I honestly have no clue how big the buffer size should be.
Neither do I. I assumed that since the connection is over a local unix domain socket, a buffer of 10 should be sufficient. I also don't want to make it unnecessarily large because if the caller can't keep up then there's probably a problem elsewhere.
@rolinh ok it passed my rigorous testing of scaling nodepool size from 1 to 3 and then scaling it back down to 1:
|
pkg/hubble/peer/service.go
Outdated
// interface so that it can be mocked for the purpose of testing. | ||
type Service struct { | ||
stop chan (struct{}) | ||
mgr *manager.Manager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The interface could be defined here, not in pkg/node/manager/manager.go
, this allows you to not need to import pkg/node/manager
in this package.
case <-s.stop: | ||
return nil | ||
case cn := <-h.C: | ||
if err := stream.Send(cn); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why isn't a context.Context
part of the Send
parameters?
2650dd5
to
c586a3a
Compare
c301be8
to
01611db
Compare
test-me-please EDIT: test flake |
restart-ginkgo EDIT: hit flake again |
case <-s.stop: | ||
return nil | ||
case cn := <-h.C: | ||
if err := stream.Send(cn); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, yes, the code itself looks good! 🎉 We have to get a bit of experience with this, because I honestly have no clue how big the buffer size should be.
restart-ginkgo EDIT: yet another flake :( |
01611db
to
bfcc725
Compare
test-me-please EDIT: update to adapt to changes from e99049f |
bfcc725
to
c0137af
Compare
test-me-please EDIT: rebased against master again to fix conflict introduced by f332fb5 |
c0137af
to
85fd93d
Compare
test-me-please |
test-gke |
test-gke |
85fd93d
to
c7cd0ad
Compare
Rebased against master to fix conflict with fef319a. |
test-me-please |
test-gke |
c7cd0ad
to
dfc21b9
Compare
Found a bug in the peer service implementation where the wrong |
test-me-please |
This commits removes the global variable DefaultOptions which contained no defaults but was just initializing the listeners map. The map initialization is done in the constructor instead in order to avoid problems with shallow copy of default options (a 2nd call to NewServer would use modified defaults). Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
Make sure to only require the grpc interface definition for service options. The server is a simple grpc server which serves 1 or more services and it doesn't need to know about the internal of each service implementation. This change makes this explicit. Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
This is useful for any package that needs to implement gRPC tests, thus move it to a more generic location. Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
This interface will make it easier to implement tests, notably for the peer service that uses the node manager. Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
This commit adds a new peer module for hubble that implements the peer gRPC server interface. This new gRPC service notifies of nodes being added, deleted or modified. This information is obtained by subscribing to the cilium node manager via a new datapath.NodeHandler interface implementation (peer.handler). This new service will be consumed by hubble proxy in order to connect to hubble peers. It is intended to only be served over a local unix domain socket. Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
This commits enables the peer service for the hubble gRPC server that listens on the local unix domain socket. This will allow node discovery for the peer proxy. For all other hubble listen addresses, the peer service is not enabled. Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
dfc21b9
to
0c5a9e9
Compare
rebased against master, fixed test failure uncovered by the context fix. |
test-me-please |
test-gke |
test-gke Edit: looks like it timed out: https://jenkins.cilium.io/job/Cilium-PR-K8s-GKE/546/execution/node/112/log/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
test-gke |
The GKE test was running for 10+ hours and stuck at this step (CI issue):
|
test-gke |
Please, see individual commits for details.