Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hubble[/relay]: add version to observer.ServerStatus and add and implement observer.GetNodes #13979

Merged
merged 8 commits into from Nov 27, 2020

Conversation

rolinh
Copy link
Member

@rolinh rolinh commented Nov 10, 2020

Add version information to observer.ServerStatus.

Additionally, add observer.GetNodes and implement it in Hubble Relay. This new endpoint provides a list of nodes with name, version, address and connection state and TLS information. This is mostly useful for Hubble UI to provide a list of all nodes with version and status on a status page but it could also be leveraged by Hubble CLI (eg hubble nodes list).

For reference, this is the output I get when using grpcurl to issue an observer.GetNodes request against Hubble Relay on my 4 nodes dev cluster with one unavailable node:

$ grpcurl -plaintext -use-reflection localhost:4245 observer.Observer.GetNodes
{
  "nodes": [
    {
      "name": "kind-worker",
      "version": "cilium v1.9.90-gccaaaa966",
      "address": "172.18.0.2:4244",
      "state": "NODE_CONNECTED",
      "tls": {
        "enabled": true,
        "serverName": "kind-worker.default.hubble-grpc.cilium.io"
      },
      "uptimeNs": "60622955603",
      "numFlows": "304",
      "maxFlows": "4095",
      "seenFlows": "304"
    },
    {
      "name": "kind-worker2",
      "address": "172.18.0.5:4244",
      "state": "NODE_UNAVAILABLE",
      "tls": {
        "enabled": true,
        "serverName": "kind-worker2.default.hubble-grpc.cilium.io"
      }
    },
    {
      "name": "kind-control-plane",
      "version": "cilium v1.9.90-gccaaaa966",
      "address": "172.18.0.4:4244",
      "state": "NODE_CONNECTED",
      "tls": {
        "enabled": true,
        "serverName": "kind-control-plane.default.hubble-grpc.cilium.io"
      },
      "uptimeNs": "60888864685",
      "numFlows": "85",
      "maxFlows": "4095",
      "seenFlows": "85"
    },
    {
      "name": "kind-worker3",
      "version": "cilium v1.9.90-gccaaaa966",
      "address": "172.18.0.3:4244",
      "state": "NODE_CONNECTED",
      "tls": {
        "enabled": true,
        "serverName": "kind-worker3.default.hubble-grpc.cilium.io"
      },
      "uptimeNs": "60685807845",
      "numFlows": "84",
      "maxFlows": "4095",
      "seenFlows": "84"
    }
  ]
}

And for completeness, here is the version information in observer.ServerStatus:

$ grpcurl -plaintext -use-reflection localhost:4245 observer.Observer.ServerStatus
{
  "numFlows": "3415",
  "maxFlows": "12285",
  "seenFlows": "3415",
  "uptimeNs": "461108917333",
  "numConnectedNodes": 3,
  "numUnavailableNodes": 1,
  "unavailableNodes": [
    "kind-worker3"
  ],
  "version": "hubble-relay v1.9.90-g59c7b8350"
}

When querying a local hubble instance from a Cilium pod:

$ grpcurl -plaintext -use-reflection -unix /var/run/cilium/hubble.sock observer.Observer.GetNodes    
ERROR:
  Code: Unimplemented
  Message: GetNodes not implemented

And for the status:

$ grpcurl -plaintext -use-reflection -unix /var/run/cilium/hubble.sock observer.Observer.ServerStatus
{
  "numFlows": "2443",
  "maxFlows": "4095",
  "seenFlows": "2443",
  "uptimeNs": "670048540549",
  "version": "cilium v1.9.90-g59c7b8350"
}

Closes: #13935

Hubble: add GetNodes rpc endpoint

@rolinh rolinh added release-note/minor This PR changes functionality that users may find relevant to operating Cilium. sig/hubble Impacts hubble server or relay labels Nov 10, 2020
@rolinh rolinh requested a review from a team as a code owner November 10, 2020 23:01
@rolinh rolinh requested review from a team and ti-mo November 10, 2020 23:01
@maintainer-s-little-helper maintainer-s-little-helper bot added this to In progress in 1.10.0 Nov 10, 2020
@rolinh rolinh requested a review from glibsm November 10, 2020 23:01
@rolinh rolinh changed the title hubble[/relay]: add version information and a list of nodes name and version to observer status hubble[/relay]: add version information and a list of nodes with name and version to observer status Nov 10, 2020
@rolinh rolinh marked this pull request as draft November 11, 2020 10:43
@rolinh
Copy link
Member Author

rolinh commented Nov 11, 2020

Will rework this PR to add a new GetNodes rpc endpoint to the observer service.

@rolinh rolinh force-pushed the pr/rolinh/relay-all-nodes-status branch from f77e7e9 to 59c7b83 Compare November 12, 2020 13:44
@rolinh rolinh changed the title hubble[/relay]: add version information and a list of nodes with name and version to observer status hubble[/relay]: add version to observer.ServerStatus and add and implement observer.GetNodes Nov 12, 2020
@rolinh rolinh marked this pull request as ready for review November 12, 2020 13:57
@rolinh
Copy link
Member Author

rolinh commented Nov 12, 2020

Changes implemented. This PR is now ready for review.

pkg/hubble/build/version.go Outdated Show resolved Hide resolved
Copy link
Member

@kaworu kaworu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great patch, tested and it looks good! I have a couple of questions, and also would like to see a trivial test case for the codes.Unimplemented returned by Hubble's GetNodes.

pkg/hubble/relay/observer/server.go Outdated Show resolved Hide resolved
pkg/hubble/relay/observer/server.go Show resolved Hide resolved
@nathanjsweet
Copy link
Member

test-me-please

@rolinh rolinh force-pushed the pr/rolinh/relay-all-nodes-status branch from ccaaaa9 to efaa764 Compare November 19, 2020 13:11
@rolinh rolinh requested a review from a team as a code owner November 19, 2020 13:11
@rolinh rolinh requested a review from tklauser November 19, 2020 13:11
Copy link
Member

@tklauser tklauser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to blang/semver LGTM, one remark regarding missing test tag (see GH action failure).

pkg/hubble/build/version.go Show resolved Hide resolved
@rolinh rolinh force-pushed the pr/rolinh/relay-all-nodes-status branch from a8cef44 to c7c6f5d Compare November 19, 2020 14:00
@rolinh rolinh requested a review from tklauser November 19, 2020 14:00
Copy link
Member

@tklauser tklauser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

@kaworu kaworu added the kind/feature This introduces new functionality. label Nov 19, 2020
Copy link
Member

@kaworu kaworu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working great, thanks!

Knowing about the running version is useful, notably during a cluster
upgrade.

Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
This endpoint is intended to be implemented by Hubble Relay to provide
information about nodes and their status.

Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
The new generated code breaks implementations of observer server because
they are missing the new GetNodes method. To ensure that every commit
compiles on its own, add stubs to implementations of observer server.

Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
@rolinh rolinh force-pushed the pr/rolinh/relay-all-nodes-status branch from c7c6f5d to ab09bb6 Compare November 23, 2020 10:38
@rolinh
Copy link
Member Author

rolinh commented Nov 23, 2020

test-me-please

@rolinh
Copy link
Member Author

rolinh commented Nov 24, 2020

retest-4.9

EDIT: bah, I meant to re-test runtime. 4.9 run was ✔️

@rolinh
Copy link
Member Author

rolinh commented Nov 24, 2020

retest-runtime

This failure is a flake (completely unrelated to this PR), see #14149.

@rolinh
Copy link
Member Author

rolinh commented Nov 24, 2020

retest-runtime

@rolinh rolinh closed this Nov 25, 2020
@rolinh rolinh reopened this Nov 25, 2020
@rolinh
Copy link
Member Author

rolinh commented Nov 26, 2020

retest-net-next

Comment on lines +46 to +51
// component is the Hubble component (eg: hubble, hubble-relay).
component string
// Core represents the core version (eg: 1.9.0).
Core string
// Revision is the software revision, typically a Git commit SHA.
Revision string
Copy link
Member

@glibsm glibsm Nov 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is one unexported?

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Nov 26, 2020
MaxFlows: s.GetRingBuffer().Cap(),
NumFlows: s.GetRingBuffer().Len(),
SeenFlows: atomic.LoadUint64(&s.numObservedFlows),
UptimeNs: uint64(time.Since(s.startTime).Nanoseconds()),
}, nil
}

// GetNodes implements observerpb.ObserverClient.GetNodes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it implement it though :)

continue
}
n.State = relaypb.NodeState_NODE_CONNECTED
p := p
Copy link
Member

@glibsm glibsm Nov 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would suggest to at least now shadow the name.

EDIT: after further digging through faqs it seems like shadow is the recommendation directly from Rob Pike. golang/go@0cab7d5

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rolinh perhaps would be good to add a comment to explain the motivation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@errordeveloper This is common practice and even recommended as the "easier" way of achieving this in the Go FAQ. I could add a comment here but then there are many places in the codebase that would need the same comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, if it's actually part of the FAQ, comment is unnecessary

@errordeveloper errordeveloper merged commit ae069dc into master Nov 27, 2020
@errordeveloper errordeveloper deleted the pr/rolinh/relay-all-nodes-status branch November 27, 2020 12:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature This introduces new functionality. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/minor This PR changes functionality that users may find relevant to operating Cilium. sig/hubble Impacts hubble server or relay
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

It would be nice to have names of connected nodes in observer API
8 participants