Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make cilium status output more succinct by default #2821

Merged
merged 6 commits into from
Feb 18, 2018

Conversation

joestringer
Copy link
Member

@joestringer joestringer commented Feb 15, 2018

Simplify output of `cilium status` by default, add new `--verbose`, `--brief` options

Add --verbose mode and some extra options to choose how much output you want from the cilium status API. Gather the verbose output in bugtool. Fold in a minor change to phrase controller states in terms of success rather than failure.

Example output with this series:

$ cilium status                                                                                                                                                                                                                                                
KVStore:                    Ok         Consul: 172.17.0.2:8300                  
ContainerRuntime:           Ok                                                  
Kubernetes:                 Disabled                                            
Cilium:                     Ok         OK                                       
NodeMonitor:                Disabled                                            
Cilium health daemon:       Warning   Get http:///var/run/cilium/health.sock/v1beta/hello: dial unix /var/run/cilium/health.sock:
 connect: no such file or directory                                             
IPv4 address pool:   3/65535 allocated                                                   
IPv6 address pool:   2/65535 allocated                                                   
Controller Status:     3/3 healthy

There's also this command for a brief, one-line summary:

$ cilium status --brief
OK

@joestringer joestringer added pending-review area/misc Impacts miscellaneous areas of the code not otherwise owned by another area. labels Feb 15, 2018
@joestringer joestringer requested review from a team as code owners February 15, 2018 01:04
@joestringer
Copy link
Member Author

test-me-please

tgraf
tgraf previously requested changes Feb 15, 2018
Copy link
Member

@tgraf tgraf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add a --brief option which prints the overall status on a single line? We need this for the Kubernetse healthcheck.

@joestringer
Copy link
Member Author

@tgraf sure, do you have an example of how that output should look?

From a quick search, it looks something like we need this on success:

ok

Or this on failure:

error: ...

@tgraf
Copy link
Member

tgraf commented Feb 15, 2018

@tgraf sure, do you have an example of how that output should look?

From a quick search, it looks something like we need this on success:

ACK

@aanm
Copy link
Member

aanm commented Feb 15, 2018

Since we are touching this can we also change it to something like

Allocated IPv4 addresses:   (3/65535 allocated)
Allocated IPv6 addresses:   (2/65535 allocated)
Known cluster nodes:        (1/1 connected (or "status ok"?))
 cilium-master (localhost):
  Primary Address:   10.0.2.15
   Type:             InternalIP
  AllocRange:        10.11.0.0/16
Controller Status (3/3 success)

Also, I'm assuming you are changing this because of the number of allocated IP addresses. Can we use @ianvernon 's IP "coalesce" functionality to present the IPs in CIDRs when possible?

@joestringer
Copy link
Member Author

@aanm looks good, one comment:

Known cluster nodes: (1/1 connected (or "status ok"?))

^ Writing 1 connected is trivial, which would reflect the number of nodes that we've learned through the orchestrator. Writing N/M connected suggests some kind of health check, but we don't have that information in the main daemon or the client at this time, are you proposing that cilium status makes another call to cilium-health to determine this on the client side?

@aanm
Copy link
Member

aanm commented Feb 15, 2018

I think so, we were already showing up all nodes in the cluster with cilium status so why not give the full result of cilium-health?

@joestringer
Copy link
Member Author

joestringer commented Feb 15, 2018

@aanm Seems like a reasonable idea. I'll play around with it a bit. Do we really need node information like allocrange for all other nodes in cilium status?

@joestringer
Copy link
Member Author

joestringer commented Feb 15, 2018

Here's a quick mockup:

When a node is unhealthy, regular output (only print unhealthy nodes):

$ cilium status                                                                 
...                                                                             
IPAM:                                                                           
 IPv4 addresses:   3/65535 allocated                                            
 IPv6 addresses:   2/65535 allocated                                            
Cluster health:    N/M nodes healthy
  cilium-test:                                                                  
    Host connectivity to 10.0.2.15:                                             
      ICMP:          OK, RTT=395.968µs                                          
      HTTP via L3:   Connection timed out     

With all nodes healthy, option to print all health details:

$ cilium status --all-health                                                    
...                                                                             
Cluster health:    1/1 nodes healthy                                            
  cilium-master (localhost):                                                                                                                                                                                                                                   
    Host connectivity to 10.0.2.15:                                             
      ICMP:          OK, RTT=395.968µs                                          
      HTTP via L3:   OK, RTT=3.211737ms 

When cilium-health isn't up:

$ cilium status
...                                                                                                                                                                                                                                                
Cluster health:    ?/1 nodes healthy (cilium-health: xxx)

@joestringer joestringer added release-note/minor This PR changes functionality that users may find relevant to operating Cilium. wip and removed pending-review labels Feb 15, 2018
@aanm
Copy link
Member

aanm commented Feb 16, 2018

@aanm Seems like a reasonable idea. I'll play around with it a bit. Do we really need node information like allocrange for all other nodes in cilium status?

@joestringer I was only thinking for the current node but we have the alloc range for all nodes in the pkg/node

@joestringer
Copy link
Member Author

@aanm right, that's what the current output in cilium status is, which we're currently printing. However I suspect that this information isn't really that interesting, so we could disable it (it's still exposed in the JSON if you fetch via cilium status -o json). Instead, we could place the cilium-health status there as a one-liner, or with a short list of failing nodes. Then we can achieve your suggestion of the output something like cluster health: N/M nodes connected

@joestringer
Copy link
Member Author

@aanm I implemented your suggestion in #2858. It needed a bit of refactoring, so I kept it out of this PR.

Frame controller output in terms of how many are OK, rather than how
many are failing.

Signed-off-by: Joe Stringer <joe@covalent.io>
Introduce three new arguments to 'cilium status':

* allAddresses: Print all addr allocations (default: print only the count)
* allNodes: Print all known nodes (default: only print localhost info)
* verbose: Equivalent to --all-addresses --all-controllers --all-nodes

This ensures that by default in large environments, 'cilium status' will
not flood the terminal of the person trying to get information, but
still allows more information to be gathered on demand.

Signed-off-by: Joe Stringer <joe@covalent.io>
Fixes: cilium#2749
Signed-off-by: Joe Stringer <joe@covalent.io>
The new `--brief` argument prints a single line summary of the status
which prints either "OK" or "error: ...", which prints only one of the
errors currently reported in the system, with preference in this order:

* Cilium daemon
* Container Runtime
* Kvstore
* Kubernetes
* Cilium-health daemon
* Controllers inside Cilium daemon

Signed-off-by: Joe Stringer <joe@covalent.io>
This will be used by an upcoming commit to report the max number of IPs
that may be allocated by cilium.

Signed-off-by: Joe Stringer <joe@covalent.io>
Signed-off-by: Joe Stringer <joe@covalent.io>
@joestringer
Copy link
Member Author

test-me-please

@tgraf tgraf merged commit 0e29a06 into cilium:master Feb 18, 2018
@joestringer joestringer deleted the submit/cilium-status branch February 19, 2018 04:56
ldelossa pushed a commit to ldelossa/cilium that referenced this pull request Mar 16, 2024
Due to cilium#2821 run the SRv6 datapath e2e tests only with SID Manager disabled.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/misc Impacts miscellaneous areas of the code not otherwise owned by another area. release-note/minor This PR changes functionality that users may find relevant to operating Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants