Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to enable profiling in Antrea components #1434

Closed
antoninbas opened this issue Oct 23, 2020 · 7 comments · Fixed by #1452
Closed

Ability to enable profiling in Antrea components #1434

antoninbas opened this issue Oct 23, 2020 · 7 comments · Fixed by #1452
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.

Comments

@antoninbas
Copy link
Contributor

Describe the problem/challenge you have
When a user runs into an issue like this one, it is hard to collect relevant information to troubleshoot the issue.

Describe the solution you'd like
We should have a way to enable profiling and the pprof HTTP server (https://golang.org/pkg/net/http/pprof/) for the Antrea components (Agent & Controller) after Antrea has been deployed in a cluster. We could consider having a config parameter for this. However updating the ConfigMap would require a restart of the Agent / Controller. My favorite solution would be to be able to use antctl to enable profiling. This means that profiling could be enabled selectively for a specific Agent or the Controller (by exec'ing into the Pod and running antctl). It could look like this:

antctl profiling enable --port 8888
antctl profiling disable

Given that the antrea-agent Pod and the antrea-controller Pod both use the host network, I believe no changes to the Pod specs would be needed to enable access to the HTTP server using one of the Node's IP.

Anything else you would like to add?
Maybe there is some integration possible with antctl support-bundle, but I don't think it's necessary at first.

@antoninbas antoninbas added kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Oct 23, 2020
@antoninbas
Copy link
Contributor Author

@tnqn I think this is something that would be very helpful to have in 0.11. I can take care of it if no one else has cycles, but let me see if you see any issue with the approach first. I have tested it or written any code yet.

@weiqiangt
Copy link
Contributor

weiqiangt commented Oct 24, 2020 via email

@antoninbas
Copy link
Contributor Author

@weiqiangt it does look like profiling is enabled by default. Could you look into it if you have the chance and maybe share some steps? It would be great to document it. There is probably some authentication required?

@weiqiangt
Copy link
Contributor

Sure.
The /debug/pprof does need authentication, you can use the following command to have a try. We also have a feature that relies on this (#1110)

# antrea-controller/antrea-agent
TARGET=""
TOKEN=$(kubectl exec -it $(kubectl get pods -nkube-system -owide|grep antrea-controller|awk '{print $2}') -nkube-system -- cat /var/run/antrea/apiserver/loopback-client-token)
 # ip address of the agent/controller
IP=""      
# port number of the apiserver of the agent/controller
PORT=""
# heap/profile(for CPU sampling)/goroutines
CATEGORY="" 
OUTPUT="filename"
wget --header="Authorization: Bearer $TOKEN" https://$IP:$PORT/debug/pprof/$CATEGORY --no-check-certificate -O $OUTPUT

For documenting, where would you like to put them in?

@antoninbas
Copy link
Contributor Author

Thanks for the information @weiqiangt. This document would be the right place: https://github.com/vmware-tanzu/antrea/blob/master/docs/troubleshooting.md

However it's not as simple as I had hoped, and it seems that it would not be possible to use go tool pprof or the browser directly? I wonder if antctl could provide some sort of proxy functionality in that case so that a user could do antctl proxy --controller --port 8888 or antctl proxy --agent=<node name> --port 8888 and then access the controller / agent APIs at http://localhost:8888 without worrying about getting an authentication token. So it would be very similar in spirit to kubectl proxy, but tailored for Antrea.

@weiqiangt
Copy link
Contributor

Yes, your idea is nice.
Do you have the bandwidth to implement this? Otherwise, I think I can implement this also.

@antoninbas
Copy link
Contributor Author

I can take a look at it later this week. I think we may be able to use the kubectl proxy code for this. If other things come up that I have to take care of, I'll ask you for help.

@antoninbas antoninbas self-assigned this Oct 29, 2020
antoninbas added a commit to antoninbas/antrea that referenced this issue Oct 29, 2020
With this command antctl can operate as a reverse proxy for Antrea APIs,
similarly to "kubectl proxy" for the K8s APIs (we rely on the
k8s.io/kubectl Go module for the implementation). Thanks to this,
troubleshooting the APIs can become much easier and we hide complexity
from clients. One example is that it becomes much easier to use "go tool
pprof". The drawback here is that the HTTPS connection between the proxy
and the Antrea Agent / Controller is not secure (this can be fixed for
the Controller at least); in this it is very similar to the
supportbundle command implementation.

Fixes antrea-io#1434

TODO: documentation, tests
antoninbas added a commit to antoninbas/antrea that referenced this issue Oct 29, 2020
With this command antctl can operate as a reverse proxy for Antrea APIs,
similarly to "kubectl proxy" for the K8s APIs (we rely on the
k8s.io/kubectl Go module for the implementation). Thanks to this,
troubleshooting the APIs can become much easier and we hide complexity
from clients. One example is that it becomes much easier to use "go tool
pprof". The drawback here is that the HTTPS connection between the proxy
and the Antrea Agent / Controller is not secure (this can be fixed for
the Controller at least); in this it is very similar to the
supportbundle command implementation.

Fixes antrea-io#1434

TODO: documentation
antoninbas added a commit to antoninbas/antrea that referenced this issue Oct 29, 2020
With this command antctl can operate as a reverse proxy for Antrea APIs,
similarly to "kubectl proxy" for the K8s APIs (we rely on the
k8s.io/kubectl Go module for the implementation). Thanks to this,
troubleshooting the APIs can become much easier and we hide complexity
from clients. One example is that it becomes much easier to use "go tool
pprof". The drawback here is that the HTTPS connection between the proxy
and the Antrea Agent / Controller is not secure (this can be fixed for
the Controller at least); in this it is very similar to the
supportbundle command implementation.

Fixes antrea-io#1434
antoninbas added a commit to antoninbas/antrea that referenced this issue Oct 30, 2020
With this command antctl can operate as a reverse proxy for Antrea APIs,
similarly to "kubectl proxy" for the K8s APIs (we rely on the
k8s.io/kubectl Go module for the implementation). Thanks to this,
troubleshooting the APIs can become much easier and we hide complexity
from clients. One example is that it becomes much easier to use "go tool
pprof". The drawback here is that the HTTPS connection between the proxy
and the Antrea Agent / Controller is not secure (this can be fixed for
the Controller at least); in this it is very similar to the
supportbundle command implementation.

Fixes antrea-io#1434
antoninbas added a commit that referenced this issue Nov 2, 2020
With this command antctl can operate as a reverse proxy for Antrea APIs,
similarly to "kubectl proxy" for the K8s APIs (we rely on the
k8s.io/kubectl Go module for the implementation). Thanks to this,
troubleshooting the APIs can become much easier and we hide complexity
from clients. One example is that it becomes much easier to use "go tool
pprof". The drawback here is that the HTTPS connection between the proxy
and the Antrea Agent / Controller is not secure (this can be fixed for
the Controller at least); in this it is very similar to the
supportbundle command implementation.

Fixes #1434
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants