layout | pubDate | modDate | title | description | navOrder |
---|---|---|---|---|---|
src/layouts/Default.astro |
2024-05-08 |
2024-05-30 |
Troubleshooting |
How to troubleshoot common Kubernetes Agent issues |
40 |
This page will help you diagnose and solve issues with the Kubernetes agent.
The generated helm commands use the --atomic
flag, which automatically rollbacks the changes if it fails to execute within a specified timeout (default 5 min).
If the helm command fails, then it may print an error message containing context deadline exceeded This indicates that the timeout was exceeded and the Kubernetes resources did not correctly start.
To help diagnose these issues, the kubectl
commands describe
and logs
can be used while the helm command is executing to help debug any issues.
kubectl describe pods -l app.kubernetes.io/name=csi-driver-nfs -n kube-system
# To get pod information
kubectl describe pods -l app.kubernetes.io/name=octopus-agent -n [NAMESPACE]
# To get pod logs
kubectl logs -l app.kubernetes.io/name=octopus-agent -n [NAMESPACE]
Replace [NAMESPACE]
with the namespace in the agent installation command
If the Agent install command fails with a timeout error, it could be that:
- There is an error in the connection information provided
- The bearer token or API Key has expired or has been revoked
- The agent is unable to connect to Octopus Server due to a networking issue
- (if using the NFS storage solution) The NFS CSI driver has not been installed
- (if using a custom Storage Class) the Storage Class name doesn't match
This error indicates that the logs from the script pods are incomplete or malformed.
When scripts are executed, any outputs or logs are stored in the script pod's container logs. The Tentacle pod then reads from the container logs to feed back to Octopus Server.
There's a limit to the size of logs kept before they are rotated out. If a particular log line is rotated before Octopus Server reads it, then it means log lines are missing - hence we fail the deployment prevent unexpected changes from being hidden.
This error indicates that the script pods were deleted unexpectedly - typically being evicted/terminated by Kubernetes.
If you are using the default NFS storage however, then the script pod would be deleted if the NFS server pod is restarted. Some possible causes are:
- being evicted due to exceeding its storage quota
- being moved or restarted as part of routine cluster operation