kdebug is a command line utility that helps troubleshoot a running Kubernetes cluster and apps in it.
It focuses on DevOps scenarios and covers these areas:
- OS diagnostics
- Kubernetes components diagnostics
- Lightweight application diagnostics
kdebug runs in check mode by default. By running a set of predefined checks, it gives diagnostics information and guides you to next steps.
Currently kdebug supports following checks:
- Disk usage: Check disk usage and identity top large files.
- Disk read-only: Check if the user home directory is read-only.
- DNS: Check cluster DNS.
- HTTP: Check HTTP connectivity to well known endpoints.
- Kube object size: Check configmap/secret object size.
- Kube pod: Check pod restart reasons.
- Kube pod scheduling: Check pods scheduling issues.
- OOM: Analysis out-of-memory events.
- System load: Check the CPU and Memory of VM and some primary processes (etcd, kubelet...)
- TCP: Check if the TCP connection could be established to exposed services (external load balancer, internal cluster service)
- Ping: Check if the icmp ping/pong could work towards public IP (8.8.8.8) and cluster IP (node, pod)
- KMS cache size: Check if API server KMS cache size is sufficient to hold all secrets.
Run all checks:
kdebug
Run a specific check:
kdebug -c dns
List available checks:
kdebug --list
See full supported arguments and help:
kdebug -h
Kubernetes related checks require a working kubeconfig. You can either put it at the default location $HOME/.kube/config
, or you can specify via --kube-config-path
:
kdebug -c kubepod \
--kube-config-path /path/to/kubeconfig
kdebug supports running on a batch of remote machines simultaneously via SSH.
Explictly specify a list of machine names:
kdebug -c dns \
--batch.machines=machine-1 \
--batch.machines=machine-2 \
--batch.concurrency=2 \
--batch.ssh-user=azureuser
Read machine names list from a file or stdin:
# From file
kdebug -c dns \
--batch.machines-file=/path/to/machine/names/file
# From stdin
kubectl get nodes | grep NotReady | awk '{print $1}' | kdebug -c dns --batch.machines-file=-
Auto discover machines list via Kubernetes API server.
kdebug -c dns --batch.kube-machines
In addition, you can specify a label selector:
kdebug -c dns \
--batch.kube-machines \
--batch.kube-machines-label=kubernetes.io/role=agent
Or filter out unready nodes only:
kdebug -c dns \
--batch.kube-machines-unready
In addition to the default check mode, kdebug also supports a tool mode. Tool mode wraps useful commands and makes them easier to used in typical scenarios.
Currently kdebug provides following tools:
- Tcpdump: Wrap tcpdump command and provides a simpler interface for container scenarios.
- Reboot reason: Inspect last reboot reason.
- AAD SSH: SSH via AAD. This is a handy replacement for the original Azure CLI based implementation.
- NetExec: Execute the command with the same network namespace with a specific process or pod.
You can see a full list with:
kdebug --list
Use following command to start a tool:
kdebug -t <tool>
Show tool specific options:
kdebug -t <tool> -h
Attach to network namespace of a process with pid=100 and capture all traffic:
kdebug -t tcpdump --pid=100
With source and destination specified, and TCP only:
kdebug -t tcpdump \
--pid=100 \
--source=10.0.0.1:1000 \
--destination=10.0.0.2:2000 \
--tcponly
--host
matches either source or destination:
kdebug -t tcpdump --host=10.0.0.1:1000
Check VM last reboot reason within last 1 day:
kdebug -t vmrebootdetector
Check VM last reboot reason within last 100 days:
kdebug -t vmrebootdetector \
--checkdays=100
Check upgraded packages within last 14 days:
kdebug --tool upgradeinspector --checkdays 14
Check upgraded package within last 7 days, limit 10 records:
kdebug --tool upgradeinspector --recordlimit 10
SSH via AAD. See Azure Linux VMs and Azure AD.
This is a handy replacement for the original Azure CLI based implementation.
Login via interactive flow:
kdebug -t aadssh <user>@<tenant>@<hostname-or-ip>
A browser will pop up for credentials.
Login via Azure CLI credentials:
az login
kdebug -t aadssh --use-azure-cli <user>@<tenant>@<hostname-or-ip>
Execute the command with the same network namespace with a process, you need to on the VM the process locate in.
kdebug -t netexec --pid=<process-pid>
Execute the command with the same network namespace with a pod, you need to have the kubeconfig.
kdebug -t netexec --pod=<pod-name> --namespace=<pod-namespace>
And specify the command with --command=
. The default command is sh
Prerequisite:
Build:
make build
Test:
make test
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.