Skip to content

Commit

Permalink
Add troubleshooting section for the stuck PVCs
Browse files Browse the repository at this point in the history
  • Loading branch information
perk-sumo committed Apr 2, 2021
1 parent 08a8795 commit 4b31ff8
Showing 1 changed file with 16 additions and 0 deletions.
16 changes: 16 additions & 0 deletions deploy/docs/Troubleshoot_Collection.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
- [Common Issues](#common-issues)
- [Missing metrics - cannot see cluster in Explore](#missing-metrics---cannot-see-cluster-in-explore)
- [Pod stuck in `ContainerCreating` state](#pod-stuck-in-containercreating-state)
- [Fluentd Pod stuck in `Pending` state after recreation](#fluentd-pod-stuck-in-pending-state-after-recreation)
- [Missing `kubelet` metrics](#missing-kubelet-metrics)
- [1. Enable the `authenticationTokenWebhook` flag in the cluster](#1-enable-the-authenticationtokenwebhook-flag-in-the-cluster)
- [2. Disable the `kubelet.serviceMonitor.https` flag in Kube Prometheus Stack](#2-disable-the-kubeletservicemonitorhttps-flag-in-kube-prometheus-stack)
Expand Down Expand Up @@ -260,6 +261,21 @@ Warning FailedCreatePodSandBox 29s kubelet, ip-172-20-87-45.us-west-1.comput

you have an unhealthy node. Killing the node should resolve this issue.

### Fluentd Pod stuck in `Pending` state after recreation

If you are seeing a Fluentd Pod stuck in the `Pending` state, using the [file based buffering](./Best_Practices.md#fluentd-file-based-buffer)
(default since 2.0) and seeing logs like

```
Warning FailedScheduling 16s (x23 over 31m) default-scheduler 0/6 nodes are available: 2 node(s) had volume node affinity conflict, 4 node(s) were unschedulable.
```

you have a volume node affinity conflict. It can happen when Fluentd Pod was running in one AZ and has been rescheduled into
another AZ. Deleting the existing PVC and then killing the Pod should resolve this issue.

The Fluentd StatefulSet pods and their PVCs are bound by their number: `*-sumologic-fluentd-logs-1` Pod will be using
the `buffer-*-sumologic-fluentd-logs-1` PVC.

### Missing `kubelet` metrics

Navigate to the `kubelet` targets using the steps above. You may see that the targets are down with 401 errors. If so, there are two known workarounds you can try.
Expand Down

0 comments on commit 4b31ff8

Please sign in to comment.