-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fails to start after a while #718
Comments
@sgandon I think there is something hoarked with this cluster. At first glance the connection to te api server somehow times out. K9s default timeout is set to 5secs. Can u connect to this cluster using kubectl? If I recall correctly there is something funky in the way you connect to k3s in terms of where the kubeconfig is loaded ie the command is wrapped |
@derailed I forgot in my description that I have no issue at all using the kubectl command and I eventually did use the kubectl command for inspecting my resources. |
@sgandon Yes, you're not the only one with this problem :/ |
We are running our k8s cluster on AWS EKS and facing the same issue. |
It looks like this change introduced the issue. |
@sgandon Thanks for the details! Just wanted to make sure it was not a k3s config issue. I am baffled! I have created a docker-for cluster and left it running for ~ an hour and of course No issues! So I am thinking it's not a cluster origination issue. Same revs as what you are running. I keep going back to a kube config issue either api server url or certs are messed up but I can't seem to repro here at the ranch ;( Of the folks that up voted here would you mind sharing some repros and logs so you can help me track this down? Any details here would help... Thank you!! |
@matheussilva-hotmart This might be different issue as it would indicate RBAC restriction on a given namespace. Would you mind including your logs when the error occurs. Thank you Matheus! |
When i start using |
I have updated k9s to 0.19.6 and also tried a |
Our issues seems also to be related to a VPN/Firewall. Turning off the VPN greatly improves the connection stability. |
I have been experiencing the same issue today - when I connect to AWS EKS - which was working fine yesterday - only thing I remember doing was trying the : ctx from k9s. Kubectl get po - works fine
|
@nat2k5us Thank you for sending this extra info! Let's try a few things here and see if we can shed some light... Seems to me this is a different issue. Guessing RBAC rules changed?? Also don't think this will change anything... but try moving your $HOME/.k9s/config.yml and see if k9s will start from scratch with a new config. In the same shell and same user as you're launching k9s can you try these and report back on the results. kubectl auth can-i list pods -A --as user-tradestation.com
kubectl auth can-i list crds -A --as user-tradestation.com |
@sgandon Don't think this will change much but try moving your $HOME/.k9s/config.yml. Also is it possible your api server on the cluster take more than 5s to respond ie in the same shell and user you launch k9s in, what does |
@sgandon Also do you currently have any RBAC rules defined on this cluster? Is your admission controller active or is this just a fresh and plain ol' docker-deskop cluster?? |
fwiw, when i run |
@elordahl Tx for the info Eric !! Can you attach the logs when running locally? Hard to pin point this issue at the moment as I can't seem to repro on any of my clusters local or remote ;( |
yep -- see below: local/mac
docker
|
to be clear, i ran the following to get logs, which is a slight variation of the README command (which works): # first
docker run --entrypoint='' --rm -it -v ~/.kube/config:/root/.kube/config derailed/k9s sh
# then, from inside docker container
k9s -l debug
cat /tmp/k9s-root.log |
@derailed moving the config.yml out of .k9s works - I am able to list pods in eks cluster. |
Hello,
And yes my admission controller is running when I am experiencing this issue. |
@sgandon Boy this is a mix bag ;( Having a hard time figuring out what is what here. So on a fresh cluster without kiverno running are you still unable to connect? The initial logs you've send on this issue indicates a connection failure to the api server not an actual RBAC issue. So are the k9s logs you're seeing now the same as the original logs or are they different? ie k9s may fail to launch under different scenarios we just need to figure out what's failing here. Also did you try |
Hi @derailed |
I have tried |
@sgandon Thank you for the logs info. Not sure I've got this but please give v0.19.7 a shot and see if we're happier. If not please reopen. Thanks! |
@derailed, I have update to 0.19.7
So I suppose you could re-open this ticket, I am sorry about this. |
@sgandon I think we should rename this issue Here are my steps: brew install k3d
k3d create -n k9s --workers 2
export KUBECONFIG="$(k3d get-kubeconfig --name='k9s')"
kubectl get no // => connection refused?? What am I missing?? |
@sgandon So after a reboot ;( I was able to get a k3d cluster up and running. Left k9s running for ~1hours and no issues... Bare cluster running...
What am I missing? |
Well I knew that was going to ba a tough one :) |
@sgandon Thank you!! That would be super helpful if you can as I can't see any issue here at the ranch with either k3d or docker k8s. Again I am not saying there is not a bug under all this... but can't seem to repro ;( |
@derailed, it seems this is a timeout issue because when I increase the timeout to 10 seconds here, k9s eventually starts. |
@sgandon Thank you for sending the details! Hum.. obviously a 3sec post is highly dubious. It appears you're still running kyverno on your cluster. Seems suspicious given the nature of the fmk and the api server slow down on these SubjectAccessReview calls. My best guess here is they are getting routed thru an admission controller that puts the brakes on. Could you try removing it from your cluster and retry k9s and see if that make a difference? Also checkout the api server/kyverno deployment logs and see if these posts are somehow getting routed/decorated or trapped by their adm controller. My best guess is they are given I was able to run k9s with no issues on vanillas k3 or dockerFor clusters... |
@sgandon Also take a quick look at whether pods are restarting on your cluster. It could be some kyverno components are in trouble on that cluster?? |
@derailed, indeed when I removed kyverno the POST taking 3s went down to 4ms. So we found our guilty. |
fwiw, i am no longer having issues after upgrading to v0.20.x (tested with 0.20.0 and 0.20.5). There are still some connectivity issues, which may be unrelated, but for the most part it's functional. |
A note to others that may view this with a similar issue: the |
I'm having this issue intermittently as well. |
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── │ |
For anyone else finding this through keyword searches, my issues was similar: a misbehaving ValidatingWebhook. I was using Gatekeeper, and had installed an over-restrictive NetworkPolicy which was blocking the webhook. Check for any Validating Webhooks with |
Describe the bug
running the version
0.19.5
I am having some issues appearing first after a while and then blocking completly the start of the tool (see at the botton of the issue for logs).So from a fresh cluster (docker-for-mac or K3d) eveything is running fine until I have some error messages appearing at the bottom like
[list watch] access denied on resource "default":"v1/pods"
Then if I quit K9s and start to relaunch it it fails with the logs below.
I am doing some experiments with a webhook admission controller so I wonder if this could be related.
If I delete my cluster and start a fresh one the issue disapear and come later somehow.
To Reproduce
it is hard to describe some steps, I am playing with Kyvero ClusterPolicies but and this issu happen's after a while
Expected behavior
not to crash
Screenshots
Versions (please complete the following information):
On MacOs 10.15.3
It fails on both Docker-for-Desktop : 2.3.0.2
or on
k3d version v1.7.0
k9s version:
Version: 0.19.5
Commit: 9f1b099
Date: 2020-05-15T22:35:38Z
Additional context
start logs
The text was updated successfully, but these errors were encountered: