Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"microk8s.kubectl" failed: cannot create transient scope: DBus error "System.Error.E2BIG" #2194

Closed
LemurPwned opened this issue Apr 22, 2021 · 22 comments
Labels

Comments

@LemurPwned
Copy link

LemurPwned commented Apr 22, 2021

Problem description

When trying to execute any of the microk8s commands I get:

internal error, please report: running "microk8s.kubectl" failed: cannot create transient scope: DBus error "System.Error.E2BIG": [Argument list too long]

After some time of cluster activity the microk8s fails with the above error. This error effectively prevents from doing any operations with microk8s command, including the microk8s inspect, microk8s.start/stop or microk8s.kubectl.
It seems that the only solution is to restart the system and the cluster comes back to a healthy state again. However, after couple of hours it goes back to the failed state again.

Interestingly enough, it seems that the services inside the cluster ARE working and responding -- I can query them and so on, they are working as expected, but I cannot check their state or logs.

System data

microk8s: 1.15/1.17 stable (both had the same problem occuring)
System: Ubuntu 18.04.1

Comments

  • Three of the services that run in the cluster mounts on the disk
  • Other commands on that server are working fine, it's just the microk8s that crashes with this error.
  • I tried to increase the ulimit size but it didn't help and I'm not too keen on doing that indefinitely.

Is there a way to at least recover the system permanently from that state?
I tried restarting each of the services from here: https://microk8s.io/docs/configuring-services
Only one did help: systemctl restart snap.microk8s.daemon-containerd, but the cluster falls back into the same failed state (DBus error) couple of seconds after restarting the containerd service -- so eventually it didn't help.

@balchua
Copy link
Collaborator

balchua commented Apr 22, 2021

Just curious if you run a standalone kubectl (not thru microk8s kubectl) do you see the same error?

@LemurPwned
Copy link
Author

Following your question, I have installed kubectl from snap and for every command with kubectl I get the same error:
internal error, please report: running "kubectl" failed: cannot create transient scope: DBus error "System.Error.E2BIG": [Argument list too long]

@balchua
Copy link
Collaborator

balchua commented Apr 24, 2021

Not really an expert on this, it looks like something has to be tuned on the system.
Naively googling it, most points to increasing the ulimit size. But didn't really say to how big.
You may have to try increasing it to 65535 or by trial and error.

@LemurPwned
Copy link
Author

@balchua Like I mentioned in the issue I have already tried increasing it substantially -- but it didn't help too much. Besides, like you remarked yourself, it's not clear what's the upper limit.
Even though, my gut feeling is that increasing ulimit to appease the error may lead to some nasty consequences.

@LemurPwned
Copy link
Author

Using this post, I have found that you may place an extra ulimit settings /var/snap/microk8s/current/args/containerd-env. I tried increasing both -l and -n, unfortunately to no avail.

One thing I noticed that may be helpful but I can't figure it out.
Basically, if I restart the server, the cluster comes back to a healthy state for a couple of hours. However, if I just reinstall microk8s or restart containerd service, the cluster falls back to the failed state almost immediately.
I'm wondering what the server restarting does compared to microk8s/containerd restarts that helps the cluster to recover.

@balchua
Copy link
Collaborator

balchua commented Apr 27, 2021

Maybe increasing the stack size like ulimit -s instead of -lor -n

@LemurPwned
Copy link
Author

I noticed that if I switch to root then all the microk8s commands work again, even though on non-root uses it still prints out the same error.

@balchua
Copy link
Collaborator

balchua commented Apr 28, 2021

Perhaps the ulimit settings is user session based. The ulimit you set in the containerd-env file is set using the root user. Im probably saying something wrong tho.

@LemurPwned
Copy link
Author

LemurPwned commented Apr 29, 2021

@balchua thanks. Yeah, that was my gut feeling also. But the weirdest thing is that ulimit -n, ulimit -l, ulimit -s are all the same both in the user session and the root session.

I also checked getconf ARG_MAX (to make sure it's not bash arg issue) and it's the same too for both root and user. xargs --show-limits also yields identical results for root and user (with the exception of Maximum length of command which is actually larger for the user session).
EDIT:
I'm still curious what is particular in microk8s that it hits one of the limits on the user session.

@LemurPwned
Copy link
Author

I'm thinking I could rule out the ulimit. I matched the ulimit -a for the root and the user session and still, the problem is present in the user session but not in the root session. Also, for sanity (although I don't think that'd the problem here) I checked that those limits are correctly picked up by the containers.

@balchua
Copy link
Collaborator

balchua commented May 4, 2021

Is it ok to upload the inspect tarball? Im not sure it can reveal anything tho.

@robotrapta
Copy link

I'm also seeing this on microk8s v1.20.7 2213 1.20/stable canonical✓ classic on Ubuntu 18.04.5.

Inspect doesn't work either:

$ microk8s.inspect
internal error, please report: running "microk8s.inspect" failed: cannot create transient scope: DBus error "System.Error.E2BIG": [Argument list too long]

Rebooting brings it back, at least temporarily. I read here that running kubectl as root is a workaround, which is an interesting hint.

@LemurPwned
Copy link
Author

@robotrapta yes, you linked the issue here :)

@marner2
Copy link

marner2 commented Jun 16, 2021

I'm having the same issue on my cluster. I'm also getting random errors with pods failing. I'm also getting the DBus error "System.Error.E2BIG": [Argument list too long] error when I run apt-get update on my machines now.

@flyte
Copy link

flyte commented Oct 4, 2021

I get this if I leave watch -n 1 kubectl get pods running overnight (or over a weekend 🙄) by accident and then try to run any kubectl command. I don't use microk8s, I'm using hosted clusters.

@LemurPwned
Copy link
Author

LemurPwned commented Oct 4, 2021

@flyte same here -- but what would be the root cause? I remember this wasn't the problem before, I had open watch session in tmux terminals many times.

@flyte
Copy link

flyte commented Oct 4, 2021

I think it's specifically a problem with kubectl when it's installed as a snap. Beyond that I don't know.

@bboozzoo
Copy link

I've provided some info in https://forum.snapcraft.io/t/dbus-error-system-error-e2big-error/29227/2, if possible please capture the session bus traffic, as well as the journal.

@stale
Copy link

stale bot commented Feb 16, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the inactive label Feb 16, 2023
@khumps
Copy link

khumps commented Mar 15, 2023

Just ran into this issue in 1.25
I was seeing the following messages in journalctl:

Mar 15 03:57:21 hostname systemd[1486]: snap.microk8s.microk8s.1a9164d0-355e-49e3-8b0e-1649f811dee7.scope: Couldn't move process 16149 to requested cgroup '/user.slice/user-1000.slice/user@1000>
Mar 15 03:57:21 hostname systemd[1486]: snap.microk8s.microk8s.1a9164d0-355e-49e3-8b0e-1649f811dee7.scope: Failed to add PIDs to scope's control group: Permission denied
Mar 15 03:57:21 hostname systemd[1486]: snap.microk8s.microk8s.1a9164d0-355e-49e3-8b0e-1649f811dee7.scope: Failed with result 'resources'.
Mar 15 03:57:21 hostname systemd[1486]: Failed to start snap.microk8s.microk8s.1a9164d0-355e-49e3-8b0e-1649f811dee7.scope.

Restarting user@1000.service solved this for me sudo systemctl restart user@1000.service (I assume rebooting would have had a similar result)

@stale stale bot removed the inactive label Mar 15, 2023
@robotrapta
Copy link

This started showing up for me again, and also appears to be related to journalctl. (For reasons that make sense to a colleague who understands linux better than I do, some users on a machine have this problem while others don't, even though they share the same snap installation of kubectl.) I was able to fix it (without rebooting) with:

sudo systemctl restart user-1000.slice

Copy link

stale bot commented Feb 21, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the inactive label Feb 21, 2024
@stale stale bot closed this as completed Mar 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants