New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Steamwebhelper is not responding" crash menu with home folder on NFS #10431
Comments
I agree that this is not the same problem as #10412, despite the superficially similar symptoms. In #10412, we don't get as far as the container runtime starting. In this issue, the container starts up fine and hands over control to This in
8018 seems to be
I should mention here that one possible route towards solving #10412 is to use the I would recommend putting the Steam installation (usually |
Steam Beta is borked, I can reproduce this crash on Arch Linux with clean Steam (removed
Current workaround is to switch back to non-Beta:
|
@smcv I'm not sure what to look for with regard to the certificates. I don't believe anything is non-standard there. Pointers as to what to check would be appreciated. The machine is my daily-driver and I've not seen anything else fail relating to certs/TLS/etc. As to the flock issue: The core problem is NFS doesn't support flock(), that only works for local filesystems. The previously mentioned workaround was a tiny C program someone contributed to #5788 called "fakeflock.c". I'd rather not move everything to a local filesystem as I have the NFS server for convenient backups and rolling snapshots (every 15 minutes) for the entire family (thankfully, they're not on the beta). Snapshot rollbacks have saved each of us from various catastrophes numerous times. I've been using it this way for well over a decade and it's worked very well for us so far. If NFS is the issue, we're likely not the only ones that will be impacted by this due to the popularity of various DIY/home NAS solutions out there. In an attempt to test it out, I unmounted, unshared, and relocated my steam filesystem on the server to prevent automatic mounting, then created a Steam subdirectory on a local filesystem with enough space, linked ~/.local/share/Steam back to this location, removed ~/.steam*, the relaunched Steam to force a full re-install. I then logged into Steam, switched over to the beta, at which point it segfaulted (I grabbed the logs, below). I then restarted Steam again and the beta came right up. Crash text:
|
@davispuh Is your home directory (or Steam installation path) mounted via NFS? |
Well it's bit complicated 😂 My home directory is not using NFS but local Here's summary of mounts:
That last
And
PS. These are just relevant excerpts |
@davispuh Unfortunately, I know very little about brtfs, so I don't know where the overlap between its capabilities and NFSv4 would be. @smcv I forgot to mention that I'm running NFSv4, which has different locking mechanisms than NFSv3 (neither of which natively support flock). |
I did some testing due to getting pinged. After switching to the beta, at first the Steam UI didn't show up at all and I also didn't get this "not responding" dialog. This reproduced a couple of times. I then tried running Steam on a local drive (ext4 filesystem), which worked. After that running on the NFS mount worked too. To further confirm the functionality I tried it on another computer, with the same NFS mount. There I got the "not responding" dialog, but the Steam UI showed up and worked as well. After choosing to restart either steam or just steamwebhelper the dialog did not reappear. I have to get on with other things, but maybe I'll try a clean install on NFS later. Both computers are running Debian unstable and have Nvidia GPUs with driver version 525.147.05. Edit: I ran the local Steam installation by changing HOME to point at a local mount. It's not impossible that it could have affected the installation on NFS, though it definitely did run from the local drive. |
Following DataBeaver's lead, I redid my fresh local install per my above description (using a link). However, one thing I noticed was a large number of errors when running a diff between the local and NFS copies. I believe all of these were dead links.
The confusing part is, this is true of both installations, so I'm not even sure how sniper can run in the local installation if this is causal in the NFS installation (so this very well may be a red-herring). Digging into them in more detail, most appear to be dead links to the /run/ hierarchy, so can be ignored.
Filtering down to absolute path links, I see the certificate issue you mentioned. It appears to be looking for different filenames than I have on my system, as I see a mixture of near-misses (eg: different numbers) and completely missing ones (eg: Staat*). Next I see that it's linking to font configs that don't exist in /usr/share/fontconfig/conf.avail/ (I have 29 total in that directory, perhaps I'm missing a package?). Digging into a few of the oddballs at the end of the list, it's become apparent to me that this runs in some sort of chrooted environment with multiple filesystem overlays or something along those lines (sorry, I'm unfamiliar with exactly what the steam-runtime-* installations do), as I'm finding the absolute paths stuffed under various different hierarchies in the Steam installation. Extrapolating this back to the certs, and it looks like they are actually there too, but in yet another subdirectory. For example, Backing up and filtering for relative paths, I find a few additional causes. Here's the list of dead links with the /run/ links stripped out: |
I can't tell at this stage whether the problem that @glabifrons is having is to do with NFS or not, so this might all be a red herring. But, we are going to need this sooner or later, so...
Does it support POSIX process-associated record locks ( We are going to need to put some sort of locking into place, otherwise we get bizarre failure modes like one process deleting a temporary runtime that another process is still using. Sorry, but avoiding that is more important than supporting NFS. If At the moment, the container runtime tries to use the Linux-specific
(If necessary, copy the whole The The last command (with
Disabling locking like this is not a solution. This will lead to concurrent processes all believing that they have the lock at the same time, and overwriting or deleting files that the other concurrent processes were using. |
We do not have enough information on this issue to be able to guess whether the failure mode you are seeing with the beta is the same as @glabifrons is seeing, or the same as #10412, or some different thing. Please look at the logs in If we try to handle multiple different problems on the same issue number, it quickly becomes really confusing, which makes it take longer to solve any of the problems that were reported; so we should reserve this issue number for the specific problem that @glabifrons is experiencing (which unfortunately we have not yet been able to identify). If we can identify that something different is going wrong for you, please open a separate issue for that, with a title that is as specific as possible.
I don't know whether any of these will interfere with the container runtime. My first guess would be that RAID shouldn't matter, because that's at a lower level than anything we're doing, but the others might. If you can try launching Steam on the same system but from a home directory that is as "ordinary and boring" as possible (perhaps by creating a temporary user whose home directory is on local disk and is not NFS-exported, and logging in as that user) then that will help to narrow down whether any of these less-usual configurations are involved. |
Unfortunately, I think this is normal if it takes an unusually long time for the If your NFS mount has enough latency to make small metadata operations like |
Back to @glabifrons:
Yes, it's the Steam container runtime, which has quite a lot of code in common with Flatpak. It's normal that some of the files below Looking at your list of dangling symlinks, the majority of them are very likely to work as intended inside the container. I do notice one bug, but it's a bug that will only affect developers who are running this stuff in a non-default configuration that isn't relevant to end-user systems. If you are copying Steam installations between filesystems, you can delete all of You can verify that
This checks both metadata and content of all of the files in there, so expect it to take up to 30 seconds on HDD, and perhaps longer on NFS. You can also get an interactive shell inside the container by running:
Inside that It would be useful for me to see a detailed log from the container runtime framework, which you can get by running:
(You can just exit from the The log file will appear in |
I'm sure that's desirable, but remote filesystems have functionality and performance characteristics that are very much unlike local filesystems, and we can't support every possible scenario. As currently implemented, the whole At the moment the way it's implemented doesn't allow for it to be a symlink or a mount point, but I'll see whether that can become possible in future. |
A quick look at the relevant manpages tells me that while NFS doesn't support
This works for me on my NFS home directory. As does Steam itself. So I think NFS is at most a contributing factor, not the root cause. It will be interesting to see glabifrons's results for the lock test. Could be that we have some configuration differences.
Understandable. It's a relatively minor annoyance, but if you want to do something about it, maybe add an option to wait a bit longer without restarting anything? Or even keep checking for responsiveness while the dialog is up, and hide it if steamwebhelper starts responding after all. |
Hmm I thought this is only issue for My crash is not #10412 because my
but that didn't change anything and
works fine without issues. In logs nothing in particular stands out
And here are backtraces but I don't know how to get symbols for it?
|
The steamwebhelper has had several major changes in the new beta. Some issues caused by those changes (like #10412) are to do with the fact that it is now running inside a container runtime, like Counterstrike 2 and Dota 2 do. Others could be to do with changes inside the steamwebhelper itself. At the moment, unfortunately I don't see enough information here to be able to say whether you are experiencing the same crash as the original reporter of this particular issue or not.
In the beta that was active over the weekend, the log was truncated every time the steamwebhelper restarted, which was unhelpful because it meant that previous error messages could be lost. Please try updating Steam to the current beta 1706390103, which has stopped truncating the log every time, so should get you better logs. You might need to do this by swapping to the stable branch (completely exit from Steam,
The general public cannot get debug symbols for the proprietary parts of Steam (and neither can I), but Valve can. In your original report, you quoted log output that said Steam has uploaded a crash dump, |
All information in my comment is with latest Beta version (today installed), there you can see Looks like it crashes very early in startup as there isn't any other log entries after In non-Beta version I see
But this is not present in Beta we never reach it. Crash seems to be inside So they need to look into this. My latest |
@smcv I tried the adverb commands both on the local filesystem and on the NFS installation, and surprisingly it worked for both. So it looks like it's not a locking issue. Just in case the information is useful though, this post has the best description of the limitations of the NFSv4 calls (IIRC, there's one for read and one for write, but none for both): Another thing you mentioned is compatibility for old kernels. I may actually have the opposite problem, as I'm running the HWE kernel: 6.5.0-15-generic. I wonder if there might be negative interaction with the newer kernels. @davispuh what kernel are you using? I ran pv-verify in both installations, and on my NVME drive it took 2.8s, and on NFS it took 5.5s. I tried your other command to launch an xterm within sniper and verified all symlinks look good. I tried to generate a log for you as requested, but no logs ever appear in either the var under steam on NFS or the local installation. Only the .ref file and a temp subdirectory (tmp-$randomchars) in my local installation and several of those subdirs in my NFS one. I like your thoughts on relocation. I do have one thought that I hope is a stupid question: Steam doesn't attempt to launch anything as root, does it? One other thing... you mentioned you don't have access to debug symbols, but Valve can... with as much effort as you're putting into this and as knowledgeable as you are on how the inner workings and even the development direction, I thought you were an employee of Valve! |
I was thinking about NFS and root-squash and the overlays and I think I figured out at least part of what's going on. I don't pretend to understand what type of container sniper is using or how it's overlaying filesystems, but I find this really strange as up until the 20th, not only did it work, but many of the games I play use Proton (I recall you saying sniper is related in the other issue thread) and I even use Proton Experimental for some of them, like Space Engineers). |
It's still difficult to tell from the information available, but the best diagnosis I can make so far is that @glabifrons might be seeing a @glabifrons, if you can find a I still think that @davispuh, am I correct to think that you don't see
Sorry, I was forgetting which layer is responsible for implementing
This should record a log like I said. Another way to get logs would be to exit from Steam completely, and then run it as:
which should record one log in
This is interesting. Normally (if you don't use It sounds as though your installation on a local disk is working correctly, but the old subdirectories are not being garbage-collected on your NFS installation. I'd be interested to see why not. If you can get a log file, it should tell us why.
Normally you would use As @davispuh said, one good indication is that each time Steam runs the
OK, good. It sounds as though we cannot rely on
This may seem weird, but is normal. When an unprivileged user creates a new user namespace, as we do in the Steam container runtime, the kernel will only allow us to create one user ID mapping (our own user ID) and one group ID mapping (our own primary group ID). Everything else is mapped to the "overflow uid" and "overflow gid" (normally nobody:nogroup), very similar to NFS root-squashing. Files owned by root and files owned by any other user will show up inside the container as though they were owned by the overflow uid, which you should interpret as meaning "owned by someone who is not me". Flatpak apps have the same behaviour, for the same reason.
Not usually, and not on the critical path for basic UI functionality. In some situations (mainly either related to VR, or on the Steam Deck) it will try to run commands via
My main test environments for new container runtime releases are Ubuntu 22.04 (with the same HWE kernel you're using) and Arch (with a very new kernel, currently 6.7), so it's heavily tested on modern kernels.
I'm a consultant helping them with the Steam Runtime and related topics. If your particular issue is a problem with the container runtime, my team might be able to fix it; if it's a problem with |
I just launched the beta in the foreground again and got two crash-IDs.
I can provide more output for context if needed. I was able to get a log using the _v2-entry-point, so thank you for the correction. Thank you very much for the overflow uid explanation. That's a huge relief that it's not what I was afraid of, as that would have meant that solution wasn't NFS compatible. I'm glad you mentioned Steam's VR... I guess I'll be putting off playing with that for a while (was looking at it recently due to a deal on woot that I almost bought). Hopefully the above log is helpful, but if not, we now have crashdumps as well. |
Sifting through the log, I find the error about not finding libvdpau.so.1 to be interesting, as it placed a copy into Steam/ubuntu12_64/steam-runtime-sniper/var/tmp-92DII2/usr/lib/pressure-vessel/overrides/lib/x86_64-linux-gnu/. I saw it clean out the tmp dirs then give a errors that they're not empty. Most were empty by the time I looked. I did an rmdir * in the var under sniper and it removed most of them (7 remain of 19 that were there before). This could be purely a timing/sync issue with a file being removed server-side. A 1 second sleep should be more than enough to solve that. As issues go, that's incredibly minor. Other than that, I don't see anything jumping out at me in that log. You'll know better than I what to look for though. |
Probably it found that your host system had a 64-bit
No, anything that is created in
Sorry, I am not going to slow down each container startup for every Steam-on-Linux user just to benefit NFS users. If it's leaving behind nearly-empty directories, then the disk space cost is trivially small. I suspect that what might be happening here is that we're deleting the directory |
Please could a Valve developer look up the backtraces for the two crash IDs referenced by @glabifrons in #10431 (comment)
and check whether they are the same thing that @davispuh is experiencing, which is this?
|
From the log in #10431 (comment), we are likely to be using the container's root CA certificates (derived from Debian 11's @glabifrons is using Ubuntu, which is Debian-derived, so this is not a mismatch between Debian and e.g. Fedora search paths for root CA certificates, or anything like that. A potentially interesting factor is that we have pulled in
so one possible factor for Valve developers to investigate would be whether these can somehow collide? [Edited to add: the fact that #10431 (comment) didn't solve this, according to #10431 (comment), suggests that this probably wasn't the problem.] |
@glabifrons, if you are comfortable with using unreleased software, one thing you could try is:
If that makes it work, then my theory about |
@smcv Thank you very much for coming up with more ways to narrow this down. |
i have temporarily created a local home directory instead of using the nfs mounted location. steam works now, so appears to be something related to having steam installed on an nfs mount. is it possible to select where to install the steam client? |
Try symlinking Note you might need to delete |
Actually it might be that issue is |
It's not the .config directory. My ~/.config/ is on NFS still with my workaround described (above and) below, and it works.
I created the directory /var/cache/fscache/Steam/ and changed the owner and group to my user (fscache is the mount point for my NFS cache, but this does not interfere with it).
So Steam seems perfectly fine with the steamapps subdirectory being replaced with a link as a workaround (if, like me, you want to keep your games NFS mounted). |
thanks @glabifrons. something in your description is missing for me (maybe?). i get that you created a new directory (/var/cache/...) and changed the permissions. how did you install the steam client to that directory? my client machines have <nfs_server>:/home mounted. so i can access my home directory from any client machine and the server via the same path. did you link ~/.local/share/Steam to /var/cache/fscache/Steam and then install? |
i did the above and it's working for me now. linked ~/.local/share/Steam to a directory on a local drive and then launched the client. it did the install and launches ok now. thanks for the help @glabifrons |
If the ~/.steam/ directory has those links pointing to that new directory, launching Steam will trigger it to install to that new directory.
No, I left ~/.local/share/Steam/ as-is, no changes are needed there. It's the links in ~/.steam/ that tell Steam where it's installed. |
That'll work too, but it ends up moving all of your games to the local drive too. The way I did it kept the games on NFS and only moved the Steam installation. |
This comment was marked as off-topic.
This comment was marked as off-topic.
in my case, i have storage settings in steam that put games on a local disk any, in a specific location. i want steam client on the nfs mount so i can access my steam account from any computer. can't work that way like this though. hope valve fixes this properly. |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
I'm experiencing the very same issue on a system where users' home directories are stored on NFS. However, this mount is on a pretty fast network, so I doubt it's because of excessive latency. @smcv Is there any workaround to this particular issue at the moment? |
This might be a long-shot, but it's worth asking (note: I admit I know next to nothing about .Net). |
You say "pretty fast", but latency and throughput aren't the same thing. Some of the setup that the container runtime framework needs to do involves a lot of metadata operations ( If possible, I would recommend putting the actual Steam installation on a local Unix filesystem like ext4, xfs or btrfs (not a remote filesystem like NFS, and not a non-Unix filesystem like NTFS either). You can still install games on a remote filesystem, if you want, by adding another Steam library (I use It is possible to relocate the Steam installation to wherever you want, within reason, as long as you adjust both of the symbolic links For ordinary OS-level packages (.deb or similar) the default location for the actual Steam installation is For Flatpak users it's usually For Snap users it's usually
I can't tell what specific issue you are having (the "Steamwebhelper is not responding" dialog is a symptom, not a root cause), and you haven't attached any logs that I've seen. If whatever issue you are having is triggered by having Steam installed on NFS, the workaround would be to try having Steam not be installed on NFS, as above. |
Not as far as I know. To the best of my knowledge, Quite a lot of games use .Net (notably, the popular Unity engine is .Net-based), so if you want to play Steam games, any incompatibility between NFS and .Net is likely to be something you will need to solve or work around sooner or later; but that shouldn't matter when you are still at the stage of trying to run Steam itself. |
I'm a fedora user. Try to execute steam from a terminal. |
If there is a root cause for steamwebhelper crashes that involves NFS, then I would be surprised by this workaround having any effect. I suspect you might be getting the same symptom, but for a different reason - something that involves the terminal. Please report a separate issue, with logs from the failing situation ( |
I think the script calling a steamwebhelper, point to a location in a variable null or an invalid folder and get a error with absolute path. My log running over a terminal:
|
This is my log executing steam by shortcut:
I just posted before looking to help fellows in the same situation. |
I'm trying to get Before the January update it used to work fine, even on the NFS mount; has the root cause of the issue with $HOME being on NFS been identified at all? I attach below the |
I'm using flatpak Steam in fedora 39. I got the same issue 2 month ago. I still can't find a fix after 2 updates. Anyone have any ideas for this error? |
Use vmtouch, I run a few diskless clients, the race condition can't seem to be bypassed by any tmpfs or even using rdma or any other fancy things I've tried. For anyone on gentoo just emerge dev-util/vmtouch I signed up to github just to post this, Stay strong fellow Diskless Gamers! |
Your system information
Please describe your issue in as much detail as possible:
Expected: Steam launches as normal.
Result: "Steamwebhelper is not responding" crash dialog appears.
Steps for reproducing this issue:
Details:
This started several days back. By my logs, it looks like the last time I successfully launched Steam's beta was on 2024-01-19.
I went through issue #10412 which had the same initial dialog, but ruled out the same root cause.
While writing this up, I noticed another issue (#10417) that indicated some people were having better luck upgrading to NVidia driver 545 from 535 (which I was using).
I upgraded to 545 using Ubuntu's packages and tried switching back to Steam's beta after the upgrade (and reboot) with the same results reported above.
To be absolutely sure I followed each tip in #10412, I even removed steam-runtime-sniper before switching from release to beta on the last attempt. No change in symptoms.
Observation:
On a couple attempts, I noticed that Steam was going through the various Proton installations (one by one) and running .local/share/Steam/ubuntu12_32/../bin/d3ddriverquery64.exe even after I selected the exit option from the dialog.
I left this running to completion hoping that would solve the issue (figuring that maybe it's an incomplete driver installation within Proton or something similar), but this appeared to make no difference.
Other:
I doubt this matters, but it was related to two Steam bugs in the past so I will note it here: My home directory is mounted via NFS with the Solaris server's backing filesystem being ZFS. Several years back I had to create a 2TB quota on my steam installation share to work around #4982. The other issue (with using flock on NFS) has since been resolved (I no longer use the workaround).
These are the only things I would consider odd or unusual about my installation.
The text was updated successfully, but these errors were encountered: