Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Launching Steam immediately caused fossilize_replay to consume all available memory #230

Closed
rhoot opened this issue Aug 31, 2023 · 9 comments

Comments

@rhoot
Copy link

rhoot commented Aug 31, 2023

Not sure whether to treat this as a Steam or fossilize issue, but trying here first:

When I launched Steam today, my system got very laggy and unresponsive. Eventually it froze completely (video, audio, everything) for a few seconds until oom-killer kicked in. So I closed Steam and opened it again while keeping an eye on RAM.

I have 32 GiB of RAM plus 8 GiB of swap. Before opening Steam 29 GiB of RAM was available. Within 5 seconds fossilize_replay had consumed all of it, as well as the swap. oom-killer then kicked in and killed some processes, a few seconds later it was at 100% usage again, repeat for a couple of minutes.

System information
Computer Information:
Manufacturer: EVGA Corp.
Model: X570 FTW WIFI
Form Factor: Desktop
No Touch Input Detected

Processor Information:
CPU Vendor: AuthenticAMD
CPU Brand: AMD Ryzen 9 5950X 16-Core Processor
CPU Family: 0x19
CPU Model: 0x21
CPU Stepping: 0x0
CPU Type: 0x0
Speed: 5083 MHz
32 logical processors
16 physical processors
Hyper-threading: Supported
FCMOV: Supported
SSE2: Supported
SSE3: Supported
SSSE3: Supported
SSE4a: Supported
SSE41: Supported
SSE42: Supported
AES: Supported
AVX: Supported
AVX2: Supported
AVX512F: Unsupported
AVX512PF: Unsupported
AVX512ER: Unsupported
AVX512CD: Unsupported
AVX512VNNI: Unsupported
SHA: Supported
CMPXCHG16B: Supported
LAHF/SAHF: Supported
PrefetchW: Unsupported

Operating System Version:
"EndeavourOS Linux" (64 bit)
Kernel Name: Linux
Kernel Version: 6.4.12-zen1-1-zen
X Server Vendor: The X.Org Foundation
X Server Release: 12302000
X Window Manager: KWin
Steam Runtime Version: steam-runtime_0.20230606.51628

Video Card:
Driver: AMD AMD Radeon RX 6900 XT (navi21, LLVM 15.0.7, DRM 3.52, 6.4.12-zen1-1-zen)
Driver Version: 4.6 (Compatibility Profile) Mesa 23.1.6
OpenGL Version: 4.6
Desktop Color Depth: 24 bits per pixel
Monitor Refresh Rate: 174 Hz
VendorID: 0x10de
DeviceID: 0x1e07
Revision Not Detected
Number of Monitors: 2
Number of Logical Video Cards: 2
Primary Display Resolution: 3440 x 1440
Desktop Resolution: 3440 x 2520
Primary Display Size: 31.89" x 13.78" (34.72" diag), 81.0cm x 35.0cm (88.2cm diag)
Primary VRAM: 16384 MB

Sound card:
Audio device: USB Mixer

Memory:
RAM: 32018 Mb

VR Hardware:
VR Headset: None detected

Miscellaneous:
UI Language: English
LANG: en_US.UTF-8
Total Hard Disk Space Available: 255884 MB
Largest Free Hard Disk Block: 82750 MB

Storage:
Number of SSDs: 2
SSD sizes: 2000G,1000G
Number of HDDs: 0
Number of removable drives: 0
@kisak-valve
Copy link
Member

Hello @rhoot, can you check if rebuilding mesa with https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24949 helps (or test mesa 23.1.5)?

@rhoot
Copy link
Author

rhoot commented Aug 31, 2023

It's hard to say whether building with that patch applied helped or not. The issue worked itself out after a couple of minutes yesterday, so presumably the only things to compile today were the files for whatever games got updated.

So immediately after launching Steam today I didn't have this issue. But then some game updates got installed and memory use started to rise again. For the most part fossilize seems to sit around a much more reasonable 1 GiB of use, but there were a few spikes that brought it up right to the limit and then drop again almost immediately. I think the highest I saw was around 30 GiB used out of 31.3 (on my system, not just fossilize). Similar situation to yesterday: Only about 3 GiB was used before launching Steam.

Next time it happens I'll try to see if i can figure out which game may have triggered those spikes from the fossilize command line. I didn't realize it contained the appid in the path of an argument passed into it until after it had stopped spiking.

Edit: Actually I'll just try wiping some shader caches later.

@rhoot
Copy link
Author

rhoot commented Sep 2, 2023

Okay, so I disabled the shader cache and re-enabled it again (in Steam settings). That caused it to start compiling some shaders. It did rise back up to using basically all my memory (on mesa 23.1.6, with the patch from that MR applied). The game it was compiling shaders for was Deep Rock Galactic (appid 548430).

Eventually it dropped back down a bit, but then just... kind of stalled out:

image

You can see the remnants of the memory usage in that screenshot too. How the swap is completely full, and 100% of the free memory is now used for cache. As best I can tell, Steam was using the CPU to download cached shaders, so that CPU usage is likely expected.

But the fossilize processes just stayed sleeping like that for several minutes, until I eventually closed Steam. Once I loaded it up again I managed to snap this before my system completely froze for a few seconds again until oom-killer kicked in and killed some processes:

image

Edit: For reference/comparison, this is after closing Steam:

image

@rhoot
Copy link
Author

rhoot commented Sep 2, 2023

I just tried building/installing mesa 23.1.5. I have had shaders compiling for over an hour now, and no fossilize_replay process has ever gone above 544M in the resident set. So it definitely seems like a regression in 23.1.6.

@kakra
Copy link
Contributor

kakra commented Sep 2, 2023

You can see the remnants of the memory usage in that screenshot too. How the swap is completely full, and 100% of the free memory is now used for cache. As best I can tell, Steam was using the CPU to download cached shaders, so that CPU usage is likely expected.

This looks a lot like the memory behavior I'm seeing from 6.x kernels not only with fossilize but all sorts of processes. Especially here, it looks like fossilize does not use shared memory at all, and memory usage is too high by a factor of 10.

See if your kernel has /sys/kernel/mm/transparent_hugepage and if yes, run as root after a fresh reboot and before starting Steam:

echo within_size >/sys/kernel/mm/transparent_hugepage/shmem_enabled
echo madvise >/sys/kernel/mm/transparent_hugepage/enabled
echo defer+madvise >/sys/kernel/mm/transparent_hugepage/defrag
echo 64 >/sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_none
echo 8 >/sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_swap
echo 32 >/sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_shared

This should reduce memory pressure but if you're still seeing high swap pressure, try booting with kernel cmdline cgroup_disable=memory and then try this test again. How to adjust the kernel cmdline or set the above settings permanently, is specific to your distribution.

So it definitely seems like a regression in 23.1.6.

This is very possible, too.

@rhoot
Copy link
Author

rhoot commented Sep 2, 2023

This should reduce memory pressure but if you're still seeing high swap pressure, [...]

To be clear, swap didn't start filling up until my physical RAM had been fully consumed.

Downgrading to mesa 23.1.5 without changing anything else about the system also caused shared memory usage to go up. This is what it looks like after the downgrade:

image

@NextGenRyo
Copy link

I am suffering from this exact same problem after a system update on manjaro. In that update was mesa 23.1.6. Any updates on this?

@kisak-valve
Copy link
Member

This is a mesa/RADV regression limited to mesa 23.1.6.

Caused by https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24579, and should be fixed by https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24949 and https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24896.

The practical fix is to update to mesa 23.1.7 or newer, which includes these.

There's nothing more to be done on Fossilize's side.

@rhoot
Copy link
Author

rhoot commented Sep 10, 2023

The practical fix is to update to mesa 23.1.7 or newer, which includes these.

There's nothing more to be done on Fossilize's side.

Yep, 23.1.7 seems to fix it. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants