-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HW Accel Support]: Random out of memory errors #7934
Comments
you have not included the errors so difficult to know what is going on. You should run |
I'm assuming you mean vaapi memory leak. And it depends on the GPU you have and the host driver version. Looks like your host driver is a bit out of date, but in any case I'd suggest watching your memory usage with something like |
Thanks Nick, I deleted that post but looks like you replied to that :D anywho the host had an older driver then the docker, I'm working on getting the driver updated on the host hopefully that fixes it. I have been trying to watch it with htop to keep an eye on it and uptime kuma emails me if it goes down. |
hopefully this fixes it frigate docker vainfo |
For one I would generally suggest setting memory limits on the container, but we'd need info on what specifically was using the memory. |
Absolutely, added a memory limit now. It is just tough to know what happened exactly within Frigate container, as the entire server was frozen and had to be cold booted to get back... |
right, with the memory limit it should be easier to catch it in the state (if it happens again) and use top or htop to see what is using memory in the container |
I have been trying to chase down a machine lock up issue for a while now myself, and my setup is somewhat similar to OPs. besides CPU gen, proxmox and docker, my versions are the same. I have 6 identical Skylake NUCs with 16GB RAM running K3s on Debian. Started with frigate v11 on deb v11 and upgraded both to v12 since and the lockups persist. Im using iHD driver with the preset-intel-qsv-h264 preset, but have also tried i915 and i965, as well as the preset-vaapi with no differences. I also started without any mem limits on frigate, but noticed some OOM Killed conditions and set it down to 4GB. I would expect the container to die and restart if it were just memory, but still the node will lock up hard. Far as I can tell, hw acceleration is working just fine, as I also have plex transcoding with the iHD driver. The problem seems to be around object detection (im using openvino). With obj detection disabled, the container will run for months without issue. When I enable obj detection, with just -person defaults, on a single camera; the node will lock up within a week requiring a hard reset. If enabled for all 3 cameras with multiple objects, it can lock up as fast as 48hrs. Doesnt matter which node its running on, that node will lock up. I have not found any discernable logs on the node beyond some GPU messages on the console after lockup (that led me nowhere), nor any frigate logs beyond the OOM Killed container status. |
the main difference is OP is using a coral, which sort of highlights the fact that these issues are usually not related and have their own unique cause. Other users have reported the opposite that you are and have said openvino works fine but hwaccel causes high memory usage. a lot of these types of issues are for proxmox users, though not all. some users have reported updating host kernel, driver, etc. fixed it for them |
thought I'd chime in after updating the drivers I've had ZERO issues. well other then the fact that intel are poopoo heads and not letting rocket lake cpu's gpu split. |
Thanks for confirming. Will close this as the original issue is not occurring. Anyone else can feel free to create their own issue. |
Describe the problem you are having
I'm getting some strange random out of memory errors, and machine hard locks. I think might be related to the vaapi drivers memory leak? But those threads I read on this were a few years old, is this still a problem?
Version
0.12.1-367D724
Frigate config file
docker-compose file or Docker CLI command
Relevant log output
FFprobe output from your camera
Operating system
Debian
Install method
Docker Compose
Network connection
Wired
Camera make and model
ankee c800
Any other information that may be helpful
proxmox 8
ubuntu server 23.04
passing raw gpu device to VM
passing raw coral m.2 bkey(pci-e adapter)
6gb memory
6 cpu (host=intel 11th gen i5 - rocket lake)
tried to use the 965 driver as stated in the help also the i915 for kicks.
Also tried to pass QSV it fails. (hwaccel_args: preset-intel-qsv-h264)
vaapi usage shows roughly 3-5% on average.
The text was updated successfully, but these errors were encountered: