New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug/Degradation]: High CPU Usage With Motion Detection #6853
Comments
7c1568f is known to cause issues with motion detection being overly sensitive leading to more false positive. The motion PR before that made motion less sensitive in certain situations. Obviously it's going to be worked on but for the time being simply adjusting motion settings in the config will result in fixing the issue. |
for what it’s worth, here is my motion config, adjusted after reading a recent PR: motion:
threshold: 30
contour_area: 50
frame_height: 100 |
Can you give an example of configuration? Honestly, the code of this functionality scares me :) |
basically:
It really depends on the camera as for some of mine the default settings are perfect and for others it's not. One thing to try is to disable improve_contrast and then make the threshold much more sensitive. |
I just tried with the latest commit. It still seems too sensitive with default settings (I've never had to tweak motion settings before): https://capture.dropbox.com/JsrDL15BdHDBenxu There's only a tiny bit of wind right now, so motion on the edges of leaves would be fine, but there's no visible motion in the grass. It is interesting that all the suspected motion is at the edges of the shadow of the roof - very high contrast areas at the moment. There also wasn't any clouds around where it suddenly got bright, mistaking that as motion. |
@ccutrer what is the detect resolution on those camera(s)? |
1280x720 |
I've turned off improve_contrast, and I'm not seeing nearly as many false positives. But my CPU usage is much higher than yesterday (with improve_contrast on by default, and before #6870 was merged). I'm now at 0% idle CPU, with the majority taken up by the frigate.process: process for cameras that have any motion (12-13% each). I do have latest snapshots for everything now, though. And on the system page, cameras without any motion have a capture FPS that matches their ffmpeg FPS (meaning it's keeping up, and not skipping any frames). Overall detector fps is now closer 30-40 (was 60-90 yesterday). Cameras with consistent motion have a capture FPS of 0.4-2. |
Interesting. We are aware of a problem where the fast resize method is causing lots of noise on higher resolutions like 1080p, and setting a higher frame height would fix that. All my cameras are 720p in detect and the default settings in the latest have been working quite well for me. It's supposed to rain today so will be curious how it works with the rain, but also don't have any bright sun right now so can't test to see if I am seeing the same there. I think in general, the same set of motion settings won't work for every camera and some adjustments will need to be made to tune for certain cameras, and I don't really see a problem with that. |
Sun peeked out for a couple minutes and I was seeing something similar on one of the cameras so I adjusted the threshold to 50 and things are working well now |
:nod:. I definitely need to find some time to become familiar with the motion settings, and dial them in better. Are there in narrative-style docs for how best to tweak motion settings? Something like https://docs.frigate.video/guides/false_positives, https://docs.frigate.video/guides/stationary_objects, and https://docs.frigate.video/configuration/stationary_objects, which guide which settings to tweak first, and why. So far all I've found is the config file reference which tells what the tweakable knobs are, but no recommendations are where to start in which situations. I looked a bit more at which cameras were using high CPU, even if they don't necessarily have high motion. I noticed my theater room cameras are both in that bucket. Dark room, cameras in night vision mode, with a lot of noise, but zero actual motion. Definitely seem like candidates for tweaking motion setting. Other cameras I really have no idea why they would be in high CPU if it's not a systemic problem across all cameras. Two other indoor, but bright, currently unoccupied rooms, for example. |
I think a
That might be a better question for Blake, I don't know what exactly would lead to higher CPU usage, mine are mostly pretty low but without much activity right now. |
@ccutrer let's try something here: For the cameras with high CPU usage but no motion
|
@ccutrer what is the detect resolution on that camera? Would you be able to install and run py-spy in the container to dig into where the CPU usage is actually coming from? |
1280x720
I'm not sure what format you would like, so I just did an SVG and let it run for about a minute. Let me know if you want me to run it some other way. |
I'm most interested in which function is taking the most time in |
i am experiencing the same increased CPU usage even on the dev-9e531b0 container with revised motion settings below: motion:
threshold: 30
contour_area: 50
frame_height: 100 this is on 2 cameras with a detect resolution of 1080p. if needed I can post my full config. whats odd is that at first i noticed a lot of motion boxes being created from a close by tree fluttering in the wind, but even after masking that region the CPU usage hasnt tapered down. this evening i can also run py-spy as mentioned and follow up. |
So... one thing I found is that the default frame_height changed from 50 to 100 from 0.12.1 to current dev, which implies why CPU usage would be up on the image resize for motion detection. But changing it back to 50 doesn't seem to have had any effect on lowering my CPU usage, or explain why some cameras on 0.12.1 seem to need almost no CPU for motion detection. |
Very interesting, I'll look at this and I'm sure Blake will know more than me. One thing I wanted to point out though is that we don't really know that this increase in time has to do with the transferring of a frame id through the queue. The |
No, that's why I used time.process_time, not time.perf_counter. It counts actual CPU time, not wall-clock time. So idle time spent waiting for a frame doesn't count, only actual processing time. I used perf_counter at first, and the wall-clock time was similar between 0.12 and 0.13 - about 0.2s. My frame rate is 6fps, which is slight more than 1s/0.2s. It's almost like for some reason 0.13 the |
I used that code and I did track it down to d81dd60 that increases my get frame time from 0.001 to 0.005 |
:( but WHY??. That commit doesn't touch anything with the frame queue. It's almost like we're up against some sort boundary condition, and where slightly slowing down or speeding up the processing time changes how often it checks the queue, and that's make Queue.get decide to spin instead of enter an idle block. |
Thankfully the original motion detector was left in the code, I simply changed the motion detector in video.py to use FrigateMotionDetector instead of ImprovedMotionDetector and on that commit it is back to 0.001. It seems there is something in that new motion detector, but I agree it is odd that the issue manifests itself there |
@NickM-27 try turning off motion detection with the old motion detector and see what happens. I'm not totally convinced that we are not just seeing increased wait times for the next frame because the newer motion detection actually does less work. It should be waiting longer for the next frame because it finishes the previous frame earlier. |
@blakeblackshear Yes indeed, turning off motion detection and it immediately falls back down
|
I do have a minor bug in 64cab90 - start_pop needs to be reset at the end of the loop. but fixing that doesn't measurably change my results. |
Okay, here's another counter-intuitive result. I have #6889 applied locally, so that I know when frames get skipped. On dev, when I use FrigateMotionDetector, my overall idle CPU is higher (~10%), but my skipped_fps are quite high - 4+ on many cameras. But using ImprovedMotionDetector, my overall idle CPU is 0.0% (completely pegged), but my skipped_fps is 0 for almost all cameras. So yes, I would agree that ImprovedMotionDetector itself is less CPU intensive given the same parameters. But somehow it takes my |
I wonder if CPU affinity has something to do with it? I.e. when the capture process and the process process are running on the same CPU, the semaphore locking is very fast. But if they're not, there's a larger penalty for multiple cores accessing the same memory. And somehow something about ImprovedMotionDetector causes the process process to hop CPU cores more often. Using |
Just a quick update... setting CPU affinity for at least a small handful of cameras did not help. I did notice that there are a lot of individual mp.Value variables that are accessed on every iteration of both capture_frames and process_frames. These all essentially require a lock to read and write, and may contribute to system CPU. I've been working on reducing those, but so far it seems to have only marginally improved my user CPU time, and system CPU time still sucks up the rest. I'm still hopeful that I'll find some... something... that will improve the IPC, and immediately drop my system cpu time significantly. |
I spent some time trying to back out specific parts of the new motion detector to see what was driving the change in CPU usage. I found that if I removed both the gaussian blur and contrast improvement calls, the CPU usage dropped back to very low levels. What is interesting is that adding either of them increases CPU usage to the elevated levels. Adding a second one doesn't seem to increase levels further. It doesn't matter which one is enabled, just that at least one is. I'm wondering if there is some underlying overhead associated with the opencv interface to python that initializes once for a given scope. I did try updating to the latest opencv and it reduced the CPU usage by half, but it's still elevated with either of those function calls enabled. Just some weird python shit. One day I want to port this part of the code base to rust or something. Another idea would be to look at doing the same operations in numpy and optimizing with numba, but there is a bunch of other opencv stuff in there. |
so i had some time today to just watch the debug view, and things seem to be acting very..sporadic. see video below: vlc-record-2023-06-24-16h30m12s-vlc-record-2023-06-24-16h27m40s-test.mp4-.mp4-.mp4this cam is outputting 1080p @ 10fps (detect fps set to same resolution but 5 fps). i have a motion mask over the closer tree branch that is moving alot. motion settings: motion:
threshold: 30
contour_area: 50
frame_height: 100 is this just motion settings that need more tuning or something with the motion detector PRs being discussed here? |
Seems like numpy/numpy#6237 // https://stackoverflow.com/questions/40445983/why-does-just-importing-opencv-cause-massive-cpu-usage may be relevant |
I'm not convinced it's the numpy/openblas thing. I've straced the motion detection processes, and it's very clearly spending all of its time in |
Okay, I think I give up, and am just resigned to the fact that I have too many cameras on one processor. I did a WIP (e323288), combining the capture and motion detection processes into one, completely eliminating the queue between them. My CPU usage is about the same. I still don't understand why so much time is taken up in system CPU. |
I haven't given up yet. Just been busy the past week. |
I'll keep watching, but given what I know now, I don't think I was even keeping up under 0.12, I just was unaware I was falling behind. I'll hold off on ordering new hardware for a few days though, in case you pull out something I can't find :). |
I'm willing to help with troubleshooting, but will need above average guidance due to infamiliarity with the tools used. So may not be worth much, up to you, but I suggest it because I see this issue reliably and use higher resolution cameras - it suggests to me that the issue may not be with the 30 some cameras being too much, but simply the "combined detect area". I think the issue may be CPU related, but am also seeing what appears to be a memory leak. Installation Method: Docker Compose Reported Metrics: Host/Guest; CPU usage total 9%/37%, RAM 85%/98%, IO delay on host 0-0,6 %. See below on Frigates reported metrics. Cameras: 5 total, all Dahua. 3x 4K exterior cameras, 2x fullHD indoor cameras all streaming and detecting at native resolution. Noteworthy: Hardware acceleration not enabled (yet). 2x USB corals installed, but only one configured (haven't been able to make both work reliably, haven't looked into it yet). Otherwise "normal" and simple config; properly applied motion masks for sky, timestamps, etc. and a few detection zones configured for each camera. Using experimental UI true and MQTT, cams configured with RTSP paths and roles detect and record. Birdseye in objects mode. The setup was rock-steady on version 0.11, which was my initial installation. Never a wrong move, despite not having a Coral connected then, so using CPU detectors on all cameras. Updating to 0.12.0-DA3E197 (current) as a fresh install, I noticed CPU usage being reported very high (fx 170 % per camera and 300+ % on the CPU detectors. Applying the Coral obviously removed the detector CPU usage, but ffmpeg continues to be around 170 % for the 4Ks and 43 %. I thought this was a small UI issue that would just get fixed, but after running for some time, I see the behavior described here. No frames received, check logs on random cameras that work if I access them on eg. their built-in webserver or via app or VNC. Nothing in the log except "connection refused" (cameras have rate limits to them, I think frigate may cause them to drop the streams and reinitiate). The same cameras that report no frames in Cameras view will work in Birdseye, but the stream lags by many seconds (maybe 30s) or so, whereas they are normally near-instant, maybe a 1-2 s lag. Restarting Frigate "fixes" things for a while. But typically not for long - maybe a day or two? Observations:
As mentioned, there seems to be a CPU component to this issue but also a memory leak, in that the guestOS reports 98% memory usage of 64 GB. This also helps explain why a reset fixes things for some time. As mentioned, I am writing this mainly because I have assigned an absurd amount of ressources to Frigate (planning to optimise later, but have plans to add many cameras, etc.) and currently don't really have many cameras, but do have some of higher resolution. If I can help, please tag me and let me know specifically what you'd like. It's probably easiest if you simply ask for a method of output (fx pasting text, screenshots, etc.) and then write actions or commands, like "paste your config" or "please paste output of #>cat some command, cat some other command, etc. to avoid any confusion. |
@nicoleise make your own issue, this issue has to do with 0.13 dev branch changes not 0.12. also in general your setup sounds highly inefficient given that:
|
Closing this as the conversation in #6940 indicates the issue has been resolved. |
Thanks, Nick. I understood that this is about the latest version, but also that the issue seemed to be introduced with 0.12 originally. Maybe my mistake. But wasn't "fishing for" support in someone elses thread, just meant to help - also the reason I didn't open a separate issue (would essentially just be spam when the problem is known and worked on). Great that it seems resolved though. I'll look into the docs for detecting on secondary streams, thanks. Only had time (since 0.12) to install it and have it "working", not to set it up properly yet. Sadly. Thanks for your advice. |
Motion detection has remained unchanged for many versions until this current dev version (0.13) so it was not introduced in 0.12.
No problem, generally if an issue is being reported on a version newer than you are using it is better to create a new issue. |
@kirsch33 unless you have reason to believe it is due to motion detection then it would definitely not be reopened. also, my CPU usage has gone even lower after the recent PRs that have merged |
@NickM-27 understood i'll make a new issue |
Describe the problem you are having
after some latest commits, related to motion detection, it's works worse then before
Version
dev-7c1568f
Frigate config file
Relevant log output
FFprobe output from your camera
Frigate stats
No response
Operating system
UNRAID
Install method
Docker CLI
Coral version
USB
Network connection
Wired
Camera make and model
netatmo welcome
Any other information that may be helpful
The text was updated successfully, but these errors were encountered: