Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug/Degradation]: High CPU Usage With Motion Detection #6853

Closed
skrashevich opened this issue Jun 19, 2023 · 51 comments · Fixed by #6870
Closed

[Bug/Degradation]: High CPU Usage With Motion Detection #6853

skrashevich opened this issue Jun 19, 2023 · 51 comments · Fixed by #6870

Comments

@skrashevich
Copy link
Contributor

Describe the problem you are having

after some latest commits, related to motion detection, it's works worse then before

Version

dev-7c1568f

Frigate config file

same as in #6802

Relevant log output

irrelevant, no errors/warnings etc

FFprobe output from your camera

irrelevant

Frigate stats

No response

Operating system

UNRAID

Install method

Docker CLI

Coral version

USB

Network connection

Wired

Camera make and model

netatmo welcome

Any other information that may be helpful

frigate-stationary-degradation

@kirsch33
Copy link
Contributor

i am running this same dev build and have noticed a degradation in overall detection performance as well, with true positives specifically. seems to be related to one of these PRs #6516 #6741 since if i downgrade to a build prior to those, its back to normal

@NickM-27
Copy link
Sponsor Collaborator

i am running this same dev build and have noticed a degradation in overall detection performance as well, with true positives specifically. seems to be related to one of these PRs #6516 #6741 since if i downgrade to a build prior to those, its back to normal

7c1568f is known to cause issues with motion detection being overly sensitive leading to more false positive.

The motion PR before that made motion less sensitive in certain situations.

Obviously it's going to be worked on but for the time being simply adjusting motion settings in the config will result in fixing the issue.

@kirsch33
Copy link
Contributor

i am running this same dev build and have noticed a degradation in overall detection performance as well, with true positives specifically. seems to be related to one of these PRs #6516 #6741 since if i downgrade to a build prior to those, its back to normal

7c1568f is known to cause issues with motion detection being overly sensitive leading to more false positive.

The motion PR before that made motion less sensitive in certain situations.

Obviously it's going to be worked on but for the time being simply adjusting motion settings in the config will result in fixing the issue.

for what it’s worth, here is my motion config, adjusted after reading a recent PR:

motion:
  threshold: 30
  contour_area: 50
  frame_height: 100

@skrashevich
Copy link
Contributor Author

Obviously it's going to be worked on but for the time being simply adjusting motion settings in the config will result in fixing the issue.

Can you give an example of configuration? Honestly, the code of this functionality scares me :)

@NickM-27
Copy link
Sponsor Collaborator

basically:

  • contour_area is how large an area of motion needs to be to be considered motion. Lower is more sensitive.

  • threshold is the difference in the pixel required to count as motion. Lower is more sensitive.

It really depends on the camera as for some of mine the default settings are perfect and for others it's not.

One thing to try is to disable improve_contrast and then make the threshold much more sensitive.

@ccutrer
Copy link
Contributor

ccutrer commented Jun 21, 2023

I just tried with the latest commit. It still seems too sensitive with default settings (I've never had to tweak motion settings before):

https://capture.dropbox.com/JsrDL15BdHDBenxu

There's only a tiny bit of wind right now, so motion on the edges of leaves would be fine, but there's no visible motion in the grass. It is interesting that all the suspected motion is at the edges of the shadow of the roof - very high contrast areas at the moment. There also wasn't any clouds around where it suddenly got bright, mistaking that as motion.

@ccutrer
Copy link
Contributor

ccutrer commented Jun 21, 2023

Another instance where high contrast edges are seeing lots of motion:
dining-motion

I noticed on another camera it's seeing motion around the timestamp, even though the timestamp itself is masked out. I can only guess that either the contrast improvement process is blooming out from that change, or it's detecting small changes from compression artifacts as movement. I'm going to try disabling improve_contrast for now.

@NickM-27
Copy link
Sponsor Collaborator

NickM-27 commented Jun 21, 2023

@ccutrer what is the detect resolution on those camera(s)?

@ccutrer
Copy link
Contributor

ccutrer commented Jun 21, 2023

1280x720

@ccutrer
Copy link
Contributor

ccutrer commented Jun 21, 2023

I've turned off improve_contrast, and I'm not seeing nearly as many false positives. But my CPU usage is much higher than yesterday (with improve_contrast on by default, and before #6870 was merged). I'm now at 0% idle CPU, with the majority taken up by the frigate.process: process for cameras that have any motion (12-13% each). I do have latest snapshots for everything now, though. And on the system page, cameras without any motion have a capture FPS that matches their ffmpeg FPS (meaning it's keeping up, and not skipping any frames). Overall detector fps is now closer 30-40 (was 60-90 yesterday). Cameras with consistent motion have a capture FPS of 0.4-2.

@NickM-27
Copy link
Sponsor Collaborator

Interesting. We are aware of a problem where the fast resize method is causing lots of noise on higher resolutions like 1080p, and setting a higher frame height would fix that.

All my cameras are 720p in detect and the default settings in the latest have been working quite well for me. It's supposed to rain today so will be curious how it works with the rain, but also don't have any bright sun right now so can't test to see if I am seeing the same there.

I think in general, the same set of motion settings won't work for every camera and some adjustments will need to be made to tune for certain cameras, and I don't really see a problem with that.

@NickM-27
Copy link
Sponsor Collaborator

Sun peeked out for a couple minutes and I was seeing something similar on one of the cameras so I adjusted the threshold to 50 and things are working well now

@ccutrer
Copy link
Contributor

ccutrer commented Jun 21, 2023

:nod:. I definitely need to find some time to become familiar with the motion settings, and dial them in better. Are there in narrative-style docs for how best to tweak motion settings? Something like https://docs.frigate.video/guides/false_positives, https://docs.frigate.video/guides/stationary_objects, and https://docs.frigate.video/configuration/stationary_objects, which guide which settings to tweak first, and why. So far all I've found is the config file reference which tells what the tweakable knobs are, but no recommendations are where to start in which situations.

I looked a bit more at which cameras were using high CPU, even if they don't necessarily have high motion. I noticed my theater room cameras are both in that bucket. Dark room, cameras in night vision mode, with a lot of noise, but zero actual motion. Definitely seem like candidates for tweaking motion setting.

Other cameras I really have no idea why they would be in high CPU if it's not a systemic problem across all cameras. Two other indoor, but bright, currently unoccupied rooms, for example.

@NickM-27
Copy link
Sponsor Collaborator

NickM-27 commented Jun 21, 2023

Something like https://docs.frigate.video/guides/false_positives, https://docs.frigate.video/guides/stationary_objects, and https://docs.frigate.video/configuration/stationary_objects, which guide which settings to tweak first, and why

I think a Tuning Motion Detection would be a good guide to write. The reference docs do say when to use what but I think you need to be familiar with the terms to understand what they mean.

Other cameras I really have no idea why they would be in high CPU if it's not a systemic problem across all cameras. Two other indoor, but bright, currently unoccupied rooms, for example.

That might be a better question for Blake, I don't know what exactly would lead to higher CPU usage, mine are mostly pretty low but without much activity right now.

Screen Shot 2023-06-21 at 11 16 39 AM

@ccutrer
Copy link
Contributor

ccutrer commented Jun 21, 2023

The ones that concern me are say Entry and Family in this screenshot - 12.x% CPU, with 0 frames even sent for object detection, and no actual motion at the moment.
Screenshot 2023-06-21 at 11 45 47 AM

@NickM-27
Copy link
Sponsor Collaborator

@ccutrer let's try something here:

For the cameras with high CPU usage but no motion

  1. Using the HA integration or MQTT turn off object detect and motion detect (they are separate). Make note of what the baseline is.
  2. Enable motion detect and let things settle for at least 1 minute, make note of what this is.
  3. Toggle improve contrast (this is another switch in HA / MQTT) and let things settle for at least 1 minute, make note of what CPU usage is.
  4. Finally, enable object detect and see how things change after that.

@ccutrer
Copy link
Contributor

ccutrer commented Jun 21, 2023

For gaming area camera:

Baseline CPU is 12-14% consistently, with no visible motion:
latest
Indeed, going to the debug view and watching it, no motion boxes are shown.

Disabling object detection, motion detection, and improve contrast: as expected CPU usage drops to ~0.3%.
Enabling just motion: It rises back to 12-13%
Enabling improve contrast: Stays steady at 12-13%
Enabling object detection: Stays steady at 12-13% (stats report 0 fps on detect, so make sense - no frames were even sent to object detection, cause there is no motion).

@blakeblackshear
Copy link
Owner

@ccutrer what is the detect resolution on that camera?

Would you be able to install and run py-spy in the container to dig into where the CPU usage is actually coming from?

@NickM-27 NickM-27 reopened this Jun 21, 2023
@ccutrer
Copy link
Contributor

ccutrer commented Jun 21, 2023

@ccutrer what is the detect resolution on that camera?

Would you be able to install and run py-spy in the container to dig into where the CPU usage is actually coming from?

1280x720

py-spy top --pid <pid> has 12.00% 12.00% 3.30s 3.37s detect (frigate/motion/improved_motion.py as the consistent top entry.

I'm not sure what format you would like, so I just did an SVG and let it run for about a minute. Let me know if you want me to run it some other way.
profile

@blakeblackshear
Copy link
Owner

I'm most interested in which function is taking the most time in py-spy top --pid <pid>. You can sort by own time and total time.

@kirsch33
Copy link
Contributor

i am experiencing the same increased CPU usage even on the dev-9e531b0 container with revised motion settings below:

motion:
  threshold: 30
  contour_area: 50
  frame_height: 100

this is on 2 cameras with a detect resolution of 1080p. if needed I can post my full config.

whats odd is that at first i noticed a lot of motion boxes being created from a close by tree fluttering in the wind, but even after masking that region the CPU usage hasnt tapered down.

this evening i can also run py-spy as mentioned and follow up.

@NickM-27
Copy link
Sponsor Collaborator

NickM-27 commented Jun 22, 2023

This is what I am getting with py-spy on one of my cameras that is pretty active right now.

Sorted by own %

Screen Shot 2023-06-22 at 09 07 39 AM

Sorted by own time

Screen Shot 2023-06-22 at 09 29 52 AM

@ccutrer
Copy link
Contributor

ccutrer commented Jun 22, 2023

I ran an experiment this morning. I rolled back to 0.12.1 (copying the same config, but letting it start with a fresh database and no prior recordings). My overall CPU usage is 10-15% idle. Interestingly, active cameras are actually using from 12-30% CPU. I'm using a camera that my kids are currently play in front of, so lots of motion. Py-spy looks like this (OwnTime sorted):
Screenshot 2023-06-22 at 9 37 16 AM
TotalTime sorted:
Screenshot 2023-06-22 at 9 39 18 AM

A relatively idle camera, OwnTime:
Screenshot 2023-06-22 at 9 40 11 AM
TotalTime:
Screenshot 2023-06-22 at 9 40 35 AM

Now I repeat, with 0.13-9E531B0. Overall CPU is 4-5% idle, with the cameras using lots of processing time maxing at 12-25% (which is actually lower CPU usage than 0.12.1).

OwnTime of busy camera:
Screenshot 2023-06-22 at 9 43 45 AM
TotalTime:
Screenshot 2023-06-22 at 9 44 25 AM

Idle camera, OwnTime:
Screenshot 2023-06-22 at 9 45 10 AM
TotalTime:
Screenshot 2023-06-22 at 9 45 30 AM

The major difference seems to be that almost all cameras are using either ~12% CPU (movement) or ~6% CPU (idle), whereas with 0.12.1 idle cameras are using ~0.3% CPU (but busy cameras use as much as 30% CPU). And overall, I think I'm running closer to redline than I thought I was, with any version of Frigate.

@ccutrer
Copy link
Contributor

ccutrer commented Jun 22, 2023

So... one thing I found is that the default frame_height changed from 50 to 100 from 0.12.1 to current dev, which implies why CPU usage would be up on the image resize for motion detection. But changing it back to 50 doesn't seem to have had any effect on lowering my CPU usage, or explain why some cameras on 0.12.1 seem to need almost no CPU for motion detection.

@NickM-27
Copy link
Sponsor Collaborator

NickM-27 commented Jun 22, 2023

Very interesting, I'll look at this and I'm sure Blake will know more than me. One thing I wanted to point out though is that we don't really know that this increase in time has to do with the transferring of a frame id through the queue. The frame_queue.get(True, 1) means there is a timeout of 1 second, so the delay could also be a slowdown in the frame ids being put into the queue instead.

@ccutrer
Copy link
Contributor

ccutrer commented Jun 22, 2023

The frame_queue.get(True, 1) means there is a timeout of 1 second, so the delay could also be a slowdown in the frames being put into the queue instead.

No, that's why I used time.process_time, not time.perf_counter. It counts actual CPU time, not wall-clock time. So idle time spent waiting for a frame doesn't count, only actual processing time. I used perf_counter at first, and the wall-clock time was similar between 0.12 and 0.13 - about 0.2s. My frame rate is 6fps, which is slight more than 1s/0.2s. It's almost like for some reason 0.13 the frame_queue.get call is never entering an actual wait state, but instead is always spinning.

@NickM-27
Copy link
Sponsor Collaborator

I used that code and I did track it down to d81dd60 that increases my get frame time from 0.001 to 0.005

@ccutrer
Copy link
Contributor

ccutrer commented Jun 22, 2023

I used that code and I did track it down to d81dd60 that increases my get frame time from 0.001 to 0.005

:( but WHY??. That commit doesn't touch anything with the frame queue. It's almost like we're up against some sort boundary condition, and where slightly slowing down or speeding up the processing time changes how often it checks the queue, and that's make Queue.get decide to spin instead of enter an idle block.

@NickM-27
Copy link
Sponsor Collaborator

Thankfully the original motion detector was left in the code, I simply changed the motion detector in video.py to use FrigateMotionDetector instead of ImprovedMotionDetector and on that commit it is back to 0.001. It seems there is something in that new motion detector, but I agree it is odd that the issue manifests itself there

@blakeblackshear
Copy link
Owner

@NickM-27 try turning off motion detection with the old motion detector and see what happens.

I'm not totally convinced that we are not just seeing increased wait times for the next frame because the newer motion detection actually does less work. It should be waiting longer for the next frame because it finishes the previous frame earlier.

@NickM-27
Copy link
Sponsor Collaborator

NickM-27 commented Jun 22, 2023

@blakeblackshear Yes indeed, turning off motion detection and it immediately falls back down

[2023-06-22 16:22:42] frigate.video                  INFO    : Took 0.005485291000000059s of CPU time to get a frame
[2023-06-22 16:22:42] frigate.video                  INFO    : Took 0.004979210000000012s of CPU time to get a frame
[2023-06-22 16:22:42] frigate.video                  INFO    : Took 0.005536291999999943s of CPU time to get a frame
[2023-06-22 16:22:42] frigate.video                  INFO    : Took 0.005919828999999988s of CPU time to get a frame
[2023-06-22 16:22:42] frigate.video                  INFO    : Took 0.0072390829999999795s of CPU time to get a frame
[2023-06-22 16:22:42] frigate.video                  INFO    : Took 0.0006049149999999837s of CPU time to get a frame
[2023-06-22 16:22:42] frigate.video                  INFO    : Took 0.006045665000000033s of CPU time to get a frame
[2023-06-22 16:22:42] frigate.video                  INFO    : Took 0.00805941599999993s of CPU time to get a frame
[2023-06-22 16:22:42] frigate.video                  INFO    : Took 0.006102372999999939s of CPU time to get a frame
[2023-06-22 16:22:42] frigate.video                  INFO    : Took 0.006026333000000106s of CPU time to get a frame
[2023-06-22 16:22:42] frigate.video                  INFO    : Took 0.005916017999999967s of CPU time to get a frame
[2023-06-22 16:22:42] frigate.video                  INFO    : Took 0.0007802920000000158s of CPU time to get a frame
[2023-06-22 16:22:42] frigate.video                  INFO    : Took 0.005774002000000111s of CPU time to get a frame
[2023-06-22 16:22:42] frigate.video                  INFO    : Took 0.006351564000000032s of CPU time to get a frame
[2023-06-22 16:22:42] frigate.video                  INFO    : Took 0.007430503000000033s of CPU time to get a frame
[2023-06-22 16:22:42] frigate.video                  INFO    : Took 0.0004538760000000197s of CPU time to get a frame
[2023-06-22 16:22:42] frigate.video                  INFO    : Took 0.0007422499999999443s of CPU time to get a frame
[2023-06-22 16:22:42] frigate.video                  INFO    : Took 0.005950585000000008s of CPU time to get a frame
[2023-06-22 16:22:42] frigate.video                  INFO    : Took 0.0010318340000000648s of CPU time to get a frame
[2023-06-22 16:22:42] frigate.video                  INFO    : Took 0.0016696659999999586s of CPU time to get a frame
[2023-06-22 16:22:42] frigate.video                  INFO    : Took 0.000742751000000097s of CPU time to get a frame
[2023-06-22 16:22:42] frigate.video                  INFO    : Took 0.0006653739999999964s of CPU time to get a frame
[2023-06-22 16:22:42] frigate.video                  INFO    : Took 0.0008575009999999272s of CPU time to get a frame
[2023-06-22 16:22:42] frigate.video                  INFO    : Took 0.0008053350000000181s of CPU time to get a frame
[2023-06-22 16:22:42] frigate.video                  INFO    : Took 0.0012079590000000362s of CPU time to get a frame
[2023-06-22 16:22:42] frigate.video                  INFO    : Took 0.0009174979999999611s of CPU time to get a frame
[2023-06-22 16:22:42] frigate.video                  INFO    : Took 0.0006456679999999881s of CPU time to get a frame
[2023-06-22 16:22:42] frigate.video                  INFO    : Took 0.005710542999999957s of CPU time to get a frame
[2023-06-22 16:22:42] frigate.video                  INFO    : Took 0.0004921660000000161s of CPU time to get a frame
[2023-06-22 16:22:42] frigate.video                  INFO    : Took 0.0009218770000000154s of CPU time to get a frame
[2023-06-22 16:22:42] frigate.video                  INFO    : Took 0.0008889579999999953s of CPU time to get a frame
[2023-06-22 16:22:42] frigate.video                  INFO    : Took 0.0006090829999999547s of CPU time to get a frame

@ccutrer
Copy link
Contributor

ccutrer commented Jun 22, 2023

I do have a minor bug in 64cab90 - start_pop needs to be reset at the end of the loop. but fixing that doesn't measurably change my results.

@ccutrer
Copy link
Contributor

ccutrer commented Jun 22, 2023

Okay, here's another counter-intuitive result. I have #6889 applied locally, so that I know when frames get skipped. On dev, when I use FrigateMotionDetector, my overall idle CPU is higher (~10%), but my skipped_fps are quite high - 4+ on many cameras. But using ImprovedMotionDetector, my overall idle CPU is 0.0% (completely pegged), but my skipped_fps is 0 for almost all cameras. So yes, I would agree that ImprovedMotionDetector itself is less CPU intensive given the same parameters. But somehow it takes my frame_queue.get times from 7e-5s (yes, it's even faster when I corrected my start_pop bug) with FrigateMotionDetector to the ~0.2s with ImprovedMotionDetector. I looked through multi_processing.Queue's source, and on through to multi_processing.Connection.wait, and I don't see where it might decide to spin instead of block (https://github.com/python/cpython/blob/1a79785e3e8fea80bcf6a800b45a04e06c787480/Lib/multiprocessing/connection.py#L922). My system CPU % is also ~10% higher (accounting for almost all of the previously idle % with FrigateMotionDetector) when using ImprovedMotionDetector.

@ccutrer
Copy link
Contributor

ccutrer commented Jun 22, 2023

I wonder if CPU affinity has something to do with it? I.e. when the capture process and the process process are running on the same CPU, the semaphore locking is very fast. But if they're not, there's a larger penalty for multiple cores accessing the same memory. And somehow something about ImprovedMotionDetector causes the process process to hop CPU cores more often. Using pidstat -u -p <pid>, I can see that the capture process does not hop CPU cores at all. The process process does, for both motion detectors, but seems like it does more often for the latter, but that's anecdotal at best. I don't know how to actually measure it. If this is the case, setting the cpu affinity for both capture and process processes to be the same per-camera might fix it.

@ccutrer
Copy link
Contributor

ccutrer commented Jun 23, 2023

Just a quick update... setting CPU affinity for at least a small handful of cameras did not help. I did notice that there are a lot of individual mp.Value variables that are accessed on every iteration of both capture_frames and process_frames. These all essentially require a lock to read and write, and may contribute to system CPU. I've been working on reducing those, but so far it seems to have only marginally improved my user CPU time, and system CPU time still sucks up the rest. I'm still hopeful that I'll find some... something... that will improve the IPC, and immediately drop my system cpu time significantly.

@blakeblackshear
Copy link
Owner

I spent some time trying to back out specific parts of the new motion detector to see what was driving the change in CPU usage.

I found that if I removed both the gaussian blur and contrast improvement calls, the CPU usage dropped back to very low levels.

What is interesting is that adding either of them increases CPU usage to the elevated levels. Adding a second one doesn't seem to increase levels further. It doesn't matter which one is enabled, just that at least one is.

I'm wondering if there is some underlying overhead associated with the opencv interface to python that initializes once for a given scope. I did try updating to the latest opencv and it reduced the CPU usage by half, but it's still elevated with either of those function calls enabled.

Just some weird python shit. One day I want to port this part of the code base to rust or something.

Another idea would be to look at doing the same operations in numpy and optimizing with numba, but there is a bunch of other opencv stuff in there.

@kirsch33
Copy link
Contributor

kirsch33 commented Jun 24, 2023

so i had some time today to just watch the debug view, and things seem to be acting very..sporadic. see video below:

vlc-record-2023-06-24-16h30m12s-vlc-record-2023-06-24-16h27m40s-test.mp4-.mp4-.mp4

this cam is outputting 1080p @ 10fps (detect fps set to same resolution but 5 fps). i have a motion mask over the closer tree branch that is moving alot.

motion settings:

motion:
  threshold: 30
  contour_area: 50
  frame_height: 100

is this just motion settings that need more tuning or something with the motion detector PRs being discussed here?

@NickM-27 NickM-27 changed the title [Bug/Degradation]: Static object detection [Bug/Degradation]: High CPU Usage With Motion Detection Jun 25, 2023
@NickM-27
Copy link
Sponsor Collaborator

NickM-27 commented Jun 25, 2023

@ccutrer
Copy link
Contributor

ccutrer commented Jun 26, 2023

I'm not convinced it's the numpy/openblas thing. I've straced the motion detection processes, and it's very clearly spending all of its time in poll (or pselect6, when I temporarily forced multiprocessing.Connection to use select instead of poll; made no difference) not in gettimeofday. Not that I've also looked through the kernel's implementation of poll, and while there are provisions to use a busy-loop instead of waiting to be woken up, they shouldn't apply to the pipe vfs which is they type of fd we're polling for. One possibility is I noticed ~5% of my overall CPU time is spent in software IRQs, so I'm wondering if it's because I have so many different poll requests running, and when any one has a frame ready, it wakes up all of them in the kernel, which have to then loop and realize their particular FD still isn't ready. But I'm not sure on that - I haven't traced through how the wakeups actually happen in the kernel. And it's still weird how tweaking the motion detector -- all user code -- somehow seems to affect this quite drastically.

@ccutrer
Copy link
Contributor

ccutrer commented Jun 26, 2023

Okay, I think I give up, and am just resigned to the fact that I have too many cameras on one processor. I did a WIP (e323288), combining the capture and motion detection processes into one, completely eliminating the queue between them. My CPU usage is about the same. I still don't understand why so much time is taken up in system CPU.

@blakeblackshear
Copy link
Owner

I haven't given up yet. Just been busy the past week.

@ccutrer
Copy link
Contributor

ccutrer commented Jun 26, 2023

I'll keep watching, but given what I know now, I don't think I was even keeping up under 0.12, I just was unaware I was falling behind. I'll hold off on ordering new hardware for a few days though, in case you pull out something I can't find :).

@nicoleise
Copy link

I'm willing to help with troubleshooting, but will need above average guidance due to infamiliarity with the tools used. So may not be worth much, up to you, but I suggest it because I see this issue reliably and use higher resolution cameras - it suggests to me that the issue may not be with the 30 some cameras being too much, but simply the "combined detect area".

I think the issue may be CPU related, but am also seeing what appears to be a memory leak.

Installation Method: Docker Compose
Target: Debian 11 OS guest VM running on Proxmox 7.4
Assigned Ressources: 16 cores out of 2 Intel(R) Xeon(R) CPU E5-2697A v4 @ 2.60GHz, 64 GB RAM out of 256 GB. 6x1TB HDDs in RAID10 ZFS for the recordings and 32G out of 1.6 TB SSD in RAID1 (mirror of two 1.6 TB SSDs). Currently only on 1x1G connection, planning for 2x10G, but there is nothing on the server that can saturate a 1G link in the slightest - host netin is under 2M. In other words, a fairly high "allowance" of ressources and more important; plenty of headroom on the host system.

Reported Metrics: Host/Guest; CPU usage total 9%/37%, RAM 85%/98%, IO delay on host 0-0,6 %. See below on Frigates reported metrics.

Cameras: 5 total, all Dahua. 3x 4K exterior cameras, 2x fullHD indoor cameras all streaming and detecting at native resolution.

Noteworthy: Hardware acceleration not enabled (yet). 2x USB corals installed, but only one configured (haven't been able to make both work reliably, haven't looked into it yet). Otherwise "normal" and simple config; properly applied motion masks for sky, timestamps, etc. and a few detection zones configured for each camera. Using experimental UI true and MQTT, cams configured with RTSP paths and roles detect and record. Birdseye in objects mode.

The setup was rock-steady on version 0.11, which was my initial installation. Never a wrong move, despite not having a Coral connected then, so using CPU detectors on all cameras.

Updating to 0.12.0-DA3E197 (current) as a fresh install, I noticed CPU usage being reported very high (fx 170 % per camera and 300+ % on the CPU detectors. Applying the Coral obviously removed the detector CPU usage, but ffmpeg continues to be around 170 % for the 4Ks and 43 %.

I thought this was a small UI issue that would just get fixed, but after running for some time, I see the behavior described here. No frames received, check logs on random cameras that work if I access them on eg. their built-in webserver or via app or VNC. Nothing in the log except "connection refused" (cameras have rate limits to them, I think frigate may cause them to drop the streams and reinitiate). The same cameras that report no frames in Cameras view will work in Birdseye, but the stream lags by many seconds (maybe 30s) or so, whereas they are normally near-instant, maybe a 1-2 s lag.

Restarting Frigate "fixes" things for a while. But typically not for long - maybe a day or two?

Observations:

  1. I see similarly that the ffmpeg FPS are as specified while the capture and detect are very low (0.2). One camera hardly sees motion, this shows 5/5/0 (0 skipped).

  2. No CPU usage reported for the Coral detector. When frigate is restarted, usage is reported.

As mentioned, there seems to be a CPU component to this issue but also a memory leak, in that the guestOS reports 98% memory usage of 64 GB. This also helps explain why a reset fixes things for some time.

As mentioned, I am writing this mainly because I have assigned an absurd amount of ressources to Frigate (planning to optimise later, but have plans to add many cameras, etc.) and currently don't really have many cameras, but do have some of higher resolution.

If I can help, please tag me and let me know specifically what you'd like. It's probably easiest if you simply ask for a method of output (fx pasting text, screenshots, etc.) and then write actions or commands, like "paste your config" or "please paste output of #>cat some command, cat some other command, etc. to avoid any confusion.

@NickM-27
Copy link
Sponsor Collaborator

NickM-27 commented Jul 6, 2023

@nicoleise make your own issue, this issue has to do with 0.13 dev branch changes not 0.12.

also in general your setup sounds highly inefficient given that:

  1. it is highly discouraged to run detect on the main stream, there are very few cases where there is any benefit and it vastly increases CPU usage
  2. If you are not going to listen to 1 then hardware acceleration is very important

@NickM-27
Copy link
Sponsor Collaborator

NickM-27 commented Jul 6, 2023

Closing this as the conversation in #6940 indicates the issue has been resolved.

@NickM-27 NickM-27 closed this as completed Jul 6, 2023
@nicoleise
Copy link

@nicoleise make your own issue, this issue has to do with 0.13 dev branch changes not 0.12.

also in general your setup sounds highly inefficient given that:

  1. it is highly discouraged to run detect on the main stream, there are very few cases where there is any benefit and it vastly increases CPU usage
  2. If you are not going to listen to 1 then hardware acceleration is very important

Thanks, Nick. I understood that this is about the latest version, but also that the issue seemed to be introduced with 0.12 originally. Maybe my mistake. But wasn't "fishing for" support in someone elses thread, just meant to help - also the reason I didn't open a separate issue (would essentially just be spam when the problem is known and worked on).

Great that it seems resolved though.

I'll look into the docs for detecting on secondary streams, thanks. Only had time (since 0.12) to install it and have it "working", not to set it up properly yet. Sadly. Thanks for your advice.

@NickM-27
Copy link
Sponsor Collaborator

NickM-27 commented Jul 6, 2023

Thanks, Nick. I understood that this is about the latest version, but also that the issue seemed to be introduced with 0.12 originally. Maybe my mistake.

Motion detection has remained unchanged for many versions until this current dev version (0.13) so it was not introduced in 0.12.

But wasn't "fishing for" support in someone elses thread, just meant to help - also the reason I didn't open a separate issue (would essentially just be spam when the problem is known and worked on).

No problem, generally if an issue is being reported on a version newer than you are using it is better to create a new issue.

@kirsch33
Copy link
Contributor

kirsch33 commented Jul 6, 2023

@NickM-27 not sure if this should be re-opened or not but I pulled down the latest dev build dev-baf671b and seems a new set of issues are at hand. here is CPU usage before and after (before was running dev-c3b313a)
Capture
not sure if related to #6986 or #6890 or maybe #7022

@NickM-27
Copy link
Sponsor Collaborator

NickM-27 commented Jul 6, 2023

@kirsch33 unless you have reason to believe it is due to motion detection then it would definitely not be reopened.

also, my CPU usage has gone even lower after the recent PRs that have merged

@kirsch33
Copy link
Contributor

kirsch33 commented Jul 6, 2023

@NickM-27 understood i'll make a new issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants