New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimize output_frame_resize #6917
base: dev
Are you sure you want to change the base?
optimize output_frame_resize #6917
Conversation
…threads used by OpenBLAS library in Dockerfile
✅ Deploy Preview for frigate-docs canceled.
|
…mage.Resampling.NEAREST in copy_yuv_to_position function
I have ~x2 performance improvement with this modifications |
to be clear, that means ~ one half CPU usage? I'll try this in the morning and see how it goes in my setup |
that means ~25% CPU utilisation by frigate.output process before, and ~12% after
so relative... :-) |
So I ended up trying this now, wanted to clarify the above because I think #6853 got steered into the direction of the motion detection CPU usage which is separate from frigate.output. My motion detection CPU usage is unaffected by this PR. In any case, this PR brought the frigate.output CPU usage in my setup from 20% of to 13% in top. I am just wondering if the openblas changes have affect that at all or if it is just the other changes, in which case maybe the openblas changes aren't needed? |
test within latest commit (7ee53df) |
Given in mind my mistake in pushed Dockerfile (openblas and env vars were installed only in wheels step, but not in the final step) and your feedback on performance improvements - apparently, openblas does not significantly affect the result :) |
This has caused my CPU usage to go up for each camera detect process by ~ 2 - 3 % |
Might be better for this PR to keep things focused on the frame_resize improvements which reduced CPU usage by frigate.output and potentially work on other changes in a separate PR |
You suggest rollback this PR to 4f9cf22, and move other commits to separate PR? |
Based on previous conversation I would also suggest removing openblas, but yeah I think it makes sense to keep it separate since the frigate.output is separate from the motion detector |
…mber of threads used by OpenBLAS library in Dockerfile" This reverts commit 2bc977a.
7ee53df
to
9de8f3f
Compare
What do you think about create configuration option for interpolation algorithm, used in internal calculations? |
I think it is too advanced for most users to understand / our docs to explain which one to use. I'm also not sure in which cases a user would want to change that |
this decreased my output process CPU usage by at least half its previous usage. good work! |
frigate/util.py
Outdated
position, | ||
resize_dim, | ||
offset, | ||
interpolation=Image.Resampling.NEAREST, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my testing, this interpolation for a resize is faster, but the quality is really bad. I would be curious to see if there is still a performance improvement if using Image.Resampling.BILINEAR
instead. I believe that is the equivalent to opencv's INTER_LINEAR which was used previously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is specifically for birdseye, no? Running this PR, when I only have a single camera active (which is rare), I can notice a quality degradation. Though that's possibly because my birdseye resolution (1920x1080) is bigger than my detect resolution (1280x720) that drives it. When I have multiple cameras, I can't notice poor quality at all. Perhaps if we're resizing larger, we use bilinear, but if we're resizing smaller the faster/simpler algorithm can be used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my testing, this interpolation for a resize is faster, but the quality is really bad
describe your testing configuration
frigate/util.py
Outdated
def assign_resized_frame( | ||
destination_slice, source_slice, resize_dim, interpolation | ||
): | ||
source_img = Image.fromarray(source_slice) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its likely the performance gain here has nothing to do with using PIL. Using opencv's cv2.INTER_NEAREST
probably gets the same performance gain without casting back and forth from a numpy array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some optimisations about casting numpy array made in a2b30d0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would still prefer to use opencv's resize unless there is a good reason to switch since we use that in almost every other part of the code base.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you repeat this test with #7053 merged in? This can significantly affect the result.And will be wonderful to see py-spy output
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try it, but if that PR made a difference it would lower the CPU usage for all cases not just the PIL case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
of course. This PR provides 2 optimisations. Maybe, in some cases, an error in the second optimization neutralizes the benefits of the first one. need to check
@blakeblackshear this PR is related to #6986 |
954e6c6
to
2d00bd4
Compare
up? |
#6853