Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize output_frame_resize #6917

Open
wants to merge 10 commits into
base: dev
Choose a base branch
from

Conversation

skrashevich
Copy link
Contributor

@netlify
Copy link

netlify bot commented Jun 26, 2023

Deploy Preview for frigate-docs canceled.

Name Link
🔨 Latest commit ebcdb63
🔍 Latest deploy log https://app.netlify.com/sites/frigate-docs/deploys/6563d8d74fee5d0008eed1f6

…mage.Resampling.NEAREST in copy_yuv_to_position function
@skrashevich
Copy link
Contributor Author

I have ~x2 performance improvement with this modifications

@NickM-27
Copy link
Sponsor Collaborator

I have ~x2 performance improvement with this modifications

to be clear, that means ~ one half CPU usage? I'll try this in the morning and see how it goes in my setup

@skrashevich
Copy link
Contributor Author

skrashevich commented Jun 26, 2023

to be clear, that means ~ one half CPU usage? I'll try this in the morning and see how it goes in my setup

that means ~25% CPU utilisation by frigate.output process before, and ~12% after

in the morning

so relative... :-)

@NickM-27
Copy link
Sponsor Collaborator

So I ended up trying this now, wanted to clarify the above because I think #6853 got steered into the direction of the motion detection CPU usage which is separate from frigate.output. My motion detection CPU usage is unaffected by this PR.

In any case, this PR brought the frigate.output CPU usage in my setup from 20% of to 13% in top. I am just wondering if the openblas changes have affect that at all or if it is just the other changes, in which case maybe the openblas changes aren't needed?

@skrashevich
Copy link
Contributor Author

skrashevich commented Jun 26, 2023

My motion detection CPU usage is unaffected by this PR.

test within latest commit (7ee53df)

@skrashevich skrashevich changed the title openblas & optimize output_frame_resize openblas & optimize output_frame_resize & motion detect & process_frames Jun 26, 2023
@skrashevich
Copy link
Contributor Author

skrashevich commented Jun 26, 2023

maybe the openblas changes aren't needed?

Given in mind my mistake in pushed Dockerfile (openblas and env vars were installed only in wheels step, but not in the final step) and your feedback on performance improvements - apparently, openblas does not significantly affect the result :)

@NickM-27
Copy link
Sponsor Collaborator

My motion detection CPU usage is unaffected by this PR.

test within latest commit (7ee53df)

This has caused my CPU usage to go up for each camera detect process by ~ 2 - 3 %

@NickM-27
Copy link
Sponsor Collaborator

Might be better for this PR to keep things focused on the frame_resize improvements which reduced CPU usage by frigate.output and potentially work on other changes in a separate PR

@skrashevich
Copy link
Contributor Author

skrashevich commented Jun 26, 2023

Might be better for this PR to keep things focused on the frame_resize improvements which reduced CPU usage by frigate.output and potentially work on other changes in a separate PR

You suggest rollback this PR to 4f9cf22, and move other commits to separate PR?

@NickM-27
Copy link
Sponsor Collaborator

Might be better for this PR to keep things focused on the frame_resize improvements which reduced CPU usage by frigate.output and potentially work on other changes in a separate PR

You suggest rollback this PR to 4f9cf22, and move other commits to separate PR?

Based on previous conversation I would also suggest removing openblas, but yeah I think it makes sense to keep it separate since the frigate.output is separate from the motion detector

…mber of threads used by OpenBLAS library in Dockerfile"

This reverts commit 2bc977a.
@skrashevich skrashevich force-pushed the 230626-optimize-output-frame-resize branch from 7ee53df to 9de8f3f Compare June 26, 2023 13:04
@skrashevich skrashevich changed the title openblas & optimize output_frame_resize & motion detect & process_frames optimize output_frame_resize Jun 26, 2023
@skrashevich
Copy link
Contributor Author

Might be better for this PR to keep things focused on the frame_resize improvements which reduced CPU usage by frigate.output and potentially work on other changes in a separate PR

You suggest rollback this PR to 4f9cf22, and move other commits to separate PR?

Based on previous conversation I would also suggest removing openblas, but yeah I think it makes sense to keep it separate since the frigate.output is separate from the motion detector

What do you think about create configuration option for interpolation algorithm, used in internal calculations?

@NickM-27
Copy link
Sponsor Collaborator

I think it is too advanced for most users to understand / our docs to explain which one to use. I'm also not sure in which cases a user would want to change that

@ccutrer
Copy link
Contributor

ccutrer commented Jun 26, 2023

this decreased my output process CPU usage by at least half its previous usage. good work!

frigate/util.py Outdated
position,
resize_dim,
offset,
interpolation=Image.Resampling.NEAREST,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my testing, this interpolation for a resize is faster, but the quality is really bad. I would be curious to see if there is still a performance improvement if using Image.Resampling.BILINEAR instead. I believe that is the equivalent to opencv's INTER_LINEAR which was used previously.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is specifically for birdseye, no? Running this PR, when I only have a single camera active (which is rare), I can notice a quality degradation. Though that's possibly because my birdseye resolution (1920x1080) is bigger than my detect resolution (1280x720) that drives it. When I have multiple cameras, I can't notice poor quality at all. Perhaps if we're resizing larger, we use bilinear, but if we're resizing smaller the faster/simpler algorithm can be used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my testing, this interpolation for a resize is faster, but the quality is really bad

describe your testing configuration

frigate/util.py Outdated
def assign_resized_frame(
destination_slice, source_slice, resize_dim, interpolation
):
source_img = Image.fromarray(source_slice)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its likely the performance gain here has nothing to do with using PIL. Using opencv's cv2.INTER_NEAREST probably gets the same performance gain without casting back and forth from a numpy array.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some optimisations about casting numpy array made in a2b30d0

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still prefer to use opencv's resize unless there is a good reason to switch since we use that in almost every other part of the code base.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but it give performance improvement... Especially with enhanced multiprocessing (#6986 and #6936)
It's allows to get rid of unnecessary context switching

Copy link
Sponsor Collaborator

@NickM-27 NickM-27 Jul 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running with the current dev branch this PR provides higher CPU usage for me, opencv INTER_NEAREST provides a slight reduction in CPU usage

Current dev branch:

Screen Shot 2023-07-06 at 08 11 31 AM

Current dev branch and using opencv INTER_NEAREST

Screen Shot 2023-07-06 at 08 09 20 AM

This PR:

Screen Shot 2023-07-06 at 08 14 35 AM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you repeat this test with #7053 merged in? This can significantly affect the result.And will be wonderful to see py-spy output

Copy link
Sponsor Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try it, but if that PR made a difference it would lower the CPU usage for all cases not just the PIL case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of course. This PR provides 2 optimisations. Maybe, in some cases, an error in the second optimization neutralizes the benefits of the first one. need to check

@skrashevich
Copy link
Contributor Author

@blakeblackshear this PR is related to #6986
I didn't test them separately

@skrashevich skrashevich force-pushed the 230626-optimize-output-frame-resize branch from 954e6c6 to 2d00bd4 Compare July 6, 2023 15:12
@skrashevich
Copy link
Contributor Author

up?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants