Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zero-copy scene detection #627

Draft
wants to merge 22 commits into
base: master
Choose a base branch
from
Draft

Conversation

redzic
Copy link
Collaborator

@redzic redzic commented May 6, 2022

Scene detection, especially at higher resolutions, is currently bottlenecked by many allocations and copying of uncompressed frames. This PR uses the ffmpeg API directly to decode frames and removes copies frame data during scene detection, which allows for large speedups compared to the previous code (which generally seem to increase with resolution and higher bit depth).

Considerations before merging

  • Implement --sc-downscale-height and --sc-pix-format, this now needs to be manually done with the ffmpeg API as well
  • Upstream the changes to rav1e, av-scenechange, and possibly av-metrics-decoders (the code changes from av-metrics-decoders could be included in av1an or av-scenechange instead).

Other possible improvements

  • Decode next few frames in parallel of the scene detection analysis itself, instead of decoding frames as needed and then performing analysis in serial. This seems difficult, so this could possibly be done in a future PR
  • If possible, remove uses of Arc<T> in analyze_next_frame and remove temporary vectors holding Arc<Frame<T>>s. This would need to be done in addition to some work on rav1e to allow creating Plane<T> that are not owned
  • Run callback function (incrementing the progress bar) on another thread

@master-of-zen
Copy link
Owner

@redzic Amazing work 🥇 💯

@redzic redzic requested a review from shssoichiro May 6, 2022 08:35
@shssoichiro
Copy link
Collaborator

I haven't seen the av-scenechange code yet but here are some of my thoughts:

Fix standard mode of scene detection. Currently, only the fast mode of scene detection actually works because the function estimate_inter_costs seems to require the raw frame data to be padded to a power of 2.

I also think the ideal solution is to not require the padding. I looked into this recently because I realized there's actually a lot of padding on each frame in rav1e and I wanted to reduce memory usage. I got as far as seeing that the motion estimation requires padding because it looks outside of the visible portion of the video when searching. Removing this requirement would probably speed up motion estimation in rav1e (and by extension estimate_inter_costs) as well. But I don't think it's a small change.

Failing that, I think number 2 is the solution we are stuck with. I don't like 1 because as you mentioned it changes the results to be less accurate. And 3, unfortunately, I don't think is possible, or at the very least would be pretty messy to implement (a lot of raw pointers and indirection).

Upstream the changes to rav1e, av-scenechange, and possibly av-metrics-decoders (the code changes from av-metrics-decoders could be included in av1an or av-scenechange instead).

One of the major challenges here I think will be that rav1e does not currently depend on ffmpeg, and I'm not sure that they want to. It might be necessary to implement something similar in rav1e's y4m implementation but without relying on the ffmpeg API (which is cursed).

Decode next few frames in parallel of the scene detection analysis itself, instead of decoding frames as needed and then performing analysis in serial. This seems difficult, so this could possibly be done in a future PR

I don't think this is as difficult as it seems. But I agree putting it in a future PR makes sense, since it is a separate optimization from the zero-copy changes.

If possible, remove uses of Arc in analyze_next_frame and remove temporary vectors holding Arc<Frame>s. This would need to be done in addition to some work on rav1e to allow creating Plane that are not owned

Not sure how feasible this is... It might be worth investigating in a follow-up PR. But I also think the gains from it would be small, if any.

@redzic
Copy link
Collaborator Author

redzic commented May 18, 2022

It seems like estimate_inter_costs actually needed more vertical padding (not horizontal padding which is what I thought originally), which is pretty easy to add without having to do a bunch of extra copying. I've added it and now the standard scene detect works (and is also much faster) on many videos I've tried, but it seems to segfault (apparently in a call to copy_nonoverlapping) on HBD videos with standard scene detect, and apparently an assertion fails sometimes too according to CI. I'll look into this further.

@redzic
Copy link
Collaborator Author

redzic commented May 18, 2022

I've fixed the standard scene detection mode, now I just need to address the other issues (mainly vapoursynth input support and downscaling for scene detection).

@BlueSwordM
Copy link
Contributor

Incredible.

@redzic
Copy link
Collaborator Author

redzic commented Jun 3, 2022

Surprisingly ffmpeg isn't actually the problem here, it's mostly vapoursynth which has a limited (at least, for our specific use case) decoding API. Vapoursynth seems to be based on the idea of returning "frame refs", which is counter to the idea of decoding into a specific buffer that av1an allocates with specific dimensions to avoid copying altogether.

I think maybe it's still possible to support zerocopy fast scene detect with vapoursynth input, but in the case of standard scene detect and vapoursynth input I think we are forced to copy into a new buffer. There might be some additional optimizations though, like avoiding copying when combined with --sc-downscale-height, but I'll have to see later what can be done about that.

@redzic redzic force-pushed the zerocopy branch 2 times, most recently from 90ee178 to 52c1af8 Compare June 26, 2022 09:46
@redzic redzic force-pushed the zerocopy branch 2 times, most recently from d6fbf97 to 5c8b172 Compare July 9, 2022 05:48
@redzic redzic force-pushed the zerocopy branch 3 times, most recently from 67f44f4 to 437b2f5 Compare July 13, 2022 08:18
@redzic
Copy link
Collaborator Author

redzic commented Jul 13, 2022

Update: I've added support in this PR for vapoursynth input. It seems to work on the scripts I've tried so far (even with HBD and --sc-method standard), but it might not be 100% correct yet. I'll have to double check why it's even working though to be honest, as I'm legitimately not even sure how it's producing the same results as before when I didn't even add any extra special handling for padding.

@redzic
Copy link
Collaborator Author

redzic commented Jul 13, 2022

It looks like the standard scene detection mode with VS only happens to work with certain dimensions (but not with others, for example 930x734), so some kind of check will be needed to add the padding when needed (and unfortunately we have no choice to copy the buffer in this case unless there's a way to configure the padding on the returned frames in VS).

@silverbacknet
Copy link

unless there's a way to configure the padding on the returned frames in VS

Definitely possible, that's how we did it in AvsP. I have to go back and refresh my memory on exactly how that was done, but vital for colorspace conversion for the UI and such. On the other hand, not sure how useful it would be here, since it would be a wash whether VS or Av1an does the bitblt onto a padded region. Maybe there's a fix for the scene detection? I'll take a look at it.

@redzic
Copy link
Collaborator Author

redzic commented Aug 30, 2022

Definitely possible, that's how we did it in AvsP. I have to go back and refresh my memory on exactly how that was done, but vital for colorspace conversion for the UI and such.

Definitely let me know if you figure it out/remember, it would simplify a lot of code if it's possible

@0xBA5E64
Copy link
Contributor

0xBA5E64 commented May 22, 2024

With @shssoichiro just now committing f259c44, is it possible we'll see a pulse on this getting merged as well?

Edit: I'm admittedly way out of my league with this but, I'm guessing the real question is if redzic's zerocopy branch of av-scenechange can be merged now that shssoichiro's figured out vapoursynth scene-detection?

@shssoichiro
Copy link
Collaborator

I do hope to integrate the ffmpeg portions of this as well, at least the portions that we know will work correctly. There are some limitations on making it truly zero-copy, which I believe is why this got stalled in the first place, but pipeless should be possible for ffmpeg as well.

@redzic
Copy link
Collaborator Author

redzic commented May 22, 2024

Vapoursynth scene detection was working in this branch as well, with it being pipeless all the time but only being zerocopy if the strides between what vapoursynth returns and what rav1e expects matched. FFmpeg decoding with true zerocopy worked even with standard scenechange mode as it configured padding to what rav1e expects before calling the decoder. I can't dedicate time to this now but if someone else wants to pick this up and sort out the edge cases and do a little more testing I'm all good with that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants