Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement hardware acceleration #412

Open
wants to merge 5 commits into
base: integration
Choose a base branch
from

Conversation

tacheometry
Copy link

@tacheometry tacheometry commented Mar 26, 2024

This PR addresses Issue #392.

I've added some Nvidia CUDA FFMPEG arguments for the mp4 format for now, but this is a good start. What remains is:

  • Supporting AMD acceleration
  • Supporting NVIDIA acceleration
  • Implementing the arguments in other video formats as well (right now only for mp4)
  • Use localization for the option strings
  • Adding a setting in the program interface to control what type of acceleration to use (off by default)
  • Maybe: hardware acceleration for the entire transcoding process. Right now only encoding and decoding are accelerated.
    • This isn't a convenient feat. For CUDA it requires changing filter names from scale to scale_cuda, or crf into qp for example. I tried doing this but couldn't get it to not error. It requires much experimentation.

All help is greatly appreciated 馃榾 This is the first C# program I edit...

The results I got adding these arguments (which accelerate only the encode/decode part of the process) are 2-3x faster than before, transcoding a 176 MB video into ~8MB using the To Mp4 (low quality) preset.

Development

I couldn't find much information on how to contribute to this program, but this is what I've learned so far:

Installation

The Magick.Native-Q16-x64.dll file must be copied to bin/x64/Debug for the program to compile. To obtain this file you need to follow the Magick.NET compilation guide and then grab it from C:\Users\xxxx\.nuget\packages\magick.native.

Testing

The project can be built in its entirety, and then the installer can be run, but this requires a system restart. A more efficient option is calling FileConverter.exe directly. After building the FileConverter solution, you should be able to find Application/FileConverter/bin/x64/Debug/FileConverter.exe

Opening this file will give you the tutorial window. But if you run it from the command line like so:

.\FileConverter.exe --verbose --conversion-preset "To Mp4 (low quality)" "C:\Users\DevAccount\Desktop\cs.mp4"

It is equivalent to right clicking a preset in the context menu, without all the extra steps.

Resources:

@tacheometry tacheometry mentioned this pull request Mar 26, 2024
@tacheometry
Copy link
Author

Managed to get full transcoding working. It doesn't provide as much of a speed up as accelerated encoding and decoding, but for long videos it'll definitely be super useful.

@tacheometry
Copy link
Author

tacheometry commented Mar 27, 2024

Benchmarks

Note: when writing ffmpeg.exe, it refers to FileConverter\Application\FileConverter\bin\x64\Debug\ffmpeg.exe, I'm not using the system ffmpeg.

Commands

Instead of modifying the program and testing every single time, I modified the effective ffmpeg command used, and implemented my modifications after getting everything working.

These following commands are executed for the To Mp4 (low quality) preset (except I'm modifying -n to -y and adding -benchmark). To see the execution time, look at rtime=1.234s in the ffmpeg benchmark output.

Acceleration off

This is used by FileConverter by default.

ffmpeg.exe -y -stats -i "input.mp4" -c:v libx264 -preset medium -crf 31 -c:a aac -qscale:a 0.75 -vf "scale=trunc(iw*1/2)*2:trunc(ih*1/2)*2,format=yuv420p" "output.mp4" -benchmark

HW accelerated encoding and decoding, CPU scaling

ffmpeg.exe -y -stats -hwaccel cuda -i "input.mp4" -c:v h264_nvenc -preset medium -crf 31 -c:a aac -qscale:a 0.75 -vf "scale=trunc(iw*1/2)*2:trunc(ih*1/2)*2,format=yuv420p" "output.mp4" -benchmark

Fully HW accelerated transcoding

ffmpeg.exe -y -stats -hwaccel cuda -hwaccel_output_format cuda -i "input.mp4" -c:v h264_nvenc -preset  medium -crf 31 -c:a aac -qscale:a 0.75 -vf "scale_cuda=trunc(iw*1/2)*2:trunc(ih*1/2)*2:format=yuv420p" "output.mp4" -benchmark

Results

input.mp4 is a 30 second long 1920x1080p 176 MB file.

To Mp4 (low quality) (1x scaling)

  • HW accel off: 14.6s
  • HW accelerated encode/decode: 5.7s (2.56x faster than base)
  • Fully accelerated transcode: 5.3s (2.75x faster than base)

To Mp4 (lowER quality) (0.5x scaling)

This preset I made changes the scaling from 100% to 50%.

  • HW accel off: 6.2s
  • HW accelerated encode/decode: 4.3s (1.44x faster than base)
  • Fully accelerated transcode: 3.3s (1.87x faster than base)

@tacheometry tacheometry marked this pull request as ready for review March 27, 2024 18:20
@broscoi
Copy link

broscoi commented Mar 27, 2024

I compiled @tacheometry's version and ran a few tests. 3x runs on Hardware acceleration mode = Nvidia (CUDA) & 3x runs on Hardware acceleration mode = Off. Each time the results from CUDA were atleast 2 times faster than with CPU proccessing.

nohardwareacceleration
hardwareacceleration

@tacheometry
Copy link
Author

@Tichau Can I get a review on this?

@ItsukaHiro
Copy link

I thinks the easiest way is replace the ffmpeg file in file-converter with a self-complied version that enable all gpu-video-process feature enable. So it will work on any machine even it have NVIDIA or Intel or AMD graphic card.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants