Implement hardware acceleration #412

tacheometry · 2024-03-26T02:29:19Z

This PR addresses Issue #392.

I've added some Nvidia CUDA FFMPEG arguments for the mp4 format for now, but this is a good start. What remains is:

Supporting AMD acceleration
Supporting NVIDIA acceleration
Implementing the arguments in other video formats as well (right now only for mp4)
Use localization for the option strings
Adding a setting in the program interface to control what type of acceleration to use (off by default)
Maybe: hardware acceleration for the entire transcoding process. Right now only encoding and decoding are accelerated.
- This isn't a convenient feat. For CUDA it requires changing filter names from scale to scale_cuda, or crf into qp for example. I tried doing this but couldn't get it to not error. It requires much experimentation.

All help is greatly appreciated 😀 This is the first C# program I edit...

The results I got adding these arguments (which accelerate only the encode/decode part of the process) are 2-3x faster than before, transcoding a 176 MB video into ~8MB using the To Mp4 (low quality) preset.

Development

I couldn't find much information on how to contribute to this program, but this is what I've learned so far:

Installation

The Magick.Native-Q16-x64.dll file must be copied to bin/x64/Debug for the program to compile. To obtain this file you need to follow the Magick.NET compilation guide and then grab it from C:\Users\xxxx\.nuget\packages\magick.native.

Testing

The project can be built in its entirety, and then the installer can be run, but this requires a system restart. A more efficient option is calling FileConverter.exe directly. After building the FileConverter solution, you should be able to find Application/FileConverter/bin/x64/Debug/FileConverter.exe

Opening this file will give you the tutorial window. But if you run it from the command line like so:

.\FileConverter.exe --verbose --conversion-preset "To Mp4 (low quality)" "C:\Users\DevAccount\Desktop\cs.mp4"

It is equivalent to right clicking a preset in the context menu, without all the extra steps.

Resources:

https://docs.nvidia.com/video-technologies/video-codec-sdk/12.0/ffmpeg-with-nvidia-gpu/index.html
https://trac.ffmpeg.org/wiki/HWAccelIntro
https://github.com/HeiSir2014/ffmpeg-wiki
https://stackoverflow.com/a/55747785
https://lists.ffmpeg.org/pipermail/ffmpeg-user/2017-July/036820.html
various StackExchange answers you might find

tacheometry · 2024-03-27T18:04:15Z

Managed to get full transcoding working. It doesn't provide as much of a speed up as accelerated encoding and decoding, but for long videos it'll definitely be super useful.

tacheometry · 2024-03-27T18:19:23Z

Benchmarks

Note: when writing ffmpeg.exe, it refers to FileConverter\Application\FileConverter\bin\x64\Debug\ffmpeg.exe, I'm not using the system ffmpeg.

Commands

Instead of modifying the program and testing every single time, I modified the effective ffmpeg command used, and implemented my modifications after getting everything working.

These following commands are executed for the To Mp4 (low quality) preset (except I'm modifying -n to -y and adding -benchmark). To see the execution time, look at rtime=1.234s in the ffmpeg benchmark output.

Acceleration off

This is used by FileConverter by default.

ffmpeg.exe -y -stats -i "input.mp4" -c:v libx264 -preset medium -crf 31 -c:a aac -qscale:a 0.75 -vf "scale=trunc(iw*1/2)*2:trunc(ih*1/2)*2,format=yuv420p" "output.mp4" -benchmark

HW accelerated encoding and decoding, CPU scaling

ffmpeg.exe -y -stats -hwaccel cuda -i "input.mp4" -c:v h264_nvenc -preset medium -crf 31 -c:a aac -qscale:a 0.75 -vf "scale=trunc(iw*1/2)*2:trunc(ih*1/2)*2,format=yuv420p" "output.mp4" -benchmark

Fully HW accelerated transcoding

ffmpeg.exe -y -stats -hwaccel cuda -hwaccel_output_format cuda -i "input.mp4" -c:v h264_nvenc -preset  medium -crf 31 -c:a aac -qscale:a 0.75 -vf "scale_cuda=trunc(iw*1/2)*2:trunc(ih*1/2)*2:format=yuv420p" "output.mp4" -benchmark

Results

input.mp4 is a 30 second long 1920x1080p 176 MB file.

To Mp4 (low quality) (1x scaling)

HW accel off: 14.6s
HW accelerated encode/decode: 5.7s (2.56x faster than base)
Fully accelerated transcode: 5.3s (2.75x faster than base)

To Mp4 (lowER quality) (0.5x scaling)

This preset I made changes the scaling from 100% to 50%.

HW accel off: 6.2s
HW accelerated encode/decode: 4.3s (1.44x faster than base)
Fully accelerated transcode: 3.3s (1.87x faster than base)

broscoi · 2024-03-27T19:00:30Z

I compiled @tacheometry's version and ran a few tests. 3x runs on Hardware acceleration mode = Nvidia (CUDA) & 3x runs on Hardware acceleration mode = Off. Each time the results from CUDA were atleast 2 times faster than with CPU proccessing.

tacheometry · 2024-04-05T22:34:41Z

@Tichau Can I get a review on this?

ItsukaHiro · 2024-04-28T00:31:11Z

I thinks the easiest way is replace the ffmpeg file in file-converter with a self-complied version that enable all gpu-video-process feature enable. So it will work on any machine even it have NVIDIA or Intel or AMD graphic card.

Try some cuda stuff

1952c18

tacheometry mentioned this pull request Mar 26, 2024

Allow use of GPU #392

Open

tacheometry added 3 commits March 26, 2024 16:10

Add setting for hardware acceleration mode

5f7bcad

Use hardware accel setting

57b1b79

Full CUDA transcoding

b8da1fe

tacheometry marked this pull request as ready for review March 27, 2024 18:20

Use localization keys

811dcfc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement hardware acceleration #412

Implement hardware acceleration #412

tacheometry commented Mar 26, 2024 •

edited

tacheometry commented Mar 27, 2024

tacheometry commented Mar 27, 2024 •

edited

broscoi commented Mar 27, 2024

tacheometry commented Apr 5, 2024

ItsukaHiro commented Apr 28, 2024

Implement hardware acceleration #412

Are you sure you want to change the base?

Implement hardware acceleration #412

Conversation

tacheometry commented Mar 26, 2024 • edited

Development

Installation

Testing

Resources:

tacheometry commented Mar 27, 2024

tacheometry commented Mar 27, 2024 • edited

Benchmarks

Commands

Acceleration off

HW accelerated encoding and decoding, CPU scaling

Fully HW accelerated transcoding

Results

To Mp4 (low quality) (1x scaling)

To Mp4 (lowER quality) (0.5x scaling)

broscoi commented Mar 27, 2024

tacheometry commented Apr 5, 2024

ItsukaHiro commented Apr 28, 2024

tacheometry commented Mar 26, 2024 •

edited

tacheometry commented Mar 27, 2024 •

edited