Transition VMAF to use nv-codec-headers instead of driver linking#1436
Transition VMAF to use nv-codec-headers instead of driver linking#1436kylophone merged 1 commit intoNetflix:masterfrom
Conversation
|
You can PR them to https://code.ffmpeg.org/FFmpeg/nv-codec-headers if you like, adding new driver functions is never a huge deal. You can most likely also get rid of the nvcc dependency by just using clang, like FFmpeg does. Practically every distribution of clang comes with nvptx support. You'll need to build a small header that defines all the interfaces you use though, but usually it's not all that many. |
|
@BtbN would it be possible for you to help me with these changes ? For me this contribution has to go through legal etc., I am working with the Aachen DevTech team :) |
|
Something I just noticed: You are still using cudart, aren't you? |
|
I am only querying for cudart to get the path to nvcc. But yes you are right I forgot to delete |
|
Interesting, I'd have thought that adding cuda_rt_api_dependency to cuda_dependency, which ultimately ends up in the dependencies of the lib, would link against it. Might have misread the meson script then. |
|
I do not understand meson very well :D I just relied on |
Made a PR here: https://code.ffmpeg.org/FFmpeg/nv-codec-headers/pulls/1. Let's review over on Forgejo. @gedoensmax Could you please test your libvmaf changes with the updated nv-codec-headers and let me know if everything is working? |
|
@kylophone I adopted the remaining functions from your patch - thanks for the quick turnaround. @BtbN Also thanks for noting that there was still some dependency on CUDA RT that was accidentally introduced indeed and the build was failing without since from debugging it was still adding the |
|
I managed to use clang to compile all CUDA code, but I did not manage to compile without CUDA toolkit present since a lot to device intrinsics are not defined in the ffmpeg cuda_runtime headers. NVCC compilation is much more reliable and gives a better user experience since no PTX compilation at runtime is needed. I also updated the Dockerfiles so that they will work as shown below, to explicitly disable gpu one can also use the |
|
The main advantage of using clang to compile to PTX code is that you can build for a really old SM if you don't need any more modern features, and the resulting PTX code will work with a wide range of drivers. While when using latest nvcc, it can only compile for sm75 and up, locking out all GPUs older than RTX2000 series, even though older ones could easily run the kernels. |
|
@BtbN I get the reasoning behind clang which is why I made it easy to switch to clang compilation. Due to PTX compilation time at startup I decided to keep the default on NVCC. Especially inside a container that is an issue otherwise. |
|
@gedoensmax If this is ready to merge, please squash and I can push. |
6acf6d2 to
565ac41
Compare
|
All squashed and ready to be merged. |
ab24ad7 to
ee7b952
Compare
|
Sorry for the few force pushes - I introduced a problem with nvtx compile. It is now possible to compile with nvtx enabled but cuda disabled. That could help with CPU time measuring for example or for mixed CUDA and CPU profiling. |
ee7b952 to
7a016d6
Compare
|
@gedoensmax Let me know when this is ready. No rush, I will wait for your ping. |
|
@kylophone It is ready I just noticed some minor things on the same day that I fixed. Therefore the force pushes, otherwise I am happy to merge this. |
… clang compilation Details: - updated docker docs to no longer require separate CUDA container - enable CUDA compilation using clang (CUDA Toolkit libs and headers still required) - use nv-codec-headers for runtime loading (thanks @BtbN) - remove redundant CUDA event recreation
685124b to
9b0ad54
Compare
This still requires NVCC to be installed for now but it would at least enable shipping a CUDA prebuilt library with ffmpeg as default I believe as the CUDA driver is dynamically loaded.
@kylophone Would it be possible to build this setup in the current CI ?
@BtbN to dynamically load the cuda driver I relied on your nv-codec-headers, but these are missing some driver functions we have been using in VMAF. Would it be possible to add these ?
While there is no graph usage so far it would be great to have CUDA graph support as well, but these seem to be quite a few functions.