CUDA feature extraction #1080

kylophone · 2022-08-03T18:53:21Z

Initial import for CUDA feature extraction on GPU. There have been three new feature extractors implemented:

integer_adm_cuda.c
integer_motion_cuda.c
integer_vif_cuda.c

Speed-up will depend on hardware, but in general the results show huge improvement.

# 1920x1080, Tesla T4
VMAF version 9474f69c
500 frames ⠀⢐ 324.41 FPS

libcuda is required to build, use meson -Denable_cuda=true to enable.

Co-authored-by: Roman Arzumanyan <rarzumanyan@nvidia.com> Co-authored-by: Sven Middelberg <smiddelberg@nvidia.com> Co-authored-by: Cem Moluluo <cmoluluo@nvidia.com> Co-authored-by: Maximilian Mueller <maximilianm@nvidia.com> Co-authored-by: Kyle Swanson <kswanson@netflix.com> Co-authored-by: Markus Tavenrath <matavenrath@nvidia.com>

libvmaf/meson_options.txt

1480c1

That's a lot of code, congrats

libvmaf/src/cuda/common.h

libvmaf/src/cuda/cuda_helper.cuh

rarzumanyan · 2022-08-19T09:00:29Z

@kylophone @gedoensmax @1480c1:

ninja -vC build install returns error message, it fails to find version.h during the install phase.

How to repro:

git clone https://github.com/Netflix/vmaf.git
git checkout cuda_feature_extraction

python3 -m pip install virtualenv
python3 -m virtualenv .venv
source .venv/bin/activate
pip install meson
sudo apt install nasm ninja-build doxygen xxd

cd libvmaf
mkdir -p build
export PATH=/usr/local/cuda/bin:$PATH
meson -Denable_cuda=true build --buildtype release --prefix $(pwd)/build

ninja -vC build
# FAILS HERE #
ninja -vC build install

Error message:

Installing src/libvmaf.so.1.1.3 to /home/roman/Git/vmaf/libvmaf/build/lib/x86_64-linux-gnu
Installing src/libvmaf.a to /home/roman/Git/vmaf/libvmaf/build/lib/x86_64-linux-gnu
Installing tools/vmaf to /home/roman/Git/vmaf/libvmaf/build/bin
Installing /home/roman/Git/vmaf/libvmaf/include/libvmaf/libvmaf.h to /home/roman/Git/vmaf/libvmaf/build/include/libvmaf/
Installing /home/roman/Git/vmaf/libvmaf/include/libvmaf/compute_vmaf.h to /home/roman/Git/vmaf/libvmaf/build/include/libvmaf/
Installing /home/roman/Git/vmaf/libvmaf/include/libvmaf/feature.h to /home/roman/Git/vmaf/libvmaf/build/include/libvmaf/
Installing /home/roman/Git/vmaf/libvmaf/include/libvmaf/model.h to /home/roman/Git/vmaf/libvmaf/build/include/libvmaf/
Installing /home/roman/Git/vmaf/libvmaf/include/libvmaf/picture.h to /home/roman/Git/vmaf/libvmaf/build/include/libvmaf/
Installing /home/roman/Git/vmaf/libvmaf/build/include/libvmaf/version.h to /home/roman/Git/vmaf/libvmaf/build/include/libvmaf
Traceback (most recent call last):
  File "/home/roman/Git/vmaf/libvmaf/.venv/lib/python3.8/site-packages/mesonbuild/mesonmain.py", line 153, in run
    return options.run_func(options)
  File "/home/roman/Git/vmaf/libvmaf/.venv/lib/python3.8/site-packages/mesonbuild/minstall.py", line 766, in run
    installer.do_install(datafilename)
  File "/home/roman/Git/vmaf/libvmaf/.venv/lib/python3.8/site-packages/mesonbuild/minstall.py", line 546, in do_install
    self.install_headers(d, dm, destdir, fullprefix)
  File "/home/roman/Git/vmaf/libvmaf/.venv/lib/python3.8/site-packages/mesonbuild/minstall.py", line 639, in install_headers
    if self.do_copyfile(fullfilename, outfilename, makedirs=(dm, outdir)):
  File "/home/roman/Git/vmaf/libvmaf/.venv/lib/python3.8/site-packages/mesonbuild/minstall.py", line 424, in do_copyfile
    self.copy2(from_file, to_file)
  File "/home/roman/Git/vmaf/libvmaf/.venv/lib/python3.8/site-packages/mesonbuild/minstall.py", line 324, in copy2
    shutil.copy2(*args, **kwargs)
  File "/usr/lib/python3.8/shutil.py", line 435, in copy2
    copyfile(src, dst, follow_symlinks=follow_symlinks)
  File "/usr/lib/python3.8/shutil.py", line 264, in copyfile
    with open(src, 'rb') as fsrc, open(dst, 'wb') as fdst:
FileNotFoundError: [Errno 2] No such file or directory: '/home/roman/Git/vmaf/libvmaf/build/include/libvmaf/version.h'

ERROR: Unhandled python exception
No such file or directory - (2, 'No such file or directory')
this is probably not a Meson bug.
FAILED: meson-internal__install 
/home/roman/Git/vmaf/libvmaf/.venv/bin/meson install --no-rebuild
ninja: build stopped: subcommand failed.

However there's a verison.h.in file which (I assume) is used to generate version.h by install script.

kylophone · 2022-08-22T15:31:31Z

meson -Denable_cuda=true build --buildtype release --prefix $(pwd)/build

ninja -vC build
# FAILS HERE #
ninja -vC build install

So it looks like it is building just fine, but you are not able to install. Your install prefix is the same as your build directory, which to me seems like it would be a problem. If you want a local install, try --prefix $(pwd)/install and that should work.

rarzumanyan · 2022-08-23T11:30:21Z

@kylophone

Thanks a lot, that indeed solved the issue.

Now I'm facing another:
In order to use cuda-accelerated libvmaf in my application I have to call vmaf_cuda_init() which is declared in cuda.h that IMO isn't the best name because cuda.h header is part of CUDA SDK and libvmaf/cuda.h includes cuda.h. Could you please rename it? Any meaningful unique name will do, e. g. vmaf_cuda.h.

The point is that libvmaf/cuda.h isn't installed by meson to $(prefix)/include/libvmaf hence not accessible by application linked against libvmaf.

gedoensmax · 2022-08-23T12:17:29Z

Some of these issues are fixed by PR #1086. I did not look into the issue with meson install target.

kylophone · 2022-08-23T16:08:42Z

Some of these issues are fixed by PR #1086. I did not look into the issue with meson install target.

Ope, I didn't see this PR in time, so I addressed header name and installation with 8ab1119

fyi @rarzumanyan, you should be unblocked now

rarzumanyan · 2022-08-23T16:55:43Z

@kylophone
One more thing, now regarding the HAVE_CUDA macro:

It's defined in meson file and passed as additional symbol to compiler:

vmaf/libvmaf/src/meson.build

Lines 164 to 167 in 8ab1119

    
           if is_cuda_enabled 
        
               add_languages('cuda') 
        
               cdata.set10('HAVE_CUDA', is_cuda_enabled) 
        
           endif

But it's not put into any of header files, hence when including libvmaf headers in 3rd party application compiler evaluates it as 0:

In file included from /path/to/my/project/calc_vmaf.c:128:
/home/roman/Git/vmaf/libvmaf/install/include/libvmaf/vmaf_cuda.h:24:5: warning: "HAVE_CUDA" is not defined, evaluates to 0 [-Wundef]
   24 | #if HAVE_CUDA
      |     ^~~~~~~~~

Could you please make this macro definition available to 3rd-party applications? I see 2 approaches to this:

Write the macro definition during meson build into a header file and then include this auto-generated header, e. g. in mentioned vmaf_cuda.h
Modify libvmaf.pc pkgconfig file with more cflags:

Cflags: -I${includedir} -I${includedir}/libvmaf -DHAVE_CUDA=1

BTW there's a hard-coded macro redefinition in these files:

vmaf/libvmaf/src/feature/integer_adm_cuda.c

Line 22 in 8ab1119

#define HAVE_CUDA

vmaf/libvmaf/test/test_cuda_pic_preallocation.c

Line 21 in 8ab1119

#define HAVE_CUDA 1

vmaf/libvmaf/test/test_ring_buffer.c

Line 19 in 8ab1119

#define HAVE_CUDA 1

kylophone · 2022-08-23T17:13:04Z

I think I will just remove the HAVE_CUDA guard from the public header altogether, since it shouldn't be part of the libvmaf API IMO. I don't think there should be any issues with that. Let me know what you think, @rarzumanyan.

rarzumanyan · 2022-08-23T17:36:16Z

I think I will just remove the HAVE_CUDA guard from the public header altogether, since it shouldn't be part of the libvmaf API IMO. I don't think there should be any issues with that. Let me know what you think, @rarzumanyan.

IMO that may echo in licensing implications to 3rd party applications compiled against libvmaf headers. In meson we only add source files so one would have to be very careful not to include CUDA by accident via headers. Since CUDA SDK is proprietary I think it's better to keep it surrounded by macro guard.

I lean towards keeping the HAVE_CUDA macro guard and adding -DHAVE_CUDA=1 cflag to pkgconfig file. At least this is what I'm doing in my test application and it doesn't cause any compilation issues.

kylophone · 2022-08-24T18:12:58Z

IMO that may echo in licensing implications to 3rd party applications compiled against libvmaf headers. In meson we only add source files so one would have to be very careful not to include CUDA by accident via headers. Since CUDA SDK is proprietary I think it's better to keep it surrounded by macro guard.

I believe we avoid that problem at build time already. Setting -Denable_cuda conditionally controls both compilation and header installation.

gedoensmax · 2022-10-27T08:11:12Z

@kylophone what else will be needed to merge this into master ? We would love to finish this up.

kylophone · 2022-10-27T17:22:15Z

@kylophone what else will be needed to merge this into master ? We would love to finish this up.

I'll send you a mail.

bluedot-io · 2022-11-01T04:29:29Z

Hello.
I am currently building libvmaf gpu version in aws g4dn instance by referencing the following link.
#1080 (comment)

after "ninja -vC build" error has occurred.

[13/180] /home/ec2-user/vmaf/.venv/bin/meson --internal vcstagger ../include/vcs_version.h.in include/vcs_version.h 2.3.1 /home/ec2-user/vmaf/libvmaf/include @VCS_TAG@ '(.*)' /usr/bin/git --git-dir /home/ec2-user/vmaf/libvmaf/../.git describe --tags --long --match '?.*.*' --always
[14/180] cc -Isrc/libx86_avx2.a.p -Isrc -I../src -I../src/feature/common -fdiagnostics-color=always -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Wextra -std=c11 -O3 -D_GNU_SOURCE -fPIC -mavx -mavx2 -pedantic -DOC_NEW_STYLE_INCLUDES -MD -MQ src/libx86_avx2.a.p/feature_x86_adm_avx2.c.o -MF src/libx86_avx2.a.p/feature_x86_adm_avx2.c.o.d -o src/libx86_avx2.a.p/feature_x86_adm_avx2.c.o -c ../src/feature/x86/adm_avx2.c
[15/180] nvcc -Isrc/libcuda_common_vmaf_lib.a.p -Xcompiler=-Wall,-Winvalid-pch,-Wnon-virtual-dtor,-Wextra -O3 -Xcompiler=-fPIC -I/usr/local/cuda/include -I../src/cuda -I../src/feature -I../src/cuda -I../src/feature/common -I../src -Isrc -I../include -Iinclude -I../src -Isrc -Isrc/libcuda_common_vmaf_lib.a.p -o src/libcuda_common_vmaf_lib.a.p/cuda_integer_adm_adm_csf.cu.o -c ../src/cuda/integer_adm/adm_csf.cu
FAILED: src/libcuda_common_vmaf_lib.a.p/cuda_integer_adm_adm_csf.cu.o
nvcc -Isrc/libcuda_common_vmaf_lib.a.p -Xcompiler=-Wall,-Winvalid-pch,-Wnon-virtual-dtor,-Wextra -O3 -Xcompiler=-fPIC -I/usr/local/cuda/include -I../src/cuda -I../src/feature -I../src/cuda -I../src/feature/common -I../src -Isrc -I../include -Iinclude -I../src -Isrc -Isrc/libcuda_common_vmaf_lib.a.p -o src/libcuda_common_vmaf_lib.a.p/cuda_integer_adm_adm_csf.cu.o -c ../src/cuda/integer_adm/adm_csf.cu
../src/cuda/integer_adm/adm_csf.cu(69): warning: integer conversion resulted in a change of sign

../src/cuda/integer_adm/adm_csf.cu(28): error: calling a __host__ function("__builtin_assume_aligned") from a __device__ function("copy_vec_4<(int)4> ") is not allowed

../src/cuda/integer_adm/adm_csf.cu(29): error: calling a __host__ function("__builtin_assume_aligned") from a __device__ function("copy_vec_4<(int)4> ") is not allowed

../src/cuda/integer_adm/adm_csf.cu(41): error: calling a __host__ function("__builtin_assume_aligned") from a __device__ function("copy_vec_4<(int)4> ") is not allowed

../src/cuda/integer_adm/adm_csf.cu(42): error: calling a __host__ function("__builtin_assume_aligned") from a __device__ function("copy_vec_4<(int)4> ") is not allowed

4 errors detected in the compilation of "../src/cuda/integer_adm/adm_csf.cu".

This is my full log.
libvmaf_gpu_instal_error.txt

(.venv) [ec2-user@ip-172-31-91-240 libvmaf]$ cat /etc/*release*
NAME="Amazon Linux"
VERSION="2"
ID="amzn"
ID_LIKE="centos rhel fedora"
VERSION_ID="2"
PRETTY_NAME="Amazon Linux 2"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2"
HOME_URL="https://amazonlinux.com/"
Amazon Linux release 2 (Karoo)
cpe:2.3:o:amazon:amazon_linux:2

(.venv) [ec2-user@ip-172-31-91-240 libvmaf]$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Tue_Sep_15_19:10:02_PDT_2020
Cuda compilation tools, release 11.1, V11.1.74
Build cuda_11.1.TC455_06.29069683_0
(.venv) [ec2-user@ip-172-31-91-240 libvmaf]$

Can I get help?

gedoensmax · 2022-11-01T11:20:22Z

@bluedot-io To me this looks a lot like a bug in nvcc. Could you try upgrading to a more recent version? Another option is to just comment the __builtin_assume_aligned statements for now. Those are performance related primitives and should only affect the runtime a little.

bluedot-io · 2022-11-01T12:18:05Z

@gedoensmax
Thank you for your reply :)
Can you recommend a version of the nvcc that succeeded in building?
I want the build to succeed without modifying the original code.

gedoensmax · 2022-11-01T12:40:58Z

I can recommend 11.6. Just use docker run --gpus all --rm -v $PWD:/src -w /src -it nvidia/cuda:11.6.1-devel-ubuntu18.04 bash and run below commands. That worked for me.

apt-get update 
apt-get -y install nasm ninja-build doxygen xxd python3.7 python3-pip python3-venv
python3.7 -m pip install venv
python3.7 -m venv /venv
source /venv/bin/activate
python3.7 -m pip install meson
cd /src

mkdir -p install
export PATH=/usr/local/cuda/bin:$PATH
rm -rf ./libvmaf/build
meson libvmaf/build libvmaf -Denable_cuda=true --buildtype release --prefix $(pwd)/install
ninja -vC libvmaf/build 
ninja -vC libvmaf/build  install

bluedot-io · 2022-11-07T09:27:51Z

@gedoensmax
I installed it successfully thanks to your kind installation process.
We measured fps for 4K (500 frames) and 2K (2000 frames).

======== CPU Run
<4K>

(venv) root@d764404a4d1d:/src# ./vmaf/install/bin/vmaf-r ./ref_500_yuv420p.yuv -d ./dis_500_yuv420p.yuv -w 3840 -h 2160 -p 420 -b 8
VMAF version 428f5c82
500 frames ⠀⢐ 2.64 FPS
vmaf_v0.6.1: 97.428043

<2K>

(venv) root@d764404a4d1d:/src# ./vmaf/install/bin/vmaf -r ./ref_500_yuv420p.yuv -d ./dis_500_yuv420p.yuv -w 1920 -h 1080 -p 420 -b 8
VMAF version 428f5c82
2000 frames ⠉⠙ 10.52 FPS
vmaf_v0.6.1: 97.428043

========= GPU Run
<4K>

ubuntu@ip-172-31-24-61:~$ ./vmaf/install/bin/vmaf --cuda -r ./ref_500_yuv420p.yuv -d ./dis_500_yuv420p.yuv -w 3840 -h 2160 -p 420 -b 8
VMAF version 428f5c82
500 frames ⠀⢐ 69.71 FPS
vmaf_v0.6.1: 97.428043

<2K>

ubuntu@ip-172-31-24-61:~$ ./vmaf/install/bin/vmaf --cuda -r ./ref_500_yuv420p.yuv -d ./dis_500_yuv420p.yuv -w 1920 -h 1080 -p 420 -b 8
VMAF version 428f5c82
2000 frames ⠉⠙ 287.30 FPS
vmaf_v0.6.1: 97.428043

I tested it on aws g4dn.2xlarge.
The fps results of GPU porting are very impressive.
nice job!! Thank you!

gedoensmax · 2022-11-09T07:24:35Z

@bluedot-io It is likely that speedup will be even more impressive when using ffmpeg in the hopefully close future. The way the vmaf tool is implemented we noticed that file I/O can not keep up with a fast GPU. E.g. on my RTX6000 a profiling looks like this:

The highlited part depicts the GPU processing time of the CUDA kernels and the CUDA API section shows how much time the CPU takes to launch those kernels. In OS runtime section we can see how much time fread takes up.
FFmepg will solve this issue and also enable decoding to GPU memory directly.

bluedot-io · 2022-11-10T02:04:38Z

@gedoensmax
Thank you for a detailed description.
Wow.. currently the "./vmaf" executable cannot hide the processing time of file I/O.

File I/O is pipelined in ffmpeg, so faster fps results are expected.
In addition, since NVCodec is included in the GPU, the decoding time and memcpy time in GPU can also be reduced.
Overall, I'm looking forward to seeing how much the processing speed will improve.

I expect vmaf on GPU to come soon in ffmpeg!!

alexdns1 · 2022-12-28T18:02:25Z

@gedoensmax @kylophone is there any update on your ffmpeg vmaf work ?

HunterAP23 · 2022-12-28T19:30:14Z

@gedoensmax @kylophone is there any update on your ffmpeg vmaf work ?

It's been 15 days since their last commit - give them time to work on it. People may have a lot more going on in their lives, especially with the holidays, that would take their attention away. Just subscribe to this PR and you'll see when an update comes up.

alexdns1 · 2022-12-28T20:19:45Z

@HunterAP23 hey im not rushing them dont get me wrong :)

kylophone · 2023-01-07T23:31:38Z

Replaced by #1152.

1480c1 reviewed Aug 3, 2022

View reviewed changes

libvmaf/meson_options.txt Outdated Show resolved Hide resolved

1480c1 reviewed Aug 3, 2022

View reviewed changes

libvmaf/src/cuda/common.h Outdated Show resolved Hide resolved

libvmaf/src/cuda/cuda_helper.cuh Outdated Show resolved Hide resolved

libvmaf/src/cuda/cuda_helper.cuh Outdated Show resolved Hide resolved

kylophone added 2 commits August 9, 2022 09:53

meson_options: enable_cuda default to false

773162d

meson: fix cuda detection

c33c9cf

fomat changes and adding header guards

add9990

rename libvmaf cuda header, install

8ab1119

gedoensmax added 3 commits August 24, 2022 12:03

fix: input value checks in ring_buffer cuda common

b930977

enable dynamic choice of GPU processing

118a566

removing hard coded HAVE_CUDA

d6e7da2

gedoensmax and others added 7 commits September 18, 2022 13:11

introducing a gpumask for dynamic FEX selection

815ce0d

api: do not export or require HAVE_CUDA macro

a2d09dc

api: introduce vmaf_cuda_state.h

2fdb7e6

api: do not export or require HAVE_CUDA macro

d4d9d4f

api: introduce vmaf_cuda_state.h

fc05e67

Merge branch 'cuda_feature_extraction' into cuda_feature_extraction

71d1fcd

fix: alignemnt misspecified

428f5c8

gedoensmax mentioned this pull request Nov 11, 2022

How to use Nvidia GPU to accelerate VMAF measuring? #535

Closed

gedoensmax and others added 9 commits November 22, 2022 14:17

motion_cuda: change to driver API kernel launch

0a31d62

removed CUDA runtime dependency

8a546fb

removed unneeded header files

9464a63

minor code cleanup

1bc51f4

libvmaf/cuda: gpumask logic, enable parallel cpu/gpu feature extraction

7a07896

fix cpu-only build

2f46b48

cuda/common: cleanup

cbb2f81

s/CudaVmafBuffer/VmafCudaBuffer

a8a528c

vmaf_get_feature_extractor_by_feature_name: fix header

03ee30c

kylophone mentioned this pull request Jan 7, 2023

CUDA feature extraction #1152

Merged

kylophone closed this Jan 7, 2023

teb mentioned this pull request Mar 8, 2023

Add support for VMAF Cuda processing? fifonik/FFMetrics#114

Open

Mlypse mentioned this pull request Jul 17, 2023

ninja build fails when i enable_cuda #1191

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA feature extraction #1080

CUDA feature extraction #1080

kylophone commented Aug 3, 2022 •

edited

Loading

1480c1 left a comment

rarzumanyan commented Aug 19, 2022

kylophone commented Aug 22, 2022 •

edited

Loading

rarzumanyan commented Aug 23, 2022 •

edited

Loading

gedoensmax commented Aug 23, 2022

kylophone commented Aug 23, 2022 •

edited

Loading

rarzumanyan commented Aug 23, 2022 •

edited

Loading

kylophone commented Aug 23, 2022

rarzumanyan commented Aug 23, 2022

kylophone commented Aug 24, 2022

gedoensmax commented Oct 27, 2022

kylophone commented Oct 27, 2022

bluedot-io commented Nov 1, 2022 •

edited

Loading

gedoensmax commented Nov 1, 2022

bluedot-io commented Nov 1, 2022

gedoensmax commented Nov 1, 2022

bluedot-io commented Nov 7, 2022 •

edited

Loading

gedoensmax commented Nov 9, 2022

bluedot-io commented Nov 10, 2022 •

edited

Loading

alexdns1 commented Dec 28, 2022

HunterAP23 commented Dec 28, 2022

alexdns1 commented Dec 28, 2022

kylophone commented Jan 7, 2023

CUDA feature extraction #1080

CUDA feature extraction #1080

Conversation

kylophone commented Aug 3, 2022 • edited Loading

1480c1 left a comment

Choose a reason for hiding this comment

rarzumanyan commented Aug 19, 2022

kylophone commented Aug 22, 2022 • edited Loading

rarzumanyan commented Aug 23, 2022 • edited Loading

gedoensmax commented Aug 23, 2022

kylophone commented Aug 23, 2022 • edited Loading

rarzumanyan commented Aug 23, 2022 • edited Loading

kylophone commented Aug 23, 2022

rarzumanyan commented Aug 23, 2022

kylophone commented Aug 24, 2022

gedoensmax commented Oct 27, 2022

kylophone commented Oct 27, 2022

bluedot-io commented Nov 1, 2022 • edited Loading

gedoensmax commented Nov 1, 2022

bluedot-io commented Nov 1, 2022

gedoensmax commented Nov 1, 2022

bluedot-io commented Nov 7, 2022 • edited Loading

gedoensmax commented Nov 9, 2022

bluedot-io commented Nov 10, 2022 • edited Loading

alexdns1 commented Dec 28, 2022

HunterAP23 commented Dec 28, 2022

alexdns1 commented Dec 28, 2022

kylophone commented Jan 7, 2023

kylophone commented Aug 3, 2022 •

edited

Loading

kylophone commented Aug 22, 2022 •

edited

Loading

rarzumanyan commented Aug 23, 2022 •

edited

Loading

kylophone commented Aug 23, 2022 •

edited

Loading

rarzumanyan commented Aug 23, 2022 •

edited

Loading

bluedot-io commented Nov 1, 2022 •

edited

Loading

bluedot-io commented Nov 7, 2022 •

edited

Loading

bluedot-io commented Nov 10, 2022 •

edited

Loading