Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA feature extraction #1080

Closed
wants to merge 24 commits into from
Closed

CUDA feature extraction #1080

wants to merge 24 commits into from

Conversation

kylophone
Copy link
Collaborator

@kylophone kylophone commented Aug 3, 2022

Initial import for CUDA feature extraction on GPU. There have been three new feature extractors implemented:

  • integer_adm_cuda.c
  • integer_motion_cuda.c
  • integer_vif_cuda.c

Speed-up will depend on hardware, but in general the results show huge improvement.

# 1920x1080, Tesla T4
VMAF version 9474f69c
500 frames ⠀⢐ 324.41 FPS

libcuda is required to build, use meson -Denable_cuda=true to enable.

Co-authored-by: Roman Arzumanyan <rarzumanyan@nvidia.com>
Co-authored-by: Sven Middelberg <smiddelberg@nvidia.com>
Co-authored-by: Cem Moluluo <cmoluluo@nvidia.com>
Co-authored-by: Maximilian Mueller <maximilianm@nvidia.com>
Co-authored-by: Kyle Swanson <kswanson@netflix.com>
Co-authored-by: Markus Tavenrath <matavenrath@nvidia.com>
libvmaf/meson_options.txt Outdated Show resolved Hide resolved
Copy link
Contributor

@1480c1 1480c1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a lot of code, congrats

libvmaf/src/cuda/common.h Outdated Show resolved Hide resolved
libvmaf/src/cuda/cuda_helper.cuh Outdated Show resolved Hide resolved
libvmaf/src/cuda/cuda_helper.cuh Outdated Show resolved Hide resolved
@rarzumanyan
Copy link
Contributor

@kylophone @gedoensmax @1480c1:

ninja -vC build install returns error message, it fails to find version.h during the install phase.

How to repro:

git clone https://github.com/Netflix/vmaf.git
git checkout cuda_feature_extraction

python3 -m pip install virtualenv
python3 -m virtualenv .venv
source .venv/bin/activate
pip install meson
sudo apt install nasm ninja-build doxygen xxd

cd libvmaf
mkdir -p build
export PATH=/usr/local/cuda/bin:$PATH
meson -Denable_cuda=true build --buildtype release --prefix $(pwd)/build

ninja -vC build
# FAILS HERE #
ninja -vC build install

Error message:

Installing src/libvmaf.so.1.1.3 to /home/roman/Git/vmaf/libvmaf/build/lib/x86_64-linux-gnu
Installing src/libvmaf.a to /home/roman/Git/vmaf/libvmaf/build/lib/x86_64-linux-gnu
Installing tools/vmaf to /home/roman/Git/vmaf/libvmaf/build/bin
Installing /home/roman/Git/vmaf/libvmaf/include/libvmaf/libvmaf.h to /home/roman/Git/vmaf/libvmaf/build/include/libvmaf/
Installing /home/roman/Git/vmaf/libvmaf/include/libvmaf/compute_vmaf.h to /home/roman/Git/vmaf/libvmaf/build/include/libvmaf/
Installing /home/roman/Git/vmaf/libvmaf/include/libvmaf/feature.h to /home/roman/Git/vmaf/libvmaf/build/include/libvmaf/
Installing /home/roman/Git/vmaf/libvmaf/include/libvmaf/model.h to /home/roman/Git/vmaf/libvmaf/build/include/libvmaf/
Installing /home/roman/Git/vmaf/libvmaf/include/libvmaf/picture.h to /home/roman/Git/vmaf/libvmaf/build/include/libvmaf/
Installing /home/roman/Git/vmaf/libvmaf/build/include/libvmaf/version.h to /home/roman/Git/vmaf/libvmaf/build/include/libvmaf
Traceback (most recent call last):
  File "/home/roman/Git/vmaf/libvmaf/.venv/lib/python3.8/site-packages/mesonbuild/mesonmain.py", line 153, in run
    return options.run_func(options)
  File "/home/roman/Git/vmaf/libvmaf/.venv/lib/python3.8/site-packages/mesonbuild/minstall.py", line 766, in run
    installer.do_install(datafilename)
  File "/home/roman/Git/vmaf/libvmaf/.venv/lib/python3.8/site-packages/mesonbuild/minstall.py", line 546, in do_install
    self.install_headers(d, dm, destdir, fullprefix)
  File "/home/roman/Git/vmaf/libvmaf/.venv/lib/python3.8/site-packages/mesonbuild/minstall.py", line 639, in install_headers
    if self.do_copyfile(fullfilename, outfilename, makedirs=(dm, outdir)):
  File "/home/roman/Git/vmaf/libvmaf/.venv/lib/python3.8/site-packages/mesonbuild/minstall.py", line 424, in do_copyfile
    self.copy2(from_file, to_file)
  File "/home/roman/Git/vmaf/libvmaf/.venv/lib/python3.8/site-packages/mesonbuild/minstall.py", line 324, in copy2
    shutil.copy2(*args, **kwargs)
  File "/usr/lib/python3.8/shutil.py", line 435, in copy2
    copyfile(src, dst, follow_symlinks=follow_symlinks)
  File "/usr/lib/python3.8/shutil.py", line 264, in copyfile
    with open(src, 'rb') as fsrc, open(dst, 'wb') as fdst:
FileNotFoundError: [Errno 2] No such file or directory: '/home/roman/Git/vmaf/libvmaf/build/include/libvmaf/version.h'

ERROR: Unhandled python exception
No such file or directory - (2, 'No such file or directory')
this is probably not a Meson bug.
FAILED: meson-internal__install 
/home/roman/Git/vmaf/libvmaf/.venv/bin/meson install --no-rebuild
ninja: build stopped: subcommand failed.

However there's a verison.h.in file which (I assume) is used to generate version.h by install script.

@kylophone
Copy link
Collaborator Author

kylophone commented Aug 22, 2022

meson -Denable_cuda=true build --buildtype release --prefix $(pwd)/build

ninja -vC build
# FAILS HERE #
ninja -vC build install

So it looks like it is building just fine, but you are not able to install. Your install prefix is the same as your build directory, which to me seems like it would be a problem. If you want a local install, try --prefix $(pwd)/install and that should work.

@rarzumanyan
Copy link
Contributor

rarzumanyan commented Aug 23, 2022

@kylophone

Thanks a lot, that indeed solved the issue.

Now I'm facing another:
In order to use cuda-accelerated libvmaf in my application I have to call vmaf_cuda_init() which is declared in cuda.h that IMO isn't the best name because cuda.h header is part of CUDA SDK and libvmaf/cuda.h includes cuda.h. Could you please rename it? Any meaningful unique name will do, e. g. vmaf_cuda.h.

The point is that libvmaf/cuda.h isn't installed by meson to $(prefix)/include/libvmaf hence not accessible by application linked against libvmaf.

@gedoensmax
Copy link
Contributor

Some of these issues are fixed by PR #1086. I did not look into the issue with meson install target.

@kylophone
Copy link
Collaborator Author

kylophone commented Aug 23, 2022

Some of these issues are fixed by PR #1086. I did not look into the issue with meson install target.

Ope, I didn't see this PR in time, so I addressed header name and installation with 8ab1119

fyi @rarzumanyan, you should be unblocked now

@rarzumanyan
Copy link
Contributor

rarzumanyan commented Aug 23, 2022

@kylophone
One more thing, now regarding the HAVE_CUDA macro:

It's defined in meson file and passed as additional symbol to compiler:

if is_cuda_enabled
add_languages('cuda')
cdata.set10('HAVE_CUDA', is_cuda_enabled)
endif

But it's not put into any of header files, hence when including libvmaf headers in 3rd party application compiler evaluates it as 0:

In file included from /path/to/my/project/calc_vmaf.c:128:
/home/roman/Git/vmaf/libvmaf/install/include/libvmaf/vmaf_cuda.h:24:5: warning: "HAVE_CUDA" is not defined, evaluates to 0 [-Wundef]
   24 | #if HAVE_CUDA
      |     ^~~~~~~~~

Could you please make this macro definition available to 3rd-party applications? I see 2 approaches to this:

  1. Write the macro definition during meson build into a header file and then include this auto-generated header, e. g. in mentioned vmaf_cuda.h
  2. Modify libvmaf.pc pkgconfig file with more cflags:
Cflags: -I${includedir} -I${includedir}/libvmaf -DHAVE_CUDA=1

BTW there's a hard-coded macro redefinition in these files:

#define HAVE_CUDA 1

@kylophone
Copy link
Collaborator Author

I think I will just remove the HAVE_CUDA guard from the public header altogether, since it shouldn't be part of the libvmaf API IMO. I don't think there should be any issues with that. Let me know what you think, @rarzumanyan.

@rarzumanyan
Copy link
Contributor

I think I will just remove the HAVE_CUDA guard from the public header altogether, since it shouldn't be part of the libvmaf API IMO. I don't think there should be any issues with that. Let me know what you think, @rarzumanyan.

IMO that may echo in licensing implications to 3rd party applications compiled against libvmaf headers. In meson we only add source files so one would have to be very careful not to include CUDA by accident via headers. Since CUDA SDK is proprietary I think it's better to keep it surrounded by macro guard.

I lean towards keeping the HAVE_CUDA macro guard and adding -DHAVE_CUDA=1 cflag to pkgconfig file. At least this is what I'm doing in my test application and it doesn't cause any compilation issues.

@kylophone
Copy link
Collaborator Author

IMO that may echo in licensing implications to 3rd party applications compiled against libvmaf headers. In meson we only add source files so one would have to be very careful not to include CUDA by accident via headers. Since CUDA SDK is proprietary I think it's better to keep it surrounded by macro guard.

I believe we avoid that problem at build time already. Setting -Denable_cuda conditionally controls both compilation and header installation.

@gedoensmax
Copy link
Contributor

@kylophone what else will be needed to merge this into master ? We would love to finish this up.

@kylophone
Copy link
Collaborator Author

@kylophone what else will be needed to merge this into master ? We would love to finish this up.

I'll send you a mail.

@bluedot-io
Copy link

bluedot-io commented Nov 1, 2022

Hello.
I am currently building libvmaf gpu version in aws g4dn instance by referencing the following link.
#1080 (comment)

after "ninja -vC build" error has occurred.

[13/180] /home/ec2-user/vmaf/.venv/bin/meson --internal vcstagger ../include/vcs_version.h.in include/vcs_version.h 2.3.1 /home/ec2-user/vmaf/libvmaf/include @VCS_TAG@ '(.*)' /usr/bin/git --git-dir /home/ec2-user/vmaf/libvmaf/../.git describe --tags --long --match '?.*.*' --always
[14/180] cc -Isrc/libx86_avx2.a.p -Isrc -I../src -I../src/feature/common -fdiagnostics-color=always -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Wextra -std=c11 -O3 -D_GNU_SOURCE -fPIC -mavx -mavx2 -pedantic -DOC_NEW_STYLE_INCLUDES -MD -MQ src/libx86_avx2.a.p/feature_x86_adm_avx2.c.o -MF src/libx86_avx2.a.p/feature_x86_adm_avx2.c.o.d -o src/libx86_avx2.a.p/feature_x86_adm_avx2.c.o -c ../src/feature/x86/adm_avx2.c
[15/180] nvcc -Isrc/libcuda_common_vmaf_lib.a.p -Xcompiler=-Wall,-Winvalid-pch,-Wnon-virtual-dtor,-Wextra -O3 -Xcompiler=-fPIC -I/usr/local/cuda/include -I../src/cuda -I../src/feature -I../src/cuda -I../src/feature/common -I../src -Isrc -I../include -Iinclude -I../src -Isrc -Isrc/libcuda_common_vmaf_lib.a.p -o src/libcuda_common_vmaf_lib.a.p/cuda_integer_adm_adm_csf.cu.o -c ../src/cuda/integer_adm/adm_csf.cu
FAILED: src/libcuda_common_vmaf_lib.a.p/cuda_integer_adm_adm_csf.cu.o
nvcc -Isrc/libcuda_common_vmaf_lib.a.p -Xcompiler=-Wall,-Winvalid-pch,-Wnon-virtual-dtor,-Wextra -O3 -Xcompiler=-fPIC -I/usr/local/cuda/include -I../src/cuda -I../src/feature -I../src/cuda -I../src/feature/common -I../src -Isrc -I../include -Iinclude -I../src -Isrc -Isrc/libcuda_common_vmaf_lib.a.p -o src/libcuda_common_vmaf_lib.a.p/cuda_integer_adm_adm_csf.cu.o -c ../src/cuda/integer_adm/adm_csf.cu
../src/cuda/integer_adm/adm_csf.cu(69): warning: integer conversion resulted in a change of sign

../src/cuda/integer_adm/adm_csf.cu(28): error: calling a __host__ function("__builtin_assume_aligned") from a __device__ function("copy_vec_4<(int)4> ") is not allowed

../src/cuda/integer_adm/adm_csf.cu(29): error: calling a __host__ function("__builtin_assume_aligned") from a __device__ function("copy_vec_4<(int)4> ") is not allowed

../src/cuda/integer_adm/adm_csf.cu(41): error: calling a __host__ function("__builtin_assume_aligned") from a __device__ function("copy_vec_4<(int)4> ") is not allowed

../src/cuda/integer_adm/adm_csf.cu(42): error: calling a __host__ function("__builtin_assume_aligned") from a __device__ function("copy_vec_4<(int)4> ") is not allowed

4 errors detected in the compilation of "../src/cuda/integer_adm/adm_csf.cu".

This is my full log.
libvmaf_gpu_instal_error.txt

(.venv) [ec2-user@ip-172-31-91-240 libvmaf]$ cat /etc/*release*
NAME="Amazon Linux"
VERSION="2"
ID="amzn"
ID_LIKE="centos rhel fedora"
VERSION_ID="2"
PRETTY_NAME="Amazon Linux 2"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2"
HOME_URL="https://amazonlinux.com/"
Amazon Linux release 2 (Karoo)
cpe:2.3:o:amazon:amazon_linux:2
(.venv) [ec2-user@ip-172-31-91-240 libvmaf]$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Tue_Sep_15_19:10:02_PDT_2020
Cuda compilation tools, release 11.1, V11.1.74
Build cuda_11.1.TC455_06.29069683_0
(.venv) [ec2-user@ip-172-31-91-240 libvmaf]$

Can I get help?

@gedoensmax
Copy link
Contributor

@bluedot-io To me this looks a lot like a bug in nvcc. Could you try upgrading to a more recent version? Another option is to just comment the __builtin_assume_aligned statements for now. Those are performance related primitives and should only affect the runtime a little.

@bluedot-io
Copy link

@gedoensmax
Thank you for your reply :)
Can you recommend a version of the nvcc that succeeded in building?
I want the build to succeed without modifying the original code.

@gedoensmax
Copy link
Contributor

I can recommend 11.6. Just use docker run --gpus all --rm -v $PWD:/src -w /src -it nvidia/cuda:11.6.1-devel-ubuntu18.04 bash and run below commands. That worked for me.

apt-get update 
apt-get -y install nasm ninja-build doxygen xxd python3.7 python3-pip python3-venv
python3.7 -m pip install venv
python3.7 -m venv /venv
source /venv/bin/activate
python3.7 -m pip install meson
cd /src

mkdir -p install
export PATH=/usr/local/cuda/bin:$PATH
rm -rf ./libvmaf/build
meson libvmaf/build libvmaf -Denable_cuda=true --buildtype release --prefix $(pwd)/install
ninja -vC libvmaf/build 
ninja -vC libvmaf/build  install

@bluedot-io
Copy link

bluedot-io commented Nov 7, 2022

@gedoensmax
I installed it successfully thanks to your kind installation process.
We measured fps for 4K (500 frames) and 2K (2000 frames).

======== CPU Run
<4K>

(venv) root@d764404a4d1d:/src# ./vmaf/install/bin/vmaf-r ./ref_500_yuv420p.yuv -d ./dis_500_yuv420p.yuv -w 3840 -h 2160 -p 420 -b 8
VMAF version 428f5c82
500 frames ⠀⢐ 2.64 FPS
vmaf_v0.6.1: 97.428043

<2K>

(venv) root@d764404a4d1d:/src# ./vmaf/install/bin/vmaf -r ./ref_500_yuv420p.yuv -d ./dis_500_yuv420p.yuv -w 1920 -h 1080 -p 420 -b 8
VMAF version 428f5c82
2000 frames ⠉⠙ 10.52 FPS
vmaf_v0.6.1: 97.428043

========= GPU Run
<4K>

ubuntu@ip-172-31-24-61:~$ ./vmaf/install/bin/vmaf --cuda -r ./ref_500_yuv420p.yuv -d ./dis_500_yuv420p.yuv -w 3840 -h 2160 -p 420 -b 8
VMAF version 428f5c82
500 frames ⠀⢐ 69.71 FPS
vmaf_v0.6.1: 97.428043

<2K>

ubuntu@ip-172-31-24-61:~$ ./vmaf/install/bin/vmaf --cuda -r ./ref_500_yuv420p.yuv -d ./dis_500_yuv420p.yuv -w 1920 -h 1080 -p 420 -b 8
VMAF version 428f5c82
2000 frames ⠉⠙ 287.30 FPS
vmaf_v0.6.1: 97.428043

I tested it on aws g4dn.2xlarge.
The fps results of GPU porting are very impressive.
nice job!! Thank you!

@gedoensmax
Copy link
Contributor

@bluedot-io It is likely that speedup will be even more impressive when using ffmpeg in the hopefully close future. The way the vmaf tool is implemented we noticed that file I/O can not keep up with a fast GPU. E.g. on my RTX6000 a profiling looks like this:
image
The highlited part depicts the GPU processing time of the CUDA kernels and the CUDA API section shows how much time the CPU takes to launch those kernels. In OS runtime section we can see how much time fread takes up.
FFmepg will solve this issue and also enable decoding to GPU memory directly.

@bluedot-io
Copy link

bluedot-io commented Nov 10, 2022

@gedoensmax
Thank you for a detailed description.
Wow.. currently the "./vmaf" executable cannot hide the processing time of file I/O.

File I/O is pipelined in ffmpeg, so faster fps results are expected.
In addition, since NVCodec is included in the GPU, the decoding time and memcpy time in GPU can also be reduced.
Overall, I'm looking forward to seeing how much the processing speed will improve.

I expect vmaf on GPU to come soon in ffmpeg!!

@alexdns1
Copy link

@gedoensmax @kylophone is there any update on your ffmpeg vmaf work ?

@HunterAP23
Copy link

@gedoensmax @kylophone is there any update on your ffmpeg vmaf work ?

It's been 15 days since their last commit - give them time to work on it. People may have a lot more going on in their lives, especially with the holidays, that would take their attention away. Just subscribe to this PR and you'll see when an update comes up.

@alexdns1
Copy link

@HunterAP23 hey im not rushing them dont get me wrong :)

@kylophone
Copy link
Collaborator Author

Replaced by #1152.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants