Include dynlink_nvcuvid.h for CUDA >= 9.0. #13

renewagner · 2018-04-12T19:20:17Z

CUDA >= 9.0 no longer ships nvcuvid.h so depending
on the value of CUDART_VERSION we include
dynlink_nvcuvid.h instead.

Fixes #12.

__Initial setup comments from Tom ./diff.review.txt Next step: 1. retouch exposed interface. get rid of the c layer 2. remove singleton -> check if multiple initialization would crash ffmpeg -> otherwise, static member 3. investigate buffer passing with ffmpeg -> otherwise, copy buffer after encoding/decoding Later issue: 1. CMake issue (platform/lib) -> handle later 2. use heavy comment for API interface

__restructured API Next step: Filling nvpipecodec264: 1. multiple registration of ffmpeg is safe! 2. investigate buffer passing with ffmpeg

Issue: encoder is has delay! Next step: 1. nvenc -> encoder is waiting for multiple input frame before it starts output packet. 2. register frame data 3. send_frame receive_packet & send_packet receive_frame

__basic functionality working Basic coding/decoding working cleaned code, added comment Next step: 1. decoding latency --> probably just some flag 2. resizing image check ffmpeg example what would be the best strategy? 3. format conversion

format conversion finished conversion using CUDA. pending work: 1. isolate encode/decode format

Next: 1. decoding part: update frame parameters -> how? 2. encoding part: 1 frame -> 1 packet 3. dummy access delimiter NAL

Benchmark application added. to do: 1. add image conversion class/struct to avoid unnecessary reallocation of cuda memory 2. resize window should require re allocate codec context

issue resolved: cuda memory reallocation. Added function with reuse memory. encoding frame change does not necessarily lead to rewrap of encoding AVFrame when format conversion is needed. Todo: Resizing -> close/open context again.

Resizing finished. Currently seeing a latency of 1.5-2s for image resizing. ToDo: figure out the overhead from FFMPEG get data!!!

experiment data collected. Next: 1. nv12 channel Cr Cb seems to be off. 2. Paraview!!! 3. write script for generic data collection

h264 working inside paraview. Next: temporary solution: exposing interface to choose between h264 or nvcodec. made working by adding YUV420P to RGBA conversion. modified: libnvpipeutil/kernels.cu modified: libnvpipeutil/util.cxx

Working version with Paraview h264 in both libx and nv codec is working fine inside paraview. To have it built on Tom's machine for benchmark test... Next: modify CMakeLists.txt to enable: optional CUDA directory optional FFmpeg directory

nvprofiler nvtx flags added. Config file added for local installation modified: cmake/Config.cmake.in

nvprofiler nvtx flags added. might as well place a flag for it. Config file added for local installation modified: cmake/Config.cmake.in

built to shared libraries. Tested compatibility. Next: clean code and prepare for early release

format conversion fixed to enable line-align for ffmpeg library now works with different size of images.

code cleaned. appending Tom's approval Next: (consult Eric) 1. Resizing 2. Odd resolution 3. H265 + lossless H264

added client/server code for streaming test

swap server/client ( for testing with dt07 and desktop )

cleaned up server/client for paraview test

code modified per Tom's comments Next: change kernels.cu from clamp to saturate clean FFmpeg library dependency

fixed default bitrate using f_m = 4 instead of 1

Updated some other issue brought back the /doc/example/test.c for testing

They could improve with future HW / SDK APIs.

should've been from the beginning.

This changes the error codes to use cudaError's, where possible. Notably, the underlying NvPipe errors used to be a subset of the CUresult codes; now they are a subset of the cudaError_t's codes.

Simplifies things. The only reason we did it separately earlier was to give the user an opportunity to setup PTX paths. But we don't need PTX paths anymore, because we just use the runtime api now.

support input from device memory. The issue is that currently the video codec SDK APIs *only* support input from host memory. As such, we needed to do a synchronous copy before starting our work. While convenient for the user, it was deemed better to force them to do the copy themselves, in the hope that they will do so intelligently and asynchronously, with a sync point before calling NVPipe. We should revisit if/when the video codec SDK ever supports device memory for input data on decode. In the meantime, at least input data is rather small. Note no change for output data: we can and still do support efficiently pushing that to a device buffer.

Decoder is now properly resized when the frame size increases.

NvPipe now supports decode output to RGBA8888-structured memory. The alpha channel is set to 255. Additionally, an EGL-based demo application is now included, which should come in handy for headless clusters without X.

Small tiles are automatically padded to the minimum size, i.e. 48x32.

3.2 doesn't have the policy *to* set, so just use 3.3 instead for the `IN_LIST` operator.

Cmake fixes

CUDA >= 9.0 no longer ships nvcuvid.h so depending on the value of CUDART_VERSION we include dynlink_nvcuvid.h instead. Fixes NVIDIA#12.

Luyang1125 · 2018-04-18T02:32:54Z

Hi,
Thank you very much! But I still get this error, can you help me?

[ 12%] Building C object CMakeFiles/nvpipe.dir/decode.c.o
In file included from /usr/local/cuda/include/dynlink_nvcuvid.h:38:0,
from /home/luyang/Documents/NvPipe-master/decode.c:44:
/usr/local/cuda/include/dynlink_cuviddec.h:811:1: error: unknown type name ‘class’
class CCtxAutoLock
^
/usr/local/cuda/include/dynlink_cuviddec.h:812:1: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘attribute’ before ‘{’ token
{
^
CMakeFiles/nvpipe.dir/build.make:224: recipe for target 'CMakeFiles/nvpipe.dir/decode.c.o' failed
make[2]: *** [CMakeFiles/nvpipe.dir/decode.c.o] Error 1
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/nvpipe.dir/all' failed
make[1]: *** [CMakeFiles/nvpipe.dir/all] Error 2
Makefile:127: recipe for target 'all' failed
make: *** [all] Error 2

tfogal · 2018-04-18T03:51:22Z

@renewagner: thanks much for the fix! Will take a look over the weekend.

@Luyang1125: yeah, sorry about this. Internally this is bug 1937795, should you want to ask an NVIDIAn about this in the future. I cannot comment on a timeline but I did ping some people internally.
In the meantime: the problem is that some C++ code is not properly guarded by #ifdef __cplusplus extern "C" guards, so a potential fix is to just hack your C++ compiler in for CMAKE_C_COMPILER when you invoke CMake.

Luyang1125 · 2018-04-18T03:59:05Z

@tfogal Thanks.
Do you think it should work if I use Cuda 8.0 instead?

tfogal · 2018-04-18T04:03:27Z

It kills me to recommend that people use CUDA 8.0 at this point; there are a lot of great things in the 9 series, and the next release will have a lot more that I'm excited to see.

... but it is certainly true that NVPipe has seen more vetting on CUDA 8 than CUDA 9, at present.

jjsjann123 added 30 commits July 20, 2016 17:09

[v_0.1]

29f3a16

__restructured API Next step: Filling nvpipecodec264: 1. multiple registration of ffmpeg is safe! 2. investigate buffer passing with ffmpeg

[v_0.2]

044ee5e

Issue: encoder is has delay! Next step: 1. nvenc -> encoder is waiting for multiple input frame before it starts output packet. 2. register frame data 3. send_frame receive_packet & send_packet receive_frame

[v_0.3]

b0a5c7c

__basic functionality working Basic coding/decoding working cleaned code, added comment Next step: 1. decoding latency --> probably just some flag 2. resizing image check ffmpeg example what would be the best strategy? 3. format conversion

[v0.4]

855633e

format conversion finished conversion using CUDA. pending work: 1. isolate encode/decode format

[v1.0]

a6c1467

Next: 1. decoding part: update frame parameters -> how? 2. encoding part: 1 frame -> 1 packet 3. dummy access delimiter NAL

[v1.1]

3e4c324

Benchmark application added. to do: 1. add image conversion class/struct to avoid unnecessary reallocation of cuda memory 2. resize window should require re allocate codec context

[v1.2]

b79c5d0

issue resolved: cuda memory reallocation. Added function with reuse memory. encoding frame change does not necessarily lead to rewrap of encoding AVFrame when format conversion is needed. Todo: Resizing -> close/open context again.

[v1.3]

e897ebc

Resizing finished. Currently seeing a latency of 1.5-2s for image resizing. ToDo: figure out the overhead from FFMPEG get data!!!

[v2.0]

652ec45

experiment data collected. Next: 1. nv12 channel Cr Cb seems to be off. 2. Paraview!!! 3. write script for generic data collection

check other codec for paraview

d447bcb

[v2.1]

e5fe5b3

h264 working inside paraview. Next: temporary solution: exposing interface to choose between h264 or nvcodec. made working by adding YUV420P to RGBA conversion. modified: libnvpipeutil/kernels.cu modified: libnvpipeutil/util.cxx

[v3.0]

d3922c4

Working version with Paraview h264 in both libx and nv codec is working fine inside paraview. To have it built on Tom's machine for benchmark test... Next: modify CMakeLists.txt to enable: optional CUDA directory optional FFmpeg directory

[v3.1]

8d643c8

nvprofiler nvtx flags added. Config file added for local installation modified: cmake/Config.cmake.in

[v3.1]

b9578e5

nvprofiler nvtx flags added. might as well place a flag for it. Config file added for local installation modified: cmake/Config.cmake.in

Merge remote-tracking branch 'remotes/dt/master' into merge

e702f8c

[v3.2]

538ec7c

built to shared libraries. Tested compatibility. Next: clean code and prepare for early release

[v3.2.1] cleaned code

0dbe965

[v3.2.2] examples removed. cleaned version for Peter

8c15a6b

[v3.2.3]

905e890

format conversion fixed to enable line-align for ffmpeg library now works with different size of images.

cleaning in progress

d01dab2

[v3.3]

4ff018a

code cleaned. appending Tom's approval Next: (consult Eric) 1. Resizing 2. Odd resolution 3. H265 + lossless H264

[v3.4.0]

55a6095

added client/server code for streaming test

Merge remote-tracking branch 'remotes/dt/master' into merge

b64a22c

[v3.4.1]

6f74588

swap server/client ( for testing with dt07 and desktop )

[v3.5]

685a9a6

cleaned up server/client for paraview test

cleaning!

f194e93

[v3.5.1]

ee2a9b0

code modified per Tom's comments Next: change kernels.cu from clamp to saturate clean FFmpeg library dependency

[v3.6]

9e920d7

fixed default bitrate using f_m = 4 instead of 1

[v3.6.1]

fc80ddd

Updated some other issue brought back the /doc/example/test.c for testing

tfogal and others added 24 commits June 5, 2017 09:44

fix error message string.

2224075

Query max encode dimensions instead of hardcoding them.

03d1a1e

They could improve with future HW / SDK APIs.

make error struct const.

dd6238b

should've been from the beginning.

Switch error codes from CUresult -> cudaError_t's.

8aaa2dc

This changes the error codes to use cudaError's, where possible. Notably, the underlying NvPipe errors used to be a subset of the CUresult codes; now they are a subset of the cudaError_t's codes.

Update copyright years.

2466cc2

decode: initialize reorg object during creation.

ab699e9

Simplifies things. The only reason we did it separately earlier was to give the user an opportunity to setup PTX paths. But we don't need PTX paths anymore, because we just use the runtime api now.

width/height API change: size_t -> uint32_t.

71a4b6f

Replaced all size_t (which are not related to memory sizes) by uint32_t

adeb9ee

Disabled warnings for deprecated GPU architectures.

28344ac

Ensure there is just one '--std c++11' arg to nvcc

34c6bf0

Fixed decode resize bug

346bd86

Decoder is now properly resized when the frame size increases.

Support for odd frame sizes (internal padding to multiples of 16)

60ed7d8

RGBA8888 decode output and EGL offscreen example

f95cd92

NvPipe now supports decode output to RGBA8888-structured memory. The alpha channel is set to 255. Additionally, an EGL-based demo application is now included, which should come in handy for headless clusters without X.

Disable in-source builds

69a7427

Fixed host/device pointer detection

2644c37

Updated README

8348862

Support for tiny tiles (< 48x32)

d34276c

Small tiles are automatically padded to the minimum size, i.e. 48x32.

Print error code/message if nvEncOpenEncodeSessionEx fails

f0bfb2e

Enforce fPIC for conversion kernels

13b563b

cmake: use configure_file to use the default API header

5b4dd17

cmake: bump the minimum to 3.3

1108d8e

3.2 doesn't have the policy *to* set, so just use 3.3 instead for the `IN_LIST` operator.

Merge pull request NVIDIA#10 from mathstuf/cmake-fixes

1656d1b

Cmake fixes

Include dynlink_nvcuvid.h for CUDA >= 9.0.

9ef144d

CUDA >= 9.0 no longer ships nvcuvid.h so depending on the value of CUDART_VERSION we include dynlink_nvcuvid.h instead. Fixes NVIDIA#12.

tbiedert closed this May 31, 2018

tbiedert force-pushed the master branch from 1656d1b to e1a8839 Compare May 31, 2018 20:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include dynlink_nvcuvid.h for CUDA >= 9.0. #13

Include dynlink_nvcuvid.h for CUDA >= 9.0. #13

renewagner commented Apr 12, 2018

Luyang1125 commented Apr 18, 2018

tfogal commented Apr 18, 2018

Luyang1125 commented Apr 18, 2018

tfogal commented Apr 18, 2018

Include dynlink_nvcuvid.h for CUDA >= 9.0. #13

Include dynlink_nvcuvid.h for CUDA >= 9.0. #13

Conversation

renewagner commented Apr 12, 2018

Luyang1125 commented Apr 18, 2018

tfogal commented Apr 18, 2018

Luyang1125 commented Apr 18, 2018

tfogal commented Apr 18, 2018