Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include dynlink_nvcuvid.h for CUDA >= 9.0. #13

Closed
wants to merge 154 commits into from

Conversation

renewagner
Copy link

CUDA >= 9.0 no longer ships nvcuvid.h so depending
on the value of CUDART_VERSION we include
dynlink_nvcuvid.h instead.

Fixes #12.

__Initial setup

        comments from Tom ./diff.review.txt
        Next step:
                1. retouch exposed interface. get rid of the c layer
                2. remove singleton
                        -> check if multiple initialization would crash
ffmpeg
                        -> otherwise, static member
                3. investigate buffer passing with ffmpeg
                        -> otherwise, copy buffer after
encoding/decoding

        Later issue:
                1. CMake issue (platform/lib) -> handle later
                2. use heavy comment for API interface
__restructured API

    Next step:
        Filling nvpipecodec264:
                1. multiple registration of ffmpeg is safe!
                2. investigate buffer passing with ffmpeg
        Issue:
                encoder is has delay!

        Next step:
                1. nvenc -> encoder is waiting for multiple input frame
before it starts output packet.
                2. register frame data
                3. send_frame receive_packet &
                   send_packet receive_frame
__basic functionality working

        Basic coding/decoding working
        cleaned code, added comment

        Next step:
                1. decoding latency --> probably just some flag
                2. resizing image
                        check ffmpeg example
                        what would be the best strategy?
                3. format conversion
        format conversion finished
        conversion using CUDA.

        pending work:
                1. isolate encode/decode format
        Next:
                1. decoding part: update frame parameters -> how?
                2. encoding part: 1 frame -> 1 packet
                3. dummy access delimiter NAL
    Benchmark application added.

    to do:
        1.  add image conversion class/struct to avoid
            unnecessary reallocation of cuda memory
        2.  resize window should require re allocate codec
            context
        issue resolved:
                cuda memory reallocation. Added function with reuse
memory.
                encoding frame change does not necessarily lead to
rewrap of encoding AVFrame when format conversion is needed.

        Todo:
                Resizing -> close/open context again.
        Resizing finished. Currently seeing a latency of 1.5-2s for
image resizing.

        ToDo:
                figure out the overhead from FFMPEG
                get data!!!
        experiment data collected.

        Next:
                1. nv12 channel Cr Cb seems to be off.
                2. Paraview!!!
                3. write script for generic data collection
        h264 working inside paraview.

        Next:
                temporary solution:
                exposing interface to choose between h264 or nvcodec.
made working by adding YUV420P to RGBA conversion.
	modified:   libnvpipeutil/kernels.cu
	modified:   libnvpipeutil/util.cxx
        Working version with Paraview
        h264 in both libx and nv codec is working fine inside paraview.

        To have it built on Tom's machine for benchmark test...
        Next:
                modify CMakeLists.txt to enable:
                optional CUDA directory
                optional FFmpeg directory
        nvprofiler nvtx flags added.
        Config file added for local installation
	modified:   cmake/Config.cmake.in
        nvprofiler nvtx flags added.
        might as well place a flag for it.
        Config file added for local installation
	modified:   cmake/Config.cmake.in
        built to shared libraries. Tested compatibility.

        Next:
                clean code and prepare for early release
        format conversion fixed to enable line-align for ffmpeg

        library now works with different size of images.
        code cleaned.
                appending Tom's approval

        Next: (consult Eric)
                1. Resizing
                2. Odd resolution
                3. H265 + lossless H264
        added client/server code for streaming test
        swap server/client ( for testing with dt07 and desktop )
        cleaned up server/client for paraview test
        code modified per Tom's comments

        Next:
                change kernels.cu from clamp to saturate
                clean FFmpeg library dependency
        fixed default bitrate using f_m = 4 instead of 1
        Updated some other issue
        brought back the /doc/example/test.c for testing
tfogal and others added 24 commits June 5, 2017 09:44
They could improve with future HW / SDK APIs.
should've been from the beginning.
This changes the error codes to use cudaError's, where possible.
Notably, the underlying NvPipe errors used to be a subset of the
CUresult codes; now they are a subset of the cudaError_t's codes.
Simplifies things.  The only reason we did it separately earlier
was to give the user an opportunity to setup PTX paths.  But we
don't need PTX paths anymore, because we just use the runtime api
now.
support input from device memory.  The issue is that currently the
video codec SDK APIs *only* support input from host memory.  As
such, we needed to do a synchronous copy before starting our work.

While convenient for the user, it was deemed better to force them
to do the copy themselves, in the hope that they will do so
intelligently and asynchronously, with a sync point before calling
NVPipe.

We should revisit if/when the video codec SDK ever supports device
memory for input data on decode.  In the meantime, at least input
data is rather small.

Note no change for output data: we can and still do support
efficiently pushing that to a device buffer.
Decoder is now properly resized when the frame size increases.
NvPipe now supports decode output to RGBA8888-structured memory.
The alpha channel is set to 255.

Additionally, an EGL-based demo application is now included, which
should come in handy for headless clusters without X.
Small tiles are automatically padded to the minimum size, i.e. 48x32.
3.2 doesn't have the policy *to* set, so just use 3.3 instead for the
`IN_LIST` operator.
CUDA >= 9.0 no longer ships nvcuvid.h so depending
on the value of CUDART_VERSION we include
dynlink_nvcuvid.h instead.

Fixes NVIDIA#12.
@Luyang1125
Copy link

Hi,
Thank you very much! But I still get this error, can you help me?

[ 12%] Building C object CMakeFiles/nvpipe.dir/decode.c.o
In file included from /usr/local/cuda/include/dynlink_nvcuvid.h:38:0,
from /home/luyang/Documents/NvPipe-master/decode.c:44:
/usr/local/cuda/include/dynlink_cuviddec.h:811:1: error: unknown type name ‘class’
class CCtxAutoLock
^
/usr/local/cuda/include/dynlink_cuviddec.h:812:1: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘attribute’ before ‘{’ token
{
^
CMakeFiles/nvpipe.dir/build.make:224: recipe for target 'CMakeFiles/nvpipe.dir/decode.c.o' failed
make[2]: *** [CMakeFiles/nvpipe.dir/decode.c.o] Error 1
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/nvpipe.dir/all' failed
make[1]: *** [CMakeFiles/nvpipe.dir/all] Error 2
Makefile:127: recipe for target 'all' failed
make: *** [all] Error 2

@tfogal
Copy link
Contributor

tfogal commented Apr 18, 2018

@renewagner: thanks much for the fix! Will take a look over the weekend.

@Luyang1125: yeah, sorry about this. Internally this is bug 1937795, should you want to ask an NVIDIAn about this in the future. I cannot comment on a timeline but I did ping some people internally.
In the meantime: the problem is that some C++ code is not properly guarded by #ifdef __cplusplus extern "C" guards, so a potential fix is to just hack your C++ compiler in for CMAKE_C_COMPILER when you invoke CMake.

@Luyang1125
Copy link

@tfogal Thanks.
Do you think it should work if I use Cuda 8.0 instead?

@tfogal
Copy link
Contributor

tfogal commented Apr 18, 2018

It kills me to recommend that people use CUDA 8.0 at this point; there are a lot of great things in the 9 series, and the next release will have a lot more that I'm excited to see.

... but it is certainly true that NVPipe has seen more vetting on CUDA 8 than CUDA 9, at present.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants