-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include dynlink_nvcuvid.h for CUDA >= 9.0. #13
Conversation
__Initial setup comments from Tom ./diff.review.txt Next step: 1. retouch exposed interface. get rid of the c layer 2. remove singleton -> check if multiple initialization would crash ffmpeg -> otherwise, static member 3. investigate buffer passing with ffmpeg -> otherwise, copy buffer after encoding/decoding Later issue: 1. CMake issue (platform/lib) -> handle later 2. use heavy comment for API interface
They could improve with future HW / SDK APIs.
should've been from the beginning.
This changes the error codes to use cudaError's, where possible. Notably, the underlying NvPipe errors used to be a subset of the CUresult codes; now they are a subset of the cudaError_t's codes.
Simplifies things. The only reason we did it separately earlier was to give the user an opportunity to setup PTX paths. But we don't need PTX paths anymore, because we just use the runtime api now.
support input from device memory. The issue is that currently the video codec SDK APIs *only* support input from host memory. As such, we needed to do a synchronous copy before starting our work. While convenient for the user, it was deemed better to force them to do the copy themselves, in the hope that they will do so intelligently and asynchronously, with a sync point before calling NVPipe. We should revisit if/when the video codec SDK ever supports device memory for input data on decode. In the meantime, at least input data is rather small. Note no change for output data: we can and still do support efficiently pushing that to a device buffer.
Decoder is now properly resized when the frame size increases.
NvPipe now supports decode output to RGBA8888-structured memory. The alpha channel is set to 255. Additionally, an EGL-based demo application is now included, which should come in handy for headless clusters without X.
Small tiles are automatically padded to the minimum size, i.e. 48x32.
3.2 doesn't have the policy *to* set, so just use 3.3 instead for the `IN_LIST` operator.
Cmake fixes
CUDA >= 9.0 no longer ships nvcuvid.h so depending on the value of CUDART_VERSION we include dynlink_nvcuvid.h instead. Fixes NVIDIA#12.
Hi, [ 12%] Building C object CMakeFiles/nvpipe.dir/decode.c.o |
@renewagner: thanks much for the fix! Will take a look over the weekend. @Luyang1125: yeah, sorry about this. Internally this is bug 1937795, should you want to ask an NVIDIAn about this in the future. I cannot comment on a timeline but I did ping some people internally. |
@tfogal Thanks. |
It kills me to recommend that people use CUDA 8.0 at this point; there are a lot of great things in the 9 series, and the next release will have a lot more that I'm excited to see. ... but it is certainly true that NVPipe has seen more vetting on CUDA 8 than CUDA 9, at present. |
CUDA >= 9.0 no longer ships nvcuvid.h so depending
on the value of CUDART_VERSION we include
dynlink_nvcuvid.h instead.
Fixes #12.