Skip to content

Feedback on the video decoding extension(s) #1694

@cyanreg

Description

@cyanreg

Hi,

I'm writing hardware decoding code for FFmpeg (code), and in the process I've found some issues with the extensions.
The first one is simple, VkVideoDecodeH264ProfileEXT.pictureLayout is required to not be 0, but VkVideoDecodeH264PictureLayoutFlagBitsEXT has the value for VK_VIDEO_DECODE_H264_PICTURE_LAYOUT_PROGRESSIVE_EXT set as 0 (spec).

In the blog post you asked for feedback on sub-picture level decoding:

Currently only picture-level decode commands are supported (as specified by the appropriate codec-specific EXT extension structures for decode operations, for example VkVideoDecodeH264PictureInfoEXT). We are interested to hear of use cases that need to request more fine grained operations!

The main use-case is for reduced latency. After all, at 60fps, having to wait for 16 whole milliseconds while you receive a frame from a realtime stream, and only then submit a queue to decode it, and then spend 16 more milliseconds while you schedule it for presentation will double the latency. At the cost of screen tearing, you can partially redraw parts of the screen, thus if you do a row of slices at a time, you can cut down on a lot of latency.
FFmpeg supports sub-frame decoding, and can output a row at a time. Would be nice if we could support that using Vulkan hardware decoding.

But moreover, the biggest issue with the decoding API is how you're supposed to have all slices in a single VkBuffer. Sure, you can solve that by using sparse buffers, but a lot of devices do not support such, so your only choice is to copy each slice into a large enough RAM memory allocation, and then upload that to the GPU. Some bitstreams reach bitrates of hundreds, if not thousands of megabits per second, which can take a significant amount of resources. It would be much better if you could decode a slice at a time as you're receiving the slices, which would hugely reduce allocations and copies needed.
Or even if still decoding an entire picture at a time, you could let the GPU work on decoding quicker, rather than wasting a whole 16 milliseconds or so while waiting to receive each slice the frame has.

So, my recommendation would be to introduce a new vkCmdDecodeSlice command, which takes a new VkVideoDecodeSliceInfo structure, which allows for users to feed in a slice inside a single VkBuffer at a time. Additionally, a vkCmdStartFrame and vkCmdEndFrame commands to signal when a decoding starts and ends. All of this must still take place in a single submission, of course, so calling vkCmdStartFrame without calling vkCmdEndFrame would be an error.
If slice-level decoding is found to be within scope, the user can signal VkEvents inside the submission, which would inform the user whether a slice has been decoded and the contents in the VkImage corresponding to the slice are valid.
Additionally, the VkVideoDecodeInfoKHR structure ought to be modified to permit having each slice in a separate VkBuffer object.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions