New NVDEC and VIC implementation #1384
Merged
Conversation
|
lgtm |
4d02a2d
into
Ryujinx:master
13 checks passed
13 checks passed
Debug build (Dotnet 3.1.100, OS ubuntu-latest)
Debug build (Dotnet 3.1.100, OS ubuntu-latest)
Details
Release build (Dotnet 3.1.100, OS ubuntu-latest)
Release build (Dotnet 3.1.100, OS ubuntu-latest)
Details
Profile Debug build (Dotnet 3.1.100, OS ubuntu-latest)
Profile Debug build (Dotnet 3.1.100, OS ubuntu-latest)
Details
Profile Release build (Dotnet 3.1.100, OS ubuntu-latest)
Profile Release build (Dotnet 3.1.100, OS ubuntu-latest)
Details
Release build (Dotnet 3.1.100, OS macOS-latest)
Release build (Dotnet 3.1.100, OS macOS-latest)
Details
Profile Debug build (Dotnet 3.1.100, OS macOS-latest)
Profile Debug build (Dotnet 3.1.100, OS macOS-latest)
Details
Profile Release build (Dotnet 3.1.100, OS macOS-latest)
Profile Release build (Dotnet 3.1.100, OS macOS-latest)
Details
Debug build (Dotnet 3.1.100, OS windows-latest)
Debug build (Dotnet 3.1.100, OS windows-latest)
Details
Release build (Dotnet 3.1.100, OS windows-latest)
Release build (Dotnet 3.1.100, OS windows-latest)
Details
Profile Debug build (Dotnet 3.1.100, OS windows-latest)
Profile Debug build (Dotnet 3.1.100, OS windows-latest)
Details
Profile Release build (Dotnet 3.1.100, OS windows-latest)
Profile Release build (Dotnet 3.1.100, OS windows-latest)
Details
SeraUQ
added a commit
to Doctorwho1909/Ryujinx
that referenced
this pull request
Jul 12, 2020
New NVDEC and VIC implementation (Ryujinx#1384)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Definitions
From NVIDIA documentation:
The Tegra host1x module is the DMA engine for register access to Tegra’s graphics- and multimedia-related modules. The modules served by host1x are referred to as clients. host1x includes some other functionality, such as synchronization.
host1x contains a FIFO (in fact, both a write and a read FIFO) for each client module. These FIFOs contain the register addresses and values for data written to, or read from, the client modules.
host1x contains memory-mapped registers that allow the CPU to control these FIFOs. Accessing client module registers in this manner can be described as indirect access, since the client’s registers are access indirectly through the host1x FIFO, rather than directly through a memory map. However, use of this access mechanism is discouraged.
NVDEC stands for NVIDIA Decoder, and is the module responsible for video decoding on the Nintendo Switch. It is used for fast video decoding on most games that makes use of videos. The following video codecs are supported by NVDEC (as of the version supported on the Nintendo Switch): MPEG, MPEG4, VC1, VP8, VP9, H264 and H265. Of those, Nintendo only exposes VP8, VP9, H264 and H265. H264 and H265 are also know as AVC and HEVC, respectively.
Most games uses H264, it is the most common video codec. This choice is probably due to the fact that has the best hard support/efficiency ratio (that is, it is widely supported and has somewhat high compression ratios). For this reason, it is much more likely to be used in ports than, say, VP9. A considerable amount of first party games also makes use of VP9, like for example: Super Smash Bros Ultimate, Pokemon Let's Go Eevee/Pikachu, Fire Emblem Three Houses, to name a few.
VIC stands for Video Image Composer. It can do many things, but here I will focus only on VIC role on video decoding.
NVDEC does not support a configurable output frame format. This means that NVDEC will always output the same format, know as NV12, where luma and chroma planes are stored on separate buffers. Luma contains luminance information of the picture, while chroma contains color information. Luma is thus, basically, a grayscale version of the video, while chroma contains all the color information. The two combined is what produces the final, colored picture that will be displayed.
VIC role on the decode process is converting NVDEC output to something that the GPU can consume. It usually converts from NVIDIA block linear memory layout, to more standard linear layout (one line after the other in memory), it may also perform color conversion.
Below picture illustrates optional color conversion that may be performed by VIC (this is done when the game requests a RGBA format, instead of NV12).
Input luma plane (1 channel):



Input chroma plane (2 channels for interleaved planes):
Output RGBA image (4 channels):
Some games does this color conversion using VIC, others do it on the GPU using a shader (in those cases, the VIC outputs a picture with the same format as the input, NV12). There are some cases where VIC will only perform a raw copy of the picture data. This is all (indirectly) configured by the game. It will depend on which kind of format it requests using the high level Nintendo API.
Implementation
To understand the decisions taken regarding the VP9 support, we first need to understand a bit of the VP9 bitstream. Each frame can be thought as 3 parts: Uncompressed header, compressed header and tile data. The uncompressed and compressed header are parsed by the NVIDIA library on the game side, so NVDEC never receives this data, it only receives the tile data. For the most part, the data gathered from those headers are written into custom structures that are passed to NVDEC, so we have access to the decoded data.
In order to decode the frame using conventional libraries, such as FFMPEG, it would be necessary to fully reconstruct the bitstream, including the uncompressed and compressed headers. We have most, but not all, data required to do that. For this reason, using such libraries is not feasible to emulate NVDEC VP9 decoder. Instead, we use a custom decoder that can easily adapt to our need: The uncompressed and compressed header decoding and parsing process is skipped, the already decoded information is directly feed to the decoder.
This approach also allows directly passing reference frame data to the decoder. Reference frames are nothing more than just previously decoded frames. The pixels on those frames are used to construct future frames, a compression techniques that takes advantage of redundancy between frames to reduce the size of subsequent frames by copying previously decoded blocks of pixels. VP9 has 8 reference frame slots, only 3 of them can be used at once. The bitstream headers indicates which slots should be updated with the current frame. This information is consumed by the user library code and never reaches NVDEC. The only thing that is sent to NVDEC is the raw address of the 3 reference frames in memory, this is where the software decoder proves useful, as we can directly load the reference frame data from memory, and pass it to the decoder.
H264 is considerably different from VP9, and implementing it using FFMPEG is possible (but also has it's own limitations, I would rather have a custom software implementation aswell). The equivalent to the Compressed and Uncompressed headers described earlier that VP9 has on H264 are the SPS (Sequence Parameter Set) and PPS (Picture Parameter Set). However, it is slightly different. Unlike VP9, H264 videos usually only have one SPS and one PPS for the entire video, instead of one per frame. Furthermore, it does not make use of probability tables that needs to be adapted per frame like VP9, and all information required to update reference frames is passed to NVDEC. So, one can take a naive (but not accurate) approach of reconstructing the SPS and PPS, prepending it to the frame data, and sending it to FFMPEG for decode.
This approach works well when only one video is decoded at once (which is what has been observed so far on all tested games). However, it does not work when multiple videos are decoded at once, unless we propagate channel information from the HLE driver to NVDEC. This would allow appropriate separation of the FFMPEG decoding contexts for each video, since each video will be decoded on a separate channel.
Common issues
Games behave differently when decode is not fast enough. Each game will have a different response, and some of them are documented below:
Those issues are usually easier to observe with VP9, since it uses software decoding, that is slower and is most likely to struggle, especially when decoding 1080p content.