Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New NVDEC and VIC implementation #1384

Merged
merged 25 commits into from Jul 12, 2020
Merged

New NVDEC and VIC implementation #1384

merged 25 commits into from Jul 12, 2020

Conversation

@gdkchan
Copy link
Member

@gdkchan gdkchan commented Jul 12, 2020

Definitions

  • What is Host1x?

From NVIDIA documentation:

The Tegra host1x module is the DMA engine for register access to Tegra’s graphics- and multimedia-related modules. The modules served by host1x are referred to as clients. host1x includes some other functionality, such as synchronization.

host1x contains a FIFO (in fact, both a write and a read FIFO) for each client module. These FIFOs contain the register addresses and values for data written to, or read from, the client modules.

host1x contains memory-mapped registers that allow the CPU to control these FIFOs. Accessing client module registers in this manner can be described as indirect access, since the client’s registers are access indirectly through the host1x FIFO, rather than directly through a memory map. However, use of this access mechanism is discouraged.

  • What is NVDEC?

NVDEC stands for NVIDIA Decoder, and is the module responsible for video decoding on the Nintendo Switch. It is used for fast video decoding on most games that makes use of videos. The following video codecs are supported by NVDEC (as of the version supported on the Nintendo Switch): MPEG, MPEG4, VC1, VP8, VP9, H264 and H265. Of those, Nintendo only exposes VP8, VP9, H264 and H265. H264 and H265 are also know as AVC and HEVC, respectively.

Most games uses H264, it is the most common video codec. This choice is probably due to the fact that has the best hard support/efficiency ratio (that is, it is widely supported and has somewhat high compression ratios). For this reason, it is much more likely to be used in ports than, say, VP9. A considerable amount of first party games also makes use of VP9, like for example: Super Smash Bros Ultimate, Pokemon Let's Go Eevee/Pikachu, Fire Emblem Three Houses, to name a few.

  • What is VIC?

VIC stands for Video Image Composer. It can do many things, but here I will focus only on VIC role on video decoding.
NVDEC does not support a configurable output frame format. This means that NVDEC will always output the same format, know as NV12, where luma and chroma planes are stored on separate buffers. Luma contains luminance information of the picture, while chroma contains color information. Luma is thus, basically, a grayscale version of the video, while chroma contains all the color information. The two combined is what produces the final, colored picture that will be displayed.

VIC role on the decode process is converting NVDEC output to something that the GPU can consume. It usually converts from NVIDIA block linear memory layout, to more standard linear layout (one line after the other in memory), it may also perform color conversion.

Below picture illustrates optional color conversion that may be performed by VIC (this is done when the game requests a RGBA format, instead of NV12).

Input luma plane (1 channel):
image
Input chroma plane (2 channels for interleaved planes):
image
Output RGBA image (4 channels):
image
Some games does this color conversion using VIC, others do it on the GPU using a shader (in those cases, the VIC outputs a picture with the same format as the input, NV12). There are some cases where VIC will only perform a raw copy of the picture data. This is all (indirectly) configured by the game. It will depend on which kind of format it requests using the high level Nintendo API.

Implementation

  • NVDEC - VP9 support:

To understand the decisions taken regarding the VP9 support, we first need to understand a bit of the VP9 bitstream. Each frame can be thought as 3 parts: Uncompressed header, compressed header and tile data. The uncompressed and compressed header are parsed by the NVIDIA library on the game side, so NVDEC never receives this data, it only receives the tile data. For the most part, the data gathered from those headers are written into custom structures that are passed to NVDEC, so we have access to the decoded data.

In order to decode the frame using conventional libraries, such as FFMPEG, it would be necessary to fully reconstruct the bitstream, including the uncompressed and compressed headers. We have most, but not all, data required to do that. For this reason, using such libraries is not feasible to emulate NVDEC VP9 decoder. Instead, we use a custom decoder that can easily adapt to our need: The uncompressed and compressed header decoding and parsing process is skipped, the already decoded information is directly feed to the decoder.

This approach also allows directly passing reference frame data to the decoder. Reference frames are nothing more than just previously decoded frames. The pixels on those frames are used to construct future frames, a compression techniques that takes advantage of redundancy between frames to reduce the size of subsequent frames by copying previously decoded blocks of pixels. VP9 has 8 reference frame slots, only 3 of them can be used at once. The bitstream headers indicates which slots should be updated with the current frame. This information is consumed by the user library code and never reaches NVDEC. The only thing that is sent to NVDEC is the raw address of the 3 reference frames in memory, this is where the software decoder proves useful, as we can directly load the reference frame data from memory, and pass it to the decoder.

  • NVDEC - H264 support:

H264 is considerably different from VP9, and implementing it using FFMPEG is possible (but also has it's own limitations, I would rather have a custom software implementation aswell). The equivalent to the Compressed and Uncompressed headers described earlier that VP9 has on H264 are the SPS (Sequence Parameter Set) and PPS (Picture Parameter Set). However, it is slightly different. Unlike VP9, H264 videos usually only have one SPS and one PPS for the entire video, instead of one per frame. Furthermore, it does not make use of probability tables that needs to be adapted per frame like VP9, and all information required to update reference frames is passed to NVDEC. So, one can take a naive (but not accurate) approach of reconstructing the SPS and PPS, prepending it to the frame data, and sending it to FFMPEG for decode.

This approach works well when only one video is decoded at once (which is what has been observed so far on all tested games). However, it does not work when multiple videos are decoded at once, unless we propagate channel information from the HLE driver to NVDEC. This would allow appropriate separation of the FFMPEG decoding contexts for each video, since each video will be decoded on a separate channel.

Common issues

Games behave differently when decode is not fast enough. Each game will have a different response, and some of them are documented below:

  • Frames are dropped, and only a black (or white) frame is show: Kirby Star Allies, Hatsune Miku and Star Ocean First Departure R does this.
  • Frames are dropped, and only the last frame is show: Zelda Link's Awakening, Xenoblade Chronicles: Definitive Edition and Fire Emblem Three Houses does this.
  • Frames are not dropped, and the game tries to sync the audio to the video (or there is no audio to sync): Seems to be the behavior taken by Super Mario Odyssey and Mario Kart 8 Deluxe (frames are not dropped, but the videos also don't have audio?) and also 1-2 Switch.

Those issues are usually easier to observe with VP9, since it uses software decoding, that is slower and is most likely to struggle, especially when decoding 1080p content.

@Thog
Thog approved these changes Jul 12, 2020
Copy link
Contributor

@jduncanator jduncanator left a comment

lgtm 👍 💡

@AcK77
AcK77 approved these changes Jul 12, 2020
@Thog Thog merged commit 4d02a2d into Ryujinx:master Jul 12, 2020
13 checks passed
13 checks passed
Debug build (Dotnet 3.1.100, OS ubuntu-latest) Debug build (Dotnet 3.1.100, OS ubuntu-latest)
Details
Release build (Dotnet 3.1.100, OS ubuntu-latest) Release build (Dotnet 3.1.100, OS ubuntu-latest)
Details
Profile Debug build (Dotnet 3.1.100, OS ubuntu-latest) Profile Debug build (Dotnet 3.1.100, OS ubuntu-latest)
Details
Profile Release build (Dotnet 3.1.100, OS ubuntu-latest) Profile Release build (Dotnet 3.1.100, OS ubuntu-latest)
Details
Debug build (Dotnet 3.1.100, OS macOS-latest) Debug build (Dotnet 3.1.100, OS macOS-latest)
Details
Release build (Dotnet 3.1.100, OS macOS-latest) Release build (Dotnet 3.1.100, OS macOS-latest)
Details
Profile Debug build (Dotnet 3.1.100, OS macOS-latest) Profile Debug build (Dotnet 3.1.100, OS macOS-latest)
Details
Profile Release build (Dotnet 3.1.100, OS macOS-latest) Profile Release build (Dotnet 3.1.100, OS macOS-latest)
Details
Debug build (Dotnet 3.1.100, OS windows-latest) Debug build (Dotnet 3.1.100, OS windows-latest)
Details
Release build (Dotnet 3.1.100, OS windows-latest) Release build (Dotnet 3.1.100, OS windows-latest)
Details
Profile Debug build (Dotnet 3.1.100, OS windows-latest) Profile Debug build (Dotnet 3.1.100, OS windows-latest)
Details
Profile Release build (Dotnet 3.1.100, OS windows-latest) Profile Release build (Dotnet 3.1.100, OS windows-latest)
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
SeraUQ added a commit to Doctorwho1909/Ryujinx that referenced this pull request Jul 12, 2020
New NVDEC and VIC implementation (Ryujinx#1384)
@gdkchan gdkchan deleted the gdkchan:nvdecv2 branch Jul 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants
You can’t perform that action at this time.