New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Texture Cache Rework #3281

Merged
merged 32 commits into from Mar 5, 2018

Conversation

Projects
None yet
@jroweboy
Copy link
Member

jroweboy commented Dec 12, 2017

(Please note this code was written by @Phanto-m and not me. I've studied it extensively so I can be ready to answer questions in review, but I am still noob and may not know everything/got something wrong. Please bear with me)

Goal:

Eliminate as many superfluous CPU <-> GPU flushes as possible. This is primarily accomplished by adding an Invalidate (without flush) method on the texture cache that is preferred to use instead of the InvalidateAndFlush that was in use everywhere before. This also adds support for hw texture copy which fixes upscaling in many games and makes several others full speed (and removes the need to use the CPU implementation.)

Whats new:

  • The entire texture cache interface got a face lift. Within the texture cache, lookup for surfaces is now done with easier to use helper methods to find the most appropriate surface. The parameters used to find/create a surface is now split from the actual cached surface as well, which should make it a bit easier to get a surface from the cache now.
  • Optimized morton copy to copy in tiles
  • Resolution scaling is now locked to integer sizes
  • GL_SCISSOR and viewport tests to try to avoid drawing out of the draw area if not needed
  • Invalidate without flushing which will attempt to revalidate the surface by copying from compatible surfaces if it can.
  • HW texture copy (yay!)

General Notes:

  • The coordinate system in the texture cache was changed to match GL_ORIGIN instead of the PICA coordinate system. I didn't bother to split this into its own commit. Please be aware of this when reviewing. Heres what phantom said when i asked them about it: "getsurfacerect would return top-left origin coords when dealing with nontiled textures which made was a real PITA, now it's consistent and matches gl's origin. the flipping is now done in displaytransfer since it's the only place where that conversion is required. i kept the kinda broken texcoords in acceleratedisplay, it's weird how it expects rotated coors or i just didnt understand how the rotating is done in the rendered, didnt bother looking too much into it"
  • The final commit is a hacky workaround that I'm going to leave up the reviewers to decide what to do with it. Its put at the end there so we can drop it if we decide its too much. It works by ignoring any regions that are dirty when revalidating (and being in the dirty region implies that the region was created only on the GPU), and if every place we are trying to validate in the interval is either copyable from another surface, or in the dirty regions, then it gets ignored. This makes things like the after match screen in smash 4 much faster with no visible differences, and also removes pokemon sun/moon outlines as well.

I did my best to split all the changes into separate commits, so the reviewers can just read them one at a time, but as they can probably tell, I got a little tired of doing this towards the end. "Add the rest of the cache methods" "Update the rasterizer to use the new cache" are probably the most "draw the rest of the owl" that this pr gets. I tried to at least make the diffs line up so they are easier to read :)


This change is Reviewable

@Subv

This comment has been minimized.

Copy link
Member

Subv commented Dec 12, 2017

I see a memory write will now only Invalidate the region in the GPU instead of Flushing + Invalidating.

What happens if the CPU does a Write, and then a Read? The comment in InvalidateRegion says that it removes the surface from the cache, if this is the case then how will it be flushed to memory during the Read?

@Subv

This comment has been minimized.

Copy link
Member

Subv commented Dec 12, 2017

Reviewed 9 of 9 files at r1, 8 of 8 files at r2.
Review status: 16 of 24 files reviewed at latest revision, 4 unresolved discussions.


src/core/hw/gpu.cpp, line 360 at r2 (raw file):

    const auto FlushInvalidate_fn = (output_gap != 0) ? Memory::RasterizerFlushAndInvalidateRegion
                                                      : Memory::RasterizerInvalidateRegion;
    FlushInvalidate_fn(config.GetPhysicalOutputAddress(), static_cast<u32>(contiguous_output_size));

I'd much prefer that you use a normal if for this


src/video_core/renderer_opengl/gl_rasterizer_cache.cpp, line 528 at r1 (raw file):

}

// If the resolution

?


src/video_core/renderer_opengl/gl_rasterizer_cache.cpp, line 530 at r1 (raw file):

// If the resolution
static u16 GetResolutionScaleFactor() {
    return !Settings::values.resolution_factor

Settings::values.resolution_factor == 0


src/video_core/renderer_opengl/gl_rasterizer_cache.cpp, line 543 at r1 (raw file):

    // update resolution_scale_factor and reset cache if changed
    static u16 resolution_scale_factor = GetResolutionScaleFactor();
    if (resolution_scale_factor != GetResolutionScaleFactor()) {

this is never true?


Comments from Reviewable

@valentinvanelslande

This comment has been minimized.

Copy link
Contributor

valentinvanelslande commented Dec 12, 2017

Fixes #3055

@jroweboy

This comment has been minimized.

Copy link
Member

jroweboy commented Dec 12, 2017

Review status: 16 of 24 files reviewed at latest revision, 4 unresolved discussions.


src/video_core/renderer_opengl/gl_rasterizer_cache.cpp, line 528 at r1 (raw file):

Previously, Subv (Sebastian Valle) wrote…

?

Oops :D


src/video_core/renderer_opengl/gl_rasterizer_cache.cpp, line 543 at r1 (raw file):

Previously, Subv (Sebastian Valle) wrote…

this is never true?

resolution_scale_factor is a static variable, meaning it'll get set to the current scale factor only on the first initialization.


Comments from Reviewable

@B3n30

This comment has been minimized.

Copy link
Contributor

B3n30 commented Dec 12, 2017

Reviewed 7 of 9 files at r1.
Review status: 16 of 24 files reviewed at latest revision, 7 unresolved discussions.


src/common/math_util.h, line 32 at r1 (raw file):

    T bottom;

    Rectangle() = default;

unrelated (but fine)


src/core/frontend/framebuffer_layout.cpp, line 20 at r1 (raw file):

u16 FramebufferLayout::GetScalingRatio() const {
    return static_cast<u16>(((top_screen.GetWidth() - 1) / Core::kScreenTopWidth) + 1);

Can the width get 0? If so this will fail.


src/video_core/renderer_opengl/gl_shader_gen.cpp, line 44 at r1 (raw file):

layout (std140) uniform shader_data {
    int framebuffer_scale;

Should probably belong to another commit?


Comments from Reviewable

@LexLight

This comment has been minimized.

Copy link

LexLight commented Dec 12, 2017

In "Paper Mario: Sticker Star", most of the low screen is white during the title, file select screens, and ingame, too, making the stickers unseeable.
https://cdn.discordapp.com/attachments/242442830486700045/390224478312988672/unknown.png
imagen
In "Professor Layton and the Miracles Mask", in Chapter 1, after they read the letter, the background becomes black, through after the scene, the background is fine.
imagen
I will test more if I can.

@Hexagon12

This comment has been minimized.

Copy link
Contributor

Hexagon12 commented Dec 12, 2017

Oh boy.
Fixes #3260, #2846, #2640, #2505, #2375, #2229, #2220, #1830, #1521.

New bugs:

Fixed issues within the PR over the time:

  • Pokemon Mystery Dungeon games crash on zero width input once again
  • Pokemon S/M/US/UM crash either when accessing Poke Refresh, or getting a bird encounter on Route 3
  • Virtual Console games don't display the top screen anymore

Updated on: 2017-12-14

@B3n30

This comment has been minimized.

Copy link
Contributor

B3n30 commented Dec 12, 2017

Also surprisingly fixes #2416

@Subv

This comment has been minimized.

Copy link
Member

Subv commented Dec 12, 2017

How does this fix #2262, a crash in the software renderer?

@wwylele

This comment has been minimized.

Copy link
Member

wwylele commented Dec 12, 2017

Just want to emphasizers a principle here:

Removing outlines in Pokemon games would be considered as a bug, not a feature.

@jroweboy

This comment has been minimized.

Copy link
Member

jroweboy commented Dec 12, 2017

@wwylele Yes, I apologize that I didn't make it clear that it was a bug. There is a follow PR that does d24s8 to rgba8 conversion with a PBO, which is an actual fix for pokemon outlines. I was debating on including that change with this PR, but I felt like it should be in a separate pull request entirely so I didn't add it. That code will be run before the skip dirty regions check, so they are compatible and with it outlines will be drawn.

@ds84182

This comment has been minimized.

Copy link
Contributor

ds84182 commented Dec 12, 2017

The first screen transition that Art of Balance TOUCH! does is broken with this PR. Instead of having a copy of the framebuffer it just shows black. Subsequent screen transitions work properly.

This can be reproduced by opening the game, selecting a save file, then clicking either "Arcade", "Endurance", or "Credits". Going back and selecting those options a second time shows the transition properly though.

@jroweboy

This comment has been minimized.

Copy link
Member

jroweboy commented Dec 12, 2017

Review status: 16 of 24 files reviewed at latest revision, 7 unresolved discussions, some commit checks failed.


src/common/math_util.h, line 32 at r1 (raw file):

Previously, B3n30 wrote…

unrelated (but fine)

It had to go somewhere (and I didn't think it was anything special enough to get into a different PR)


src/core/frontend/framebuffer_layout.cpp, line 20 at r1 (raw file):

Previously, B3n30 wrote…

Can the width get 0? If so this will fail.

It might be possible in some weird case when the game is just booting, so I probably should handle that. But we do have a minimum size for the window so it doesn't happen in practice right now.


src/video_core/renderer_opengl/gl_shader_gen.cpp, line 44 at r1 (raw file):

Previously, B3n30 wrote…

Should probably belong to another commit?

no, this is part of changing the resolution scale from a float to an int. originally it was actually two floats, one for horizontal and one for vertical hence the vec2, but now its just a constant int.


Comments from Reviewable


u8* dst_buffer = Memory::GetPhysicalPointer(surface->addr);

This comment has been minimized.

@Subv

Subv Dec 12, 2017

Member

Please add a more descriptive comment here, this is what phantom said about it: true, that comment doesn't tell much. the point is to avoid thousands of small writes every frame if the cpu decides to access that region, anything higher than 8 you're guaranteed it comes from a service (dma, failed texcopy ...)

@AuraOfTheDawn

This comment has been minimized.

Copy link

AuraOfTheDawn commented Dec 12, 2017

Breaks color on enemy hp meteres in Puzzle And Dragons Mario Bros Edition. Previously enemy hp meters were colored after their element, now all hp meters are black (and empty to grey)

@Dragios

This comment has been minimized.

Copy link
Contributor

Dragios commented Dec 13, 2017

There are a few things I wanna share.

What @Hexagon12 said was partially correct regarding Virtual Console. It does not display anything, true. But if you upscale, it will show the previous image buffer. Same goes with downscaling back. It just shows nothing initially so I assume the texture is not forwarding properly. Same issue occurs with Tales of the Abyss.
Pokémon Yellow

Kind of confused that this fixes #2262 as the crash involve purely software renderer. Mind explain how you get the result?


Most of the bugs I found have been mentioned already so I'm kind of late to the party. I just highlight the new bugs I discovered.

  • I think Cooking Mama 5 - Bon Appetit! has similar issue as the Virtual Console but instead it shows turquoise screen instead initially. Once I upscale it will show the previous image buffer and are the same for both screen. This only occurs when there is no save file like #2235 but at a much worse state.
    Cooking Mama 5
    Looks similar to #3102 before this PR

  • Metal Gear Solid 3D have a few spot of flickering when on native resolution and occurs on nVidia GPU only. Intel GPU is unaffected. Not so sure with AMD GPU.

Fixes #2741 & #3102

I notice this log appears in some games on Intel GPU. Will occur on nVidia GPU if I quit the game during the right moment.

Render.OpenGL <Warning> video_core/renderer_opengl/renderer_opengl.cpp:DebugHandler:472:
API PERFORMANCE 131154: Pixel-path performance warning: Pixel transfer is synchronized with
3D rendering.
@@ -419,6 +409,12 @@ void RasterizerFlushRegion(PAddr start, u32 size) {
}
}

void RasterizerInvalidateRegion(PAddr start, u32 size) {
if (VideoCore::g_renderer != nullptr) {

This comment has been minimized.

@lioncash

lioncash Dec 13, 2017

Member

This can be inverted, which is usually what's done (early out on the error case).

dst_params.res_scale = src_surface->res_scale;
dst_params.UpdateParams();

const bool load_gap = output_gap != 0; // Since we are going to invalidate the gap if there is

This comment has been minimized.

@lioncash

lioncash Dec 13, 2017

Member

This comment should likely go above the variable, not to the side of it.

using SurfaceType = SurfaceParams::SurfaceType;
using PixelFormat = SurfaceParams::PixelFormat;

static std::array<OGLFramebuffer, 2> transfer_framebuffers;

This comment has been minimized.

@lioncash

lioncash Dec 13, 2017

Member

ugh @ non-const static state

glbuf_next_tile();
}

u8* const buffer_end = tile_buffer + aligned_end - aligned_start;

This comment has been minimized.

@lioncash

lioncash Dec 13, 2017

Member

This can also be a pointer to const const u8* as well as a const pointer (i.e. const u8* const)


if (gl_buffer == nullptr) {
gl_buffer_size = width * height * GetGLBytesPerPixel(pixel_format);
gl_buffer.reset(new u8[gl_buffer_size]);

This comment has been minimized.

@lioncash

lioncash Dec 13, 2017

Member
gl_buffer = std::make_unique<u8[]>(gl_buffer_size);

This comment has been minimized.

@Phanto-m

Phanto-m Dec 13, 2017

Contributor

You don't want to pay extra for default initialization here, which was the point of using unique_ptr in the first place

This comment has been minimized.

@lioncash

lioncash Dec 13, 2017

Member

Ah, fair point.

true);
if (gl_buffer == nullptr) {
gl_buffer_size = width * height * GetGLBytesPerPixel(pixel_format);
gl_buffer.reset(new u8[gl_buffer_size]);

This comment has been minimized.

@lioncash

lioncash Dec 13, 2017

Member
gl_buffer = std::make_unique<u8[]>(gl_buffer_size);

u8* dst_buffer = Memory::GetPhysicalPointer(surface->addr);

This comment has been minimized.

@lioncash

lioncash Dec 13, 2017

Member

while (true) (applies to the for (;;) near the line, I must have inserted the comment incorrectly).

}

u32 PixelsInBytes(u32 size) const {
return size * 8 / GetFormatBpp(pixel_format);

This comment has been minimized.

@lioncash

lioncash Dec 13, 2017

Member

size * CHAR_BIT

}

u32 BytesInPixels(u32 pixels) const {
return pixels * GetFormatBpp(pixel_format) / 8;

This comment has been minimized.

@lioncash

lioncash Dec 13, 2017

Member

/ CHAR_BIT

static const u32 xlut[] = {0x00, 0x01, 0x04, 0x05, 0x10, 0x11, 0x14, 0x15};
static const u32 ylut[] = {0x00, 0x02, 0x08, 0x0a, 0x20, 0x22, 0x28, 0x2a};
static constexpr u32 MortonInterleave(u32 x, u32 y) {
constexpr u32 xlut[] = {0x00, 0x01, 0x04, 0x05, 0x10, 0x11, 0x14, 0x15};

This comment has been minimized.

@lioncash

lioncash Dec 13, 2017

Member

From Discord:

Does MortonInterleave's arrays being non-static constexpr in that PR really improve anything? (as in, if the compiler optimizes all its calls despite the loop its in can't be fully compile-time optimized away).

I know GCC will always load those array values onto the stack every time now if it can't run the expression at compile-time (clang does better in this regard and would still store the values fixed without the static keyword).

@jroweboy

This comment has been minimized.

Copy link
Member

jroweboy commented Dec 13, 2017

Okay, I removed the re interpretation hack since there seems to be more problems with it than originally noticed. With that gone, please retest the games and see if they still have the same errors.

I also (hopefully) fixed compilation for linux and mac on the build bots. I'll get around to the other review comments in about 24 hours.

@emmauss

This comment has been minimized.

Copy link
Contributor

emmauss commented Dec 13, 2017

Comparing Canary 253(previous build) and 256(first texture cache build), using Pokemon Sun,
Area tested is the Pokecenter on route 8
253
Integrated Intel HD 4400 : Average Speed = 55%
Nvidia Geforce 840m: Average Speed = 43%

256
Integrated Intel HD 4400 : Average Speed = 47%
Nvidia Geforce 840m: Average Speed = 60%

Looks like caching is faster on discrete than integrated.

@jroweboy

This comment has been minimized.

Copy link
Member

jroweboy commented Dec 13, 2017

@Phanto-m added in some more checks for accelerated texture copy which should fix pokemon super mystery dungeon and also fixed a mistake i made when removing the reinterpretation hack

@Dragios

This comment has been minimized.

Copy link
Contributor

Dragios commented Dec 14, 2017

Ok, here is the latest result based on my testing.

  • Virtual Console's display works as it should now
  • Pokémon outline is still missing
  • Pokémon Mystery Dungeon series no longer crashes
  • Metal Gear Solid 3D's few spot flickering still exist (nothing changed)
  • Cooking Mama 5's display issue remains as shown here (nothing changed)
  • Performance warning log remains

@jroweboy jroweboy force-pushed the jroweboy:texcache-pt2 branch from 33f0607 to e957e15 Dec 14, 2017

@jroweboy

This comment has been minimized.

Copy link
Member

jroweboy commented Dec 14, 2017

Rebased off master to fix a conflict so it can go back on canary. Thanks for all the testing so far, I will try to address everyones feedback when I get time.

@tywald

This comment has been minimized.

Copy link

tywald commented Dec 14, 2017

A portion of the top screen(on the far right) isn't displayed correctly in Dragon Quest VII and Monster Hunter 4 Ultimate. In addition, the bottom screen is mirrored on the top screen in Dragon Quest Monsters: Terry's Wonderland 3D. Introduced in the last two commits by @Phanto-m.

Tested on the appveyor build, both msvc and mingw.

3
2
1

@Hexagon12

This comment has been minimized.

Copy link
Contributor

Hexagon12 commented Dec 14, 2017

I updated my first comment to indicate which bugs still need to be fixed, and which bugs are already fixed inside the PR.

Fire Emblem Echoes: Shadows of Valentia is also broken by those two Phanto-m commits.
citra-qt_2017-12-14_17-34-52

Those same 2 commits also make a huge performance regression in Capcom games.
Tested on Ace Attorney: Dual Destinies, in a court scene at 5x internal resolution.
Before those two commits:
Speed: 55%
FPS: 34 FPS
Frametime: 29.82 ms

After commits:
Speed: 28%
FPS: 17 FPS
Frametime: 59.48 ms

@jroweboy

This comment has been minimized.

Copy link
Member

jroweboy commented Dec 14, 2017

@tywald @Hexagon12 It seems that I was accidentally removing the wrong interval when validating a surface. This is fixed now, which should fix those few regressions from the last few commits.

@Hexagon12

This comment has been minimized.

Copy link
Contributor

Hexagon12 commented Dec 14, 2017

Can confirm that the graphical bugs are gone with Shadows of Valentia and Monster Hunter Stories (from @tywald's picture).

@jroweboy

This comment has been minimized.

Copy link
Member

jroweboy commented Feb 6, 2018

TODO List before merging:

  • I've addressed most of @lioncash's comments. I haven't done much research into the const vs constexpr for MortonInterleave so I'm iffy about reverting it. I also did not split those functions as recommended as it seemed like a stylistic choice, and I personally don't enjoy having to hunt down an extra function to find what I'm looking for. If you insist, I will gladly make the change though.
  • wwylele suggested adding more comments to the header to describe the expected usage. I still need to do this.

Last call for reviews! Will be looking to merge this in the next couple of weeks 👍

@wwylele
Copy link
Member

wwylele left a comment

LGTM as I don't see anything bad. Feel free to merge when you feel it is ready (if you are willing to fix small nits and add doc first)

glPixelStorei(GL_PACK_ROW_LENGTH, 0);
}

enum MatchFlags {

This comment has been minimized.

@wwylele

wwylele Feb 28, 2018

Member

enum class

Edit: I see that there are some bit flag operation that is hard to deal with enum class in a clear way so nvm

@wwylele wwylele added pr:reviewed and removed pr:needs-review labels Feb 28, 2018

@Senjosei

This comment has been minimized.

Copy link
Contributor

Senjosei commented Feb 28, 2018

I dont remember if uve taken a look at this but theres this https://community.citra-emu.org/t/cf-vanguard-stride-to-victory-canary-version-crash/10366

@wwylele

This comment has been minimized.

Copy link
Member

wwylele commented Mar 1, 2018

Here is some detail information about the issue above (thanks god I have a friend having this game on hand):

The assertion failure is on

void RasterizerCacheOpenGL::InvalidateRegion(...) {
    ....
    if (region_owner != nullptr) {
        ASSERT(region_owner->type != SurfaceType::Texture); // here
        ....
    }
}

The call stack is
GPU::TextureCopy->RasterizerOpenGL::AccelerateTextureCopy->InvaidateRegion

The copy dst has pixel_format = SurfaceParams::PixelFormat::ETC1 (inherited from src format?). I feel that this is an oversight of texture-only formats handling in TextureCopy, but I am not sure.

cc @jroweboy & @Phanto-m

@cluezbot

This comment has been minimized.

Copy link

cluezbot commented Mar 5, 2018

Hi, this is neobot, using neobrain's account. I'm keeping an archive of versions of this PR:

2018-03-05T05:06:26Z: c3c684c...jroweboy:1d419bac1b33afc0b9e76177f44dba8a29736474

@jroweboy

This comment has been minimized.

Copy link
Member

jroweboy commented Mar 5, 2018

@wwylele pushed a quick fix for the issue. Let me know if it resolves it and we can merge this 👍

@cluezbot

This comment has been minimized.

Copy link

cluezbot commented Mar 5, 2018

Hi, this is neobot, using neobrain's account. I'm keeping an archive of versions of this PR:

2018-03-05T09:09:36Z: c3c684c...jroweboy:c2515ff39d2123e4ac3aba904c477f8781884117

@wwylele

This comment has been minimized.

Copy link
Member

wwylele commented Mar 5, 2018

The issue is resolved. I pushed a clang-format fix commit. Will merge after CI check.

@wwylele wwylele merged commit 4befbdd into citra-emu:master Mar 5, 2018

2 of 3 checks passed

code-review/reviewable 16 files, 32 discussions left (B3n30, jroweboy, lioncash, Phanto-m, Subv, wwylele)
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment