Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert PICA Shaders to GLSL #3499

Closed
wants to merge 6 commits into from

Conversation

Projects
None yet
@jroweboy
Copy link
Member

commented Mar 10, 2018

Notice: this code was written by @Phanto-m and not me.

Foreword

This is a mega PR with all of the individual portions of the hardware renderer updates in one convenient location. The next step is to split this PR into smaller, manageable chunks, and submit them as individual PRs. When the small PR is merged, simply rebase this PR off the latest master and force push over my branch (maintainers have privileges to push to this branch on my fork) so people testing in Canary don't run into builds without the new code. Special thanks to @MerryMage for splitting the code into smaller commits!

Path forward

Here are some suggested future PRs that can be broken out of this one and reviewed/tested/merged one at a time. Please update this description with a link to the PR, so we can track progress.

  • PR #3522 -- Update GLAD to include extensions (fast to merge so get it out of the way)
  • Add Decompile Program
  • Prepare to use decompiled shaders (force fallback to CPU shaders)
  • Add streaming buffer support for vertices (New feature suggested by degasus. Integrate the streaming buffers with the rest of the code)
  • Add AccelerateDrawBatch (This is where we need to make sure everything is debugged before merging)
  • Add settings to UI (maybe just put this in as part of the previous pr so its easier to test?)

Any of the smaller commits that merry wrote could either be PRed as part of one of the larger changes or as a standalone PR (I have no preference)

Design

I will be writing it up as part of each of the follow up prs (or whoever else is planning on making any of these prs can do it) Hopefully this will be much easier to review and track changes.

Apologies for the lack of communication. Please let me know if you have any questions or concerns about my plan of action.


This change is Reviewable

@cluezbot

This comment has been minimized.

Copy link

commented Mar 10, 2018

Hi, this is neobot, using neobrain's account. I'm keeping an archive of versions of this PR:

2018-03-10T16:13:37Z: 5dd8ac9...jroweboy:88b1f64907ee45842d672358a4a60d1018b7726e

@adityaruplaha

This comment has been minimized.

Copy link
Contributor

commented Mar 10, 2018

So it dynamically recompiles all PICA shader commands to GLSL?

PrimitiveAssembler<Shader::OutputVertex>& primitive_assembler = g_state.primitive_assembler;

auto hw_shaders_setting = Settings::values.hw_shaders;
bool accelerate_draw = hw_shaders_setting != Settings::HwShaders::Off &&

This comment has been minimized.

Copy link
@Subv

Subv Mar 10, 2018

Member

This condition can be made clearer

This comment has been minimized.

Copy link
@lioncash

lioncash Mar 10, 2018

Member

This and the following code up to and including the switch statement should probably be broken out into its own function this case statement is getting ridiculously long.

@FearlessTobi

This comment has been minimized.

Copy link
Contributor

commented Mar 10, 2018

Enabling GPU Shader Emulation seems to fix #3130, #2443, #3244 and #2297.

@Tilka

This comment has been minimized.

Copy link
Contributor

commented Mar 10, 2018


Turns out there is a boolean overload of mix() which is basically a component-wise select, nice.
@jroweboy

This comment has been minimized.

Copy link
Member Author

commented Mar 10, 2018

@FearlessTobi note that this doesn't fix any issues in the shader jit. this is just a different implementation that runs instead of the shader jit. Those bugs will still need fixing as the shader jit isn't going away any time soon.

@FearlessTobi

This comment has been minimized.

Copy link
Contributor

commented Mar 10, 2018

@jroweboy I'm aware of that and I know they probably won't be closed, because the underlying issues still need to be fixed but I think anyway it's a good idea to document which games will be "fixed" by this for the sake of documentation.

@lioncash
Copy link
Member

left a comment

I have to say, I'm disappointed.

I'm disappointed because the bulk of the commit messages explain literally nothing about what work has been done. When work on something is done, it should be explained in the commits themselves, not the main message on the PR. Unlike the main message on a PR, they're actually preserved in the repository's commit history. I don't think dropping 24K+ LoC into a codebase with zero explanation given about the implementation is a good idea, ever.

This contributes to a problem that should be avoided called "tribal" knowledge, where only one or two people actually know how the central thing is supposed to work (because they were the one that designed it) meanwhile anyone else wanting to get into that area has to go and rediscover everything for themselves. If those two people left for any explicable reason, can you confidently say it would be easy to understand how everything works to someone without prior history at all to the code? In other words it raises the barrier-to-entry from a documentation perspective for no reason.

Anyway, I've left some review comments (which I hope are properly appended into the relevant commits and not globbed into an addressing commit).

@@ -7,11 +7,17 @@
#include "common/common_types.h"
#include "core/hw/gpu.h"


This comment has been minimized.

Copy link
@lioncash

lioncash Mar 10, 2018

Member

This is kind of gratuitous spacing just for a vector include

@@ -188,7 +291,13 @@ RasterizerOpenGL::RasterizerOpenGL() : shader_dirty(true) {
SyncDepthWriteMask();
}

RasterizerOpenGL::~RasterizerOpenGL() {}
RasterizerOpenGL::~RasterizerOpenGL() {
if (stream_buffer != nullptr) {

This comment has been minimized.

Copy link
@lioncash

lioncash Mar 10, 2018

Member
if (stream_buffer == nullptr) {
    return;
}
GLint vs_input_index_max;
GLsizeiptr vs_input_size;

void AnalyzeVertexArray(bool is_indexed);

This comment has been minimized.

Copy link
@lioncash

lioncash Mar 10, 2018

Member

Functions shouldn't be intermixed with variable declarations

public:
OGLPipeline() = default;
OGLPipeline(OGLPipeline&& o) {
std::swap(handle, o.handle);

This comment has been minimized.

Copy link
@lioncash

lioncash Mar 10, 2018

Member

this should probably invalidate the other handle, rather than just swapping it with the other one.

Release();
}
OGLPipeline& operator=(OGLPipeline&& o) {
std::swap(handle, o.handle);

This comment has been minimized.

Copy link
@lioncash
// Licensed under GPLv2 or any later version
// Refer to the license.txt file included.

#include <memory>

This comment has been minimized.

Copy link
@lioncash

lioncash Mar 10, 2018

Member

<cstddef> and <utility> need to be included.

};

OGLStreamBuffer::OGLStreamBuffer(GLenum target) {
gl_target = target;

This comment has been minimized.

Copy link
@lioncash

lioncash Mar 10, 2018

Member

This can be assigned in a constructor initializer list.

@@ -196,14 +197,37 @@ struct ShaderSetup {
}

std::array<u32, MAX_PROGRAM_CODE_LENGTH> program_code;
bool program_code_hash_dirty = true;

u64 GetProgramCodeHash() {

This comment has been minimized.

Copy link
@lioncash

lioncash Mar 10, 2018

Member

Functions shouldn't be intermixed with variable declarations.

std::array<u32, MAX_SWIZZLE_DATA_LENGTH> swizzle_data;
bool swizzle_data_hash_dirty = true;

u64 GetSwizzleDataHash() {

This comment has been minimized.

Copy link
@lioncash

/// Data private to ShaderEngines
struct EngineData {
unsigned int entry_point;
/// Used by the JIT, points to a compiled shader object.
const void* cached_shader = nullptr;
} engine_data;

private:
u64 program_code_hash;

This comment has been minimized.

Copy link
@lioncash

lioncash Mar 10, 2018

Member

These should probably have default values.

@wwylele

This comment has been minimized.

Copy link
Member

commented Mar 10, 2018

Agree with what Lioncash said (plus what I expressed on discord). I understand we are riding a hype train and you worked hard to manage to release all the stuff together. However, I hope that we can still chill here and do what healthy development needs, as we have been always required for all PRs: good code quality, justifiable design, descriptive commit message and documentation etc.

I am going to start reading the code tomorrow, hoping we can eventually sort this out. Last time for the texcache I kinda let it slip away because that part of code wasn't well understood anyway. But for this part I will review into finer details since I worked on it before.

@adityaruplaha

This comment has been minimized.

Copy link
Contributor

commented Mar 11, 2018

@LogPod there wasn't any PR description. But now the article's out, so I get it. But basically it's what I thought.

degasus added a commit to degasus/citra that referenced this pull request Mar 11, 2018

@wwylele

This comment has been minimized.

Copy link
Member

commented Mar 11, 2018

First request: explain what the glad update does, and if the update is indeed necessary, separate it into another PR and we can insta-merge it first.

@MerryMage MerryMage force-pushed the jroweboy:glvtx-pr branch from 88b1f64 to 2e9d58a Mar 11, 2018

@cluezbot

This comment has been minimized.

Copy link

commented Mar 11, 2018

Hi, this is neobot, using neobrain's account. I'm keeping an archive of versions of this PR:

2018-03-11T15:37:05Z: 141c007...jroweboy:2e9d58ace3a0d6b7534e045e4db1ef9fc8bda5b4

@MerryMage

This comment has been minimized.

Copy link
Member

commented Mar 11, 2018

Changes made:

  • Split the glvtx diff 1 commit into multiple commits.
  • Addressed a subset of Lioncash's comments.
  • I didn't like was going on with the s/private/public thing in primitive assembler. Added IsEmpty and GetTopology member functions instead.
  • Removed some dead code.

Overall diff between the previous version and my changes below:

diff --git a/src/video_core/command_processor.cpp b/src/video_core/command_processor.cpp
index b282c150f..476b88337 100644
--- a/src/video_core/command_processor.cpp
+++ b/src/video_core/command_processor.cpp
@@ -297,11 +297,10 @@ static void WritePicaReg(u32 id, u32 value, u32 mask) {
                                (regs.pipeline.use_gs == PipelineRegs::UseGS::No ||
                                 hw_shaders_setting == Settings::HwShaders::All);
 
-        accelerate_draw &=
-            primitive_assembler.buffer_index == 0 && !primitive_assembler.strip_ready;
+        accelerate_draw &= primitive_assembler.IsEmpty();
 
         if (regs.pipeline.use_gs == PipelineRegs::UseGS::No) {
-            switch (primitive_assembler.topology) {
+            switch (primitive_assembler.GetTopology()) {
             case PipelineRegs::TriangleTopology::Shader:
             case PipelineRegs::TriangleTopology::List:
                 accelerate_draw &= (regs.pipeline.num_vertices % 3) ==
diff --git a/src/video_core/primitive_assembly.cpp b/src/video_core/primitive_assembly.cpp
index 52fc84363..b282ccca0 100644
--- a/src/video_core/primitive_assembly.cpp
+++ b/src/video_core/primitive_assembly.cpp
@@ -71,6 +71,16 @@ void PrimitiveAssembler<VertexType>::Reconfigure(PipelineRegs::TriangleTopology
     this->topology = topology;
 }
 
+template <typename VertexType>
+bool PrimitiveAssembler<VertexType>::IsEmpty() const {
+    return buffer_index == 0 && strip_ready == false;
+}
+
+template <typename VertexType>
+PipelineRegs::TriangleTopology PrimitiveAssembler<VertexType>::GetTopology() const {
+    return topology;
+}
+
 // explicitly instantiate use cases
 template struct PrimitiveAssembler<Shader::OutputVertex>;
 
diff --git a/src/video_core/primitive_assembly.h b/src/video_core/primitive_assembly.h
index 33efdeb94..5c1d6f5d4 100644
--- a/src/video_core/primitive_assembly.h
+++ b/src/video_core/primitive_assembly.h
@@ -45,7 +45,17 @@ struct PrimitiveAssembler {
      */
     void Reconfigure(PipelineRegs::TriangleTopology topology);
 
-public:
+    /**
+     * Is our internal state empty?
+     */
+    bool IsEmpty() const;
+
+    /**
+     * What is our triangle topology?
+     */
+    PipelineRegs::TriangleTopology GetTopology() const;
+
+private:
     PipelineRegs::TriangleTopology topology;
 
     int buffer_index;
diff --git a/src/video_core/rasterizer_interface.h b/src/video_core/rasterizer_interface.h
index 6a66933e9..0f7a905b9 100644
--- a/src/video_core/rasterizer_interface.h
+++ b/src/video_core/rasterizer_interface.h
@@ -7,17 +7,11 @@
 #include "common/common_types.h"
 #include "core/hw/gpu.h"
 
-
-
-#include <vector>
-
-
 struct ScreenInfo;
 
 namespace Pica {
 namespace Shader {
 struct OutputVertex;
-struct AttributeBuffer;
 }
 } // namespace Pica
 
diff --git a/src/video_core/regs_pipeline.h b/src/video_core/regs_pipeline.h
index c24002cb1..e78c3e331 100644
--- a/src/video_core/regs_pipeline.h
+++ b/src/video_core/regs_pipeline.h
@@ -46,7 +46,6 @@ struct PipelineRegs {
             BitField<26, 2, u32> size6;
             BitField<28, 2, VertexAttributeFormat> format7;
             BitField<30, 2, u32> size7;
-            u32 format_low_raw;
         };
 
         union {
@@ -63,8 +62,6 @@ struct PipelineRegs {
 
             // number of total attributes minus 1
             BitField<28, 4, u32> max_attribute_index;
-
-            u32 format_high_raw;
         };
 
         inline VertexAttributeFormat GetFormat(int n) const {
@@ -113,8 +110,6 @@ struct PipelineRegs {
                 BitField<20, 4, u32> comp5;
                 BitField<24, 4, u32> comp6;
                 BitField<28, 4, u32> comp7;
-
-                u32 config_low_raw;
             };
 
             union {
@@ -127,8 +122,6 @@ struct PipelineRegs {
                 BitField<16, 8, u32> byte_count;
 
                 BitField<28, 4, u32> component_count;
-
-                u32 config_high_raw;
             };
 
             inline int GetComponent(int n) const {
diff --git a/src/video_core/renderer_opengl/gl_rasterizer.cpp b/src/video_core/renderer_opengl/gl_rasterizer.cpp
index f2e4a2093..5617da495 100644
--- a/src/video_core/renderer_opengl/gl_rasterizer.cpp
+++ b/src/video_core/renderer_opengl/gl_rasterizer.cpp
@@ -292,11 +292,13 @@ RasterizerOpenGL::RasterizerOpenGL() {
 }
 
 RasterizerOpenGL::~RasterizerOpenGL() {
-    if (stream_buffer != nullptr) {
-        state.draw.vertex_buffer = stream_buffer->GetHandle();
-        state.Apply();
-        stream_buffer->Release();
+    if (stream_buffer == nullptr) {
+        return;
     }
+
+    state.draw.vertex_buffer = stream_buffer->GetHandle();
+    state.Apply();
+    stream_buffer->Release();
 }
 
 /**
diff --git a/src/video_core/renderer_opengl/gl_resource_manager.h b/src/video_core/renderer_opengl/gl_resource_manager.h
index cc7fa5d57..3edd6545f 100644
--- a/src/video_core/renderer_opengl/gl_resource_manager.h
+++ b/src/video_core/renderer_opengl/gl_resource_manager.h
@@ -115,13 +115,13 @@ class OGLPipeline : private NonCopyable {
 public:
     OGLPipeline() = default;
     OGLPipeline(OGLPipeline&& o) {
-        std::swap(handle, o.handle);
+        handle = std::exchange<GLuint>(o.handle, 0);
     }
     ~OGLPipeline() {
         Release();
     }
     OGLPipeline& operator=(OGLPipeline&& o) {
-        std::swap(handle, o.handle);
+        handle = std::exchange<GLuint>(o.handle, 0);
         return *this;
     }
 
@@ -181,13 +181,13 @@ class OGLSync : private NonCopyable {
 public:
     OGLSync() = default;
     OGLSync(OGLSync&& o) {
-        std::swap(handle, o.handle);
+        handle = std::exchange<GLsync>(o.handle, nullptr);
     }
     ~OGLSync() {
         Release();
     }
     OGLSync& operator=(OGLSync&& o) {
-        std::swap(handle, o.handle);
+        handle = std::exchange<GLsync>(o.handle, nullptr);
         return *this;
     }
 
diff --git a/src/video_core/renderer_opengl/gl_shader_decompiler.cpp b/src/video_core/renderer_opengl/gl_shader_decompiler.cpp
index 053d9ed83..5de9cf903 100644
--- a/src/video_core/renderer_opengl/gl_shader_decompiler.cpp
+++ b/src/video_core/renderer_opengl/gl_shader_decompiler.cpp
@@ -843,31 +843,32 @@ std::string DecompileProgram(const std::array<u32, MAX_PROGRAM_CODE_LENGTH>& pro
     call_subroutine = [&](const Subroutine& subroutine) -> u32 {
         if (subroutine.IsInline()) {
             return compile_range(subroutine.begin, subroutine.end, [](u32) { UNREACHABLE(); });
+        }
+
+        std::function<bool(const Subroutine&)> maybe_end_instr =
+            [&maybe_end_instr](const Subroutine& subroutine) -> bool {
+            for (auto& callee : subroutine.calls) {
+                if (maybe_end_instr(*callee.second)) {
+                    return true;
+                }
+            }
+            for (auto& branch : subroutine.branches) {
+                if (maybe_end_instr(*branch.second)) {
+                    return true;
+                }
+            }
+            return subroutine.end_instr_distance.is_initialized();
+        };
+
+        if (subroutine.end_instr_distance) {
+            add_line(subroutine.GetName() + "();");
+            add_line("return true;");
+        } else if (maybe_end_instr(subroutine)) {
+            add_line("if (" + subroutine.GetName() + "()) { return true; }");
         } else {
-            std::function<bool(const Subroutine&)> maybe_end_instr =
-                [&maybe_end_instr](const Subroutine& subroutine) -> bool {
-                for (auto& callee : subroutine.calls) {
-                    if (maybe_end_instr(*callee.second)) {
-                        return true;
-                    }
-                }
-                for (auto& branch : subroutine.branches) {
-                    if (maybe_end_instr(*branch.second)) {
-                        return true;
-                    }
-                }
-                return subroutine.end_instr_distance.is_initialized();
-            };
-
-            if (subroutine.end_instr_distance) {
-                add_line(subroutine.GetName() + "();");
-                add_line("return true;");
-            } else if (maybe_end_instr(subroutine)) {
-                add_line("if (" + subroutine.GetName() + "()) { return true; }");
-            } else {
-                add_line(subroutine.GetName() + "();");
-            }
+            add_line(subroutine.GetName() + "();");
         }
+
         return subroutine.end;
     };
 
diff --git a/src/video_core/renderer_opengl/gl_shader_decompiler.h b/src/video_core/renderer_opengl/gl_shader_decompiler.h
index dc097ac9e..c74b850bb 100644
--- a/src/video_core/renderer_opengl/gl_shader_decompiler.h
+++ b/src/video_core/renderer_opengl/gl_shader_decompiler.h
@@ -2,8 +2,11 @@
 // Licensed under GPLv2 or any later version
 // Refer to the license.txt file included.
 
+#include <array>
 #include <functional>
-#include "../shader/shader.h"
+#include <string>
+#include "common/common_types.h"
+#include "video_core/shader/shader.h"
 
 namespace Pica {
 namespace Shader {
#include <glad/glad.h>

namespace GLShader {

/**
* Utility function to create and compile an OpenGL GLSL shader program (vertex + fragment shader)
* @param vertex_shader String of the GLSL vertex shader program
* @param geometry_shader String of the GLSL geometry shader program
* @param fragment_shader String of the GLSL fragment shader program
* @returns Handle of the newly created OpenGL shader object

This comment has been minimized.

Copy link
@MerryMage

MerryMage Mar 11, 2018

Member

Document feedback_vars and separable_program.

constexpr bool PRINT_DEBUG = true;
constexpr u32 PROGRAM_END = MAX_PROGRAM_CODE_LENGTH;

std::function<boost::optional<u32>(u32, u32)> find_end_instr =

This comment has been minimized.

Copy link
@MerryMage

MerryMage Mar 11, 2018

Member

perhaps const auto instead, since there's no real reason to use std::function.

case PipelineRegs::TriangleTopology::Strip:
case PipelineRegs::TriangleTopology::Fan:
// accelerate_draw &= peek_into_cmdlist_for_reset_primitive();
// accelerate_draw = false;

This comment has been minimized.

Copy link
@MerryMage

MerryMage Mar 11, 2018

Member

Commented code?

@@ -28,6 +29,8 @@
#include "video_core/vertex_loader.h"
#include "video_core/video_core.h"

#include "common/bit_set.h"

This comment has been minimized.

Copy link
@MerryMage

MerryMage Mar 11, 2018

Member

Is this #include used?

@cluezbot

This comment has been minimized.

Copy link

commented Mar 11, 2018

Hi, this is neobot, using neobrain's account. I'm keeping an archive of versions of this PR:

2018-03-11T17:00:03Z: 6c63bb1...jroweboy:9de84c966b00439d625a9841220eeebc97bd4556

@MerryMage MerryMage force-pushed the jroweboy:glvtx-pr branch from 2e9d58a to 9de84c9 Mar 11, 2018

@MerryMage

This comment has been minimized.

Copy link
Member

commented Mar 11, 2018

Split diff 2. No change to diff.

template <>
struct hash<GLShader::PicaVSConfig> {
size_t operator()(const GLShader::PicaVSConfig& k) const {
return Common::ComputeHash64(&k, sizeof(GLShader::PicaVSConfig));

This comment has been minimized.

Copy link
@MerryMage

MerryMage Mar 11, 2018

Member

Padding dependent, ditto below.

(vertex_min * loader.byte_count);
u32 data_size = loader.byte_count * vertex_num;

// TODO: cache this too

This comment has been minimized.

Copy link
@MerryMage

MerryMage Mar 11, 2018

Member

A bit more context would be nice.

  • Is this required for accuracy or is it a nice-to-have for speed?
  • I'm assuming that this involves having a set of (data_addr, data_size) -> buffer_offset entries.
@MYSTERlOUSGUY

This comment has been minimized.

Copy link

commented Mar 11, 2018

Not sure if im posting in correct location but im getting freezes and display driver crashes every time i transition to next area in pokemon sun/moon games (lastest Cannary). This is followed by citra not emulating anything and if i try to close it it crashes. This does not happen if i use CPU for shader emulation. While speed boost is incredible something is seriously wrong with GPU implementation. I have ryzen 5 1600 and Radeon RX 560.

else {
LOG_ERROR(Render_OpenGL, "Error compiling fragment shader:\n%s",
&fragment_shader_error[0]);
}

This comment has been minimized.

Copy link
@wwylele

wwylele Mar 11, 2018

Member

Can we fold the three compilation blocks into one loop?

glGetIntegerv(GL_NUM_EXTENSIONS, &ext_num);
for (GLint i = 0; i < ext_num; i++) {
if (std::string("GL_ARB_separate_shader_objects")
.compare(reinterpret_cast<const char*>(glGetStringi(GL_EXTENSIONS, i))) == 0) {

This comment has been minimized.

Copy link
@wwylele

wwylele Mar 11, 2018

Member

Can this be replaced by if (GLAD_GL_ARB_separate_shader_objects)?

uniform sampler2D tex[3];
uniform sampler2D tex0;
uniform sampler2D tex1;
uniform sampler2D tex2;

This comment has been minimized.

Copy link
@wwylele

wwylele Mar 11, 2018

Member

Is this change intentional? I am fine with the change, just want to know if there is a particular reason.

This comment has been minimized.

Copy link
@MerryMage

MerryMage Mar 11, 2018

Member

The change is intentional. Shader code, now instead of doing "tex["+index+"]", does "tex"+index instead.

: PicaShaderConfigCommon(regs.vs, setup) {}

bool operator==(const PicaVSConfig& o) const {
return std::memcmp(this, &o, sizeof(PicaVSConfig)) == 0;

This comment has been minimized.

Copy link
@wwylele

wwylele Mar 11, 2018

Member

If you want to use memcmp and hash over the config struct, please refer to current gl_shader_gen.h/PicaShaderConfig to see how padding issue is avoided (generally, wrap by union and memset on init)

@wwylele

This comment has been minimized.

Copy link
Member

commented Mar 11, 2018

Would it reduce the code complexity if we remove the PICA geometry shader -> GLSL support (= always fall back to old shader emulation when GS enabled) in this initial implementation, and try to add it in the future? I am proposing this because:

As we once discussed, PICA geometry shader doesn't always work even in Point mode, due to potential reliance on register preservation across invocation. I haven't heard we had a solution to this and I don't see in the code where it solve / at least detect this (point me out if I am wrong!). This means that we require user to decide if there is graphics problem potentially caused by this and to switch the mode to partial glvtx, which is a bad user experience, and for this reason this option is actually removed now, leaving user unable to avoid the problem by changing the configuration.

PICA geometry shader isn't heavily used by games, as we were already able to play games before GS was even implemented, so removing the support for it for now shouldn't cause a large regression in performance (and no performance regression at all in no-GS games)

@@ -20,8 +20,8 @@ ConfigureGraphics::ConfigureGraphics(QWidget* parent)

ui->layoutBox->setEnabled(!Settings::values.custom_layout);

connect(ui->renderer_comboBox, QOverload<int>::of(&QComboBox::currentIndexChanged), this,
&ConfigureGraphics::UpdateRenderer);
connect(ui->renderer_comboBox, SIGNAL(currentIndexChanged(int)), this,

This comment has been minimized.

Copy link
@MerryMage

MerryMage Mar 11, 2018

Member

static_cast to the proper signature would be preferable, instead of reverting to old-style SIGNAL

@Lokuzt

This comment has been minimized.

Copy link

commented Mar 11, 2018

citra cpu vs gpu mhgen
(Left is OpenGL CPU mode and Right is OpenGL GPU mode with Accurate Hardware Shader, when enable it looks like CPU mode but with higher framerate)

Monster Hunter Generations GUI glitch with OpenGL in GPU mode. While in CPU it works fine, in GPU it renders everything but GUI. When disabled "Accurate Hardware Shader" it works normaly. I'm using an:

Windows 7 64-bit
Intel i3 530 @2.93 GHz
AMD HD Radeon 5670
4 GB RAM

@TTFH

This comment has been minimized.

Copy link

commented Mar 11, 2018

Some games crash because u8* GetPhysicalPointer(PAddr) is called with an Invalid FCRAM address from the next functions:

RasterizerOpenGL::
void AnalyzeVertexArray(bool)
void SetupVertexArray(u8*, GLintptr)
void DrawTriangles()

This is not an issue specific from this PR because the same happen with CPU Shader Emulation on:

void WritePicaReg(u32, u32, u32)
void VertexLoader::LoadVertex(u32, int, int, AttributeBuffer&, MemoryAccessTracker&)

Example game: Hyrule Warriors Legends

A fix may be return nullptr in GetPhysicalPointer and skip the above functions in that cases.
But that don't fix the main problem, just made games playables.

@wwylele wwylele force-pushed the jroweboy:glvtx-pr branch from 60f488c to 716aadc May 6, 2018

@greggameplayer

This comment has been minimized.

Copy link

commented May 6, 2018

@wwylele When i enable the accurate hardware shader option the problem is solved but hw_shaders = 2 do the same as hw_shaders = 1

@wwylele wwylele force-pushed the jroweboy:glvtx-pr branch 4 times, most recently from 3218d08 to 14f331c May 6, 2018

@LOuroboros

This comment has been minimized.

Copy link

commented May 7, 2018

So, thanks to the latest code that was added, the newest Citra Canary build can now boot games when the user sets Shader Emulation to GPU with an Intel Celeron N2840 and its Integrated GPU.
However, there's this small issue that keeps appearing from time to time.
It looks like it mainly happens when you stand still. I thought I should mention it just in case.
https://streamable.com/3iz3p

So yeah: Intel Celeron N2840 and its iGPU, 2 GBs of RAM, Windows 8.1 (64 Bits), Citra Canary 482 (MinGW).

EDIT: Shoot, I forgot to mention. There's also some special effects that seem to be missing.
Here's 2 quick examples:

Flame Wheel with Shader Emulation set to CPU: https://streamable.com/rx4i6
Flame Wheel with Shader Emulation set to GPU: https://streamable.com/m3jkx

Double Kick with Shader Emulation set to CPU: https://streamable.com/9u3xr
Double Kick with Shader Emulation set to GPU: https://streamable.com/kylzy

EDIT2 (08/05/2018): If I set hw_shaders to 2 inside qt-config.ini, the effects do show up correctly though.
In fact, I feel like the game runs better than before now, which is weird.
https://streamable.com/jfj9d

used_bytes += uniform_size_aligned_gs;
}

if (sync_fs || invalidate) {

This comment has been minimized.

Copy link
@wwylele

wwylele May 7, 2018

Member

@degasus Users using NVIDIA GeForce GTX 750 Ti reports that some random artifact in Pokemon Sun/Mon after the recent uniform buffer change. I ask them to do an experiment: change this condition to true (=always pass), and the result is that the artifact is gone. What's your thought about this?

I didn't test other GPU and haven't got other reports.

This comment has been minimized.

Copy link
@degasus

degasus May 7, 2018

Contributor

I can't reproduce this issue with Pokemon Sun on my MX150. Do you have any additional infos?

This comment has been minimized.

Copy link
@Regai

Regai May 7, 2018

Can confirm flickering bug under Pokemon Sun for GTX 1060 since canary-481 (2763adf) for Windows.
canary-478 (db43205) works fine.
Canary CPU mode doesn't have the issues, Nightly works also without issues.

In my case only characters are flickering:
pokebug

I like to try the suggested experiment, but I lack knowledge in building or coding myself.

@wwylele wwylele force-pushed the jroweboy:glvtx-pr branch from 14f331c to 6594f4c May 9, 2018

@Hoshiyuu

This comment has been minimized.

Copy link

commented May 9, 2018

Not sure if this falls under this PR but here just in case, artifacts started showing up in Monster Hunter XX on canary-481-2763adf (last issue-free version tested is canary-478-db43205, no windows build was made available for 479 and 480 so i can't confirm)

image

Please advice if this is unrelated to this PR and I should open a new issue instead.

@wwylele

This comment has been minimized.

Copy link
Member

commented May 10, 2018

@Hoshiyuu that kind of artifact has been there for some Nvidia GPU since the first time citra was able to run Monster Hunter. It is in the texturing/rasterizer and is about the tiled cache. It was somehow "fixed" by the original version of this PR. However, it is likely that this PR didn't fix it, because this PR didn't touch the texturing/rasterizer part; instead, it was some unusual GPU usage, such as sharing different data in the same buffer, that disabled the (erroneous?) Nvidia optimization and removed the artifact. In recent commits I changed the code so that it worked in a more common way, which might enable the optimization again, and therefore the artifact is back.

In short, it is out of scope of this PR.

@Hoshiyuu

This comment has been minimized.

Copy link

commented May 10, 2018

Alright, thank you! I'll file a issue then referencing your comment.

@degasus

This comment has been minimized.

Copy link
Contributor

commented May 10, 2018

@wwylele Sampling a texture while it is used as framebuffer explains all of this issues on Monster Hunter. It triggers undefined behavior which yields wrong rendering if some draw calls runs out-of-order. Maxwell v2 and Pascal GPUs have optimized their rasterizer to increase this out-of-order rasterization (and shading) to optimize their cache usage. As the first implementation here used glCopyBufferSubData to update the vertex + geometry uniforms on all draw calls, the out-of-order window was cut between all draw calls. As my updated uniform handling don't emit a GPU copy any more, the GPU is able to reorder all pixels over uniform updates again, so the issue regressed.

This is of course out of scope of this PR, but there is a simple and cheap workaround: Iff the framebuffer texture shall be used for sampling, emit a glTextureBarrier. So you may fix this issue without any big performance hit and likely within 10 lines of code.

@wwylele

This comment has been minimized.

Copy link
Member

commented May 10, 2018

@degasus please move this discussion to #3731. I am going to play with your idea, but not gonna mix it with this PR.

@wwylele wwylele force-pushed the jroweboy:glvtx-pr branch from 6594f4c to d95b4a4 May 11, 2018

@wwylele wwylele force-pushed the jroweboy:glvtx-pr branch from d95b4a4 to 107ab43 May 11, 2018

@RocketRobz

This comment has been minimized.

Copy link

commented May 12, 2018

Hardware Shader (without Accurate Multiplication) now works on NVIDIA GeForce GTX 260!
Game speed has been improved.

Games tested with Canary Build 6b84ed9:

  • AR Games: 8-14FPS -> no change
  • Style Savvy: Fashion Forward: 24FPS (title screen only) -> 30FPS
  • Style Savvy: Styling Star: 20-24FPS (areas with 3D renders) -> 24-30FPS
  • Super Mario 3D Land: 30-40FPS (rendered in 15FPS) -> 55-60FPS (rendered in 30FPS)
@wwylele

This comment has been minimized.

Copy link
Member

commented May 13, 2018

Split into #3741 and #3742. Closing this mega PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.