New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GSdx-HW: Optimize RT buffer size calculation for custom resolutions #1942

Merged
merged 2 commits into from Jun 4, 2017

Conversation

@ssakash
Member

ssakash commented May 12, 2017

Summary of changes:

  • Revamped custom resolution buffer scaling code making it more efficient, should provide decent performance boost when large framebuffer is enabled at GS limited scenarios.

  • Allow future resizing of the buffer size for custom resolution when the buffer required is increased due to the changes in CRTC/Framebuffer dimensions.

Honestly, string formatting is awful in C++ without any external libraries like Boost, It's so simple in Python. They should really consider providing any easier alternative instead of using sprintf/snprintf, using the streams for formatting is even more worse. I've made the most stupid use of smart pointers ever. :P

Here are some benchmarks of the improvement, (Mostly done at GS limited scenarios)

Master Branch:

Game Resolution Large Framebuffer disabled Large framebuffer enabled
ICO 1024x1024 28 fps 28 fps
ICO 3840x2160 15 fps 12 fps
Digimon Rumble Arena 1024x1024 46 fps 46 fps
Digimon Rumble Arena 3840x2160 34 fps 32 fps
Ben10 Alien Force: Vilgax Attacks 1024x1024 95 fps 95 fps
Ben10 Alien Force: Vilgax Attacks 3840x2160 26 fps 2 fps

Pull Request Branch:

Game Resolution Large Framebuffer disabled Large framebuffer enabled
ICO 1024x1024 28 fps 28 fps
ICO 3840x2160 15 fps 12 fps
Digimon Rumble Arena 1024x1024 46 fps 46 fps
Digimon Rumble Arena 3840x2160 34 fps 34 fps
Ben10 Alien Force: Vilgax Attacks 1024x1024 95 fps 95 fps
Ben10 Alien Force: Vilgax Attacks 3840x2160 26 fps 26 fps

To be less verbose, the new buffer size calculation algorithm only increases the buffer size when necessary (Like at ICO) and doesn't increase them unnecessarily. (giving performance boost in GS limited games like Ben10 Alien Force: Vilgax Attacks)

plugins/GSdx/GSRendererHW.cpp Outdated
const int overhead = framebuffer_height[1] - framebuffer_height[0];
std::unique_ptr<char[]> buffer = std::make_unique<char[]>(150);
int size = snprintf(buffer.get(), 150, "(Custom resolution) Framebuffer size set to %dx%d (%dx%d)\n", crtc_width, crtc_height, m_width, m_height);

This comment has been minimized.

@gregory38

gregory38 May 12, 2017

Contributor

Oh please don't do this kind of code. You can use std::string foo = "my message " + std::to_string(integer_var) + " x " .. ...

This comment has been minimized.

@ssakash

ssakash May 12, 2017

Member

Initially I wanted to do it in C++ style, something like,

std::stringstream s;
s >> "(Custom resolution) Framebuffer size set to" << crtc_width << "x" ...;

The thing is that on both this and your solution, the code will be too long horizontally, so I decided to just format it using snprintf. What's wrong with the current way? This is the traditional C way to format strings. (Though aside from using smart pointer instead of bare one)

This comment has been minimized.

@gregory38

gregory38 May 12, 2017

Contributor

you know you can hit enter ;)

Or you can do

message = "foo";
message += to_string(2);
message += "esfsfs";

C is awful. And you're code might not compile on Linux.

This comment has been minimized.

@ssakash

ssakash May 12, 2017

Member

In python, it's as simple as,

s = "(Custom resolution) Framebuffer size set to {}x{} ({}x{})".format(crtc_width..)

Standard string library of C++ Sucks :/

Though fair enough, I'll convert it to std::string as you suggested.

This comment has been minimized.

@gregory38

gregory38 May 12, 2017

Contributor

Otherwise just use

const char* first_word = (large) ? "large" : "save";
fprintf(.., "%s ... %d", first_word... 14);

Edit: first_word can be inlined
All but the above code ;)

This comment has been minimized.

@danilaml

danilaml May 12, 2017

@ssakash might want to take a look: http://fmtlib.net/latest/index.html
The author is currently working on a std proposal: http://fmtlib.net/Text%20Formatting.html

This comment has been minimized.

@gregory38

gregory38 May 12, 2017

Contributor

Oh, I didn't know the std proposal. Interesting, however best case is 2020/2021 so it won't available in the standard soon (if it is ever accepted)

@gregory38

This comment has been minimized.

Contributor

gregory38 commented May 12, 2017

Anyway, I'm not sure it is a good solution (but I need to understand what you did first). Maybe it would be possible to dynamically allocate some buffers.

The constrain is that size of the rendering on the GPU is intersection of all buffers (color + depth).

We can monitor the real size of the GS draw call, Here a kind of algo (might need some tuning)

  • if depth enabled
    • if cache hit, increase the size of the buffer is needed (copy)
    • else allocate both (color depth) buffers based on max FBW/Display BW values ever reported
  • else
    • if cache hit, increase the size of the buffer is needed (copy)
    • else if sprite rendering and draw is small (let's say 128x128) allocate a RT of 128x128
    • else allocate a buffer based on max FBW/Display BW values ever reported
plugins/GSdx/GSRendererHW.cpp Outdated
m_tc->RemovePartial();
m_width = max(m_width, native_buffer.x);
m_height = max(framebuffer_height[m_large_framebuffer], native_buffer.y);

This comment has been minimized.

@turtleli
GSdx-HW: Revamp buffer size calculation
Added a more robust buffer size calculation mechanism for custom
resolutions. Improves performance in higher resolutions for games
which don't need a big buffer. There's a great boost in performance
at GS limited scenarios.

I don't even feel there's a need for the large framebuffer option right
now, For future - I plan on making the large framebuffer enabled version
as the default as the overhead is there only at situations when it's
necessary. Until then keeping the original code just to be on the safe
side in case any issue pops up.

@ssakash ssakash force-pushed the ssakash:fb_resize branch May 19, 2017

@ssakash

This comment has been minimized.

Member

ssakash commented May 19, 2017

Anyway, I'm not sure it is a good solution (but I need to understand what you did first). Maybe it would be possible to dynamically allocate some buffers.
The constrain is that size of the rendering on the GPU is intersection of all buffers (color + depth).

We can monitor the real size of the GS draw call, Here a kind of algo (might need some tuning)

if depth enabled
if cache hit, increase the size of the buffer is needed (copy)
else allocate both (color depth) buffers based on max FBW/Display BW values ever reported
else
if cache hit, increase the size of the buffer is needed (copy)
else if sprite rendering and draw is small (let's say 128x128) allocate a RT of 128x128
else allocate a buffer based on max FBW/Display BW values ever reported

I'm not sure if your idea will be reliable, there are too much assumptions on it which makes the algorithm seem a bit volatile, though I haven't really looked much into it (lack of free time) and plus not too keen on adding a new algorithm just before a release, don't want too much regressions on my head. ;)

I'd prefer to just keep the current reliable algorithm for now.

@ssakash ssakash force-pushed the ssakash:fb_resize branch 2 times, most recently May 19, 2017

@lightningterror

This comment has been minimized.

Member

lightningterror commented May 19, 2017

I'm not sure if your idea will be reliable, there are too much assumptions on it which makes the algorithm seem a bit volatile, though I haven't really looked much into it (lack of free time) and plus not too keen on adding a new algorithm just before a release, don't want too much regressions on my head. ;)

There's gonna be plenty of other releases for that, save it for dev 1.7 xD

@gregory38

This comment has been minimized.

Contributor

gregory38 commented May 21, 2017

Sorry, I didn't mean to change it now.

// Avoid using a scissor value which is too high, developers can even leave the scissor to max (2047)
// at some cases when they don't want to limit the rendering size. Our assumption is that developers
// set the scissor to the actual data in the buffer. Let's use the scissoring value only at such cases
const int scissor_height = std::min(640u, (m_context->SCISSOR.SCAY1 - m_context->SCISSOR.SCAY0) + 1);

This comment has been minimized.

@gregory38

gregory38 May 21, 2017

Contributor

I don't like to rely on scissor data here. Scissor is a per-draw call parameter. You can't infer anything with it. At least not like that. You can do it, if you check the scissor value of all draw calls.

Union on all draw calls of (scissor inter r_draw)

This comment has been minimized.

@gregory38

gregory38 May 21, 2017

Contributor

But I guess most of your gain come from it. What happen if you keep 640 instead.

@ssakash

This comment has been minimized.

Member

ssakash commented May 22, 2017

But I guess most of your gain come from it. What happen if you keep 640 instead.

It'll work but will consume more memory than necessary, like cases at ICO where scissor is 512 and the output circuit height is 256. And also similar situations at VP2/SH2 where output circuit is 448 but scissor 512 (needed for proper scaling).

I think using the scissor should be safe here, we have a safe lower limit (output circuit height) and a safe upper limit (640), nothing could possibly go wrong here. We're only using the scissor when it has a sensible value.

plugins/GSdx/GSRendererHW.cpp Outdated
std::string overhead = to_string(framebuffer_height[1] - framebuffer_height[0]);
std::string message = "(Custom resolution) Framebuffer size set to " + to_string(crtc_width) + "x" + to_string(crtc_height);
message += " (" + to_string(m_width) + "x" + to_string(m_height) + ")\n";

This comment has been minimized.

@turtleli

turtleli May 22, 2017

Member

std::to_string

@gregory38

This comment has been minimized.

Contributor

gregory38 commented May 25, 2017

@gregory38

This comment has been minimized.

Contributor

gregory38 commented May 27, 2017

@ssakash I let you handle the tutleli feedback and then we could merge it.

@ssakash

This comment has been minimized.

Member

ssakash commented May 27, 2017

A bit busy due to exams, It'll take me some time to update the code. (Preferably at 1st of June)

@ssakash

Reminder for myself to also modify these instances of lack of std::

plugins/GSdx/GSRendererHW.cpp Outdated
{
const int crtc_width = GetDisplayRect().width();
const int crtc_height = GetDisplayRect().height();
const float scaling_ratio = ceil(static_cast<float>(m_custom_height) / crtc_height);

This comment has been minimized.

@ssakash

ssakash Jun 1, 2017

Member

std::ceil

plugins/GSdx/GSRendererHW.cpp Outdated
// When Large Framebuffer is disabled - Let's only consider the height of the display rectangle
// as the base (This is wrong implementation when we consider it theoretically as CRTC has no relation
// to the Framebuffer size)
framebuffer_height[0] = static_cast<int>(round((crtc_height * scaling_ratio)));

This comment has been minimized.

@ssakash

ssakash Jun 1, 2017

Member

std::round

plugins/GSdx/GSRendererHW.cpp Outdated
// the size of the actual image data stored. (Helps ICO to properly scale to right size by help of the
// scissoring values) Display rectangle has a height of 256 but scissor has a height of 512 which seems to
// be the real buffer size.
framebuffer_height[1] = static_cast<int>(round(single_buffer_size * scaling_ratio));

This comment has been minimized.

@ssakash

ssakash Jun 1, 2017

Member

std::round

GSdx-HW: Move scaling code to separate subroutine
Move the custom resolution scaling code to a separate subroutine and
allow future RT buffer resize calls when the buffer size isn't enough.

(Example: when a game's CRTC/Framebuffer size changes. The older code
didn't consider such cases)

@ssakash ssakash force-pushed the ssakash:fb_resize branch to 00b9ed5 Jun 1, 2017

@gregory38

This comment has been minimized.

Contributor

gregory38 commented Jun 1, 2017

@ssakash
Maybe removes the use std statement and fixes all missing std:: in a future commit :)

@ssakash

This comment has been minimized.

Member

ssakash commented Jun 1, 2017

Maybe removes the use std statement and fixes all missing std:: in a future commit :)

I figure @turtleli would be more interested, I don't really care much about it. I only fixed these instances due to turtleli's request.

@turtleli

This comment has been minimized.

Member

turtleli commented Jun 1, 2017

Yeah, I've already removed using namespace std; in one of my branches ;)

@gregory38

This comment has been minimized.

Contributor

gregory38 commented Jun 3, 2017

Are we good ?

@ssakash

This comment has been minimized.

Member

ssakash commented Jun 3, 2017

Are we good ?

I think we're good to go.

@gregory38 gregory38 merged commit 23fa065 into PCSX2:master Jun 4, 2017

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@ssakash ssakash deleted the ssakash:fb_resize branch Jun 4, 2017

@lightningterror

This comment has been minimized.

Member

lightningterror commented Jun 4, 2017

1-2 fps difference in GT4. More fps is always welcome.

ssakash added a commit that referenced this pull request Oct 27, 2018

GSdx-HW: Remove inefficient scaling algorithm
Only impacts custom resolution, there used to be a much weaker algorithm
which doens't consider scissor sizes and gives a minor performance boost
in costs of accuracy (which was used when large framebuffer was disabled
in custom resolutions)

I've removed this as the performance tradeoff is rather negligible after
the implementation of #1942 and the older one is no longer necessary.
Also added an extra parameter for considering the horizontal scissor,
I'm not sure where this might be useful so this is disabled in code for
now till I discover a testcase where this helps or run it on random data
ensuring it's working properly.

Also porting this to the general scaling function might help with memory
spikes which are experienced when large framebuffer is enabled.

arcum42 added a commit that referenced this pull request Nov 11, 2018

GSdx-HW: Remove inefficient scaling algorithm
Only impacts custom resolution, there used to be a much weaker algorithm
which doens't consider scissor sizes and gives a minor performance boost
in costs of accuracy (which was used when large framebuffer was disabled
in custom resolutions)

I've removed this as the performance tradeoff is rather negligible after
the implementation of #1942 and the older one is no longer necessary.
Also added an extra parameter for considering the horizontal scissor,
I'm not sure where this might be useful so this is disabled in code for
now till I discover a testcase where this helps or run it on random data
ensuring it's working properly.

Also porting this to the general scaling function might help with memory
spikes which are experienced when large framebuffer is enabled.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment