Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RDY] Use SDL_gpu library for faster SDL rendering #19412

Closed
wants to merge 9 commits into from

Conversation

mutability
Copy link
Contributor

@mutability mutability commented Nov 22, 2016

Fixes #19071

This changes all the SDL code from using the SDL_render interface to using SDL_gpu (https://github.com/grimfang4/sdl-gpu), which is a separate library that provides a rendering abstraction on top of SDL. This obviously adds a new build dependency for the library, and it also currently requires support for OpenGL 2+ or better. That should be very common these days, anything with any sort of 3D acceleration will have OpenGL 2+ support.

SDL_gpu can be much faster if you are careful about how you arrange your blitting. In particular if you do many blits from the same texture to the same target, it will batch those together, which produces a huge speedup.

This PR first does the obvious changes to use SDL_gpu, and then implements a class that maintains an atlas texture that gathers up all the tiles in a tileset, or all the glyphs in a font, onto a single texture to take advantage of the blitting speedup.

Benchmarked here on a discrete Nvidia GTX 670M, and on the integrated graphics (Intel HD Graphics 4000). The SDL_gpu code is faster in all cases except one. The exception is when using tiles, maximum zoomed in, minimap on; the minimap code is currently somewhat slower in the SDL_gpu version and that cost dominates that case. The difference there is pretty small, 109fps vs 102fps.

In all other case the SDL_gpu code ranges from moderately faster with tiles (48fps vs 29fps for maximum-zoom-out, no minimap) to dramatically faster for "SDL ASCII" (52fps vs 5fps, a 10x speedup)

Tested on Linux (Ubuntu 16.04) only. I have done a compile test for WIndows with MXE but I can't test the result as the Windows VM I have available doesn't have suitable opengl acceleration. @BrettDong tested OS X in #19071.

This needs testing on Windows and on a wider variety of video cards.

@jasonmbrown
Copy link

jasonmbrown commented Nov 23, 2016

Mutability can you link the compiled executable for windows builds? Im willing to test and let ya know the benchmark differences I get compared to normal build. (Im running an Intel HD 3000 Series gpu) Im in the middle of setting up cross compiling (linux -> win) so If you cant, Il test it once I get all my cross compile setup done.

@macrosblackd
Copy link
Contributor

Looks interesting, but is there a way to build with Windows (not mxe)?

@mutability
Copy link
Contributor Author

mutability commented Nov 23, 2016

Presumably it can be built for Windows via the other mechanisms, but I don't have those set up so I don't know the details. You need to (a) build SDL_gpu (which is cmake-based and I believe is set up to build with VS) and (b) add the resulting library to the set of libraries that C:DDA links.

(If you do work out the details, a patch to COMPILING.md would be greatly appreciated!)

@mutability
Copy link
Contributor Author

mutability commented Nov 23, 2016

mxe build is here: (edit: outdated, please use the build below)

I checked the curses build works OK, but the tiles version will not start in my VM (just hangs with a black window at the point where it tries to initialize opengl); I don't know whether than is a VM problem or a build problem.

If you get a window that pops up then immediately disappears, check the debug log in config/, any SDL_gpu init errors are logged there.

@mutability
Copy link
Contributor Author

Resurrected my real-hardware Windows install and the tiles version runs OK there, so it was just a VM problem.

@jasonmbrown
Copy link

jasonmbrown commented Nov 23, 2016

Comparisons Were Run on Clean Worlds and Saves tested using the debug Draw Benchmark command
They were done at Default Zoom, with all default settings.

SDL_GPU build:
SDL_GPU FPS 22 With Minimap
SDL_GPU FPS 55 Without Minimap


Comparison Build:
Version:0.C-19675-gb950780 (tiles)
Build:5896
Hardware Render:
41 FPS With Minimap
54 FPS Without Minimap

Software Render:
8 FPS With Minimap
12.5 FPS Without Minimap


Results:
GPU_SDL ran 50% slower for me With minimap.
GPU_SDL Ran Mildly faster Without minimap.

GPU_SDL Fixed Software Rendering Black lines on tiles (in windows).


Notes:
Even though the boost Doesn't appear to be a Huge amount (probably due to my integrated GPU Intel HD 3000), This fixed all my Draw Related Menu lag (I had a good 2-3 second delay when using hardware render, on basically any menu that was displaying at the same time tiles were displaying).
Hopefully this gets added to the Experimental build soon.

@mutability
Copy link
Contributor Author

Was that with a tileset, or with "use tiles"=False?

@mutability
Copy link
Contributor Author

mutability commented Nov 23, 2016

Here's my benchmark results:

tileset zoom minimap nvidia sdlrender nvidia sdlgpu intel sdlrender intel sdlgpu
chesthole normal yes 76 83 43 59
chesthole max yes 109 102 94 68
chesthole min yes 27 41 8.1 35
none n/a yes 5.4 52 4.6 42
chesthole normal no 100 123 52 117
chesthole max no 167 170 142 153
chesthole min no 29 48 8.3 48
none n/a no 5.5 65 4.7 65

(Intel HD Graphics 4000; Nvidia GeForce GTX 670M; CPU is a i7-3610QM)

@BrettDong
Copy link
Member

There are some problems displaying wide characters:

  1. Blue selection cursor only appears on the left half side on each character in screen capture 1
  2. The center prompt box in screen capture 1 seems transparent, the background ASCII art is visible
  3. The wide characters are not redrawed completely and making words overlapped together in screen capture 2
  4. A - mark appears after the period punctuation in screen capture 3 and I have no idea of the reason.

2016-11-23 10 40 06
2016-11-23 10 40 41
2016-11-23 10 40 47

@BrettDong
Copy link
Member

2016-11-23 10 47 26

Another screenshot FYI

@mutability
Copy link
Contributor Author

I think all of those are manifestations of the same problem: only half of the background of a double-width glyph is drawn.

@mutability
Copy link
Contributor Author

Try that, I don't have a suitable font to test with here so it was all placeholders but it did seem to handle the double-width background correctly.

@mutability
Copy link
Contributor Author

Updated windows builds with the wide-character fix and a faster minimap:

https://www.dropbox.com/s/ici4v4ih3dan0k9/cdda-sdlgpu-win-5dc16ff6.zip?dl=0 (full package)
https://www.dropbox.com/s/p4a0ut4e6f64gq7/cataclysm-tiles-5dc16ff6.exe?dl=0 (just the exe)

@mutability
Copy link
Contributor Author

mutability commented Nov 23, 2016

tileset zoom minimap intel sdlrender intel sdlgpu e05c996 intel sdlgpu 5dc16ff
chesthole normal yes 43 59 86
chesthole max yes 94 68 102
chesthole min yes 8.1 35 42
none n/a yes 4.6 42 54

(Intel HD Graphics 4000, i7-3610QM)

@jasonmbrown
Copy link

jasonmbrown commented Nov 24, 2016

Muta my benchmarks were done using chesthole32 tileset, and your linked bin from above.

@BrettDong
Copy link
Member

BrettDong commented Nov 24, 2016 via email

@mutability
Copy link
Contributor Author

@jasonmbrown can you check the updated build in my recent comment above and see if that helps performance on your system with the minimap on?

@DeadLeaves
Copy link
Contributor

This fixes a drawing bug I've been having on my laptop with Intel integrated graphics card. (Weird black lines between sprites) If also delivers on performance which I haven't had time to test, this seems great.

@Zireael07
Copy link
Contributor

This fixes a drawing bug I've been having on my laptop with Intel integrated graphics card. (Weird black lines between sprites)

That's great! The bug has been plaguing quite a number of players and no one could figure out what it was...

@jasonmbrown
Copy link

@mutability I tested again with the more recent build you linked. ran 5 benchmarks got between 30fps and 34fps with the minimap and between 50fps and 60fps with it turned off. But All around it looks and feels alot smoother. Im really looking forwards to this getting merged

@patternoia
Copy link
Contributor

patternoia commented Nov 24, 2016

Great work @mutability !
I need to warn you that if merged this will break the macOS builds, because they are built with FRAMEWORK=1.
sdl-gpu already supports building the SDL2_gpu.framework if built with cmake -G "Unix Makefiles".
You would need to update the Makefile accordingly to link against this framework like it is now done for regular SDL2.framework

@mutability
Copy link
Contributor Author

I can't build on macOS (or Visual Studio, or native mingw/msys) so patches to the Makefile and/or COMPILING.md would be appreciated

@patternoia
Copy link
Contributor

@BrettDong since you are already into this PR, could you provide us such a patch?

@mutability
Copy link
Contributor Author

Except for build stuff I think this is ready to go.

A word of caution: the software rendering option no longer does anything. The tiles version will only work if you have OpenGL 2 or higher available. If we want to retain support for software rendering in some form, I have some work in progress to restore software rendering by adding a SDL_gpu renderer that maps back to SDL_render's software renderer, but that's not ready yet and I'd rather handle that that as a separate PR.

@patternoia
Copy link
Contributor

Do you mean an abstract renderer factory class which allows to switch between (and instantiate a proper instance of a) software renderer, accelerated SDL2 renderer or new SDL2_gpu?
That would be a good non-breaking transition approach which will allow to obsolete others with time.

@mutability
Copy link
Contributor Author

mutability commented Nov 24, 2016

No, hooking in another abstraction layer like that is pretty annoying. I mean everything uses SDL_gpu, and providing a SDL_gpu renderer that implements the SDL_gpu calls in terms of SDL_render. It's not very efficient but it works and it means we don't have to juggle two implementations of the main code.

(That is, the software renderer calls GPU_RegisterRenderer to register itself with SDL_gpu, and provides a GPU_RendererImpl populated with functions that end up calling SDL_Render)

@BrettDong
Copy link
Member

BrettDong commented Nov 25, 2016 via email

@patternoia
Copy link
Contributor

I'll then provide the patch for frameworks next week if no one will do it faster.

@mutability
Copy link
Contributor Author

Seeing something odd with the alpha channel on semitransparent sprites - most obvious with gas fields (bloated zombie) - my current guess is that we're doing everything right but the driver is picking a 1-bit alpha channel format or something silly, so it's either fully transparent or fully opaque.

Trying some SDL_gpu changes to tell it to use 8-bit alpha.

@mutability
Copy link
Contributor Author

Unfortunately, I am unlikely to do any of that.

If you want to eliminate the development-time requirement to build SDL_gpu (there is no runtime requirement) then you could include a copy of it in the source tree as is done for some other code.

If you want to remove the use of SDL_gpu and write your own tile batcher, that'd be great, but that's not something I will be doing. I don't have the necessary opengl knowledge and I don't really want to acquire it right now; and opengl is notoriously picky to get right across lots of systems. The reason I picked SDL_gpu is exactly so that we didn't need to rediscover that knowledge.

If you want to organize some more widespread testing, that'd be great, but it's not something I can really do.

I am very, very unlikely to split this PR into multiple PRs given the past history of trying to get anything merged into this project. I would guess that it would take 6 months or more of wrangling PRs to get that done. I already have other changes that require this PR to be committed that are on hold.

I took care to split the changes into logically separate commits that should individually work - you should review those.

I can't do much about it if you just don't understand the changes. If some areas need more comments, let me know where.

@mutability
Copy link
Contributor Author

The software rendering branch linked above basically works except for some tile alignment glitches that are looking like a SDL_render bug. I would want everything else sorted out before spending more time on that.

This is only the necessary changes to get it working, there are
no speedups here.
The submap images are individually small (typically 36x36) and making
them subregions of a single larger texture makes the final series
of blits, to assemble the final display from from the submap images,
much faster as they can be batched.
I plan to reimplement this in a future PR, so this just hides the
option while retaining the infrastructure.
@mutability
Copy link
Contributor Author

Rebased to current master, waiting on a merge or feedback to my comment above.

Updated, untested windows build: https://www.dropbox.com/s/fovgy4bv347g2c5/cdda-sdlgpu-win-922a5cfa.zip?dl=0

@kevingranade
Copy link
Member

There's no feedback to give, "these things must happen for this to be merged", "I won't do those things". The obvious result is no merge.
Anyone can feel free to reopen if they want to meet the above requirements.

@mutability
Copy link
Contributor Author

Please do me the courtesy of actually reading and responding to my comment above; I made several suggestions about how to make progress on this issue.

@kevingranade
Copy link
Member

kevingranade commented Dec 30, 2016 via email

@mutability
Copy link
Contributor Author

Well, I tried. I just don't see how multiple PRs would work, TBH: each PR would include all changes of the dependent PRs and it would become a rebase nightmare. I have some time to spare to work on this, but not that much time!

Like you, I have no interest in running a testing campaign.

I'll just rebase this against master periodically and put up new builds and hope it gets some organic testing.

@mutability
Copy link
Contributor Author

I can do a naive decomposition into half a dozen PRs if you think that's really a useful thing - one commit per PR.

@kevingranade
Copy link
Member

I'm not interested in discussing this unless there's a solution available for both of the major issues.

  1. Software rendering must be supported.
  2. SDL_gpu must be an optional build dependency.

@mutability
Copy link
Contributor Author

(1) I can do - see the branch I linked above.
(2) you said wasn't a dealbreaker, has something changed since your previous comment?

@mutability
Copy link
Contributor Author

mutability commented Jan 2, 2017

For (2) would you accept "include a copy of SDL_gpu"? (The cmake dependency is easy to break).

I would prefer "include a copy" over "two implementations", as if we have two parallel implementations only one of which is actually built in the CI builds, then (a) the pure SDL implementation is going to rot and (b) we cannot do things that SDL_gpu makes possible but which are not easy to do in vanilla SDL (e.g. premultiplied alpha)

@mutability
Copy link
Contributor Author

Rebased, updated windows builds (untested): https://www.dropbox.com/s/scco5axx4blwkfn/cdda-sdlgpu-win-31d99bde.zip?dl=0

@kevingranade: Would you accept "include a copy of SDL_gpu"?

@kevingranade
Copy link
Member

I'm not interested in discussing this unless there's a solution available for both of the major issues.

Software rendering must be supported.
SDL_gpu must be an optional build dependency.

Importing the SDL_gpu source partially addresses the dependency issue, but is a poor practice and also adds a hard build dependency on cmake, which is unacceptable.

@mutability
Copy link
Contributor Author

mutability commented Jan 23, 2017 via email

@kevingranade
Copy link
Member

kevingranade commented Jan 23, 2017 via email

@mutability
Copy link
Contributor Author

I mean you are making it organizationally hard. If I go ahead and do my own design, it gets vetoed after the fact. If I ask for input, it takes a month and quite a bit of fighting to get even a minimal response.

I don't think we are going to reach agreement on this particular issue, since the changes do fundamentally require sdlgpu, but you don't want the dependency. I don't see maintaining two copies of the graphics code as practical. So this PR is effectively dead.

I will maintain the changes in my own repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet