[RDY] Use SDL_gpu library for faster SDL rendering #19412

mutability · 2016-11-22T22:36:54Z

Fixes #19071

This changes all the SDL code from using the SDL_render interface to using SDL_gpu (https://github.com/grimfang4/sdl-gpu), which is a separate library that provides a rendering abstraction on top of SDL. This obviously adds a new build dependency for the library, and it also currently requires support for OpenGL 2+ or better. That should be very common these days, anything with any sort of 3D acceleration will have OpenGL 2+ support.

SDL_gpu can be much faster if you are careful about how you arrange your blitting. In particular if you do many blits from the same texture to the same target, it will batch those together, which produces a huge speedup.

This PR first does the obvious changes to use SDL_gpu, and then implements a class that maintains an atlas texture that gathers up all the tiles in a tileset, or all the glyphs in a font, onto a single texture to take advantage of the blitting speedup.

Benchmarked here on a discrete Nvidia GTX 670M, and on the integrated graphics (Intel HD Graphics 4000). The SDL_gpu code is faster in all cases except one. The exception is when using tiles, maximum zoomed in, minimap on; the minimap code is currently somewhat slower in the SDL_gpu version and that cost dominates that case. The difference there is pretty small, 109fps vs 102fps.

In all other case the SDL_gpu code ranges from moderately faster with tiles (48fps vs 29fps for maximum-zoom-out, no minimap) to dramatically faster for "SDL ASCII" (52fps vs 5fps, a 10x speedup)

Tested on Linux (Ubuntu 16.04) only. I have done a compile test for WIndows with MXE but I can't test the result as the Windows VM I have available doesn't have suitable opengl acceleration. @BrettDong tested OS X in #19071.

This needs testing on Windows and on a wider variety of video cards.

jasonmbrown · 2016-11-23T01:51:49Z

Mutability can you link the compiled executable for windows builds? Im willing to test and let ya know the benchmark differences I get compared to normal build. (Im running an Intel HD 3000 Series gpu) Im in the middle of setting up cross compiling (linux -> win) so If you cant, Il test it once I get all my cross compile setup done.

macrosblackd · 2016-11-23T02:14:47Z

Looks interesting, but is there a way to build with Windows (not mxe)?

mutability · 2016-11-23T02:18:47Z

Presumably it can be built for Windows via the other mechanisms, but I don't have those set up so I don't know the details. You need to (a) build SDL_gpu (which is cmake-based and I believe is set up to build with VS) and (b) add the resulting library to the set of libraries that C:DDA links.

(If you do work out the details, a patch to COMPILING.md would be greatly appreciated!)

mutability · 2016-11-23T02:31:17Z

mxe build is here: (edit: outdated, please use the build below)

I checked the curses build works OK, but the tiles version will not start in my VM (just hangs with a black window at the point where it tries to initialize opengl); I don't know whether than is a VM problem or a build problem.

If you get a window that pops up then immediately disappears, check the debug log in config/, any SDL_gpu init errors are logged there.

mutability · 2016-11-23T02:47:44Z

Resurrected my real-hardware Windows install and the tiles version runs OK there, so it was just a VM problem.

jasonmbrown · 2016-11-23T07:49:39Z

Comparisons Were Run on Clean Worlds and Saves tested using the debug Draw Benchmark command
They were done at Default Zoom, with all default settings.

SDL_GPU build:
SDL_GPU FPS 22 With Minimap
SDL_GPU FPS 55 Without Minimap

Comparison Build:
Version:0.C-19675-gb950780 (tiles)
Build:5896
Hardware Render:
41 FPS With Minimap
54 FPS Without Minimap

Software Render:
8 FPS With Minimap
12.5 FPS Without Minimap

Results:
GPU_SDL ran 50% slower for me With minimap.
GPU_SDL Ran Mildly faster Without minimap.

GPU_SDL Fixed Software Rendering Black lines on tiles (in windows).

Notes:
Even though the boost Doesn't appear to be a Huge amount (probably due to my integrated GPU Intel HD 3000), This fixed all my Draw Related Menu lag (I had a good 2-3 second delay when using hardware render, on basically any menu that was displaying at the same time tiles were displaying).
Hopefully this gets added to the Experimental build soon.

mutability · 2016-11-23T12:37:02Z

Was that with a tileset, or with "use tiles"=False?

mutability · 2016-11-23T13:32:39Z

Here's my benchmark results:

tileset	zoom	minimap	nvidia sdlrender	nvidia sdlgpu	intel sdlrender	intel sdlgpu
chesthole	normal	yes	76	83	43	59
chesthole	max	yes	109	102	94	68
chesthole	min	yes	27	41	8.1	35
none	n/a	yes	5.4	52	4.6	42
chesthole	normal	no	100	123	52	117
chesthole	max	no	167	170	142	153
chesthole	min	no	29	48	8.3	48
none	n/a	no	5.5	65	4.7	65

(Intel HD Graphics 4000; Nvidia GeForce GTX 670M; CPU is a i7-3610QM)

BrettDong · 2016-11-23T14:46:11Z

There are some problems displaying wide characters:

Blue selection cursor only appears on the left half side on each character in screen capture 1
The center prompt box in screen capture 1 seems transparent, the background ASCII art is visible
The wide characters are not redrawed completely and making words overlapped together in screen capture 2
A - mark appears after the period punctuation in screen capture 3 and I have no idea of the reason.

BrettDong · 2016-11-23T14:48:32Z

Another screenshot FYI

mutability · 2016-11-23T14:55:25Z

I think all of those are manifestations of the same problem: only half of the background of a double-width glyph is drawn.

mutability · 2016-11-23T15:11:30Z

Try that, I don't have a suitable font to test with here so it was all placeholders but it did seem to handle the double-width background correctly.

mutability · 2016-11-23T20:35:11Z

Updated windows builds with the wide-character fix and a faster minimap:

https://www.dropbox.com/s/ici4v4ih3dan0k9/cdda-sdlgpu-win-5dc16ff6.zip?dl=0 (full package)
https://www.dropbox.com/s/p4a0ut4e6f64gq7/cataclysm-tiles-5dc16ff6.exe?dl=0 (just the exe)

mutability · 2016-11-23T20:47:58Z

tileset	zoom	minimap	intel sdlrender	intel sdlgpu e05c996	intel sdlgpu 5dc16ff
chesthole	normal	yes	43	59	86
chesthole	max	yes	94	68	102
chesthole	min	yes	8.1	35	42
none	n/a	yes	4.6	42	54

(Intel HD Graphics 4000, i7-3610QM)

jasonmbrown · 2016-11-24T08:53:42Z

Muta my benchmarks were done using chesthole32 tileset, and your linked bin from above.

BrettDong · 2016-11-24T10:36:24Z

Nice, now work perfectly. MacBook Air, macOS Sierra 10.12.1, Intel HD Graphics 6000 With minimap on: 74 FPS Without minimap: 93.5 FPS

mutability · 2016-11-24T13:17:41Z

@jasonmbrown can you check the updated build in my recent comment above and see if that helps performance on your system with the minimap on?

DeadLeaves · 2016-11-24T15:15:06Z

This fixes a drawing bug I've been having on my laptop with Intel integrated graphics card. (Weird black lines between sprites) If also delivers on performance which I haven't had time to test, this seems great.

Zireael07 · 2016-11-24T16:17:31Z

This fixes a drawing bug I've been having on my laptop with Intel integrated graphics card. (Weird black lines between sprites)

That's great! The bug has been plaguing quite a number of players and no one could figure out what it was...

jasonmbrown · 2016-11-24T16:19:59Z

@mutability I tested again with the more recent build you linked. ran 5 benchmarks got between 30fps and 34fps with the minimap and between 50fps and 60fps with it turned off. But All around it looks and feels alot smoother. Im really looking forwards to this getting merged

patternoia · 2016-11-24T16:43:21Z

Great work @mutability !
I need to warn you that if merged this will break the macOS builds, because they are built with FRAMEWORK=1.
sdl-gpu already supports building the SDL2_gpu.framework if built with cmake -G "Unix Makefiles".
You would need to update the Makefile accordingly to link against this framework like it is now done for regular SDL2.framework

mutability · 2016-11-24T16:49:13Z

I can't build on macOS (or Visual Studio, or native mingw/msys) so patches to the Makefile and/or COMPILING.md would be appreciated

patternoia · 2016-11-24T16:50:28Z

@BrettDong since you are already into this PR, could you provide us such a patch?

mutability · 2016-11-24T22:06:02Z

Except for build stuff I think this is ready to go.

A word of caution: the software rendering option no longer does anything. The tiles version will only work if you have OpenGL 2 or higher available. If we want to retain support for software rendering in some form, I have some work in progress to restore software rendering by adding a SDL_gpu renderer that maps back to SDL_render's software renderer, but that's not ready yet and I'd rather handle that that as a separate PR.

patternoia · 2016-11-24T22:25:10Z

Do you mean an abstract renderer factory class which allows to switch between (and instantiate a proper instance of a) software renderer, accelerated SDL2 renderer or new SDL2_gpu?
That would be a good non-breaking transition approach which will allow to obsolete others with time.

mutability · 2016-11-24T22:38:06Z

No, hooking in another abstraction layer like that is pretty annoying. I mean everything uses SDL_gpu, and providing a SDL_gpu renderer that implements the SDL_gpu calls in terms of SDL_render. It's not very efficient but it works and it means we don't have to juggle two implementations of the main code.

(That is, the software renderer calls GPU_RegisterRenderer to register itself with SDL_gpu, and provides a GPU_RendererImpl populated with functions that end up calling SDL_Render)

BrettDong · 2016-11-25T12:33:20Z

Sorry I build CDDA with homebrewed SDL in /usr/local/include & /usr/local/lib, and I totaly have no experience with framework SDL and no idea on how to update Makefile to build with framework SDL2_gpu. Do you know how to build? 在 2016/11/25 上午12:43, patternoia 写道:

…

Great work @mutability <https://github.com/mutability> ! I need to warn you that if merged, this will break the macOS builds, because they are built with FRAMEWORK=1. |sdl-gpu| already supports building the |SDL2_gpu.framework| if build with |cmake -G "Unix Makefiles"|. You would need to update the |Makefile| accordingly to link against this framework like it is now done for regular |SDL2.framework| — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#19412 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AUGWLuq3DMeyRPLdTbpu9wmcO8qugDzQks5rBb6sgaJpZM4K6BoA>.

patternoia · 2016-11-25T23:00:39Z

I'll then provide the patch for frameworks next week if no one will do it faster.

mutability · 2016-11-26T02:58:47Z

Seeing something odd with the alpha channel on semitransparent sprites - most obvious with gas fields (bloated zombie) - my current guess is that we're doing everything right but the driver is picking a 1-bit alpha channel format or something silly, so it's either fully transparent or fully opaque.

Trying some SDL_gpu changes to tell it to use 8-bit alpha.

mutability · 2016-12-24T12:33:06Z

Unfortunately, I am unlikely to do any of that.

If you want to eliminate the development-time requirement to build SDL_gpu (there is no runtime requirement) then you could include a copy of it in the source tree as is done for some other code.

If you want to remove the use of SDL_gpu and write your own tile batcher, that'd be great, but that's not something I will be doing. I don't have the necessary opengl knowledge and I don't really want to acquire it right now; and opengl is notoriously picky to get right across lots of systems. The reason I picked SDL_gpu is exactly so that we didn't need to rediscover that knowledge.

If you want to organize some more widespread testing, that'd be great, but it's not something I can really do.

I am very, very unlikely to split this PR into multiple PRs given the past history of trying to get anything merged into this project. I would guess that it would take 6 months or more of wrangling PRs to get that done. I already have other changes that require this PR to be committed that are on hold.

I took care to split the changes into logically separate commits that should individually work - you should review those.

I can't do much about it if you just don't understand the changes. If some areas need more comments, let me know where.

mutability · 2016-12-24T12:45:51Z

The software rendering branch linked above basically works except for some tile alignment glitches that are looking like a SDL_render bug. I would want everything else sorted out before spending more time on that.

This is only the necessary changes to get it working, there are no speedups here.

The submap images are individually small (typically 36x36) and making them subregions of a single larger texture makes the final series of blits, to assemble the final display from from the submap images, much faster as they can be batched.

sensible include paths

I plan to reimplement this in a future PR, so this just hides the option while retaining the infrastructure.

mutability · 2016-12-30T18:26:43Z

Rebased to current master, waiting on a merge or feedback to my comment above.

Updated, untested windows build: https://www.dropbox.com/s/fovgy4bv347g2c5/cdda-sdlgpu-win-922a5cfa.zip?dl=0

kevingranade · 2016-12-30T20:18:48Z

There's no feedback to give, "these things must happen for this to be merged", "I won't do those things". The obvious result is no merge.
Anyone can feel free to reopen if they want to meet the above requirements.

mutability · 2016-12-30T21:40:56Z

Please do me the courtesy of actually reading and responding to my comment above; I made several suggestions about how to make progress on this issue.

kevingranade · 2016-12-30T22:37:25Z

I had no intention of being rude, I took this:

Unfortunately, I am unlikely to do any of that.

To mean your post was suggestions for other people to take up, not yourself, and that no detailed response was needed. Sorry for misunderstanding. Importing the SDL_gpu source partially addresses the dependency issue, but is a poor practice and also adds a hard build dependency on cmake, which is unacceptable. On the off chance you mean including a SDL_GPU binary, that's simply not possible due to our variety of build targets and shifting compiler options. A native tile batching solution would be independent of this pr so is irrelevant to it staying open. I have no interest in running a campaign to determine whether this change does what I expect it to by failing to load in the face of poor hardware support. If I were to start working on this change, it would be to fragment it into manegable chunks that don't disable the existing functionality. I have no intention of doing this in the near future. As for pr organization, I want you to break it up and comment it more before I review it, you want me to review it and tell you what needs improvement, it's an impasse.

mutability · 2016-12-30T22:47:23Z

Well, I tried. I just don't see how multiple PRs would work, TBH: each PR would include all changes of the dependent PRs and it would become a rebase nightmare. I have some time to spare to work on this, but not that much time!

Like you, I have no interest in running a testing campaign.

I'll just rebase this against master periodically and put up new builds and hope it gets some organic testing.

mutability · 2016-12-30T23:05:47Z

I can do a naive decomposition into half a dozen PRs if you think that's really a useful thing - one commit per PR.

kevingranade · 2017-01-02T01:40:30Z

I'm not interested in discussing this unless there's a solution available for both of the major issues.

Software rendering must be supported.
SDL_gpu must be an optional build dependency.

mutability · 2017-01-02T01:48:41Z

(1) I can do - see the branch I linked above.
(2) you said wasn't a dealbreaker, has something changed since your previous comment?

mutability · 2017-01-02T15:10:30Z

For (2) would you accept "include a copy of SDL_gpu"? (The cmake dependency is easy to break).

I would prefer "include a copy" over "two implementations", as if we have two parallel implementations only one of which is actually built in the CI builds, then (a) the pure SDL implementation is going to rot and (b) we cannot do things that SDL_gpu makes possible but which are not easy to do in vanilla SDL (e.g. premultiplied alpha)

mutability · 2017-01-22T14:30:07Z

Rebased, updated windows builds (untested): https://www.dropbox.com/s/scco5axx4blwkfn/cdda-sdlgpu-win-31d99bde.zip?dl=0

@kevingranade: Would you accept "include a copy of SDL_gpu"?

kevingranade · 2017-01-23T03:29:03Z

I'm not interested in discussing this unless there's a solution available for both of the major issues.

Software rendering must be supported.
SDL_gpu must be an optional build dependency.

Importing the SDL_gpu source partially addresses the dependency issue, but is a poor practice and also adds a hard build dependency on cmake, which is unacceptable.

mutability · 2017-01-23T11:43:29Z

So is importing sdlgpu and removing the cmake dependency something that you would accept or not? I still don't know! I asked the question because your comment was ambiguous, just quoting it back at me does not help me make any progress here. (Edit: What is unclear is: did you mean "the hard build dependency on cmake is unacceptable", or "the whole concept of importing sdlgpu is unacceptable"?) I am not being obtuse here, I just do not want to start down that path only to find out later that it was not something you would accept. You are making it very difficult to contribute code to the project.

kevingranade · 2017-01-23T17:12:23Z

On Jan 23, 2017 3:43 AM, "Oliver Jowett" <notifications@github.com> wrote: So is importing sdlgpu and removing the cmake dependency something that you would accept or not? As stated, no. Importing code from another project is pretty iffy already, you're suggesting making a fork of the project, which means we would have to keep updating this new build system if we want to keep in sync with upstream. If you convince upstream to drop the cmake dependency, that'd be fine, but it's also incredibly unlikely. You are making it very difficult to contribute code to the project. I'm making it difficult to take a new, HARD dependency on a graphics library I've never heard of before, in a project that has been intentionally and explicitly dependency averse for its entire history.

mutability · 2017-01-23T18:02:26Z

I mean you are making it organizationally hard. If I go ahead and do my own design, it gets vetoed after the fact. If I ask for input, it takes a month and quite a bit of fighting to get even a minimal response.

I don't think we are going to reach agreement on this particular issue, since the changes do fundamentally require sdlgpu, but you don't want the dependency. I don't see maintaining two copies of the graphics code as practical. So this PR is effectively dead.

I will maintain the changes in my own repo.

DanmakuDan mentioned this pull request Nov 24, 2016

[CR] Performance boost for SDL ASCII drawing. (SDL Direct3D currently) #19165

Closed

mutability added 9 commits December 30, 2016 18:03

Switch from using the SDL render API to using SDL_gpu.

8740de0

This is only the necessary changes to get it working, there are no speedups here.

Add Atlas, a class to manage many sprites on a single texture.

642b52b

Use Atlas for caching font glyphs and backgrounds.

a042334

Use Atlas for tiles.

ec53227

Makefile updates for libSDL2_gpu

0feb667

Point compiling docs at a sdl-gpu fork with pkg-config support /

bde6c10

sensible include paths

Fix window resizing / scaling settings.

cc71daa

Hide the software rendering option.

922a5cf

I plan to reimplement this in a future PR, so this just hides the option while retaining the infrastructure.

mutability force-pushed the sdlgpu branch from 510c137 to 922a5cf Compare December 30, 2016 18:04

kevingranade closed this Dec 30, 2016

mutability mentioned this pull request Jan 23, 2017

ASCII gaming in SDL build is terribly slow #19071

Closed

codemime mentioned this pull request Apr 2, 2017

Utilize GPU core for more performance #20775

Closed

ZhilkinSerg mentioned this pull request Aug 23, 2017

Losslessly compress all tiles #21685

Merged

[RDY] Use SDL_gpu library for faster SDL rendering #19412

[RDY] Use SDL_gpu library for faster SDL rendering #19412

Conversation

mutability commented Nov 22, 2016 • edited

jasonmbrown commented Nov 23, 2016 • edited

macrosblackd commented Nov 23, 2016

mutability commented Nov 23, 2016 • edited

mutability commented Nov 23, 2016 • edited

mutability commented Nov 23, 2016

jasonmbrown commented Nov 23, 2016 • edited

mutability commented Nov 23, 2016

mutability commented Nov 23, 2016 • edited

BrettDong commented Nov 23, 2016

BrettDong commented Nov 23, 2016

mutability commented Nov 23, 2016

mutability commented Nov 23, 2016

mutability commented Nov 23, 2016

mutability commented Nov 23, 2016 • edited

jasonmbrown commented Nov 24, 2016 • edited

BrettDong commented Nov 24, 2016 via email

mutability commented Nov 24, 2016

DeadLeaves commented Nov 24, 2016

Zireael07 commented Nov 24, 2016

jasonmbrown commented Nov 24, 2016

patternoia commented Nov 24, 2016 • edited

mutability commented Nov 24, 2016

patternoia commented Nov 24, 2016

mutability commented Nov 24, 2016

patternoia commented Nov 24, 2016

mutability commented Nov 24, 2016 • edited

BrettDong commented Nov 25, 2016 via email

patternoia commented Nov 25, 2016

mutability commented Nov 26, 2016

mutability commented Dec 24, 2016

mutability commented Dec 24, 2016

mutability commented Dec 30, 2016

kevingranade commented Dec 30, 2016

mutability commented Dec 30, 2016

kevingranade commented Dec 30, 2016 via email

mutability commented Dec 30, 2016

mutability commented Dec 30, 2016

kevingranade commented Jan 2, 2017

mutability commented Jan 2, 2017

mutability commented Jan 2, 2017 • edited

mutability commented Jan 22, 2017

kevingranade commented Jan 23, 2017

mutability commented Jan 23, 2017 via email • edited

kevingranade commented Jan 23, 2017 via email

mutability commented Jan 23, 2017

mutability commented Nov 22, 2016 •

edited

jasonmbrown commented Nov 23, 2016 •

edited

mutability commented Nov 23, 2016 •

edited

mutability commented Nov 23, 2016 •

edited

jasonmbrown commented Nov 23, 2016 •

edited

mutability commented Nov 23, 2016 •

edited

mutability commented Nov 23, 2016 •

edited

jasonmbrown commented Nov 24, 2016 •

edited

patternoia commented Nov 24, 2016 •

edited

mutability commented Nov 24, 2016 •

edited

mutability commented Jan 2, 2017 •

edited

mutability commented Jan 23, 2017 via email •

edited