New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RDY] Use SDL_gpu library for faster SDL rendering #19412
Conversation
Mutability can you link the compiled executable for windows builds? Im willing to test and let ya know the benchmark differences I get compared to normal build. (Im running an Intel HD 3000 Series gpu) Im in the middle of setting up cross compiling (linux -> win) so If you cant, Il test it once I get all my cross compile setup done. |
Looks interesting, but is there a way to build with Windows (not mxe)? |
Presumably it can be built for Windows via the other mechanisms, but I don't have those set up so I don't know the details. You need to (a) build SDL_gpu (which is cmake-based and I believe is set up to build with VS) and (b) add the resulting library to the set of libraries that C:DDA links. (If you do work out the details, a patch to COMPILING.md would be greatly appreciated!) |
mxe build is here: (edit: outdated, please use the build below) I checked the curses build works OK, but the tiles version will not start in my VM (just hangs with a black window at the point where it tries to initialize opengl); I don't know whether than is a VM problem or a build problem. If you get a window that pops up then immediately disappears, check the debug log in config/, any SDL_gpu init errors are logged there. |
Resurrected my real-hardware Windows install and the tiles version runs OK there, so it was just a VM problem. |
Comparisons Were Run on Clean Worlds and Saves tested using the debug Draw Benchmark command SDL_GPU build: Comparison Build: Software Render: Results: GPU_SDL Fixed Software Rendering Black lines on tiles (in windows). Notes: |
Was that with a tileset, or with "use tiles"=False? |
Here's my benchmark results:
(Intel HD Graphics 4000; Nvidia GeForce GTX 670M; CPU is a i7-3610QM) |
There are some problems displaying wide characters:
|
I think all of those are manifestations of the same problem: only half of the background of a double-width glyph is drawn. |
Try that, I don't have a suitable font to test with here so it was all placeholders but it did seem to handle the double-width background correctly. |
Updated windows builds with the wide-character fix and a faster minimap: https://www.dropbox.com/s/ici4v4ih3dan0k9/cdda-sdlgpu-win-5dc16ff6.zip?dl=0 (full package) |
(Intel HD Graphics 4000, i7-3610QM) |
Muta my benchmarks were done using chesthole32 tileset, and your linked bin from above. |
Nice, now work perfectly.
MacBook Air, macOS Sierra 10.12.1, Intel HD Graphics 6000
With minimap on: 74 FPS
Without minimap: 93.5 FPS
|
@jasonmbrown can you check the updated build in my recent comment above and see if that helps performance on your system with the minimap on? |
This fixes a drawing bug I've been having on my laptop with Intel integrated graphics card. (Weird black lines between sprites) If also delivers on performance which I haven't had time to test, this seems great. |
That's great! The bug has been plaguing quite a number of players and no one could figure out what it was... |
@mutability I tested again with the more recent build you linked. ran 5 benchmarks got between 30fps and 34fps with the minimap and between 50fps and 60fps with it turned off. But All around it looks and feels alot smoother. Im really looking forwards to this getting merged |
Great work @mutability ! |
I can't build on macOS (or Visual Studio, or native mingw/msys) so patches to the Makefile and/or COMPILING.md would be appreciated |
@BrettDong since you are already into this PR, could you provide us such a patch? |
Except for build stuff I think this is ready to go. A word of caution: the software rendering option no longer does anything. The tiles version will only work if you have OpenGL 2 or higher available. If we want to retain support for software rendering in some form, I have some work in progress to restore software rendering by adding a SDL_gpu renderer that maps back to SDL_render's software renderer, but that's not ready yet and I'd rather handle that that as a separate PR. |
Do you mean an abstract renderer factory class which allows to switch between (and instantiate a proper instance of a) software renderer, accelerated SDL2 renderer or new SDL2_gpu? |
No, hooking in another abstraction layer like that is pretty annoying. I mean everything uses SDL_gpu, and providing a SDL_gpu renderer that implements the SDL_gpu calls in terms of SDL_render. It's not very efficient but it works and it means we don't have to juggle two implementations of the main code. (That is, the software renderer calls GPU_RegisterRenderer to register itself with SDL_gpu, and provides a GPU_RendererImpl populated with functions that end up calling SDL_Render) |
Sorry I build CDDA with homebrewed SDL in /usr/local/include &
/usr/local/lib, and I totaly have no experience with framework SDL and
no idea on how to update Makefile to build with framework SDL2_gpu. Do
you know how to build?
在 2016/11/25 上午12:43, patternoia 写道:
…
Great work @mutability <https://github.com/mutability> !
I need to warn you that if merged, this will break the macOS builds,
because they are built with FRAMEWORK=1.
|sdl-gpu| already supports building the |SDL2_gpu.framework| if build
with |cmake -G "Unix Makefiles"|.
You would need to update the |Makefile| accordingly to link against
this framework like it is now done for regular |SDL2.framework|
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#19412 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AUGWLuq3DMeyRPLdTbpu9wmcO8qugDzQks5rBb6sgaJpZM4K6BoA>.
|
I'll then provide the patch for frameworks next week if no one will do it faster. |
Seeing something odd with the alpha channel on semitransparent sprites - most obvious with gas fields (bloated zombie) - my current guess is that we're doing everything right but the driver is picking a 1-bit alpha channel format or something silly, so it's either fully transparent or fully opaque. Trying some SDL_gpu changes to tell it to use 8-bit alpha. |
Unfortunately, I am unlikely to do any of that. If you want to eliminate the development-time requirement to build SDL_gpu (there is no runtime requirement) then you could include a copy of it in the source tree as is done for some other code. If you want to remove the use of SDL_gpu and write your own tile batcher, that'd be great, but that's not something I will be doing. I don't have the necessary opengl knowledge and I don't really want to acquire it right now; and opengl is notoriously picky to get right across lots of systems. The reason I picked SDL_gpu is exactly so that we didn't need to rediscover that knowledge. If you want to organize some more widespread testing, that'd be great, but it's not something I can really do. I am very, very unlikely to split this PR into multiple PRs given the past history of trying to get anything merged into this project. I would guess that it would take 6 months or more of wrangling PRs to get that done. I already have other changes that require this PR to be committed that are on hold. I took care to split the changes into logically separate commits that should individually work - you should review those. I can't do much about it if you just don't understand the changes. If some areas need more comments, let me know where. |
The software rendering branch linked above basically works except for some tile alignment glitches that are looking like a SDL_render bug. I would want everything else sorted out before spending more time on that. |
This is only the necessary changes to get it working, there are no speedups here.
The submap images are individually small (typically 36x36) and making them subregions of a single larger texture makes the final series of blits, to assemble the final display from from the submap images, much faster as they can be batched.
sensible include paths
I plan to reimplement this in a future PR, so this just hides the option while retaining the infrastructure.
Rebased to current master, waiting on a merge or feedback to my comment above. Updated, untested windows build: https://www.dropbox.com/s/fovgy4bv347g2c5/cdda-sdlgpu-win-922a5cfa.zip?dl=0 |
There's no feedback to give, "these things must happen for this to be merged", "I won't do those things". The obvious result is no merge. |
Please do me the courtesy of actually reading and responding to my comment above; I made several suggestions about how to make progress on this issue. |
I had no intention of being rude, I took this:
Unfortunately, I am unlikely to do any of that.
To mean your post was suggestions for other people to take up, not
yourself, and that no detailed response was needed. Sorry for
misunderstanding.
Importing the SDL_gpu source partially addresses the dependency issue, but
is a poor practice and also adds a hard build dependency on cmake, which is
unacceptable. On the off chance you mean including a SDL_GPU binary,
that's simply not possible due to our variety of build targets and shifting
compiler options.
A native tile batching solution would be independent of this pr so is
irrelevant to it staying open.
I have no interest in running a campaign to determine whether this change
does what I expect it to by failing to load in the face of poor hardware
support. If I were to start working on this change, it would be to
fragment it into manegable chunks that don't disable the existing
functionality. I have no intention of doing this in the near future.
As for pr organization, I want you to break it up and comment it more
before I review it, you want me to review it and tell you what needs
improvement, it's an impasse.
|
Well, I tried. I just don't see how multiple PRs would work, TBH: each PR would include all changes of the dependent PRs and it would become a rebase nightmare. I have some time to spare to work on this, but not that much time! Like you, I have no interest in running a testing campaign. I'll just rebase this against master periodically and put up new builds and hope it gets some organic testing. |
I can do a naive decomposition into half a dozen PRs if you think that's really a useful thing - one commit per PR. |
I'm not interested in discussing this unless there's a solution available for both of the major issues.
|
(1) I can do - see the branch I linked above. |
For (2) would you accept "include a copy of SDL_gpu"? (The cmake dependency is easy to break). I would prefer "include a copy" over "two implementations", as if we have two parallel implementations only one of which is actually built in the CI builds, then (a) the pure SDL implementation is going to rot and (b) we cannot do things that SDL_gpu makes possible but which are not easy to do in vanilla SDL (e.g. premultiplied alpha) |
Rebased, updated windows builds (untested): https://www.dropbox.com/s/scco5axx4blwkfn/cdda-sdlgpu-win-31d99bde.zip?dl=0 @kevingranade: Would you accept "include a copy of SDL_gpu"? |
|
So is importing sdlgpu and removing the cmake dependency something that you
would accept or not? I still don't know! I asked the question because your
comment was ambiguous, just quoting it back at me does not help me make any
progress here.
(Edit: What is unclear is: did you mean "the hard build dependency on cmake is unacceptable", or "the whole concept of importing sdlgpu is unacceptable"?)
I am not being obtuse here, I just do not want to start down that path only
to find out later that it was not something you would accept.
You are making it very difficult to contribute code to the project.
|
On Jan 23, 2017 3:43 AM, "Oliver Jowett" <notifications@github.com> wrote:
So is importing sdlgpu and removing the cmake dependency something that you
would accept or not?
As stated, no. Importing code from another project is pretty iffy already,
you're suggesting making a fork of the project, which means we would have
to keep updating this new build system if we want to keep in sync with
upstream.
If you convince upstream to drop the cmake dependency, that'd be fine, but
it's also incredibly unlikely.
You are making it very difficult to contribute code to the project.
I'm making it difficult to take a new, HARD dependency on a graphics
library I've never heard of before, in a project that has been
intentionally and explicitly dependency averse for its entire history.
|
I mean you are making it organizationally hard. If I go ahead and do my own design, it gets vetoed after the fact. If I ask for input, it takes a month and quite a bit of fighting to get even a minimal response. I don't think we are going to reach agreement on this particular issue, since the changes do fundamentally require sdlgpu, but you don't want the dependency. I don't see maintaining two copies of the graphics code as practical. So this PR is effectively dead. I will maintain the changes in my own repo. |
Fixes #19071
This changes all the SDL code from using the SDL_render interface to using SDL_gpu (https://github.com/grimfang4/sdl-gpu), which is a separate library that provides a rendering abstraction on top of SDL. This obviously adds a new build dependency for the library, and it also currently requires support for OpenGL 2+ or better. That should be very common these days, anything with any sort of 3D acceleration will have OpenGL 2+ support.
SDL_gpu can be much faster if you are careful about how you arrange your blitting. In particular if you do many blits from the same texture to the same target, it will batch those together, which produces a huge speedup.
This PR first does the obvious changes to use SDL_gpu, and then implements a class that maintains an atlas texture that gathers up all the tiles in a tileset, or all the glyphs in a font, onto a single texture to take advantage of the blitting speedup.
Benchmarked here on a discrete Nvidia GTX 670M, and on the integrated graphics (Intel HD Graphics 4000). The SDL_gpu code is faster in all cases except one. The exception is when using tiles, maximum zoomed in, minimap on; the minimap code is currently somewhat slower in the SDL_gpu version and that cost dominates that case. The difference there is pretty small, 109fps vs 102fps.
In all other case the SDL_gpu code ranges from moderately faster with tiles (48fps vs 29fps for maximum-zoom-out, no minimap) to dramatically faster for "SDL ASCII" (52fps vs 5fps, a 10x speedup)
Tested on Linux (Ubuntu 16.04) only. I have done a compile test for WIndows with MXE but I can't test the result as the Windows VM I have available doesn't have suitable opengl acceleration. @BrettDong tested OS X in #19071.
This needs testing on Windows and on a wider variety of video cards.