Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve rendering of large viewport areas #7962

Open
wants to merge 2 commits into
base: master
from

Conversation

@ldpl
Copy link
Contributor

ldpl commented Jan 25, 2020

My take on sprite sorter optimization, I estimate it to be about 10x faster than the current one. And while it's still not ideal it's decently fast and I'm tired of this shit anyway :P

It's made to perfectly replicate the current sorting algorithm (including weird reversing of moved sprites) so there shouldn't be any (new) glitches.

Also, there is definitely some room for further optimizations. At the very least map of sets can probably be replaced with some more specialized structure. For example, kd tree should be faster but its current implementation of node removal is very inefficient and needs to be optimized. It's also possible that deviating from the original algorithm can be beneficial to performance (std::sort does it instantly but I couldn't find a good ordering relation).

If someone is interested here is my testing code and over a dozen sorter implementations:
sprite-sorter.zip

@ldpl ldpl force-pushed the ldpl:sprite-sorter-optimization branch from 908e72d to 7d67d62 Jan 25, 2020
@ldpl ldpl force-pushed the ldpl:sprite-sorter-optimization branch from 7d67d62 to 7d7c0ce Jan 26, 2020
@ldpl
Copy link
Contributor Author

ldpl commented Jan 26, 2020

Apparently iterating 10 times more elements with foward_list is still faster than bothering maps. So it's about 20x faster than original now.

@ldpl ldpl force-pushed the ldpl:sprite-sorter-optimization branch from 7d7c0ce to 406958d Jan 26, 2020
@nielsmh
Copy link
Contributor

nielsmh commented Jan 29, 2020

I tried this with one case that broke in the previous attempt at improving the sprite sorter, and did not get any flickering. So far it seems good.

@JGRennison
Copy link
Contributor

JGRennison commented Jan 30, 2020

The ordering value seems to be inconsistently typed between int32 and uint32.
sprite_order is typed as indexed by int32 but next_order and ParentSpriteToDraw::order are typed as uint32 with values starting from UINT32_MAX.

@LordAro
Copy link
Member

LordAro commented Jan 30, 2020

@ldpl You mentioned on IRC that you'd found some other optimisations as well. Have you made any further progress with that, or would you consider it a separate change?

@ldpl
Copy link
Contributor Author

ldpl commented Jan 30, 2020

@LordAro well, I have a faster sorter, I just don't like it xD So I'm still looking for better solutions.

@JGRennison
Copy link
Contributor

JGRennison commented Feb 1, 2020

Have you done any measurements within OpenTTD itself, or just within the test harness in the zip?

I did some simple measurements by means of adding a line to the FPS window for the sprite sorter and got worse performance with this PR than the original algorithm (3x worse in the most demanding case).
The test vector in the zip seems to contain 27940 entries/sprites. This is nearly two orders of magnitude larger than the sprite sorter would typically be expected to do in one go.

@nielsmh
Copy link
Contributor

nielsmh commented Feb 1, 2020

If this does perform better on very large sprite counts, but worse on small (normal) counts, would it be worth to have both the original and this implementation, and select which to use based on number of sprites to sort?

@ldpl
Copy link
Contributor Author

ldpl commented Feb 1, 2020

While we indeed can use some other sprite sorter for smaller cases question is whether it's really an issue. As sprite sorter doesn't always need to be fast, even if it sorts sprites in 3 microseconds instead of 1 it doesn't matter much as long it's fast enough to compose the frame. In fact it's usually called with so few sprites to sort that it raises a question of measuring that performance as, for example, PerformanceTimer for such small times may as well measure random noise (especially on linux). And, btw, that test case I'm using is dumped from some average reddit game. But ofc, it's for extreme case of map unzoom on 4k where you don't need any measurements to spot a difference between 1s and 100ms lag xD

@JGRennison
Copy link
Contributor

JGRennison commented Feb 1, 2020

The number of sprites which will need to be sorted at a time is effectively bounded by the redraw region chunking in ViewportDrawChk. Not having to sort too many sprites at a time is one of the main reasons why the chunking is done that way.
As a result of that, the expected load is a larger number of sorts, of quantities of sprites of 1 - 3 figures.
Optimising for a single sort of a 5 figure number of sprites is not really representative, and makes the new algorithms look artificially better than the original. There isn't any need to add code to handle these sorts of sizes which will never be encountered.

@ldpl
Copy link
Contributor Author

ldpl commented Feb 1, 2020

@JGRennison there is very much need, see #6566, it takes several seconds just to sort sprites after zooming out on 4k.

@ldpl ldpl force-pushed the ldpl:sprite-sorter-optimization branch from 406958d to 71c3da2 Feb 14, 2020
ldpl added a commit to ldpl/OpenTTD that referenced this pull request Feb 14, 2020
ldpl added a commit to ldpl/OpenTTD that referenced this pull request Feb 14, 2020
@ldpl
Copy link
Contributor Author

ldpl commented Feb 14, 2020

Even though #7968 significantly reduced the need for a faster sprite sorter if we have one we can get rid of viewport subdivision completely and further improve fullscreen rendering time on 4k by about 50%.

And while that subdivision was initially added for a different reason it's no longer relevant:

<frosch123> hmm, i wondered what that weird comment for ViewportDrawChk meant... but git blame attributes it to me :/
<frosch123> oh, i just changed the formatting
<frosch123> it's a ludde comment
<frosch123> _dp_: found it. in the past the sprites were not stored in vectors
<frosch123> instead they were linked lists in a statically allocated bugger
<frosch123> *buffer
<frosch123> so the "overflow" refers to dos memory limits
<frosch123> wait what... that buffer was even on the stack
<frosch123> 32k buffer on the stack?
<frosch123> _dp_: april 2008
<frosch123> 4703eb2 removes all the custom-memory-allocator stuff for viewports

And for anyone who wants to mess with sprite sorter in future, I leave this as a source of inspiration, assistance, and a warning xD
sprite-sorter.zip

original     1264.0      search - for, order - quadratic
original_sse 1046.0      search - for, order - quadratic (sse)
opt1         1395.3      search - for, order - for
opt2         1721.5      search - for, order - MemMoveT
opt3         1492.8      search - for, order - for
kdtree1       190.8      search - kdtree, order - for
kdtree2       137.9      search - kdtree, order - vector+deque+map
kdtree3       143.8      search - kdtree, order - vector+deque+map
map1          159.7      x,y - map<set<pair>>; order - set+copy
map2          152.6      x,y - map<set<pair>>; order - set+copy
map3          191.8      x,y - map<set<pair>>; order - set+copy
map4          112.2      x,y - map<set<pair>>; order - vector+map
map4_clean    110.0      x,y - map<set<pair>>; order - vector+stack+map (sse, no debug)
map4_clean2   109.1      x,y - map<set<pair>>; order - vector+stack+map (sse, no debug)
map4_fancy    109.2      x,y - map<set<pair>>; order - vector+stack+map (sse, no debug, comments)
map5         1169.1      x,y - map<set<pair>>; order - vector+stack+map
map6          953.8      x,y - map<set<pair>>; order - vector+stack+map
map7          115.7      x,y - map<multimap>; order - vector+stack+map
map8          106.1      x,y - map<set<pair>>; order - vector+vector
map9          119.5      x,y,z - map<map<set<pair>>>; order - vector+vector
set1          304.8 FAIL x,y,z - set<>; order - vector+vector
list1          50.4      x+y+z - forward_list; order - vector+stack+map
list2          51.0      x+y+z - forward_list; order - vector+stack+map
list2_fancy    48.5      x+y+z - forward_list; order - vector+stack+map
list3          33.8      x+y+z - forward_list; order - vector+vector
list4          34.1 FAIL forward_list+vector
list5          18.4      list3 with moving 1 optimization
list6          20.6      list5 with coordinate normalization
list7          18.0      list5 extra optimized
list7_fancy    15.4      list7 with comments
list8_fancy    13.0      list7 with stack and comments (SSE)
list9_fancy    12.6      list8_fancy with preceding_prev optimization (SSE)
vector1        42.9      x,y,z - vector, order - vector+vector
vector2        34.6      x,y,z - vector, order - vector+vector
listmap1       45.5      list3+map8 (We're gonna combine!)
segtree1       55.4      x,y,z - segtree(min each), order - vector+stack+map
segtree2      359.4      some segtree
segtree3       53.2      some segtree
segtree4       96.1      some segtree
segtree5       54.3      some segtree
segtree6       42.2      some segtree
segtree7       27.1      some segtree
segtree8       25.8      some segtree
segtree9       26.7      x,y,z,order - segtree(min each)
segtree10      23.1      x,y,z - segtree(min each), order - vector+vector
segtree11       9.7 FAIL x,y,z - segtree(min each), order - vector+vector
fenwick1       59.2      x,y,z - fenwick<~list(n,p,u,i,y,s)>, order - vector+stack+map
fenwick2       32.8      x,y,z - fenwick<~list(n,p,u,i)>, order - vector+vector
fenwick3       33.2      x,y,z - fenwick<~forward_list(n,u,i)>, order - vector+vector
qsort           3.7 FAIL qsort by center mass
@ldpl ldpl changed the title Sprite sorter optimization Improve rendering of large viewport areas Feb 14, 2020
@ldpl ldpl force-pushed the ldpl:sprite-sorter-optimization branch from 71c3da2 to 4249529 Feb 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants
You can’t perform that action at this time.