Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize the layer move processing and GFX render #33

Closed
Wohlstand opened this issue Jan 22, 2021 · 27 comments
Closed

Optimize the layer move processing and GFX render #33

Wohlstand opened this issue Jan 22, 2021 · 27 comments
Assignees
Labels
Biggie The big task that requires more time than usually vanilla bug Something isn't working. The old bug which is reproducing on original VB6-built version of game.
Milestone

Comments

@Wohlstand
Copy link
Collaborator

Wohlstand commented Jan 22, 2021

Since the original game, there is a VERY INEFFECTIVE algorithm of the layer move:
https://github.com/Wohlstand/TheXTech/blob/795b5dc6db3e977622b57844ca5b439bfada9e92/src/layers.cpp#L733-L754
To move one layer (that may contain few blocks only), it does a loop through the entire array of thousands of blocks every cycle!

On modern devices that may don't affect the workflow. On weak devices, it will cause the strong lag and FPS to slow down.

Instead, any per-layer processings must be done through layer member lists which avoids the fetching of the entire block array every cycle. That may require checks on blocks/BGOs/NPCs array modifying that may happen for various reasons.

EDIT: Also, it's need to optimise the GFX rendering to avoid the full scanning through all elements arrays. That would take the use of the sort of a quad-tree class (there is one used at PGE Engine, can be taken for a use). There is a challenge: it's need to go through all cases where objects do change their scene position and size, and then, update them at the quad-tree.

@Wohlstand Wohlstand added the vanilla bug Something isn't working. The old bug which is reproducing on original VB6-built version of game. label Jan 22, 2021
@Wohlstand Wohlstand self-assigned this Jan 22, 2021
@ds-sloth
Copy link
Collaborator

Thank you so much for your work on this project! Your source code is ported very clearly and has been wonderful to work with.

I am working on a 3DS port of the game and have fixed this issue by adding

std::set<int> blocks; std::set<int> BGOs; std::set<int> NPCs; std::set<int> warps; std::set<int> waters;

to Layer_t and syncing each of these whenever they should be changed. (Every time a block, BGO, NPC, warp, water is added, removed, or swapped.) It involved a lot of code changes but speeds up certain levels (on the 3DS) by 2x. I will have the 3DS port uploaded soon (I have made a lot of changes, but it will be nice to bring it back in sync with master eventually).

@Wohlstand
Copy link
Collaborator Author

Wohlstand commented Feb 28, 2021

@ds-sloth, yeah, I think about making the similar thing, but seems you can contribute this while I focused on other things, and so, you'll need to refresh those lists once every individual block or NPC got added/removed/resorted/etc. Also, there's is need a major optimize on the rendering, because here is a weak case when it scans through all element arrays to find visible one, and in some cases when layers with blocks are in move, the block sorting check gets disabled and then the full scan gets every gfx update, that a reason why on some bigger levels my game got lagged while running my Archosn c70 tablet.

I'll review all your changes you'll make once you'll show them 😉

@Wohlstand Wohlstand changed the title Optimize the layer move processing Optimize the layer move processing and GFX render Feb 28, 2021
@ds-sloth
Copy link
Collaborator

It will take me a little while to upload. I am actually re-implementing the editor GUI for 3DS so there is a lot of work I did and still have to do.

I agree that using a quad-tree will be very good but that is beyond my programming skill. :)

@Wohlstand
Copy link
Collaborator Author

Wohlstand commented Mar 1, 2021

I agree that using a quad-tree will be very good but that is beyond my programming skill. :)

As I told you, the quad-tree implementation is already at PGE Engine, and it needs to backport it. It should register all known objects inside self, and when they change their state, they should be updated at the quad-tree by the update() call. There is the loose-quad-tree implementation which is a template that takes the object type and callbacks needed to find objects. So, no need to re-invent the bicycle :)

It will take me a little while to upload. I am actually re-implementing the editor GUI for 3DS so there is a lot of work I did and still have to do.

I still keep in my mind some sort of the "Nostalgic Editor", and the first thing I would use the ImGUI library over SDL2. You will don't need to use any heavy GUI frameworks like Qt or WxWidgets, and you don't need to make any sort of platform-dependent solutions like WinAPI, GTK, or Cocoa. The ImGUI is also used for various embedded solutions, one of them was a... robot!

P.S> The library itself is here: https://github.com/ocornut/imgui

Keep the note, TheXTech does use the SDL Render API to be simple and fully platform-independent, so it should work even on a smart kettle that has no OpenGL on board, etc.

@ds-sloth
Copy link
Collaborator

ds-sloth commented Mar 2, 2021

What if we made one quad tree per layer? For bricks, I don't think they ever move except when their layer moves so this could save a lot of processing power. (This would also save processing time for the move layer function.) The game uses absolute positions but I think it should be pretty possible to switch it to positions relative to layer.....

Can you link your quad tree code? I don't have much time these days but would like to take a look and learn what I can so I could include it someday.

I made a silly GUI toolset but it works. Very minimal and hard to maintain. It just uses frm main render rect to simulate buttons. But it works well on the 3DS which has a 240p screen (!). One could definitely replace it on PC but imgui seems to depend on a 3D rendering toolkit and I don't think it supports the 3DS's.

@Wohlstand
Copy link
Collaborator Author

What if we made one quad tree per layer? For bricks, I don't think they ever move except when their layer moves so this could save a lot of processing power. (This would also save processing time for the move layer function.) The game uses absolute positions but I think it should be pretty possible to switch it to positions relative to layer.....

That will require the recursive search and the computation of the coordinates from local into global, I already did this at PGE Engine where every layer has its own quad-tree subtree. That adds the necessary to make a more tricky way on physics. At least, here the optimization should be applied with the few changes as possible to guarantee the full compatibility of physics workflow. In the case of TheXTech, adding the trees-per-layer will require a lot of changes in physics computation that will require the two-way recursive coordinates conversion.

Can you link your quad tree code? I don't have much time these days but would like to take a look and learn what I can so I could include it someday.

@Wohlstand
Copy link
Collaborator Author

Wohlstand commented Apr 2, 2021

@ds-sloth, hello!
Try to review my world map code now, recently I added the trees.cpp code that used the QuadTree thing at me. I made the working search of elements for the world map. That was easy. However, to optimize the level, I will need a more complex solution:

  • Every element should have the own Z-value and Index fields remembered
  • Every element should be recognized as the same type (probably I'll try to turn every level structure into the class with the inheritance and the polymorphism to unify the thing). So, I will need to search all level elements as the same query (with the ability to filter by type to query elements of the specific sub-type).

That should resolve the problem of the Dr.Pepper Pyramid (at the "2k15 Summer Takeover" episode) where the result lags on my Orange Pi PC with the Mali400 processor because this damn querying 12k elements in one frame, and calls the same 12k elements for every playable character and every active NPC.

That would be tricky because there are some queries were depends on the in-array order, one example is when the player stomps the SMW Koopa, it spawns the beech-Koopa at the shell. I accidentally made a bug when the just spawned beech Koopa got immediately stomped by the player. I fixed that when I found the mistake in the way of the array loop.

The cases I should pay:

  • Every position&size change of the element must call the update() at the tree
  • Despawned NPC should be removed from the tree (at least there is only one place where that happens - the kill NPC call)
  • Spawned NPC should be also added into the tree (there are many places where NPCs got spawned)
  • Some NPCs to accept the block physics doing the mean trick - they do spawn the block that gets followed the NPC move, and by the fact, the player finds the block in the scene instead of NPC, damn
  • Moving layers should be optimized too - every layer member must be listed in the array or the ring buffer.
  • I must check on how SLOWER will move the huge layer move when using the quad-tree indexing: that means every object move will cause the quadtree update call. I bet, it's much faster than with the R-Tree case, I had to tested that at ItemScene by moving of the 30k elements selection group, it got lagged much slower than I did the same with the R-Tree being used.

@ds-sloth
Copy link
Collaborator

The world map code looks great. I haven't merged it into my 3DS build yet because I am trying to rebase my 3DS build on your current code and integrate the two branches. But it seems like it will speed things up a lot. Without your optimizations, the world map runs at 60fps on the New 3DS but only at 30fps on the Old 3DS.

As you make these optimizations, please keep in mind to separate the different classes of NPCs / BGOs in your quadtree so that we do not need to loop over the full list three/four times and then check whether the ID is in the right set. (For BGOs this is solved in the original code by sorting the BGOs by class, but that doesn't work in the Level Editor and we could make it smarter with the quadtree representation.)

@Wohlstand
Copy link
Collaborator Author

You can do your merge gradually, and some good news for you as I don't plan any major changes on the sooner time except the quad-tree optimization for levels as I really wish to solve the DrPeeper problem on my damned ARM board.

please keep in mind to separate the different classes of NPCs / BGOs in your quadtree

At the world map, I made the own quad-tree for every type of object, I guess, to simplify the process I'll make the same for level objects. Mainly, because the game does per-type searches. The only case when I will need to mix up all objects is the rendering system, otherwise, do keep the same Z-groups as vanilla does, per type, and the resulting Z-value.

@ds-sloth
Copy link
Collaborator

Great, thank you.
Your quadtree code looks great, so I am very excited to see you implement them on the levels. I have seen a few levels run very slowly on the 3DS, and I made a layers optimization there, but this should help even more.

@Wohlstand
Copy link
Collaborator Author

Wohlstand commented May 12, 2021

The source of slow levels is the "2k12 - Summer Takeover" episode, there are next levels confirmed as slow:

  • Intro tower
  • Dr. Peeper pyramid (this level gave the name to this lag problem)

Others I don't know as I didn't reach them yet on my ARM board. All these levels have a lot of blocks and moving layers by X.

The fact, another reason to lag:

  • if level has long vertical section filled of blocks, all blocks by Y line will be scanned, that will result a lag too 😼

@Wohlstand
Copy link
Collaborator Author

Wohlstand commented May 12, 2021

btw, the "Dungeon of Pain" at "The Invasion" also lags on my Archos 70c tablet. The "The Night" at "Another Princess Cliche" gets lagged too on my tablet.

@Wohlstand
Copy link
Collaborator Author

Redigit had to make the one-dimensional tile map, that allows finding a few blocks by X-axis, however, if you do have a lot of blocks by Y-axis, you will lag, even no moving layers will be on the section, he-he 😼 Mainly because EVERY NPC will query for these blocks.

@Wohlstand
Copy link
Collaborator Author

Wohlstand commented May 12, 2021

So, the other argument to make the quad-tree instead of this mess, then, no matter which dimension is, all should work quickly, even the section will contain a million objects!

@Wohlstand
Copy link
Collaborator Author

Btw, there are two features I do plan to make in the sooner time:

  • Update the backgrounds engine (and backporting the functionality from the PGE Engine to allow multi-layer backgrounds in the same way as PGE Engine can)
  • Backport the font engine from the PGE Engine to allow print texts with other than ASCII characters (and later, implement the episodes harmless localization files system to allow making episode translations without making any modifies to any files of the episode itself)

@ds-sloth
Copy link
Collaborator

Those both sound great.
If you want to be very nice to me, try to not add any more dependencies on SDL. :)

@Wohlstand
Copy link
Collaborator Author

If you want to be very nice to me, try to not add any more dependencies on SDL. :)

The raw FreeType I would add (would it will harm you or not?)

I do use it directly, I even don't use fontconfig, I do use only fonts placed into the config pack's directory at PGE Engine (and will be additional "fonts" directory at game assets that will add the support for new fonts). I will keep all fallback to:

  • Have FreeType be optional, I.e. the support of extended raster fonts, I do already use for Cyrillic and European tables, but without TTF usage that will make no way to print letters absent around raster maps, that will be a pain for Chinese, I took the good Chinese font by YaveYu's suggestion and it prints perfectly.
  • If no font maps given in PGE format, the built-in vanilla font engine will still work for ASCII-only print support.

As I keep the full backward compatibility to old resource packs, especially because several users do a lot of mods over them, and they will have the pain to synchronize everything. But, to simplify some cases, a good idea to provide the sort of resoure packs patches to add new resources and allow new features used with them.

@ds-sloth
Copy link
Collaborator

FreeType could be difficult because I am using citro3d/citro2d (3DS specific rendering library) due to compatibility issues with SDL on 3DS. The 3DS stores textures in a non-linear memory layout so it might be non-trivial to render a FreeType font to it but I am sure I will be able to.

@Wohlstand
Copy link
Collaborator Author

Wohlstand commented May 12, 2021

Don't worry, I do a manual conversion of every used glyph into the raster fragment that I do pass into the renderer as the same sort of a texture 😉

@Wohlstand
Copy link
Collaborator Author

Wohlstand commented May 12, 2021

You can see how they work at PGE Engine already:
https://github.com/WohlSoft/PGE-Project/blob/master/Engine/fontman/ttf_font.cpp

The loadGlyph() call actually loads the glyph, converts it into an RGBA image, then bypasses it into the renderer. Don't scary for GL_RGBA enumerators being used, when I backport this into TheXTech, I will simplify this into usage of frmMain's calls to work with textures.

@Wohlstand
Copy link
Collaborator Author

So, what do you think on this?

@ds-sloth
Copy link
Collaborator

Yes, I think I would just have to make a wrapper to convert the glyph into the 3DS texture format instead of the GL texture format. Not sure how much work that would be but it should definitely be possible. Thanks for the link. Also, just spent some time making Background2s work at 1280x720 and I can see why you want to rewrite a lot of that code.

@Wohlstand Wohlstand pinned this issue May 18, 2021
@Wohlstand
Copy link
Collaborator Author

@ds-sloth, I finally tested your branch on my ARM board, and it REALLY works much better, the DrPeeper level, so, look on the stats:

  • Before the quad-tree (my current branch):
    Scr_2021-05-30_04-39-35

  • With your quad-tree (your WIP branch):
    Scr_2021-05-30_05-35-21

However, if I toggle the SNES resolution, it renders with 65 FPS fine as less textures gets appear. It's the debug build without optimizations, yeah. Later I'll try to build the release build and try this again. Anyway, GPU (Mali4000) that I have here on the board, is weak by itself.

@Wohlstand Wohlstand added this to the Version 1.3.6 milestone Jun 5, 2021
@Wohlstand Wohlstand added the Biggie The big task that requires more time than usually label Jun 23, 2021
@ds-sloth ds-sloth self-assigned this Jul 8, 2021
@ds-sloth
Copy link
Collaborator

ds-sloth commented Jul 8, 2021

Status update for anyone following this: block quadtree has been merged into master.

It makes block logic and rendering MUCH faster in large levels, vertical levels, and levels with layer movement. Makes block logic somewhat slower in "normal" horizontal levels.

Wohlstand and I will probably want to continue thinking together about optimization in order to recover this performance before closing this issue.

@Wohlstand
Copy link
Collaborator Author

Okay so, I gonna manually backport this update into the stable 1.3.5.x branch as this is a very important optimization that should appear at the next release.

@ds-sloth
Copy link
Collaborator

Adding this to 1.3.5.x will not be trivial because some of the changes are tied to previous optimizations I had made.

I believe you will want to cherry-pick commits 606fa4c, d030913, 8a75b78, 02f927f, 18a2af7, f6b2874, and 207f4e3. Maybe more. Let me know if you have any questions and I'll be happy to help.

@Wohlstand
Copy link
Collaborator Author

Okay, the rest of this work was done and successfully merged into the mainstream. So, we can close this. Anything for the future (making quadtree for BGOs, warps, physical environment zones, etc.) should be done separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Biggie The big task that requires more time than usually vanilla bug Something isn't working. The old bug which is reproducing on original VB6-built version of game.
Projects
None yet
Development

No branches or pull requests

2 participants