-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make HardLight, Saturation, Fill and FileFinder faster #855
Conversation
Hard Light uses now a lookup table and the code is almost branch free. The table hurts the cache because of it's size but is faster then without cache because hard light requires branching. Saturation saves now one allocation and one OVER blit.
Speed tests on my old Android system (TestGame Tone map): This PR: |
Nice! People are very interested in the engine optimization since the official RPG Maker update, this surely is a good step forward! |
@@ -876,6 +891,22 @@ void Bitmap::ClearRect(Rect const& dst_rect) { | |||
RefreshCallback(); | |||
} | |||
|
|||
// Hard light lookup table mapping source color to destination color | |||
static int hard_light_lookup[256][256]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be using uint8_t for less memory footprint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh, very good find.
Sorry I also added a non-blit operation to the speedups: For zegeris old test case this reduces the parsing time on Windows for me from 0.35 seconds to 0.000287 seconds (!) This needs a test on all systems because readdir doesn't give this guarantee that the field is populated with useful data. At least on the 3DS it doesn't work but I already have a solution by looking at other sourcecode. |
http://stackoverflow.com/a/10376245 and also the first comment and last answer of it is interesting why we do it the way we did before this PR. That said, it does not work under Linux currently. |
@@ -593,8 +595,8 @@ FileFinder::Directory FileFinder::GetDirectoryMembers(const std::string& path, F | |||
#endif | |||
if (name == "." || name == "..") { continue; } | |||
|
|||
bool is_directory = ent->d_type == S_IFDIR; | |||
|
|||
bool is_directory = S_ISDIR(ent->d_type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ent->d_type == DT_DIR
would work for me
I added a fallback mechanism now. So when testing you have to check the log now if you get a debug log about the system not supporting fast dir checks. |
now wii buld finds game and is fast, too |
Great, I have two further, quite simple, optimisations under development that will give even more speed. |
} else { | ||
if (!show_all) { | ||
return; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great little tweak!
…transparency of the image. Will prefer OP_SRC over _OVER when nothing is transparent.
… exists, otherwise with alpha. This saves an extra background graphic draw.
About the new commits: Additionally the background color of the system graphic is not rendered anymore for map scenes. Sprites that would be rendered out of bounds are not rendered anymore. All in all this gives a gigantic frame boost on the 3DS from 35 to 57 FPS on the world map of testgame. |
…Title and is not really worth the FPS gain.
Don't merge yet. Found already an visual issue when the chipset is missing or has an invalid tile and doesn't render ^^ (AB blocks in testgame) |
…ecause the mask doesn't work otherwise.
This and the battle PR are now finished (finally). |
Is your filefinder change tested with games that have nested subdirectories? Otherwise LGTM. |
I didn't try running any game. For this case I only checked zegeris subdirectory benchmark test case and the data structure showed the expected result. Do you know any game that uses this feature? |
I tested now: Chipset under "aaa\basis" is found and transparency of tiles in Embric: Works |
Just checkout testgame 2000 and replace the stuff. |
I tried this PR with the first town of Aedemphia. Improvements are clearly here, congratulation :) ! |
Great, this reflects my basic benchmarks which showed up to +33% in average (based on CPU usage, this doesn't scale linear with the FPS). |
See the commit messages for detailed info.
I also measured hard light with and without lookup table. The lookup is cache unfriendly (64kb) but is faster then direct calculation because Hard light has a 50% branch prediction miss chance per color...
For saturation would be more tricky because RGB are combined here, I "only" saved an alloc and a blit here.
Fill (AddBackground, called once per frame) is also faster now by using OP_SRC which doesn't handle alpha correctly but that doesn't matter in this case.
Branch Pred: https://stackoverflow.com/questions/11227809/why-is-processing-a-sorted-array-faster-than-an-unsorted-array
Fixes #846