Make HardLight, Saturation, Fill and FileFinder faster #855

merged 15 commits into from Apr 5, 2016


None yet

6 participants

Ghabry commented Apr 2, 2016

See the commit messages for detailed info.

I also measured hard light with and without lookup table. The lookup is cache unfriendly (64kb) but is faster then direct calculation because Hard light has a 50% branch prediction miss chance per color...

For saturation would be more tricky because RGB are combined here, I "only" saved an alloc and a blit here.

Fill (AddBackground, called once per frame) is also faster now by using OP_SRC which doesn't handle alpha correctly but that doesn't matter in this case.

Branch Pred:

Fixes #846

Ghabry added some commits Mar 29, 2016
@Ghabry Ghabry Fix Visual Studio project files fffef08
@Ghabry Ghabry Speedup hardlight and saturation blits.
Hard Light uses now a lookup table and the code is almost branch free. The table hurts the cache because of it's size but is faster then without cache because hard light requires branching.
Saturation saves now one allocation and one OVER blit.
@Ghabry Ghabry Speedup Fill and Clear by replacing fill_rectangles with OP_SRC/OP_CL…
…EAR Blit.
Ghabry commented Apr 2, 2016

Speed tests on my old Android system (TestGame Tone map):
Tone: 42 FPS
Sat: 49 FPS
Both: 33 FPS

This PR:
Tone: 60 FPS
Sat: 59-60 FPS
Both: 58 FPS

BlisterB commented Apr 2, 2016

Nice! People are very interested in the engine optimization since the official RPG Maker update, this surely is a good step forward!

@Zegeri Zegeri and 1 other commented on an outdated diff Apr 2, 2016
@@ -876,6 +891,22 @@ void Bitmap::ClearRect(Rect const& dst_rect) {
+// Hard light lookup table mapping source color to destination color
+static int hard_light_lookup[256][256];
Zegeri Apr 2, 2016 Member

Could this be using uint8_t for less memory footprint?

Ghabry Apr 2, 2016 Member

Ugh, very good find.

Ghabry added some commits Apr 2, 2016
@Ghabry Ghabry Reduce size of hard_light_lookup to uint8 1f32370
@Ghabry Ghabry Make FileFinder super fast by getting rid of almost all "stat" calls …
…while parsing.
Ghabry commented Apr 2, 2016

Sorry I also added a non-blit operation to the speedups:
A gigantic performance gain in the FileFinder. readdir already tells you if the file is a directory so extra stat calls are nonsense.

For zegeris old test case this reduces the parsing time on Windows for me from 0.35 seconds to 0.000287 seconds (!)

This needs a test on all systems because readdir doesn't give this guarantee that the field is populated with useful data.
Can somebody test this under Linux? But I assume it works there.
Also Wii needs a test obviously.

At least on the 3DS it doesn't work but I already have a solution by looking at other sourcecode.

@Ghabry Ghabry changed the title from Make HardLight, Saturation and Filll faster to Make HardLight, Saturation, Fill and FileFinder faster Apr 2, 2016
Member and also the first comment and last answer of it is interesting why we do it the way we did before this PR.

That said, it does not work under Linux currently.

@carstene1ns carstene1ns commented on an outdated diff Apr 3, 2016
@@ -593,8 +595,8 @@ FileFinder::Directory FileFinder::GetDirectoryMembers(const std::string& path, F
if (name == "." || name == "..") { continue; }
- bool is_directory = ent->d_type == S_IFDIR;
+ bool is_directory = S_ISDIR(ent->d_type);
carstene1ns Apr 3, 2016 Member

ent->d_type == DT_DIR would work for me

Ghabry commented Apr 3, 2016

I added a fallback mechanism now. So when testing you have to check the log now if you get a debug log about the system not supporting fast dir checks.
Or on the Wii if it takes now 10 minutes or only 10 seconds :P


now wii buld finds game and is fast, too

Ghabry commented Apr 3, 2016

Great, I have two further, quite simple, optimisations under development that will give even more speed.
With them I reach 50 FPS on the old 3DS :)

@carstene1ns carstene1ns commented on an outdated diff Apr 3, 2016
@@ -63,6 +63,10 @@ void MessageOverlay::Draw() {
dirty = true;
+ } else {
+ if (!show_all) {
+ return;
+ }
carstene1ns Apr 3, 2016 Member

great little tweak!

Ghabry added some commits Apr 3, 2016
@Ghabry Ghabry Add a fallback to filefinder to use slow "stat" when d_type is not po…
@Ghabry Ghabry Don't render MessageOverlay when it doesn't have any messages. df9a27f
@Ghabry Ghabry Determine blit operation for bitmap and tilemaps beforehand based on …
…transparency of the image. Will prefer OP_SRC over _OVER when nothing is transparent.
@Ghabry Ghabry Make blitting on the Map smarter. Draw bottom opaque when no panorama…
… exists, otherwise with alpha. This saves an extra background graphic draw.
@Ghabry Ghabry Only draw the system background when it is required (not for Title an…
…d Map scene)
@Ghabry Ghabry Simplify argument list for Sprite::BlitScreen and don't render out of…
… bounds sprites.
Ghabry commented Apr 3, 2016

About the new commits:
When blitting, the code checks now if the src bitmap is completely opaque and will do a OP_SRC instead of OP_OVER here. This also applies to single map tiles, most of them are blitted now via OP_SRC.

Additionally the background color of the system graphic is not rendered anymore for map scenes.

Sprites that would be rendered out of bounds are not rendered anymore.

All in all this gives a gigantic frame boost on the 3DS from 35 to 57 FPS on the world map of testgame.

@Ghabry Ghabry Draw the background color for the title scene because this broke Hide…
…Title and is not really worth the FPS gain.
Ghabry commented Apr 4, 2016

Don't merge yet. Found already an visual issue when the chipset is missing or has an invalid tile and doesn't render ^^ (AB blocks in testgame)

Ghabry added some commits Apr 4, 2016
@Ghabry Ghabry Prevent render glitch when no panorama is rendered and a tile is full…
… transparent
@Ghabry Ghabry GetOperator must take care of the mask. Set MaskedBlit back to OVER b…
…ecause the mask doesn't work otherwise.
@Ghabry Ghabry Work around tone/sat not applied when a sprite was on a bush. Add com…
Ghabry commented Apr 5, 2016

This and the battle PR are now finished (finally).
Expect enjoyable performance on all devices soon :D


Is your filefinder change tested with games that have nested subdirectories? Otherwise LGTM.

Ghabry commented Apr 5, 2016

I didn't try running any game. For this case I only checked zegeris subdirectory benchmark test case and the data structure showed the expected result.

Do you know any game that uses this feature?

Ghabry commented Apr 5, 2016

I tested now: Chipset under "aaa\basis" is found and transparency of tiles in Embric: Works

Ghabry commented Apr 5, 2016

Just checkout testgame 2000 and replace the stuff.

@fdelapena fdelapena merged commit a713ce3 into EasyRPG:master Apr 5, 2016

5 checks passed

Android Build finished.
Linux Build finished.
OSX Build finished.
Windows Build finished.
web Build finished.
@Ghabry Ghabry deleted the Ghabry:fastblit branch Apr 5, 2016
BlisterB commented Apr 6, 2016

I tried this PR with the first town of Aedemphia.
Before : 9-25 fps
Now : 35-45 fps.

Improvements are clearly here, congratulation :) !

Ghabry commented Apr 6, 2016

Great, this reflects my basic benchmarks which showed up to +33% in average (based on CPU usage, this doesn't scale linear with the FPS).
Corner cases with lots of blend and saturation effects and tons of events and picture effects have a hugher gain as you showed ^^

@Ghabry Ghabry modified the milestone: 0.4.2 Apr 11, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment