New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make HardLight, Saturation, Fill and FileFinder faster #855

Merged
merged 15 commits into from Apr 5, 2016

Conversation

Projects
None yet
6 participants
@Ghabry
Member

Ghabry commented Apr 2, 2016

See the commit messages for detailed info.

I also measured hard light with and without lookup table. The lookup is cache unfriendly (64kb) but is faster then direct calculation because Hard light has a 50% branch prediction miss chance per color...

For saturation would be more tricky because RGB are combined here, I "only" saved an alloc and a blit here.

Fill (AddBackground, called once per frame) is also faster now by using OP_SRC which doesn't handle alpha correctly but that doesn't matter in this case.

Branch Pred: https://stackoverflow.com/questions/11227809/why-is-processing-a-sorted-array-faster-than-an-unsorted-array

Fixes #846

Ghabry added some commits Mar 29, 2016

Speedup hardlight and saturation blits.
Hard Light uses now a lookup table and the code is almost branch free. The table hurts the cache because of it's size but is faster then without cache because hard light requires branching.
Saturation saves now one allocation and one OVER blit.
@Ghabry

This comment has been minimized.

Show comment
Hide comment
@Ghabry

Ghabry Apr 2, 2016

Member

Speed tests on my old Android system (TestGame Tone map):
Tone: 42 FPS
Sat: 49 FPS
Both: 33 FPS

This PR:
Tone: 60 FPS
Sat: 59-60 FPS
Both: 58 FPS

Member

Ghabry commented Apr 2, 2016

Speed tests on my old Android system (TestGame Tone map):
Tone: 42 FPS
Sat: 49 FPS
Both: 33 FPS

This PR:
Tone: 60 FPS
Sat: 59-60 FPS
Both: 58 FPS

@BlisterB

This comment has been minimized.

Show comment
Hide comment
@BlisterB

BlisterB Apr 2, 2016

Member

Nice! People are very interested in the engine optimization since the official RPG Maker update, this surely is a good step forward!

Member

BlisterB commented Apr 2, 2016

Nice! People are very interested in the engine optimization since the official RPG Maker update, this surely is a good step forward!

Show outdated Hide outdated src/bitmap.cpp
@@ -876,6 +891,22 @@ void Bitmap::ClearRect(Rect const& dst_rect) {
RefreshCallback();
}
// Hard light lookup table mapping source color to destination color
static int hard_light_lookup[256][256];

This comment has been minimized.

@Zegeri

Zegeri Apr 2, 2016

Member

Could this be using uint8_t for less memory footprint?

@Zegeri

Zegeri Apr 2, 2016

Member

Could this be using uint8_t for less memory footprint?

This comment has been minimized.

@Ghabry

Ghabry Apr 2, 2016

Member

Ugh, very good find.

@Ghabry

Ghabry Apr 2, 2016

Member

Ugh, very good find.

@Ghabry

This comment has been minimized.

Show comment
Hide comment
@Ghabry

Ghabry Apr 2, 2016

Member

Sorry I also added a non-blit operation to the speedups:
A gigantic performance gain in the FileFinder. readdir already tells you if the file is a directory so extra stat calls are nonsense.

For zegeris old test case this reduces the parsing time on Windows for me from 0.35 seconds to 0.000287 seconds (!)

This needs a test on all systems because readdir doesn't give this guarantee that the field is populated with useful data.
Can somebody test this under Linux? But I assume it works there.
Also Wii needs a test obviously.

At least on the 3DS it doesn't work but I already have a solution by looking at other sourcecode.

Member

Ghabry commented Apr 2, 2016

Sorry I also added a non-blit operation to the speedups:
A gigantic performance gain in the FileFinder. readdir already tells you if the file is a directory so extra stat calls are nonsense.

For zegeris old test case this reduces the parsing time on Windows for me from 0.35 seconds to 0.000287 seconds (!)

This needs a test on all systems because readdir doesn't give this guarantee that the field is populated with useful data.
Can somebody test this under Linux? But I assume it works there.
Also Wii needs a test obviously.

At least on the 3DS it doesn't work but I already have a solution by looking at other sourcecode.

@Ghabry Ghabry changed the title from Make HardLight, Saturation and Filll faster to Make HardLight, Saturation, Fill and FileFinder faster Apr 2, 2016

@carstene1ns

This comment has been minimized.

Show comment
Hide comment
@carstene1ns

carstene1ns Apr 3, 2016

Member

http://stackoverflow.com/a/10376245 and also the first comment and last answer of it is interesting why we do it the way we did before this PR.


That said, it does not work under Linux currently.

Member

carstene1ns commented Apr 3, 2016

http://stackoverflow.com/a/10376245 and also the first comment and last answer of it is interesting why we do it the way we did before this PR.


That said, it does not work under Linux currently.

Show outdated Hide outdated src/filefinder.cpp
@@ -593,8 +595,8 @@ FileFinder::Directory FileFinder::GetDirectoryMembers(const std::string& path, F
#endif
if (name == "." || name == "..") { continue; }
bool is_directory = ent->d_type == S_IFDIR;
bool is_directory = S_ISDIR(ent->d_type);

This comment has been minimized.

@carstene1ns

carstene1ns Apr 3, 2016

Member

ent->d_type == DT_DIR would work for me

@carstene1ns

carstene1ns Apr 3, 2016

Member

ent->d_type == DT_DIR would work for me

@Ghabry

This comment has been minimized.

Show comment
Hide comment
@Ghabry

Ghabry Apr 3, 2016

Member

I added a fallback mechanism now. So when testing you have to check the log now if you get a debug log about the system not supporting fast dir checks.
Or on the Wii if it takes now 10 minutes or only 10 seconds :P

Member

Ghabry commented Apr 3, 2016

I added a fallback mechanism now. So when testing you have to check the log now if you get a debug log about the system not supporting fast dir checks.
Or on the Wii if it takes now 10 minutes or only 10 seconds :P

@Tondorian

This comment has been minimized.

Show comment
Hide comment
@Tondorian

Tondorian Apr 3, 2016

Member

now wii buld finds game and is fast, too

Member

Tondorian commented Apr 3, 2016

now wii buld finds game and is fast, too

@Ghabry

This comment has been minimized.

Show comment
Hide comment
@Ghabry

Ghabry Apr 3, 2016

Member

Great, I have two further, quite simple, optimisations under development that will give even more speed.
With them I reach 50 FPS on the old 3DS :)

Member

Ghabry commented Apr 3, 2016

Great, I have two further, quite simple, optimisations under development that will give even more speed.
With them I reach 50 FPS on the old 3DS :)

Show outdated Hide outdated src/message_overlay.cpp
} else {
if (!show_all) {
return;
}
}

This comment has been minimized.

@carstene1ns

carstene1ns Apr 3, 2016

Member

great little tweak!

@carstene1ns

carstene1ns Apr 3, 2016

Member

great little tweak!

@Ghabry

This comment has been minimized.

Show comment
Hide comment
@Ghabry

Ghabry Apr 3, 2016

Member

About the new commits:
When blitting, the code checks now if the src bitmap is completely opaque and will do a OP_SRC instead of OP_OVER here. This also applies to single map tiles, most of them are blitted now via OP_SRC.

Additionally the background color of the system graphic is not rendered anymore for map scenes.

Sprites that would be rendered out of bounds are not rendered anymore.

All in all this gives a gigantic frame boost on the 3DS from 35 to 57 FPS on the world map of testgame.

Member

Ghabry commented Apr 3, 2016

About the new commits:
When blitting, the code checks now if the src bitmap is completely opaque and will do a OP_SRC instead of OP_OVER here. This also applies to single map tiles, most of them are blitted now via OP_SRC.

Additionally the background color of the system graphic is not rendered anymore for map scenes.

Sprites that would be rendered out of bounds are not rendered anymore.

All in all this gives a gigantic frame boost on the 3DS from 35 to 57 FPS on the world map of testgame.

Draw the background color for the title scene because this broke Hide…
…Title and is not really worth the FPS gain.
@Ghabry

This comment has been minimized.

Show comment
Hide comment
@Ghabry

Ghabry Apr 4, 2016

Member

Don't merge yet. Found already an visual issue when the chipset is missing or has an invalid tile and doesn't render ^^ (AB blocks in testgame)

Member

Ghabry commented Apr 4, 2016

Don't merge yet. Found already an visual issue when the chipset is missing or has an invalid tile and doesn't render ^^ (AB blocks in testgame)

@Ghabry

This comment has been minimized.

Show comment
Hide comment
@Ghabry

Ghabry Apr 5, 2016

Member

This and the battle PR are now finished (finally).
Expect enjoyable performance on all devices soon :D
lgtm?

Member

Ghabry commented Apr 5, 2016

This and the battle PR are now finished (finally).
Expect enjoyable performance on all devices soon :D
lgtm?

@carstene1ns

This comment has been minimized.

Show comment
Hide comment
@carstene1ns

carstene1ns Apr 5, 2016

Member

Is your filefinder change tested with games that have nested subdirectories? Otherwise LGTM.

Member

carstene1ns commented Apr 5, 2016

Is your filefinder change tested with games that have nested subdirectories? Otherwise LGTM.

@Ghabry

This comment has been minimized.

Show comment
Hide comment
@Ghabry

Ghabry Apr 5, 2016

Member

I didn't try running any game. For this case I only checked zegeris subdirectory benchmark test case and the data structure showed the expected result.

Do you know any game that uses this feature?

Member

Ghabry commented Apr 5, 2016

I didn't try running any game. For this case I only checked zegeris subdirectory benchmark test case and the data structure showed the expected result.

Do you know any game that uses this feature?

@Ghabry

This comment has been minimized.

Show comment
Hide comment
@Ghabry

Ghabry Apr 5, 2016

Member

I tested now: Chipset under "aaa\basis" is found and transparency of tiles in Embric: Works

Member

Ghabry commented Apr 5, 2016

I tested now: Chipset under "aaa\basis" is found and transparency of tiles in Embric: Works

@Ghabry

This comment has been minimized.

Show comment
Hide comment
@Ghabry

Ghabry Apr 5, 2016

Member

RPG_RT.zip

Just checkout testgame 2000 and replace the stuff.

Member

Ghabry commented Apr 5, 2016

RPG_RT.zip

Just checkout testgame 2000 and replace the stuff.

@fdelapena fdelapena merged commit a713ce3 into EasyRPG:master Apr 5, 2016

5 checks passed

Android Build finished.
Details
Linux Build finished.
Details
OSX Build finished.
Details
Windows Build finished.
Details
web Build finished.
Details

@Ghabry Ghabry deleted the Ghabry:fastblit branch Apr 5, 2016

@BlisterB

This comment has been minimized.

Show comment
Hide comment
@BlisterB

BlisterB Apr 6, 2016

Member

I tried this PR with the first town of Aedemphia.
Before : 9-25 fps
Now : 35-45 fps.

Improvements are clearly here, congratulation :) !

Member

BlisterB commented Apr 6, 2016

I tried this PR with the first town of Aedemphia.
Before : 9-25 fps
Now : 35-45 fps.

Improvements are clearly here, congratulation :) !

@Ghabry

This comment has been minimized.

Show comment
Hide comment
@Ghabry

Ghabry Apr 6, 2016

Member

Great, this reflects my basic benchmarks which showed up to +33% in average (based on CPU usage, this doesn't scale linear with the FPS).
Corner cases with lots of blend and saturation effects and tons of events and picture effects have a hugher gain as you showed ^^

Member

Ghabry commented Apr 6, 2016

Great, this reflects my basic benchmarks which showed up to +33% in average (based on CPU usage, this doesn't scale linear with the FPS).
Corner cases with lots of blend and saturation effects and tons of events and picture effects have a hugher gain as you showed ^^

@Ghabry Ghabry modified the milestone: 0.4.2 Apr 11, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment