Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make HardLight, Saturation, Fill and FileFinder faster #855

Merged
merged 15 commits into from
Apr 5, 2016

Conversation

Ghabry
Copy link
Member

@Ghabry Ghabry commented Apr 2, 2016

See the commit messages for detailed info.

I also measured hard light with and without lookup table. The lookup is cache unfriendly (64kb) but is faster then direct calculation because Hard light has a 50% branch prediction miss chance per color...

For saturation would be more tricky because RGB are combined here, I "only" saved an alloc and a blit here.

Fill (AddBackground, called once per frame) is also faster now by using OP_SRC which doesn't handle alpha correctly but that doesn't matter in this case.

Branch Pred: https://stackoverflow.com/questions/11227809/why-is-processing-a-sorted-array-faster-than-an-unsorted-array

Fixes #846

Hard Light uses now a lookup table and the code is almost branch free. The table hurts the cache because of it's size but is faster then without cache because hard light requires branching.
Saturation saves now one allocation and one OVER blit.
@Ghabry
Copy link
Member Author

Ghabry commented Apr 2, 2016

Speed tests on my old Android system (TestGame Tone map):
Tone: 42 FPS
Sat: 49 FPS
Both: 33 FPS

This PR:
Tone: 60 FPS
Sat: 59-60 FPS
Both: 58 FPS

@BlisterB
Copy link
Member

BlisterB commented Apr 2, 2016

Nice! People are very interested in the engine optimization since the official RPG Maker update, this surely is a good step forward!

@@ -876,6 +891,22 @@ void Bitmap::ClearRect(Rect const& dst_rect) {
RefreshCallback();
}

// Hard light lookup table mapping source color to destination color
static int hard_light_lookup[256][256];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be using uint8_t for less memory footprint?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, very good find.

@Ghabry
Copy link
Member Author

Ghabry commented Apr 2, 2016

Sorry I also added a non-blit operation to the speedups:
A gigantic performance gain in the FileFinder. readdir already tells you if the file is a directory so extra stat calls are nonsense.

For zegeris old test case this reduces the parsing time on Windows for me from 0.35 seconds to 0.000287 seconds (!)

This needs a test on all systems because readdir doesn't give this guarantee that the field is populated with useful data.
Can somebody test this under Linux? But I assume it works there.
Also Wii needs a test obviously.

At least on the 3DS it doesn't work but I already have a solution by looking at other sourcecode.

@Ghabry Ghabry changed the title Make HardLight, Saturation and Filll faster Make HardLight, Saturation, Fill and FileFinder faster Apr 2, 2016
@carstene1ns
Copy link
Member

http://stackoverflow.com/a/10376245 and also the first comment and last answer of it is interesting why we do it the way we did before this PR.


That said, it does not work under Linux currently.

@@ -593,8 +595,8 @@ FileFinder::Directory FileFinder::GetDirectoryMembers(const std::string& path, F
#endif
if (name == "." || name == "..") { continue; }

bool is_directory = ent->d_type == S_IFDIR;

bool is_directory = S_ISDIR(ent->d_type);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ent->d_type == DT_DIR would work for me

@Ghabry
Copy link
Member Author

Ghabry commented Apr 3, 2016

I added a fallback mechanism now. So when testing you have to check the log now if you get a debug log about the system not supporting fast dir checks.
Or on the Wii if it takes now 10 minutes or only 10 seconds :P

@Tondorian
Copy link
Member

now wii buld finds game and is fast, too

@Ghabry
Copy link
Member Author

Ghabry commented Apr 3, 2016

Great, I have two further, quite simple, optimisations under development that will give even more speed.
With them I reach 50 FPS on the old 3DS :)

} else {
if (!show_all) {
return;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great little tweak!

@Ghabry
Copy link
Member Author

Ghabry commented Apr 3, 2016

About the new commits:
When blitting, the code checks now if the src bitmap is completely opaque and will do a OP_SRC instead of OP_OVER here. This also applies to single map tiles, most of them are blitted now via OP_SRC.

Additionally the background color of the system graphic is not rendered anymore for map scenes.

Sprites that would be rendered out of bounds are not rendered anymore.

All in all this gives a gigantic frame boost on the 3DS from 35 to 57 FPS on the world map of testgame.

@Ghabry
Copy link
Member Author

Ghabry commented Apr 4, 2016

Don't merge yet. Found already an visual issue when the chipset is missing or has an invalid tile and doesn't render ^^ (AB blocks in testgame)

@Ghabry
Copy link
Member Author

Ghabry commented Apr 5, 2016

This and the battle PR are now finished (finally).
Expect enjoyable performance on all devices soon :D
lgtm?

@carstene1ns
Copy link
Member

Is your filefinder change tested with games that have nested subdirectories? Otherwise LGTM.

@Ghabry
Copy link
Member Author

Ghabry commented Apr 5, 2016

I didn't try running any game. For this case I only checked zegeris subdirectory benchmark test case and the data structure showed the expected result.

Do you know any game that uses this feature?

@Ghabry
Copy link
Member Author

Ghabry commented Apr 5, 2016

I tested now: Chipset under "aaa\basis" is found and transparency of tiles in Embric: Works

@Ghabry
Copy link
Member Author

Ghabry commented Apr 5, 2016

RPG_RT.zip

Just checkout testgame 2000 and replace the stuff.

@fdelapena fdelapena merged commit a713ce3 into EasyRPG:master Apr 5, 2016
@Ghabry Ghabry deleted the fastblit branch April 5, 2016 17:57
@BlisterB
Copy link
Member

BlisterB commented Apr 6, 2016

I tried this PR with the first town of Aedemphia.
Before : 9-25 fps
Now : 35-45 fps.

Improvements are clearly here, congratulation :) !

@Ghabry
Copy link
Member Author

Ghabry commented Apr 6, 2016

Great, this reflects my basic benchmarks which showed up to +33% in average (based on CPU usage, this doesn't scale linear with the FPS).
Corner cases with lots of blend and saturation effects and tons of events and picture effects have a hugher gain as you showed ^^

@Ghabry Ghabry modified the milestone: 0.4.2 Apr 11, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants