Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add: optimise switches #78

Merged
merged 3 commits into from Oct 17, 2020
Merged

Add: optimise switches #78

merged 3 commits into from Oct 17, 2020

Conversation

@glx22
Copy link
Contributor

@glx22 glx22 commented Dec 21, 2019

I had the idea after seeing this pastebin from Andy.

With this PR the following code

switch (FEAT_TRAINS, SELF, open_car_pony_gen_6C_switch_loading_speed_by_cargo_0, cargo_classes & bitmask(CC_PASSENGERS, CC_MAIL, CC_ARMOURED)) {
	bitmask(CC_PASSENGERS): return 6;
	bitmask(CC_MAIL): return 6;
	bitmask(CC_ARMOURED): return 6;
	return 6;
}
switch (FEAT_TRAINS, SELF, open_car_pony_gen_6C_switch_loading_speed_by_cargo_1, cargo_classes & bitmask(CC_PASSENGERS, CC_MAIL, CC_ARMOURED)) {
	bitmask(CC_PASSENGERS): return 8;
	bitmask(CC_MAIL): return 8;
	bitmask(CC_ARMOURED): return 8;
	return 8;
}
switch (FEAT_TRAINS, SELF, open_car_pony_gen_6C_switch_loading_speed_by_cargo_2, cargo_classes & bitmask(CC_PASSENGERS, CC_MAIL, CC_ARMOURED)) {
	bitmask(CC_PASSENGERS): return 11;
	bitmask(CC_MAIL): return 11;
	bitmask(CC_ARMOURED): return 11;
	return 11;
}
switch (FEAT_TRAINS, SELF, open_car_pony_gen_6C_switch_loading_speed, param[0]) {
    0: open_car_pony_gen_6C_switch_loading_speed_by_cargo_0;
    1: open_car_pony_gen_6C_switch_loading_speed_by_cargo_1;
    2: open_car_pony_gen_6C_switch_loading_speed_by_cargo_2;
}

will be replaced at compile time with

switch (FEAT_TRAINS, SELF, open_car_pony_gen_6C_switch_loading_speed, param[0]) {
    0: return 6;
    1: return 8;
    2: return 11;
}

Each replacement generates a warning and the replacement is done only if the switch expression doesn't have side effects (no STORE_TEMP() nor STORE_PERM()) and if there's a default return.

@glx22 glx22 force-pushed the glx22:reduce_refs branch from d9dd0b7 to 98935d9 Dec 22, 2019
@glx22 glx22 changed the title Add: optimise switch returning constants Add: optimise switches Dec 22, 2019
@glx22
Copy link
Contributor Author

@glx22 glx22 commented Dec 22, 2019

I made some changes. Now the check for all results being the same is done earlier, and if all ranges are the same as default they are removed. Then the replacement happens if there's only default, read only expression and not a return computed value

@glx22
Copy link
Contributor Author

@glx22 glx22 commented Dec 22, 2019

@andythenorth suggested more possible optimisations. I'm waiting for test cases.

nml/ast/switch.py Outdated Show resolved Hide resolved
nml/expression/array.py Outdated Show resolved Hide resolved
nml/expression/bitmask.py Outdated Show resolved Hide resolved
nml/expression/storage_op.py Outdated Show resolved Hide resolved
nml/expression/spritegroup_ref.py Show resolved Hide resolved
nml/ast/switch.py Show resolved Hide resolved
@glx22 glx22 force-pushed the glx22:reduce_refs branch from 98935d9 to 19c16a1 Dec 28, 2019
@glx22
Copy link
Contributor Author

@glx22 glx22 commented Dec 28, 2019

Stats compiling iron-horse

nml master:

Reading lang ... 0.2 s
Reading ... 0.1 s
Init parser ... 0.0 s
Parsing ... 106.7 s
Preprocessing ... 90.5 s
Generating actions ... 17.2 s
Assigning Action2 registers ... 0.4 s
Generating strings ... 0.1 s
Collecting real sprites ... 3.6 s
Checking palette of source images ... 6.1 s
Encoding ... 38.7 s
 nmlc info: 97102 sprites, 19664 cached, 312 orphaned, 64464 duplicates, 12974 newly encoded (native)
 nmlc info: Train items: 512/65420
 nmlc info: Concurrent spritesets: 2/255 (".\generated\iron-horse.nml", line 9747)
 nmlc info: Concurrent spritegroups: 176/256 (".\generated\iron-horse.nml", line 18445)
 nmlc info: Concurrent Action2 registers: 1/127 (".\generated\iron-horse.nml", line 188355)
 nmlc info: Concurrent ActionD registers: 4/64 (".\generated\iron-horse.nml", line 10438)
 nmlc info: GRF parameter registers: 3/64
 nmlc info: Cargo translation table: 93/254
 nmlc info: Railtype translation table: 6/256
 nmlc info: D0xx strings: 16/1024
Linking actions ... 1.3 s
Writing output ... 40.2 s

NFO size: 20269741
GRF size: 18943154

this PR:

Reading lang ... 0.2 s
Reading ... 0.1 s
Init parser ... 0.0 s
Parsing ... 119.7 s
Preprocessing ... 723.0 s
Generating actions ... 19.1 s
Assigning Action2 registers ... 0.4 s
Generating strings ... 0.2 s
Collecting real sprites ... 0.8 s
Checking palette of source images ... 6.7 s
Encoding ... 12.4 s
 nmlc info: 91985 sprites, 32638 cached, 312 orphaned, 59347 duplicates, 0 newly encoded (native)
 nmlc info: Train items: 512/65420
 nmlc info: Concurrent spritesets: 2/255 (".\generated\iron-horse.nml", line 9747)
 nmlc info: Concurrent spritegroups: 176/256 (".\generated\iron-horse.nml", line 18445)
 nmlc info: Concurrent Action2 registers: 1/127 (".\generated\iron-horse.nml", line 188355)
 nmlc info: Concurrent ActionD registers: 4/64 (".\generated\iron-horse.nml", line 10438)
 nmlc info: GRF parameter registers: 3/64
 nmlc info: Cargo translation table: 93/254
 nmlc info: Railtype translation table: 6/256
 nmlc info: D0xx strings: 16/1024
Linking actions ... 1.1 s
Writing output ... 35.2 s

NFO size: 18352185
GRF size: 17702514

So compilation is a lot slower, for a very small improvement.

Edit: and then I discovered -c nmlc option, with it GRF size is the same with both nmlc and grfcodec :)

@LordAro
Copy link
Member

@LordAro LordAro commented Dec 28, 2019

A whole order of magnitude is a ridiculous slow down. Is there something specific causing it?

@glx22
Copy link
Contributor Author

@glx22 glx22 commented Dec 28, 2019

Slow down is caused by repeating loops when searching for duplicates (one for each new template, one for each new spriteset)

@Yexo
Copy link
Contributor

@Yexo Yexo commented Jun 1, 2020

It looks like the first 3 commits can be a big win already, and it's only the last 2 commits that are causing the slowdown.
Is there anything preventing the merge of the first 3 commits, leaving the last 2 for later / behind a flag / after more optimization?

@glx22 glx22 force-pushed the glx22:reduce_refs branch 2 times, most recently from b9cdffa to 0b6b5a2 Jun 2, 2020
Copy link
Contributor

@Yexo Yexo left a comment

Non-blocking: this could use a test case.

nml/expression/spritegroup_ref.py Show resolved Hide resolved
nml/expression/storage_op.py Outdated Show resolved Hide resolved
@glx22 glx22 force-pushed the glx22:reduce_refs branch from 0b6b5a2 to b10a779 Jun 9, 2020
@glx22 glx22 force-pushed the glx22:reduce_refs branch 2 times, most recently from 4c848fc to a456eb9 Oct 3, 2020
@glx22 glx22 force-pushed the glx22:reduce_refs branch from a456eb9 to ce1d2ef Oct 17, 2020
@glx22 glx22 merged commit aede745 into OpenTTD:master Oct 17, 2020
19 checks passed
19 checks passed
Commit checker
Details
Python 3.5 on ubuntu-latest
Details
Security and Quality Security and Quality
Details
Python 3.6 on ubuntu-latest
Details
Python 3.7 on ubuntu-latest
Details
Python 3.8 on ubuntu-latest
Details
Python pypy3 on ubuntu-latest
Details
Python 3.5 on macOS-latest
Details
Python 3.6 on macOS-latest
Details
Python 3.7 on macOS-latest
Details
Python 3.8 on macOS-latest
Details
Python 3.5 on windows-2016
Details
Python 3.6 on windows-2016
Details
Python 3.7 on windows-2016
Details
Python 3.8 on windows-2016
Details
Python 3.x on ubuntu-latest
Details
Python 3.x on macOS-latest
Details
Python 3.x on windows-2016
Details
CodeQL No new alerts
Details
@glx22 glx22 deleted the glx22:reduce_refs branch Oct 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants
You can’t perform that action at this time.