Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add: optimise switches #78

Open
wants to merge 3 commits into
base: master
from
Open

Add: optimise switches #78

wants to merge 3 commits into from

Conversation

@glx22
Copy link
Contributor

glx22 commented Dec 21, 2019

I had the idea after seeing this pastebin from Andy.

With this PR the following code

switch (FEAT_TRAINS, SELF, open_car_pony_gen_6C_switch_loading_speed_by_cargo_0, cargo_classes & bitmask(CC_PASSENGERS, CC_MAIL, CC_ARMOURED)) {
	bitmask(CC_PASSENGERS): return 6;
	bitmask(CC_MAIL): return 6;
	bitmask(CC_ARMOURED): return 6;
	return 6;
}
switch (FEAT_TRAINS, SELF, open_car_pony_gen_6C_switch_loading_speed_by_cargo_1, cargo_classes & bitmask(CC_PASSENGERS, CC_MAIL, CC_ARMOURED)) {
	bitmask(CC_PASSENGERS): return 8;
	bitmask(CC_MAIL): return 8;
	bitmask(CC_ARMOURED): return 8;
	return 8;
}
switch (FEAT_TRAINS, SELF, open_car_pony_gen_6C_switch_loading_speed_by_cargo_2, cargo_classes & bitmask(CC_PASSENGERS, CC_MAIL, CC_ARMOURED)) {
	bitmask(CC_PASSENGERS): return 11;
	bitmask(CC_MAIL): return 11;
	bitmask(CC_ARMOURED): return 11;
	return 11;
}
switch (FEAT_TRAINS, SELF, open_car_pony_gen_6C_switch_loading_speed, param[0]) {
    0: open_car_pony_gen_6C_switch_loading_speed_by_cargo_0;
    1: open_car_pony_gen_6C_switch_loading_speed_by_cargo_1;
    2: open_car_pony_gen_6C_switch_loading_speed_by_cargo_2;
}

will be replaced at compile time with

switch (FEAT_TRAINS, SELF, open_car_pony_gen_6C_switch_loading_speed, param[0]) {
    0: return 6;
    1: return 8;
    2: return 11;
}

Each replacement generates a warning and the replacement is done only if the switch expression doesn't have side effects (no STORE_TEMP() nor STORE_PERM()) and if there's a default return.

@glx22 glx22 force-pushed the glx22:reduce_refs branch from d9dd0b7 to 98935d9 Dec 22, 2019
@glx22 glx22 changed the title Add: optimise switch returning constants Add: optimise switches Dec 22, 2019
@glx22
Copy link
Contributor Author

glx22 commented Dec 22, 2019

I made some changes. Now the check for all results being the same is done earlier, and if all ranges are the same as default they are removed. Then the replacement happens if there's only default, read only expression and not a return computed value

@glx22
Copy link
Contributor Author

glx22 commented Dec 22, 2019

@andythenorth suggested more possible optimisations. I'm waiting for test cases.

nml/ast/switch.py Outdated Show resolved Hide resolved
nml/expression/array.py Outdated Show resolved Hide resolved
nml/expression/bitmask.py Outdated Show resolved Hide resolved
nml/expression/storage_op.py Outdated Show resolved Hide resolved
nml/expression/spritegroup_ref.py Show resolved Hide resolved
nml/ast/switch.py Show resolved Hide resolved
@glx22 glx22 force-pushed the glx22:reduce_refs branch from 98935d9 to 19c16a1 Dec 28, 2019
@glx22
Copy link
Contributor Author

glx22 commented Dec 28, 2019

Stats compiling iron-horse

nml master:

Reading lang ... 0.2 s
Reading ... 0.1 s
Init parser ... 0.0 s
Parsing ... 106.7 s
Preprocessing ... 90.5 s
Generating actions ... 17.2 s
Assigning Action2 registers ... 0.4 s
Generating strings ... 0.1 s
Collecting real sprites ... 3.6 s
Checking palette of source images ... 6.1 s
Encoding ... 38.7 s
 nmlc info: 97102 sprites, 19664 cached, 312 orphaned, 64464 duplicates, 12974 newly encoded (native)
 nmlc info: Train items: 512/65420
 nmlc info: Concurrent spritesets: 2/255 (".\generated\iron-horse.nml", line 9747)
 nmlc info: Concurrent spritegroups: 176/256 (".\generated\iron-horse.nml", line 18445)
 nmlc info: Concurrent Action2 registers: 1/127 (".\generated\iron-horse.nml", line 188355)
 nmlc info: Concurrent ActionD registers: 4/64 (".\generated\iron-horse.nml", line 10438)
 nmlc info: GRF parameter registers: 3/64
 nmlc info: Cargo translation table: 93/254
 nmlc info: Railtype translation table: 6/256
 nmlc info: D0xx strings: 16/1024
Linking actions ... 1.3 s
Writing output ... 40.2 s

NFO size: 20269741
GRF size: 18943154

this PR:

Reading lang ... 0.2 s
Reading ... 0.1 s
Init parser ... 0.0 s
Parsing ... 119.7 s
Preprocessing ... 723.0 s
Generating actions ... 19.1 s
Assigning Action2 registers ... 0.4 s
Generating strings ... 0.2 s
Collecting real sprites ... 0.8 s
Checking palette of source images ... 6.7 s
Encoding ... 12.4 s
 nmlc info: 91985 sprites, 32638 cached, 312 orphaned, 59347 duplicates, 0 newly encoded (native)
 nmlc info: Train items: 512/65420
 nmlc info: Concurrent spritesets: 2/255 (".\generated\iron-horse.nml", line 9747)
 nmlc info: Concurrent spritegroups: 176/256 (".\generated\iron-horse.nml", line 18445)
 nmlc info: Concurrent Action2 registers: 1/127 (".\generated\iron-horse.nml", line 188355)
 nmlc info: Concurrent ActionD registers: 4/64 (".\generated\iron-horse.nml", line 10438)
 nmlc info: GRF parameter registers: 3/64
 nmlc info: Cargo translation table: 93/254
 nmlc info: Railtype translation table: 6/256
 nmlc info: D0xx strings: 16/1024
Linking actions ... 1.1 s
Writing output ... 35.2 s

NFO size: 18352185
GRF size: 17702514

So compilation is a lot slower, for a very small improvement.

Edit: and then I discovered -c nmlc option, with it GRF size is the same with both nmlc and grfcodec :)

@LordAro
Copy link
Member

LordAro commented Dec 28, 2019

A whole order of magnitude is a ridiculous slow down. Is there something specific causing it?

@glx22
Copy link
Contributor Author

glx22 commented Dec 28, 2019

Slow down is caused by repeating loops when searching for duplicates (one for each new template, one for each new spriteset)

@Yexo
Copy link
Contributor

Yexo commented Jun 1, 2020

It looks like the first 3 commits can be a big win already, and it's only the last 2 commits that are causing the slowdown.
Is there anything preventing the merge of the first 3 commits, leaving the last 2 for later / behind a flag / after more optimization?

@glx22 glx22 force-pushed the glx22:reduce_refs branch 2 times, most recently from b9cdffa to 0b6b5a2 Jun 2, 2020
Copy link
Contributor

Yexo left a comment

Non-blocking: this could use a test case.

nml/expression/spritegroup_ref.py Show resolved Hide resolved
nml/expression/storage_op.py Outdated Show resolved Hide resolved
@glx22 glx22 force-pushed the glx22:reduce_refs branch from 0b6b5a2 to b10a779 Jun 9, 2020
@Yexo
Yexo approved these changes Jun 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants
You can’t perform that action at this time.