Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimization for range-based switch #22997

Open
ufcpp opened this issue Nov 3, 2017 · 4 comments
Open

Optimization for range-based switch #22997

ufcpp opened this issue Nov 3, 2017 · 4 comments
Labels
Area-Compilers Code Gen Quality Room for improvement in the quality of the compiler's generated code Perf-Epic
Milestone

Comments

@ufcpp
Copy link
Contributor

ufcpp commented Nov 3, 2017

Ported from dotnet/csharplang#198 (comment)

TL; DR

I want the following switch to be optimized to O(log N).

switch (codePoint)
{
	case uint i0 when 1536 <= i0 && i0 <= 1541:
	case 1757:
	case 1807:
	case 2274:
	case 3406:
	case 69821:
	case uint i1 when 70082 <= i1 && i1 <= 70083:

Background

I recently implemented a Unicode grapheme breaking algorithm. In this library, a switch statement generated from GraphemeBreakProperty table has about 20 thousand lines.
Cascading if statement and case-when versions of that is much shorter but 25 times slower than the switch.

// expanded cases (fast, but ugly and too long to compile)
            switch (codePoint)
            {
                case 1536:
                case 1537:
                case 1538:
                case 1539:
                case 1540:
                case 1541:
                // ...

// with case-when clause (simple, but much slower)
            switch (codePoint)
            {
                case uint i0 when 1536 <= i0 && i0 <= 1541:
                // ...

// with range (I want, proposed as "slicing"?)
            switch (codePoint)
            {
                case 1536..1541:
                // ...

Measurement

I measured performance of:

(My PC is Core i7-4790 3.60GHz.)

A result:

Method Mean Error StdDev
ByBinarySearch 4.642 ms 0.0468 ms 0.0415 ms
BySwitch 3.825 ms 0.0473 ms 0.0442 ms
BySwitchWhen 115.310 ms 2.7970 ms 2.9928 ms
ByIf 142.658 ms 1.3744 ms 1.1477 ms
ByBinaryIf 2.502 ms 0.0147 ms 0.0137 ms

FYR

limit number

Measurement results with limiting number:

20 lines:

Method Mean Error StdDev
ByBinarySearch 2.363 ms 0.0266 ms 0.0222 ms
BySwitch 1.262 ms 0.0105 ms 0.0098 ms
BySwitchWhen 1.809 ms 0.0106 ms 0.0099 ms
ByIf 1.807 ms 0.0070 ms 0.0062 ms
ByBinaryIf 1.398 ms 0.0075 ms 0.0070 ms

50 lines:

Method Mean Error StdDev
ByBinarySearch 3.030 ms 0.0224 ms 0.0210 ms
BySwitch 1.333 ms 0.0151 ms 0.0141 ms
BySwitchWhen 3.417 ms 0.0283 ms 0.0265 ms
ByIf 3.763 ms 0.0204 ms 0.0191 ms
ByBinaryIf 1.916 ms 0.0109 ms 0.0091 ms

100 lines:

Method Mean Error StdDev
ByBinarySearch 3.216 ms 0.0135 ms 0.0120 ms
BySwitch 1.416 ms 0.0089 ms 0.0083 ms
BySwitchWhen 5.030 ms 0.0454 ms 0.0425 ms
ByIf 6.065 ms 0.0530 ms 0.0496 ms
ByBinaryIf 1.960 ms 0.0158 ms 0.0148 ms

200 lines:

Method Mean Error StdDev
ByBinarySearch 3.446 ms 0.0186 ms 0.0174 ms
BySwitch 1.483 ms 0.0228 ms 0.0214 ms
BySwitchWhen 10.176 ms 0.0878 ms 0.0733 ms
ByIf 12.209 ms 0.1345 ms 0.1258 ms
ByBinaryIf 2.029 ms 0.0228 ms 0.0202 ms

500 lines:

Method Mean Error StdDev
ByBinarySearch 3.636 ms 0.0256 ms 0.0240 ms
BySwitch 1.518 ms 0.0080 ms 0.0075 ms
BySwitchWhen 30.827 ms 0.1879 ms 0.1757 ms
ByIf 33.477 ms 0.2768 ms 0.2590 ms
ByBinaryIf 2.088 ms 0.0136 ms 0.0127 ms

1000 lines:

Method Mean Error StdDev
ByBinarySearch 3.944 ms 0.0433 ms 0.0405 ms
BySwitch 3.535 ms 0.0393 ms 0.0368 ms
BySwitchWhen 36.672 ms 0.2641 ms 0.2205 ms
ByIf 69.189 ms 0.2750 ms 0.2572 ms
ByBinaryIf 2.316 ms 0.0217 ms 0.0203 ms

executable size

I also measured executable sizes.

build script

Debug (/o-) build:

Name Size (bytes)
BinarySearchIf.dll 74240
LinearIf.dll 54272
Switch.dll 140800
SwitchWhen.dll 48128

Release (/o+) build:

Name Size (bytes)
BinarySearchIf.dll 31232
LinearIf.dll 24064
Switch.dll 140800
SwitchWhen.dll 38912

compilation time

In addition, it requires 50 seconds to compile the 20 thousand-line switch-case.

@mikedn
Copy link

mikedn commented Nov 3, 2017

While there's no question that switch needs improvement, further and improved analysis of your specific case is likely needed.

For a start, you should ensure that the benchmarks are representative for your scenario. I doubt that random numbers reflect actual Unicode text.

Also, the binary search with array version is pretty poor, there are numerous improvements you could make there. Just a few examples:

  • lower + (upper - lower) / 2 can be (lower + upper) / 2 since there's no chance that addition will overflow in this particular case
  • even better, change / 2 into >> 1. Or use uint instead of int. Signed division by 2 is more expensive than unsigned division.
  • var r = ranges[middle]; should use ref to avoid a value type copy
  • maybe use pointers to avoid an array bounds check
  • maybe use a better "middle" for the start

I don't know if doing that is enough to catch up BinaryIf. BinarySearch would be preferable to BinaryIf as it requires much less code and branches.

And it's rather interesting that BinaryIf is faster than Switch. The classic switch does use binary search but it also uses switch tables here and there. The fact that BinaryIf is faster would imply that the classic switch choice of using tables isn't that great. Though it all depends on the value distribution...

@VSadov VSadov self-assigned this Nov 3, 2017
@VSadov
Copy link
Member

VSadov commented Nov 3, 2017

Interesting.

I am not sure that a lot can be done to help case uint i0 when 1536 <= i0 && i0 <= 1541: - to morph that into binary search with range checks compiler would need to prove that it is indeed a range check and that comparisons are not producing or observing sideeffects and that ranges in different cases are not overlapping (if they are, then lexical order of cases matters), etc..

It would be much easier to emit efficient case 1536..1541: since such syntax would be more precise about the intent and overlapping ranges could be just disallowed.

BTW, compiler should be able to detect "contiguous range of values" cases and emit them as range checks, so at least in theory the flat switch, even though unwieldy in source, should result in IL that employs range checks. I wonder if that really happens in this scenario...

@jcouv
Copy link
Member

jcouv commented Nov 3, 2017

FYI @gafter @khyperia

@jcouv jcouv added this to the 16.0 milestone Nov 3, 2017
@ufcpp
Copy link
Contributor Author

ufcpp commented Nov 5, 2017

@mikedn

I've tried the suggested improvements: ufcpp/GraphemeSplitter@4159c73

Method Mean Error StdDev
ByBinarySearch 3.349 ms 0.0152 ms 0.0135 ms
BySwitch 3.597 ms 0.0136 ms 0.0114 ms
ByBinaryIf 2.386 ms 0.0124 ms 0.0116 ms

GetByBinarySearch gets 20% faster than before and BySwitch, but even then ByBinaryIf is the fastest. The binary-search is faster than the classic switch because of range-based branch.

The benchmark data is not ideal but I don't want so much precision.

@jaredpar jaredpar modified the milestones: 16.0, Unknown Aug 31, 2018
@gafter gafter added the Code Gen Quality Room for improvement in the quality of the compiler's generated code label Sep 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Compilers Code Gen Quality Room for improvement in the quality of the compiler's generated code Perf-Epic
Projects
None yet
Development

No branches or pull requests

6 participants