Performance: faster switch over string objects #56374

EgorBo · 2021-09-14T11:53:04Z

Background and Motivation

Currently, when we use switch over string objects, Roslyn emits a hash-based chain of if-else comparisons, e.g.:

public bool IsKnownContentType(string contentTypeValue)
{
    switch (contentTypeValue)
    {
        case "text/xml":
        case "text/css":
        case "text/csv":
        case "image/gif":
        case "image/png":
        case "text/html":
        case "text/plain":
        case "image/jpeg":
        case "application/pdf":
        case "application/xml":
        case "application/zip":
        case "application/grpc":
        case "application/json":
        case "multipart/form-data":
        case "application/javascript":
        case "application/octet-stream":
        case "text/html; charset=utf-8":
        case "text/plain; charset=utf-8":
        case "application/json; charset=utf-8":
        case "application/x-www-form-urlencoded":
            return true;
    }
    return false;
}

Here we'll run an O(N) hash algorithm (it's not SIMDified for large strings) and do a chain of comparisons with the precomputed ones (with binary search):

int hash = ComputeStringHash(contentTypeValue);
if (hash == 0x34343443) // 0x34343443 is a hash of "text/xml"
   return hash == "text/xml";
else if (hash == 0x54545452) // 0x54545452 is a hash of "text/css"
   return hash == "text/css";
// etc

For sure it's better than doing string.Equals case by case, but it could be way faster.

Proposed approach

Roslyn should emit a jump-table based on input's Length and then compare chars at some precomputed positions (sort of Trie), e.g. for our IsKnownContentType it could be:

public bool IsKnownContentType_opt(string contentTypeValue)
{
    string? candidate = null;
    switch (contentTypeValue.Length)
    {
        case 8:
            switch (contentTypeValue[7]) // at 7th position all chars are unique:
            {
                case 'l': candidate = "text/xml"; break;
                case 's': candidate = "text/css"; break;
                case 'v': candidate = "text/csv"; break;
            }
            break;
        case 9:
            switch (contentTypeValue[6]) // at 6th position all chars are unique:
            {
                case 'g': candidate = "image/gif"; break; 
                case 'p': candidate = "image/png"; break;
                case 't': candidate = "text/html"; break;
            }
            break;
        case 10:
            switch (contentTypeValue[0])
            {
                case 't': candidate = "text/plain"; break; 
                case 'i': candidate = "image/jpeg"; break;
            }
            break;
        case 15:
            switch (contentTypeValue[12])
            {
                case 'p': candidate = "application/pdf"; break;
                case 'x': candidate = "application/xml"; break;
                case 'z': candidate = "application/zip"; break;
            }
            break;
        case 16:
            switch (contentTypeValue[12])
            {
                case 'g': candidate = "application/grpc"; break; 
                case 'j': candidate = "application/json"; break;
            }
            break;
        case 19: candidate = "multipart/form-data"; break;
        case 22: candidate = "application/javascript"; break;
        case 24:
            switch (contentTypeValue[0])
            {
                case 'a': candidate = "application/octet-stream"; break;
                case 't': candidate = "text/html; charset=utf-8"; break;
            }
            break;
        case 25: candidate = "text/plain; charset=utf-8"; break;
        case 31: candidate = "application/json; charset=utf-8"; break;
        case 33: candidate = "application/x-www-form-urlencoded"; break;
    }
    return candidate == contentTypeValue;
}

So, ideally we can find a matching candidate in just 1-2 jump instructions here.
I wrote a simple benchmark, here are the results:

|                 Method |     contentTypeValue |      Mean |     Error |    StdDev |
|----------------------- |--------------------- |----------:|----------:|----------:|
|     IsKnownContentType |     application/grpc |  8.607 ns | 0.0384 ns | 0.0359 ns |
| IsKnownContentType_opt |     application/grpc |  1.475 ns | 0.0029 ns | 0.0026 ns |

|     IsKnownContentType | appli(...)cript [22] | 12.829 ns | 0.0362 ns | 0.0302 ns |
| IsKnownContentType_opt | appli(...)cript [22] |  1.289 ns | 0.0016 ns | 0.0015 ns |

|     IsKnownContentType |     application/json |  8.152 ns | 0.0321 ns | 0.0285 ns |
| IsKnownContentType_opt |     application/json |  1.484 ns | 0.0035 ns | 0.0033 ns |

|     IsKnownContentType | appli(...)utf-8 [31] | 18.141 ns | 0.0161 ns | 0.0126 ns |
| IsKnownContentType_opt | appli(...)utf-8 [31] |  1.269 ns | 0.0020 ns | 0.0018 ns |

|     IsKnownContentType | appli(...)tream [24] | 14.009 ns | 0.0933 ns | 0.0779 ns |
| IsKnownContentType_opt | appli(...)tream [24] |  1.715 ns | 0.0031 ns | 0.0026 ns |

|     IsKnownContentType |      application/pdf |  7.774 ns | 0.0044 ns | 0.0035 ns |
| IsKnownContentType_opt |      application/pdf |  1.701 ns | 0.0174 ns | 0.0155 ns |

|     IsKnownContentType | appli(...)coded [33] | 19.900 ns | 0.0227 ns | 0.0212 ns |
| IsKnownContentType_opt | appli(...)coded [33] |  1.092 ns | 0.0028 ns | 0.0023 ns |

|     IsKnownContentType |      application/xml |  7.756 ns | 0.0235 ns | 0.0208 ns |
| IsKnownContentType_opt |      application/xml |  1.536 ns | 0.0028 ns | 0.0024 ns |

|     IsKnownContentType |      application/zip |  7.593 ns | 0.0107 ns | 0.0100 ns |
| IsKnownContentType_opt |      application/zip |  1.317 ns | 0.0077 ns | 0.0072 ns |

|     IsKnownContentType |            image/gif |  5.118 ns | 0.0048 ns | 0.0040 ns |
| IsKnownContentType_opt |            image/gif |  1.694 ns | 0.0036 ns | 0.0032 ns |

|     IsKnownContentType |           image/jpeg |  5.068 ns | 0.0086 ns | 0.0072 ns |
| IsKnownContentType_opt |           image/jpeg |  1.695 ns | 0.0047 ns | 0.0041 ns |

|     IsKnownContentType |            image/png |  4.478 ns | 0.0531 ns | 0.0497 ns |
| IsKnownContentType_opt |            image/png |  1.944 ns | 0.0087 ns | 0.0068 ns |

|     IsKnownContentType |  multipart/form-data | 10.519 ns | 0.0211 ns | 0.0187 ns |
| IsKnownContentType_opt |  multipart/form-data |  1.305 ns | 0.0114 ns | 0.0107 ns |

|     IsKnownContentType |             text/css |  4.295 ns | 0.0062 ns | 0.0052 ns |
| IsKnownContentType_opt |             text/css |  1.537 ns | 0.0039 ns | 0.0030 ns |

|     IsKnownContentType |             text/csv |  4.409 ns | 0.0434 ns | 0.0362 ns |
| IsKnownContentType_opt |             text/csv |  1.484 ns | 0.0035 ns | 0.0031 ns |

|     IsKnownContentType |            text/html |  4.402 ns | 0.0041 ns | 0.0038 ns |
| IsKnownContentType_opt |            text/html |  1.526 ns | 0.0030 ns | 0.0026 ns |

|     IsKnownContentType | text/(...)utf-8 [24] | 13.811 ns | 0.0159 ns | 0.0132 ns |
| IsKnownContentType_opt | text/(...)utf-8 [24] |  1.537 ns | 0.0036 ns | 0.0032 ns |

|     IsKnownContentType |           text/plain |  5.161 ns | 0.0207 ns | 0.0184 ns |
| IsKnownContentType_opt |           text/plain |  1.320 ns | 0.0068 ns | 0.0057 ns |

|     IsKnownContentType | text/(...)utf-8 [25] | 14.368 ns | 0.0210 ns | 0.0186 ns |
| IsKnownContentType_opt | text/(...)utf-8 [25] |  1.289 ns | 0.0050 ns | 0.0042 ns |

|     IsKnownContentType |             text/xml |  4.015 ns | 0.0069 ns | 0.0057 ns |
| IsKnownContentType_opt |             text/xml |  1.473 ns | 0.0022 ns | 0.0018 ns |

Optional optimizations

Compare 2/4 chars at once

~~In IsKnownContentType_opt I only compare single chars, but can check for free up to 4 chars at once (on 64bit) or 2 chars for AnyCPU.~~ unaligned reads

Extend Roslyn's limitations on Jump-tables

The benchmark results ^ could be even more impressive if that switch over input's Length was emitted as single/solid jump-table by Roslyn and covered all the "missing" cases. It can be explained via this sharplab link, but it's a trade-off between IL size and perf.

Risks

The new implementation should always be faster but might affect IL size in some cases. However, in the case of my sample IsKnownContentType_opt is actually 25% smaller than the default one.

Related discussion

There were attempts to do it in the past, e.g. dotnet/csharplang#1881 (comment) and #43869
but I think if we won't do a full Trie and only try to do Length + single char (or 4 at once) and fallback to the default implementation if there are cases where we can't do single-char checks - it will be simple and always faster.

/cc @333fred @CyrusNajmabadi

The text was updated successfully, but these errors were encountered:

dotnet-issue-labeler · 2021-09-14T11:53:07Z

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

george-polevoy · 2021-09-15T10:58:11Z

Some benchmark using 4 chars as a discriminator.
Strings "aaaaab, aaaaba, aaabaa, aabaaa" are chosen because they need a long discriminator among a relatively short string list.
This set of strings makes a good worst case scenario.

        [Benchmark(Baseline = true)]
        public bool SwitchOverString()
        {
            bool result;
            switch (_s)
            {
                // aabaaa 0x61006100610062 27303489359118434
                case "aabaaa":
                    result = true;
                    break;
                // aaabaa 0x61006100620061 27303489359183969
                case "aaabaa":
                    result = true;
                    break;
                // aaaaba 0x61006200610061 27303493654085729
                case "aaaaba":
                    result = true;
                    break;
                // aaaaab 0x62006100610061 27584964335829089
                case "aaaaab":
                    result = true;
                    break;
                default:
                    result = false;
                    break;
            }

            return result;
        }

        [Benchmark]
        public bool LongDiscriminator_MemoryMarshal()
        {
            string candidate = null;

            if (_s.Length != 6)
            {
                return false;
            }

            // Selects a contiguous region of chars as a long. The range [2..] is chosen because it contains significant
            // characters in all possible strings
            var discriminator = MemoryMarshal.Read<long>(MemoryMarshal.Cast<char, byte>(_s.AsSpan(2, 4)));

            switch (discriminator)
            {
                // aabaaa 0x61006100610062 27303489359118434
                case 0x61006100610062:
                    candidate = "aabaaa";
                    break;
                // aaabaa 0x61006100620061 27303489359183969
                case 0x61006100620061:
                    candidate = "aaabaa";
                    break;
                // aaaaba 0x61006200610061 27303493654085729
                case 0x61006200610061:
                    candidate = "aaaaba";
                    break;
                // aaaaab 0x62006100610061 27584964335829089
                case 0x62006100610061:
                    candidate = "aaaaab";
                    break;
            }

            return candidate == _s;
        }

|                          Method |      Mean |     Error |    StdDev | Ratio |
|-------------------------------- |----------:|----------:|----------:|------:|
|                SwitchOverString | 16.307 ns | 1.6814 ns | 0.0922 ns |  1.00 |
| LongDiscriminator_MemoryMarshal |  2.681 ns | 0.2040 ns | 0.0112 ns |  0.16 |

GGG-KILLER · 2021-10-07T00:40:46Z

I think it would be worthwhile implementing this for the pattern

return varName is "aaaaab" or "aaaaba" or "aaabaa" or "aabaaa";

as well

GGG-KILLER · 2021-10-09T16:20:59Z

Ok, so, I don't know if it'll be of any help (since I'm not really sure where this would go, I think it's in the lowering stage but I'm not sure) but I wrote a rewriter poc based on EgorBo's snippet.

This is the result of running it on IsKnownContentType:

bool IsKnownContentType(string contentTypeValue)
{
    {
        var __switchExpr__0 = contentTypeValue;
        string? __candidate__1 = null;
        int __destination__2 = -1;
        switch (__switchExpr__0.Length)
        {
            case 8:
                switch (__switchExpr__0[7])
                {
                    case 'l':
                        __candidate__1 = "text/xml";
                        __destination__2 = 0;
                        break;
                    case 's':
                        __candidate__1 = "text/css";
                        __destination__2 = 0;
                        break;
                    case 'v':
                        __candidate__1 = "text/csv";
                        __destination__2 = 0;
                        break;
                }

                break;
            case 9:
                switch (__switchExpr__0[6])
                {
                    case 'g':
                        __candidate__1 = "image/gif";
                        __destination__2 = 0;
                        break;
                    case 'p':
                        __candidate__1 = "image/png";
                        __destination__2 = 0;
                        break;
                    case 't':
                        __candidate__1 = "text/html";
                        __destination__2 = 0;
                        break;
                }

                break;
            case 10:
                switch (__switchExpr__0[0])
                {
                    case 't':
                        __candidate__1 = "text/plain";
                        __destination__2 = 0;
                        break;
                    case 'i':
                        __candidate__1 = "image/jpeg";
                        __destination__2 = 0;
                        break;
                }

                break;
            case 15:
                switch (__switchExpr__0[12])
                {
                    case 'p':
                        __candidate__1 = "application/pdf";
                        __destination__2 = 0;
                        break;
                    case 'x':
                        __candidate__1 = "application/xml";
                        __destination__2 = 0;
                        break;
                    case 'z':
                        __candidate__1 = "application/zip";
                        __destination__2 = 0;
                        break;
                }

                break;
            case 16:
                switch (__switchExpr__0[12])
                {
                    case 'g':
                        __candidate__1 = "application/grpc";
                        __destination__2 = 0;
                        break;
                    case 'j':
                        __candidate__1 = "application/json";
                        __destination__2 = 0;
                        break;
                }

                break;
            case 19:
                __candidate__1 = "multipart/form-data";
                __destination__2 = 0;
                break;
            case 22:
                __candidate__1 = "application/javascript";
                __destination__2 = 0;
                break;
            case 24:
                switch (__switchExpr__0[0])
                {
                    case 'a':
                        __candidate__1 = "application/octet-stream";
                        __destination__2 = 0;
                        break;
                    case 't':
                        __candidate__1 = "text/html; charset=utf-8";
                        __destination__2 = 0;
                        break;
                }

                break;
            case 25:
                __candidate__1 = "text/plain; charset=utf-8";
                __destination__2 = 0;
                break;
            case 31:
                __candidate__1 = "application/json; charset=utf-8";
                __destination__2 = 0;
                break;
            case 33:
                __candidate__1 = "application/x-www-form-urlencoded";
                __destination__2 = 0;
                break;
        }

        if (__destination__2 != -1 && (object.ReferenceEquals(__switchExpr__0, __candidate__1) || __switchExpr__0?.Equals(__candidate__1) is true))
        {
            switch (__destination__2)
            {
                case 0:
                    return true;
            }
        }
    }

    return false;
}

grendello · 2022-05-19T17:14:21Z

Hi,

I was just pointed to this issue by @akoeplinger as we were looking at the code (both IL and native asm) generated for the switch on string. I was dealing with a similar issue in Xamarin.Android recently (in native code), and I think the runtime
could benefit from the approach I took there.

Firstly, I use xxHash to hash the strings I need to match (64-bit and 32-bit for the respective word width) as it's both fast and has a very, very low collision ratio.

Secondly, I generate two tables - first contains the sorted hashes and the other contains the data I need to access for the given hash. To find a match, I do a binary search over the first table to get the index of that hash, and then I use that index to access the other table. This has the potential of being quite friendly to CPU caches, which is a good win for mobile platforms. The 2nd table might be problematic for Roslyn (but perhaps not?), however I think even an if-then-else performed on integers should be fast enough.

I tried several different approaches to binary search optimization, including loop unrolling and branchless versions, with mixed results. I eventually settled on the "naive" binary search because it delivered the best results across
the different architectures we support. It will (IIRC) perform log2(n)-1 comparisons and jumps, which is not too bad.

CyrusNajmabadi · 2022-05-19T19:39:40Z

It will (IIRC) perform log2(n)-1 comparisons and jumps, which is not too bad.

i think this is what roslyn does as well once the number of cases is high enough. I see us emitting https://docs.microsoft.com/en-us/dotnet/api/system.reflection.emit.opcodes.bgt_un_s?view=net-6.0 and whatnot in our IL.

You can see the logic we have for this here:

roslyn/src/Compilers/Core/Portable/CodeGen/SwitchIntegralJumpTableEmitter.cs

Line 67 in 2b12d49

internal void EmitJumpTable()

            //  c)	After bucketing, generate code to perform a binary search on these buckets array, 
            //      emitting conditional jumps if current bucket sub-array has more than one bucket and
            //      emitting the switch instruction when we are down to a single bucket in the sub-array.

grendello · 2022-05-19T20:21:31Z

@CyrusNajmabadi we were looking at the generated code and it looks like unrolled binary search, but what's puzzling is this part:

if (num == 822911587 && s == "4")
{
    return true;
}

and also

else if (s == "5")
{
   return true;
}

Why always call the Equality operator if the hash matches? Is it because of the high collision ratio of the hash for similar strings?

CyrusNajmabadi · 2022-05-19T20:22:45Z

Why always call the Equality operator if the hash matches?

Because hash collisions are possible.

Is it because of the high collision ratio of the hash for similar strings?

No. It's not about a high ratio. It's about a collision always being possible, so you have to validate the string is actually a match. A matching hash is never sufficient (same with hashtables, or any other system).

THe only times hashes are ok to use is only when you know all inputs (not all values those inputs are being checked against) and you can then design a perfect hash mapping those fixed input values to your hashing domain. That's not what we have here, so the equality check is always necessary.

grendello · 2022-05-19T20:31:04Z

I understand the need for validation, even though it does make the code slower than it needs to be, but how about using xxHash? It's got a really good ratio for small strings.
Given that we know the size of input strings, could the code be modified (assuming we use xxHash) to verify the string when collisions are possible? As you can see from the tables above, xxHash has 0 collisions for strings with lengths between 8 and 1024 bytes. String validation could also be performed only if strings are equal in length.

EgorBo · 2022-05-19T20:52:16Z

@grendello btw, CoreCLR optimizes (unrolls) x == "str" for small strings (up to 32 chars afair) so it's almost free - e.g.
using SWAR or SIMD (on both x86 and arm)

EgorBo · 2022-05-19T20:53:36Z

it shouldn't be difficult to do the same with Mono-LLVM, it's just x == "str" should be lowered down to memcmp LLVM intrinsic and then LLVM will unroll it too

CyrusNajmabadi · 2022-05-19T20:54:05Z

I understand the need for validation, even though it does make the code slower than it needs to be,

I'm not sure what you mean. Validation must be done. There is no way to avoid that. So the validation doesn't make the code slower than necessary. It's mandatory to ensure that the results are correct.

CyrusNajmabadi · 2022-05-19T20:54:50Z

String validation could also be performed only if strings are equal in length.

This is already done. == already will validate string length as the very first thing it does to avoid any O(n) work.

CyrusNajmabadi · 2022-05-19T20:56:02Z

object.ReferenceEquals(__switchExpr__0, __candidate__1)

This is not necessary. String equality already does this check.

EgorBo · 2022-05-19T20:58:26Z

For reference:

bool Test2(string s) => s == "Access-Control-Allow-Credentials";

emits:

; Method Tests:Test2(System.String):bool
       vzeroupper 
       test     rcx, rcx  ;;  <-- is null?
       je       SHORT G_M48330_IG06
       cmp      dword ptr [rcx+8], 32  ;;  <-- Length == 32
       jne      SHORT G_M48330_IG05
       vmovdqu  ymm0, ymmword ptr[rcx+12]
       vpxor    ymm0, ymm0, ymmword ptr[reloc @RWD00]
       vmovdqu  ymm1, ymmword ptr[rcx+44]
       vpxor    ymm1, ymm1, ymmword ptr[reloc @RWD32]
       vpor     ymm0, ymm0, ymm1
       vptest   ymm0, ymm0
       sete     al
       movzx    rax, al
       jmp      SHORT G_M48330_IG07
G_M48330_IG05:
       xor      eax, eax
       jmp      SHORT G_M48330_IG07
G_M48330_IG06:
       xor      eax, eax
G_M48330_IG07:
       movzx    rax, al
       vzeroupper 
       ret

For smaller string the codegen is much smaller (e.g. for 4 chars string we can use a single 64bit long constant)

dotnet/runtime#65288, dotnet/runtime#66095, dotnet/runtime#65288

grendello · 2022-05-19T21:16:39Z

I understand the need for validation, even though it does make the code slower than it needs to be,

I'm not sure what you mean. Validation must be done. There is no way to avoid that. So the validation doesn't make the code slower than necessary. It's mandatory to ensure that the results are correct.

If the chosen hash function has collision ratio of 0 collisions for a given domain (length), then I think it could be done without risking?

grendello · 2022-05-19T21:18:25Z

@EgorBo it would be nice if Mono generated similar code :)

CyrusNajmabadi · 2022-05-19T21:22:30Z

If the chosen hash function has collision ratio of 0 collisions for a given domain (length), then I think it could be done without risking?

I'm highly skeptical this will matter in practice. It's only for 8 bytes or less, or 4 characters. How often are you hashing those such that this matters. If the strings are that small, then the equality check is going to be blazingly fast anyways.

canton7 · 2022-05-19T21:22:55Z

If the chosen hash function has collision ratio of 0 collisions for a given domain (length), then I think it could be done without risking?

Maybe I'm missing something, but surely that's fundamentally impossible for any scenario where the string has more bits than the hash? You necessarily have collisions in that case, as there are only 2^64 possible hashes, but there are 2^64 possible strings of length 4 (give or take, not all bytes sequences are valid UTF-16, etc). Therefore you must get collisions when you're dealing with any strings > 4 chars.

CyrusNajmabadi · 2022-05-19T21:23:54Z

Maybe I'm missing something, but surely that's fundamentally impossible for any scenario where the string has more bits than the hash?

@canton7 That's why tehy're specifying for "a given domain (length)" . xx is bijective on <=8 bytes of data:

canton7 · 2022-05-19T21:24:25Z

Indeed, I thought that was clear to OP, but it seems perhaps not.

CyrusNajmabadi · 2022-05-19T21:25:08Z

@canton7 in this case, it's producing a 64bit (System.Int64 in .net) hash. So it's saying that it perfectly maps any 64bit input to a distinct 64bit output.

stephentoub · 2022-05-19T21:39:47Z

@stephentoub might also have some interest here.

I do :) We should experiment with an approach like @EgorBo proposes... we manually do such things in a variety of hot paths, and it'd be nice to a) not have to do so and b) extend the wins to other places as well. e.g.
https://github.com/dotnet/runtime/blob/main/src/libraries/System.Net.Http/src/System/Net/Http/Headers/KnownHeaders.cs#L161-L414
Another approach that can be considered is to first switch over the length and then proceed with Roslyn's current scheme within each length bucket. ASP.NET has a code generator that essentially does that:
https://github.com/dotnet/aspnetcore/blob/8b30d862de6c9146f466061d51aa3f1414ee2337/src/Shared/HttpSys/RequestProcessing/RequestHeaders.Generated.cs#L1170-L1704

grendello · 2022-05-19T21:40:23Z

That's not an exhaustive search. They only tested a few billion inputs (necessarily, as the input space is so large). So to take the last row, they sampled 10^11 inputs from a search space of 10^308. As we said, you must have collisions, mathematically, when the number of bits in the input is larger than the number of bits in the hash.

And I will repeat myself - apply the optimization if the domain is right, within the limits of mathematics.

canton7 · 2022-05-19T21:41:19Z

And I'll repeat myself -- you can only guarantee that there aren't collisions where the number of bits in the input is <= the number of bits in the hash (so 8 chars for a 64 bit hash, 16 chars for a 128 bit hash), and at that point the string equality check is so trivially cheap that it's not really worth eliding anyway.

CyrusNajmabadi · 2022-05-19T21:43:25Z

At htis point, i'm basically backing out the discussion. I think if this can be done safely and sanely, with a reasonable dependency on some battle tested piece of code, and it itself is tested extensively, it's fine. THis is all an internal impl detail, so as long as it's faster and safe, i'm ok with it.

grendello · 2022-05-19T21:45:49Z

Monstrosities like this could take advantage of the small string optimizations. Also, I think you'd be surprised how much "trivial" optimizations like this can improve performance on mobile platforms (especially the underpowered ones). However, I am too backing out of the discussion, I think all points have been made. Thanks!

GGG-KILLER · 2022-05-19T22:12:17Z

@stephentoub might also have some interest here.

I do :) We should experiment with an approach like @EgorBo proposes... we manually do such things in a variety of hot paths, and it'd be nice to a) not have to do so and b) extend the wins to other places as well. e.g. https://github.com/dotnet/runtime/blob/main/src/libraries/System.Net.Http/src/System/Net/Http/Headers/KnownHeaders.cs#L161-L414 Another approach that can be considered is to first switch over the length and then proceed with Roslyn's current scheme within each length bucket. ASP.NET has a code generator that essentially does that: https://github.com/dotnet/aspnetcore/blob/8b30d862de6c9146f466061d51aa3f1414ee2337/src/Shared/HttpSys/RequestProcessing/RequestHeaders.Generated.cs#L1170-L1704

If it interests anyone I have also made a source generator helper for doing what was suggested in the original issue (although obviously not production-ready as it wasn't really meant for anyone other than me to use/modify it, but it works): https://github.com/LorettaDevs/Loretta/blob/2f213c2afc49bcba1de6cba677d8c372bd188079/src/Tools/Loretta.Generators.Shared/OptimizedSwitch.cs

I'd like to help to implement this into Roslyn but no idea where to start.

stephentoub · 2022-09-16T16:03:38Z

Out of sheer curiosity, I checked what the C# compiler will output if the switch is rewritten as cases of list pattern matching:
SharpLab
e.g. instead of the original example:

    public bool IsKnownContentType1(string contentTypeValue)
    {
        switch (contentTypeValue)
        {
            case "text/xml":
            case "text/css":
            case "text/csv":
            case "image/gif":
            case "image/png":
            case "text/html":
            case "text/plain":
            case "image/jpeg":
            case "application/pdf":
            case "application/xml":
            case "application/zip":
            case "application/grpc":
            case "application/json":
            case "multipart/form-data":
            case "application/javascript":
            case "application/octet-stream":
            case "text/html; charset=utf-8":
            case "text/plain; charset=utf-8":
            case "application/json; charset=utf-8":
            case "application/x-www-form-urlencoded":
                return true;
        }
        return false;
    }

doing:

    public bool IsKnownContentType2(string contentTypeValue)
    {
        switch (contentTypeValue)
        {
            case ['t', 'e', 'x', 't', '/', 'x', 'm', 'l']:
            case ['t', 'e', 'x', 't', '/', 'c', 's', 's']:
            case ['t', 'e', 'x', 't', '/', 'c', 's', 'v']:
            case ['i', 'm', 'a', 'g', 'e', '/', 'g', 'i', 'f']:
            case ['i', 'm', 'a', 'g', 'e', '/', 'p', 'n', 'g']:
            case ['t', 'e', 'x', 't', '/', 'h', 't', 'm', 'l']:
            case ['t', 'e', 'x', 't', '/', 'p', 'l', 'a', 'i', 'n']:
            case ['i', 'm', 'a', 'g', 'e', '/', 'j', 'p', 'e', 'g']:
            case ['a', 'p', 'p', 'l', 'i', 'c', 'a', 't', 'i', 'o', 'n', '/', 'p', 'd', 'f']:
            case ['a', 'p', 'p', 'l', 'i', 'c', 'a', 't', 'i', 'o', 'n', '/', 'x', 'm', 'l']:
            case ['a', 'p', 'p', 'l', 'i', 'c', 'a', 't', 'i', 'o', 'n', '/', 'z', 'i', 'p']:
            case ['a', 'p', 'p', 'l', 'i', 'c', 'a', 't', 'i', 'o', 'n', '/', 'g', 'r', 'p', 'c']:
            case ['a', 'p', 'p', 'l', 'i', 'c', 'a', 't', 'i', 'o', 'n', '/', 'j', 's', 'o', 'n']:
            case ['m', 'u', 'l', 't', 'i', 'p', 'a', 'r', 't', '/', 'f', 'o', 'r', 'm', '-', 'd', 'a', 't', 'a']:
            case ['a', 'p', 'p', 'l', 'i', 'c', 'a', 't', 'i', 'o', 'n', '/', 'j', 'a', 'v', 'a', 's', 'c', 'r', 'i', 'p', 't']:
            case ['a', 'p', 'p', 'l', 'i', 'c', 'a', 't', 'i', 'o', 'n', '/', 'o', 'c', 't', 'e', 't', '-', 's', 't', 'r', 'e', 'a', 'm']:
            case ['t', 'e', 'x', 't', '/', 'h', 't', 'm', 'l', ';', ' ', 'c', 'h', 'a', 'r', 's', 'e', 't', '=', 'u', 't', 'f', '-', '8']:
            case ['t', 'e', 'x', 't', '/', 'p', 'l', 'a', 'i', 'n', ';', ' ', 'c', 'h', 'a', 'r', 's', 'e', 't', '=', 'u', 't', 'f', '-', '8']:
            case ['a', 'p', 'p', 'l', 'i', 'c', 'a', 't', 'i', 'o', 'n', '/', 'j', 's', 'o', 'n', ';', ' ', 'c', 'h', 'a', 'r', 's', 'e', 't', '=', 'u', 't', 'f', '-', '8']:
            case ['a', 'p', 'p', 'l', 'i', 'c', 'a', 't', 'i', 'o', 'n', '/', 'x', '-', 'w', 'w', 'w', '-', 'f', 'o', 'r', 'm', '-', 'u', 'r', 'l', 'e', 'n', 'c', 'o', 'd', 'e', 'd']:
                return true;
        }
        return false;
    }

The result isn't ideal, with multiple opportunities for improvement (e.g. emitting code in a way that the JIT's optimizations around unrolling will kick in, selecting better elements for comparison, etc.), but it's interesting that it shares at least some features in common with the proposed approach.

alrz · 2022-09-16T16:31:52Z

I think that'll emit better code having #29033 in place.

emitting code in a way that the JIT's optimizations around unrolling will kick in

What does that refer to?

stephentoub · 2022-09-16T17:59:39Z

What does that refer to?

Just that the JIT today will generate better code for e.g. value.StartsWith("abcdefgh", StringComparison.Ordinal) than it would for value.Length >= 8 && value[0] == 'a' && value[1] == 'b' && value[2] == 'c' && value[3] == 'd' && value[4] == 'e' && value[5] == 'f' && value[6] == 'g' && value[7] == 'h'.

jcouv · 2022-12-21T21:55:53Z

I have a PR out for this.
From my experimentation so far, the optimization seems beneficial (ie. significant speed improvement to normal case, little to no degradation in worst case, comparable IL size) when there are more than 7 cases (otherwise linear sequence of string comparisons is still faster) and the buckets that result from Length+char checks have 5 or fewer cases. We'll likely refine this criteria after testing on more platforms/environments.
From the examples collected so far (thanks @stephentoub) many switches in the BCL will benefit. But the ones that do case-insensitive comparison will still need to be hand-optimized.

iSazonov · 2022-12-22T07:49:29Z

I don't think this has been discussed. We are now doing optimization based on known static information from the switch. The compiler knows nothing about the peculiarities of the incoming data itself. If we could tell the compiler more information about the input data in some way, it could make the optimization better.
For example, we could specify a hot path. (If we know GET is 80% of HTTP requests.)

EgorBo · 2022-12-22T09:07:21Z

I don't think this has been discussed. We are now doing optimization based on known static information from the switch. The compiler knows nothing about the peculiarities of the incoming data itself. If we could tell the compiler more information about the input data in some way, it could make the optimization better. For example, we could specify a hot path. (If we know GET is 80% of HTTP requests.)

That's what PGO in JIT is supposed to handle

alrz · 2022-12-22T11:03:38Z

Why not using another character dispatch if the final switch is going to fallback to hashcode?

Alternatively, since the length is known could read N bytes as an IntM and switch on that? though I suppose endianness is going to be an issue.

jcouv · 2022-12-23T09:52:08Z

Why not using another character dispatch if the final switch is going to fallback to hashcode?

Experiments with trie design yielded larger IL and worse performance. The performance could probably be improved (avoid checking length multiple times for example), but the IL size probably not.

could read N bytes as an IntM

Yes, this was mentioned in the bug, but I have not experimented with that.

Closes #56374

EgorBo added Concept-API This issue involves adding, removing, clarification, or modification of an API. Feature Request labels Sep 14, 2021

dotnet-issue-labeler bot added the untriaged Issues and PRs which have not yet been triaged by a lead label Sep 14, 2021

jaredpar added Area-Compilers and removed untriaged Issues and PRs which have not yet been triaged by a lead labels Sep 14, 2021

jaredpar added this to the Compiler.Next milestone Sep 14, 2021

GGG-KILLER mentioned this issue Jan 22, 2022

Optimize the GetKeywordKind switch and add ReadOnlySpan<T> overload LorettaDevs/Loretta#51

Merged

EgorBo mentioned this issue Mar 13, 2022

Pattern matching ReadOnlySpan<char> is worse than String dotnet/runtime#66529

Closed

stephentoub mentioned this issue Sep 21, 2022

Use perfect-hash in HttpUriScheme.GetKnownMethod(string) instead of trie dotnet/aspnetcore#44096

Merged

stephentoub mentioned this issue Nov 1, 2022

[API Proposal]: Provide optimized read-only collections dotnet/runtime#67209

Closed

jaredpar assigned jcouv Nov 4, 2022

jcouv modified the milestones: Compiler.Next, 17.5 Nov 4, 2022

jcouv added this to Language in Compiler: Julien's umbrellas Nov 4, 2022

jcouv moved this from Language/design to Active/Investigating in Compiler: Julien's umbrellas Nov 23, 2022

This was referenced Dec 2, 2022

Suggest a possible optimization for static Color GetNamedColor(ReadOnlySpan<char> value). dotnet/Microsoft.Maui.Graphics#495

Open

Suggest a possible optimization for static Color GetNamedColor(ReadOnlySpan<char> value). dotnet/maui#11846

Open

jcouv mentioned this issue Dec 20, 2022

Length-based switch dispatch #66081

Merged

jcouv added the 4 - In Review A fix for the issue is submitted for review. label Dec 21, 2022

jaredpar modified the milestones: 17.5, 17.6 Jan 5, 2023

jcouv closed this as completed in #66081 Jan 28, 2023

jcouv added a commit that referenced this issue Jan 28, 2023

Length-based switch dispatch (#66081)

8aefb7c

Closes #56374

tats-u mentioned this issue Mar 3, 2024

[VB] String comparison with string constants can be optimized to either of String.Equals (String.op_Equality, op_Inequality), String.IsNullOrEmpty, or String.CompareOrdinal #71835

Closed

Performance: faster switch over string objects #56374

Performance: faster switch over string objects #56374

Comments

EgorBo commented Sep 14, 2021 • edited

Background and Motivation

Proposed approach

Optional optimizations

Compare 2/4 chars at once

Extend Roslyn's limitations on Jump-tables

Risks

Related discussion

dotnet-issue-labeler bot commented Sep 14, 2021

george-polevoy commented Sep 15, 2021 • edited

GGG-KILLER commented Oct 7, 2021

GGG-KILLER commented Oct 9, 2021 • edited

grendello commented May 19, 2022

CyrusNajmabadi commented May 19, 2022

grendello commented May 19, 2022

CyrusNajmabadi commented May 19, 2022 • edited

grendello commented May 19, 2022 • edited

EgorBo commented May 19, 2022

EgorBo commented May 19, 2022

CyrusNajmabadi commented May 19, 2022

CyrusNajmabadi commented May 19, 2022

CyrusNajmabadi commented May 19, 2022

EgorBo commented May 19, 2022 • edited

grendello commented May 19, 2022

grendello commented May 19, 2022

CyrusNajmabadi commented May 19, 2022

canton7 commented May 19, 2022

CyrusNajmabadi commented May 19, 2022

canton7 commented May 19, 2022 • edited

CyrusNajmabadi commented May 19, 2022 • edited

stephentoub commented May 19, 2022

grendello commented May 19, 2022

canton7 commented May 19, 2022 • edited

CyrusNajmabadi commented May 19, 2022

grendello commented May 19, 2022

GGG-KILLER commented May 19, 2022 • edited

stephentoub commented Sep 16, 2022

alrz commented Sep 16, 2022

stephentoub commented Sep 16, 2022

jcouv commented Dec 21, 2022

iSazonov commented Dec 22, 2022

EgorBo commented Dec 22, 2022

alrz commented Dec 22, 2022

jcouv commented Dec 23, 2022 • edited

EgorBo commented Sep 14, 2021 •

edited

george-polevoy commented Sep 15, 2021 •

edited

GGG-KILLER commented Oct 9, 2021 •

edited

CyrusNajmabadi commented May 19, 2022 •

edited

grendello commented May 19, 2022 •

edited

EgorBo commented May 19, 2022 •

edited

canton7 commented May 19, 2022 •

edited

CyrusNajmabadi commented May 19, 2022 •

edited

canton7 commented May 19, 2022 •

edited

GGG-KILLER commented May 19, 2022 •

edited

jcouv commented Dec 23, 2022 •

edited