Skip to content

Performance: faster switch over string objects #56374

@EgorBo

Description

@EgorBo

Background and Motivation

Currently, when we use switch over string objects, Roslyn emits a hash-based chain of if-else comparisons, e.g.:

public bool IsKnownContentType(string contentTypeValue)
{
    switch (contentTypeValue)
    {
        case "text/xml":
        case "text/css":
        case "text/csv":
        case "image/gif":
        case "image/png":
        case "text/html":
        case "text/plain":
        case "image/jpeg":
        case "application/pdf":
        case "application/xml":
        case "application/zip":
        case "application/grpc":
        case "application/json":
        case "multipart/form-data":
        case "application/javascript":
        case "application/octet-stream":
        case "text/html; charset=utf-8":
        case "text/plain; charset=utf-8":
        case "application/json; charset=utf-8":
        case "application/x-www-form-urlencoded":
            return true;
    }
    return false;
}

Here we'll run an O(N) hash algorithm (it's not SIMDified for large strings) and do a chain of comparisons with the precomputed ones (with binary search):

int hash = ComputeStringHash(contentTypeValue);
if (hash == 0x34343443) // 0x34343443 is a hash of "text/xml"
   return hash == "text/xml";
else if (hash == 0x54545452) // 0x54545452 is a hash of "text/css"
   return hash == "text/css";
// etc

For sure it's better than doing string.Equals case by case, but it could be way faster.

Proposed approach

Roslyn should emit a jump-table based on input's Length and then compare chars at some precomputed positions (sort of Trie), e.g. for our IsKnownContentType it could be:

public bool IsKnownContentType_opt(string contentTypeValue)
{
    string? candidate = null;
    switch (contentTypeValue.Length)
    {
        case 8:
            switch (contentTypeValue[7]) // at 7th position all chars are unique:
            {
                case 'l': candidate = "text/xml"; break;
                case 's': candidate = "text/css"; break;
                case 'v': candidate = "text/csv"; break;
            }
            break;
        case 9:
            switch (contentTypeValue[6]) // at 6th position all chars are unique:
            {
                case 'g': candidate = "image/gif"; break; 
                case 'p': candidate = "image/png"; break;
                case 't': candidate = "text/html"; break;
            }
            break;
        case 10:
            switch (contentTypeValue[0])
            {
                case 't': candidate = "text/plain"; break; 
                case 'i': candidate = "image/jpeg"; break;
            }
            break;
        case 15:
            switch (contentTypeValue[12])
            {
                case 'p': candidate = "application/pdf"; break;
                case 'x': candidate = "application/xml"; break;
                case 'z': candidate = "application/zip"; break;
            }
            break;
        case 16:
            switch (contentTypeValue[12])
            {
                case 'g': candidate = "application/grpc"; break; 
                case 'j': candidate = "application/json"; break;
            }
            break;
        case 19: candidate = "multipart/form-data"; break;
        case 22: candidate = "application/javascript"; break;
        case 24:
            switch (contentTypeValue[0])
            {
                case 'a': candidate = "application/octet-stream"; break;
                case 't': candidate = "text/html; charset=utf-8"; break;
            }
            break;
        case 25: candidate = "text/plain; charset=utf-8"; break;
        case 31: candidate = "application/json; charset=utf-8"; break;
        case 33: candidate = "application/x-www-form-urlencoded"; break;
    }
    return candidate == contentTypeValue;
}

So, ideally we can find a matching candidate in just 1-2 jump instructions here.
I wrote a simple benchmark, here are the results:

|                 Method |     contentTypeValue |      Mean |     Error |    StdDev |
|----------------------- |--------------------- |----------:|----------:|----------:|
|     IsKnownContentType |     application/grpc |  8.607 ns | 0.0384 ns | 0.0359 ns |
| IsKnownContentType_opt |     application/grpc |  1.475 ns | 0.0029 ns | 0.0026 ns |

|     IsKnownContentType | appli(...)cript [22] | 12.829 ns | 0.0362 ns | 0.0302 ns |
| IsKnownContentType_opt | appli(...)cript [22] |  1.289 ns | 0.0016 ns | 0.0015 ns |

|     IsKnownContentType |     application/json |  8.152 ns | 0.0321 ns | 0.0285 ns |
| IsKnownContentType_opt |     application/json |  1.484 ns | 0.0035 ns | 0.0033 ns |

|     IsKnownContentType | appli(...)utf-8 [31] | 18.141 ns | 0.0161 ns | 0.0126 ns |
| IsKnownContentType_opt | appli(...)utf-8 [31] |  1.269 ns | 0.0020 ns | 0.0018 ns |

|     IsKnownContentType | appli(...)tream [24] | 14.009 ns | 0.0933 ns | 0.0779 ns |
| IsKnownContentType_opt | appli(...)tream [24] |  1.715 ns | 0.0031 ns | 0.0026 ns |

|     IsKnownContentType |      application/pdf |  7.774 ns | 0.0044 ns | 0.0035 ns |
| IsKnownContentType_opt |      application/pdf |  1.701 ns | 0.0174 ns | 0.0155 ns |

|     IsKnownContentType | appli(...)coded [33] | 19.900 ns | 0.0227 ns | 0.0212 ns |
| IsKnownContentType_opt | appli(...)coded [33] |  1.092 ns | 0.0028 ns | 0.0023 ns |

|     IsKnownContentType |      application/xml |  7.756 ns | 0.0235 ns | 0.0208 ns |
| IsKnownContentType_opt |      application/xml |  1.536 ns | 0.0028 ns | 0.0024 ns |

|     IsKnownContentType |      application/zip |  7.593 ns | 0.0107 ns | 0.0100 ns |
| IsKnownContentType_opt |      application/zip |  1.317 ns | 0.0077 ns | 0.0072 ns |

|     IsKnownContentType |            image/gif |  5.118 ns | 0.0048 ns | 0.0040 ns |
| IsKnownContentType_opt |            image/gif |  1.694 ns | 0.0036 ns | 0.0032 ns |

|     IsKnownContentType |           image/jpeg |  5.068 ns | 0.0086 ns | 0.0072 ns |
| IsKnownContentType_opt |           image/jpeg |  1.695 ns | 0.0047 ns | 0.0041 ns |

|     IsKnownContentType |            image/png |  4.478 ns | 0.0531 ns | 0.0497 ns |
| IsKnownContentType_opt |            image/png |  1.944 ns | 0.0087 ns | 0.0068 ns |

|     IsKnownContentType |  multipart/form-data | 10.519 ns | 0.0211 ns | 0.0187 ns |
| IsKnownContentType_opt |  multipart/form-data |  1.305 ns | 0.0114 ns | 0.0107 ns |

|     IsKnownContentType |             text/css |  4.295 ns | 0.0062 ns | 0.0052 ns |
| IsKnownContentType_opt |             text/css |  1.537 ns | 0.0039 ns | 0.0030 ns |

|     IsKnownContentType |             text/csv |  4.409 ns | 0.0434 ns | 0.0362 ns |
| IsKnownContentType_opt |             text/csv |  1.484 ns | 0.0035 ns | 0.0031 ns |

|     IsKnownContentType |            text/html |  4.402 ns | 0.0041 ns | 0.0038 ns |
| IsKnownContentType_opt |            text/html |  1.526 ns | 0.0030 ns | 0.0026 ns |

|     IsKnownContentType | text/(...)utf-8 [24] | 13.811 ns | 0.0159 ns | 0.0132 ns |
| IsKnownContentType_opt | text/(...)utf-8 [24] |  1.537 ns | 0.0036 ns | 0.0032 ns |

|     IsKnownContentType |           text/plain |  5.161 ns | 0.0207 ns | 0.0184 ns |
| IsKnownContentType_opt |           text/plain |  1.320 ns | 0.0068 ns | 0.0057 ns |

|     IsKnownContentType | text/(...)utf-8 [25] | 14.368 ns | 0.0210 ns | 0.0186 ns |
| IsKnownContentType_opt | text/(...)utf-8 [25] |  1.289 ns | 0.0050 ns | 0.0042 ns |

|     IsKnownContentType |             text/xml |  4.015 ns | 0.0069 ns | 0.0057 ns |
| IsKnownContentType_opt |             text/xml |  1.473 ns | 0.0022 ns | 0.0018 ns |

Optional optimizations

Compare 2/4 chars at once

In IsKnownContentType_opt I only compare single chars, but can check for free up to 4 chars at once (on 64bit) or 2 chars for AnyCPU. unaligned reads

Extend Roslyn's limitations on Jump-tables

The benchmark results ^ could be even more impressive if that switch over input's Length was emitted as single/solid jump-table by Roslyn and covered all the "missing" cases. It can be explained via this sharplab link, but it's a trade-off between IL size and perf.

Risks

The new implementation should always be faster but might affect IL size in some cases. However, in the case of my sample IsKnownContentType_opt is actually 25% smaller than the default one.

Related discussion

There were attempts to do it in the past, e.g. dotnet/csharplang#1881 (comment) and #43869
but I think if we won't do a full Trie and only try to do Length + single char (or 4 at once) and fallback to the default implementation if there are cases where we can't do single-char checks - it will be simple and always faster.

/cc @333fred @CyrusNajmabadi

Metadata

Metadata

Assignees

Labels

4 - In ReviewA fix for the issue is submitted for review.Area-CompilersConcept-APIThis issue involves adding, removing, clarification, or modification of an API.Feature Request

Type

No type

Projects

Status

Active/Investigating

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions