-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Description
Background and Motivation
Currently, when we use switch over string objects, Roslyn emits a hash-based chain of if-else comparisons, e.g.:
public bool IsKnownContentType(string contentTypeValue)
{
switch (contentTypeValue)
{
case "text/xml":
case "text/css":
case "text/csv":
case "image/gif":
case "image/png":
case "text/html":
case "text/plain":
case "image/jpeg":
case "application/pdf":
case "application/xml":
case "application/zip":
case "application/grpc":
case "application/json":
case "multipart/form-data":
case "application/javascript":
case "application/octet-stream":
case "text/html; charset=utf-8":
case "text/plain; charset=utf-8":
case "application/json; charset=utf-8":
case "application/x-www-form-urlencoded":
return true;
}
return false;
}Here we'll run an O(N) hash algorithm (it's not SIMDified for large strings) and do a chain of comparisons with the precomputed ones (with binary search):
int hash = ComputeStringHash(contentTypeValue);
if (hash == 0x34343443) // 0x34343443 is a hash of "text/xml"
return hash == "text/xml";
else if (hash == 0x54545452) // 0x54545452 is a hash of "text/css"
return hash == "text/css";
// etcFor sure it's better than doing string.Equals case by case, but it could be way faster.
Proposed approach
Roslyn should emit a jump-table based on input's Length and then compare chars at some precomputed positions (sort of Trie), e.g. for our IsKnownContentType it could be:
public bool IsKnownContentType_opt(string contentTypeValue)
{
string? candidate = null;
switch (contentTypeValue.Length)
{
case 8:
switch (contentTypeValue[7]) // at 7th position all chars are unique:
{
case 'l': candidate = "text/xml"; break;
case 's': candidate = "text/css"; break;
case 'v': candidate = "text/csv"; break;
}
break;
case 9:
switch (contentTypeValue[6]) // at 6th position all chars are unique:
{
case 'g': candidate = "image/gif"; break;
case 'p': candidate = "image/png"; break;
case 't': candidate = "text/html"; break;
}
break;
case 10:
switch (contentTypeValue[0])
{
case 't': candidate = "text/plain"; break;
case 'i': candidate = "image/jpeg"; break;
}
break;
case 15:
switch (contentTypeValue[12])
{
case 'p': candidate = "application/pdf"; break;
case 'x': candidate = "application/xml"; break;
case 'z': candidate = "application/zip"; break;
}
break;
case 16:
switch (contentTypeValue[12])
{
case 'g': candidate = "application/grpc"; break;
case 'j': candidate = "application/json"; break;
}
break;
case 19: candidate = "multipart/form-data"; break;
case 22: candidate = "application/javascript"; break;
case 24:
switch (contentTypeValue[0])
{
case 'a': candidate = "application/octet-stream"; break;
case 't': candidate = "text/html; charset=utf-8"; break;
}
break;
case 25: candidate = "text/plain; charset=utf-8"; break;
case 31: candidate = "application/json; charset=utf-8"; break;
case 33: candidate = "application/x-www-form-urlencoded"; break;
}
return candidate == contentTypeValue;
}So, ideally we can find a matching candidate in just 1-2 jump instructions here.
I wrote a simple benchmark, here are the results:
| Method | contentTypeValue | Mean | Error | StdDev |
|----------------------- |--------------------- |----------:|----------:|----------:|
| IsKnownContentType | application/grpc | 8.607 ns | 0.0384 ns | 0.0359 ns |
| IsKnownContentType_opt | application/grpc | 1.475 ns | 0.0029 ns | 0.0026 ns |
| IsKnownContentType | appli(...)cript [22] | 12.829 ns | 0.0362 ns | 0.0302 ns |
| IsKnownContentType_opt | appli(...)cript [22] | 1.289 ns | 0.0016 ns | 0.0015 ns |
| IsKnownContentType | application/json | 8.152 ns | 0.0321 ns | 0.0285 ns |
| IsKnownContentType_opt | application/json | 1.484 ns | 0.0035 ns | 0.0033 ns |
| IsKnownContentType | appli(...)utf-8 [31] | 18.141 ns | 0.0161 ns | 0.0126 ns |
| IsKnownContentType_opt | appli(...)utf-8 [31] | 1.269 ns | 0.0020 ns | 0.0018 ns |
| IsKnownContentType | appli(...)tream [24] | 14.009 ns | 0.0933 ns | 0.0779 ns |
| IsKnownContentType_opt | appli(...)tream [24] | 1.715 ns | 0.0031 ns | 0.0026 ns |
| IsKnownContentType | application/pdf | 7.774 ns | 0.0044 ns | 0.0035 ns |
| IsKnownContentType_opt | application/pdf | 1.701 ns | 0.0174 ns | 0.0155 ns |
| IsKnownContentType | appli(...)coded [33] | 19.900 ns | 0.0227 ns | 0.0212 ns |
| IsKnownContentType_opt | appli(...)coded [33] | 1.092 ns | 0.0028 ns | 0.0023 ns |
| IsKnownContentType | application/xml | 7.756 ns | 0.0235 ns | 0.0208 ns |
| IsKnownContentType_opt | application/xml | 1.536 ns | 0.0028 ns | 0.0024 ns |
| IsKnownContentType | application/zip | 7.593 ns | 0.0107 ns | 0.0100 ns |
| IsKnownContentType_opt | application/zip | 1.317 ns | 0.0077 ns | 0.0072 ns |
| IsKnownContentType | image/gif | 5.118 ns | 0.0048 ns | 0.0040 ns |
| IsKnownContentType_opt | image/gif | 1.694 ns | 0.0036 ns | 0.0032 ns |
| IsKnownContentType | image/jpeg | 5.068 ns | 0.0086 ns | 0.0072 ns |
| IsKnownContentType_opt | image/jpeg | 1.695 ns | 0.0047 ns | 0.0041 ns |
| IsKnownContentType | image/png | 4.478 ns | 0.0531 ns | 0.0497 ns |
| IsKnownContentType_opt | image/png | 1.944 ns | 0.0087 ns | 0.0068 ns |
| IsKnownContentType | multipart/form-data | 10.519 ns | 0.0211 ns | 0.0187 ns |
| IsKnownContentType_opt | multipart/form-data | 1.305 ns | 0.0114 ns | 0.0107 ns |
| IsKnownContentType | text/css | 4.295 ns | 0.0062 ns | 0.0052 ns |
| IsKnownContentType_opt | text/css | 1.537 ns | 0.0039 ns | 0.0030 ns |
| IsKnownContentType | text/csv | 4.409 ns | 0.0434 ns | 0.0362 ns |
| IsKnownContentType_opt | text/csv | 1.484 ns | 0.0035 ns | 0.0031 ns |
| IsKnownContentType | text/html | 4.402 ns | 0.0041 ns | 0.0038 ns |
| IsKnownContentType_opt | text/html | 1.526 ns | 0.0030 ns | 0.0026 ns |
| IsKnownContentType | text/(...)utf-8 [24] | 13.811 ns | 0.0159 ns | 0.0132 ns |
| IsKnownContentType_opt | text/(...)utf-8 [24] | 1.537 ns | 0.0036 ns | 0.0032 ns |
| IsKnownContentType | text/plain | 5.161 ns | 0.0207 ns | 0.0184 ns |
| IsKnownContentType_opt | text/plain | 1.320 ns | 0.0068 ns | 0.0057 ns |
| IsKnownContentType | text/(...)utf-8 [25] | 14.368 ns | 0.0210 ns | 0.0186 ns |
| IsKnownContentType_opt | text/(...)utf-8 [25] | 1.289 ns | 0.0050 ns | 0.0042 ns |
| IsKnownContentType | text/xml | 4.015 ns | 0.0069 ns | 0.0057 ns |
| IsKnownContentType_opt | text/xml | 1.473 ns | 0.0022 ns | 0.0018 ns |
Optional optimizations
Compare 2/4 chars at once
In unaligned readsIsKnownContentType_opt I only compare single chars, but can check for free up to 4 chars at once (on 64bit) or 2 chars for AnyCPU.
Extend Roslyn's limitations on Jump-tables
The benchmark results ^ could be even more impressive if that switch over input's Length was emitted as single/solid jump-table by Roslyn and covered all the "missing" cases. It can be explained via this sharplab link, but it's a trade-off between IL size and perf.
Risks
The new implementation should always be faster but might affect IL size in some cases. However, in the case of my sample IsKnownContentType_opt is actually 25% smaller than the default one.
Related discussion
There were attempts to do it in the past, e.g. dotnet/csharplang#1881 (comment) and #43869
but I think if we won't do a full Trie and only try to do Length + single char (or 4 at once) and fallback to the default implementation if there are cases where we can't do single-char checks - it will be simple and always faster.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status