Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a SearchValues ProbabilisticMap implementation that uses an ASCII fast path #89155

Merged
merged 3 commits into from Jul 19, 2023

Conversation

MihaZupan
Copy link
Member

@MihaZupan MihaZupan commented Jul 19, 2023

Related: #89140

This is an implementation that we would use when we have a mixed set of ASCII and non-ASCII characters.
It's similar to the existing ProbabilisticMap implementation, but with a fast path that only looks for the ASCII characters with a faster implementation first.

For IndexOfAny, the ReplaceLineEndings benchmark shows large improvements for ASCII inputs, and as NewLineFrequency increases, it highlights the worst-case per-call overhead the new fast path introduces for non-ASCII texts.

For the cases where the probabilistic map is not vectorized (IndexOfAnyExcept, LastIndexOfAny, LastIndexOfAnyExcept), this brings massive improvements with the worst-case of relatively minor regressions.
The -Except variants even more so as they otherwise fall back to O(n * m) implementations.

The throughput difference between the ASCII fast path and the vectorized probabilistic map is large, but it might shrink somewhat once we add AVX512 support to these types, as AVX512 can use a more efficient ProbMap approach (similar to ARM64).

ReplaceLineEndings benchmarks

This was done with a CR => CR replace, such that there is no allocation overhead, to highlight the SearchValues part of the cost.

Method Toolchain Length NewLineFrequency AsciiText Mean Error Ratio
ReplaceLineEndings main 10000 0 False 1,216.2 ns 4.36 ns 1.00
ReplaceLineEndings pr 10000 0 False 1,205.7 ns 3.34 ns 0.99
ReplaceLineEndings main 10000 0 True 1,995.5 ns 7.23 ns 1.00
ReplaceLineEndings pr 10000 0 True 338.4 ns 0.74 ns 0.17
ReplaceLineEndings main 10000 0.02 False 3,362.1 ns 19.01 ns 1.00
ReplaceLineEndings pr 10000 0.02 False 5,148.4 ns 27.47 ns 1.53
ReplaceLineEndings main 10000 0.02 True 4,655.6 ns 25.22 ns 1.00
ReplaceLineEndings pr 10000 0.02 True 1,939.8 ns 16.54 ns 0.42
ReplaceLineEndings main 10000 0.05 False 7,623.3 ns 29.90 ns 1.00
ReplaceLineEndings pr 10000 0.05 False 11,956.5 ns 63.51 ns 1.57
ReplaceLineEndings main 10000 0.05 True 8,863.8 ns 81.85 ns 1.00
ReplaceLineEndings pr 10000 0.05 True 5,025.0 ns 24.73 ns 0.57
ReplaceLineEndings main 10000 0.1 False 13,317.3 ns 86.18 ns 1.00
ReplaceLineEndings pr 10000 0.1 False 21,882.1 ns 288.06 ns 1.64
ReplaceLineEndings main 10000 0.1 True 16,764.9 ns 254.59 ns 1.00
ReplaceLineEndings pr 10000 0.1 True 9,018.0 ns 42.44 ns 0.54
IndexOfAnyExcept and ASCII early-match overhead benchmarks

This is for a set of only 6 values. The more values you use, the larger the factor would be for IndexOfAnyExcept.

Method Toolchain Length AsciiText Mean Error Ratio Code Size
IndexOfAnyExcept main 10000 True 17,968.8 ns 129.08 ns 1.00 430 B
IndexOfAnyExcept pr 10000 True 294.5 ns 1.88 ns 0.02 659 B

To highlight the difference when finding early matches with ASCII text:

Method Toolchain NewLineFrequency Source Replacement AsciiText Mean Ratio
ReplaceLineEndings main 1 CR CR True 126.38 us 1.00
ReplaceLineEndings pr 1 CR CR True 84.12 us 0.67

With all that in mind, do we still want to do this?

@ghost
Copy link

ghost commented Jul 19, 2023

Tagging subscribers to this area: @dotnet/area-system-buffers
See info in area-owners.md if you want to be subscribed.

Issue Details

Related: #89140

This is an implementation that we would use when we have a mixed set of ASCII and non-ASCII characters.
It's similar to the existing ProbabilisticMap implementation, but with a fast path that only looks for the ASCII characters with a faster implementation first.

For IndexOfAny, the ReplaceLineEndings benchmark shows large improvements for ASCII inputs, and as NewLineFrequency increases, it highlights the worst-case per-call overhead the new fast path introduces for non-ASCII texts.

For the cases where the probabilistic map is not vectorized (IndexOfAnyExcept, LastIndexOfAny, LastIndexOfAnyExcept), this brings massive improvements with the worst-case of relatively minor regressions.
The -Except variants even more so as they otherwise fall back to O(n * m) implementations.

The throughput difference between the ASCII fast path and the probabilistic map is large, but it might shrink somewhat once we add AVX512 support to these types, as AVX512 can use a more efficient ProbMap approach (similar to ARM64).

ReplaceLineEndings benchmarks
Method Toolchain Length NewLineFrequency AsciiText Mean Error Ratio
ReplaceLineEndings main 10000 0 False 1,216.2 ns 4.36 ns 1.00
ReplaceLineEndings pr 10000 0 False 1,205.7 ns 3.34 ns 0.99
ReplaceLineEndings main 10000 0 True 1,995.5 ns 7.23 ns 1.00
ReplaceLineEndings pr 10000 0 True 338.4 ns 0.74 ns 0.17
ReplaceLineEndings main 10000 0.02 False 3,362.1 ns 19.01 ns 1.00
ReplaceLineEndings pr 10000 0.02 False 5,148.4 ns 27.47 ns 1.53
ReplaceLineEndings main 10000 0.02 True 4,655.6 ns 25.22 ns 1.00
ReplaceLineEndings pr 10000 0.02 True 1,939.8 ns 16.54 ns 0.42
ReplaceLineEndings main 10000 0.05 False 7,623.3 ns 29.90 ns 1.00
ReplaceLineEndings pr 10000 0.05 False 11,956.5 ns 63.51 ns 1.57
ReplaceLineEndings main 10000 0.05 True 8,863.8 ns 81.85 ns 1.00
ReplaceLineEndings pr 10000 0.05 True 5,025.0 ns 24.73 ns 0.57
ReplaceLineEndings main 10000 0.1 False 13,317.3 ns 86.18 ns 1.00
ReplaceLineEndings pr 10000 0.1 False 21,882.1 ns 288.06 ns 1.64
ReplaceLineEndings main 10000 0.1 True 16,764.9 ns 254.59 ns 1.00
ReplaceLineEndings pr 10000 0.1 True 9,018.0 ns 42.44 ns 0.54
IndexOfAnyExcept and ASCII early-match overhead benchmarks

This is for a set of only 6 values. The more values you use, the larger the factor would be for IndexOfAnyExcept.

Method Toolchain Length AsciiText Mean Error Ratio Code Size
IndexOfAnyExcept main 10000 True 17,968.8 ns 129.08 ns 1.00 430 B
IndexOfAnyExcept pr 10000 True 294.5 ns 1.88 ns 0.02 659 B

To highlight the difference when finding early matches with ASCII text:

Method Toolchain NewLineFrequency Source Replacement AsciiText Mean Ratio
ReplaceLineEndings main 1 CR CR True 126.38 us 1.00
ReplaceLineEndings pr 1 CR CR True 84.12 us 0.67

With all that in mind, do we still want to do this?

Author: MihaZupan
Assignees: MihaZupan
Labels:

area-System.Buffers

Milestone: -

@gfoidl
Copy link
Member

gfoidl commented Jul 19, 2023

do we still want to do this?

I'd like to see this change be merged, as not all target-machines have AVX-512 (now).

@MihaZupan MihaZupan closed this Jul 19, 2023
@MihaZupan MihaZupan reopened this Jul 19, 2023
@stephentoub
Copy link
Member

What is the AsciiText=False case in the benchmark? Is that with the input being entirely non-ASCII so that every call for the ASCII search immediately fails and is pure overhead?

@MihaZupan
Copy link
Member Author

Yes, AsciiText=False is pure random data, so ~99% non-ASCII.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants