Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid bloom filter checks for IndexOfAnyExcept in ProbabilisticMap #85203

Merged
merged 1 commit into from Jun 5, 2023

Conversation

MihaZupan
Copy link
Member

The point of ProbabilisticMap is to avoid doing expensive values.Contains(char) checks for every input character by applying a bloom filter first. This works well for IndexOfAny, but with IndexOfAnyExcept we still have to check each character since we don't have an "inverse bloom filter".

For an input value that isn't in the set, the bloom filter will save us from doing the verification step, but that will by definition happen only once per IndexOfAnyExcept call. The overhead of checking it for all the characters that are in the set is wasted.

This PR avoids doing those bloom filter checks for IndexOfAnyExcept paths.

Method Toolchain Length Mean Error Ratio
IndexOfAnyExcept8 main 10000 31.24 us 0.138 us 1.00
IndexOfAnyExcept8 pr 10000 24.04 us 0.076 us 0.77

In practice IndexOfAnyExcept in IndexOfAnyCharValuesProbabilistic now isn't doing anything interesting with the ProbabilisticMap, but that's just a convenient place to share some of the logic that ProbabilisticMap needs anyway.

@MihaZupan MihaZupan added this to the 8.0.0 milestone Apr 22, 2023
@MihaZupan MihaZupan self-assigned this Apr 22, 2023
@ghost
Copy link

ghost commented Apr 22, 2023

Tagging subscribers to this area: @dotnet/area-system-buffers
See info in area-owners.md if you want to be subscribed.

Issue Details

The point of ProbabilisticMap is to avoid doing expensive values.Contains(char) checks for every input character by applying a bloom filter first. This works well for IndexOfAny, but with IndexOfAnyExcept we still have to check each character since we don't have an "inverse bloom filter".

For an input value that isn't in the set, the bloom filter will save us from doing the verification step, but that will by definition happen only once per IndexOfAnyExcept call. The overhead of checking it for all the characters that are in the set is wasted.

This PR avoids doing those bloom filter checks for IndexOfAnyExcept paths.

Method Toolchain Length Mean Error Ratio
IndexOfAnyExcept8 main 10000 31.24 us 0.138 us 1.00
IndexOfAnyExcept8 pr 10000 24.04 us 0.076 us 0.77

In practice IndexOfAnyExcept in IndexOfAnyCharValuesProbabilistic now isn't doing anything interesting with the ProbabilisticMap, but that's just a convenient place to share some of the logic that ProbabilisticMap needs anyway.

Author: MihaZupan
Assignees: MihaZupan
Labels:

area-System.Buffers

Milestone: 8.0.0

@MihaZupan MihaZupan force-pushed the indexofanyvalues-prob-except branch from 3d1f79d to d590107 Compare May 3, 2023 23:47
@stephentoub stephentoub merged commit be16af7 into dotnet:main Jun 5, 2023
169 checks passed
@dotnet dotnet locked as resolved and limited conversation to collaborators Jul 5, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants