-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NonBacktracking locking fixes and cleanup #71234
NonBacktracking locking fixes and cleanup #71234
Conversation
Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions Issue DetailsThis PR addresses #70753.
In addition to these changes, the following cleanup is done:
|
58adf2e
to
d947ae4
Compare
Removed builder reference from SymbolicRegexNode instances; builder now has to be passed in. Since the builder is not thread safe this clarifies the locking required in the matcher when using it. Moved matching specific state from the builder to the matcher. This includes state and transition arrays. Simplify character kind code by eliminating duplication of logic.
d947ae4
to
9fa528f
Compare
Code should be in a good state now. I've checked the performance and there should be no significant performance regressions (I'm seeing minor improvements rather). The char kind logic changes required some performance work, which motivated the |
...aries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/CharKind.cs
Outdated
Show resolved
Hide resolved
...ularExpressions/src/System/Text/RegularExpressions/Symbolic/SymbolicRegexMatcher.Automata.cs
Outdated
Show resolved
Hide resolved
...ularExpressions/src/System/Text/RegularExpressions/Symbolic/SymbolicRegexMatcher.Automata.cs
Outdated
Show resolved
Hide resolved
...aries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/CharKind.cs
Outdated
Show resolved
Hide resolved
...stem.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/DfaMatchingState.cs
Outdated
Show resolved
Hide resolved
....Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/SymbolicRegexMatcher.cs
Outdated
Show resolved
Hide resolved
....Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/SymbolicRegexMatcher.cs
Show resolved
Hide resolved
....Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/SymbolicRegexMatcher.cs
Show resolved
Hide resolved
...tem.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/SymbolicRegexNode.cs
Show resolved
Hide resolved
...tem.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/SymbolicRegexNode.cs
Show resolved
Hide resolved
DfaMatchingState is now just MatchingState
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Thanks!
Fixes issue (#71808) where deeply nested structures caused the start set computation to overflow the stack. Re-introduces bottom-up computation of start sets that (#71234) had reworked. As an optimization, the Singleton node's set field is reused as the start set field, since for Singleton nodes the two concepts coincide and other node types do not need the set.
This PR addresses #70753.
SymbolicRegexBuilder
has mutable state that has to be protected by locks when used concurrently fromSymbolicRegexMatcher
. The builder has some locking, but a bug had crept in. This PR introduces the following to make the code more clearly correct:SymbolicRegexNode
instances and memoization caches for functions inSymbolicRegexNode
._builder
variable fromSymbolicRegexNode
to avoid accidental concurrent use through concurrent calls to functions in the nodes. The builder now has to be passed into all functions that need it, which makes it obvious which functions are thread safe._builder
in the matcher as well as state caches and transition arrays (as necessary: transition arrays are still read without locking orVolatile.Read
for performance, see Document the memory model guaranteed by dotnet/runtime runtimes #63474 and Unnecessary LdelemaRef call not elided with Volatile.Read #65789).ConcurrentArrayResize
function to replaceArray.Resize
usage with the difference that the array is copied to a larger one first and then published withVolatile.Write
. This addresses a concern that other threads might see an inconsistent copy when reading from the arrays being resized without holding a lock. I think the code was correct without this as 1. all of these arrays were ones with atomic reads/writes and 2. any read that saw a zero value would acquire the lock and re-read the array after. However, the code could easily break if any of the arrays change, e.g., to a struct element type and generally it seems better to not cause other threads to spuriously have to acquire a lock.Then the actual issue in #70753 is addressed by asserting that the lock is held in
GetOrCreateState
and calling aGetOrCreateStateUnsafe
variant in the constructor, where locking is not required yet.In addition to these changes, the following cleanup is done (mostly because I just happened to be looking at the code again):
\n
at the very end of input for the \Z anchor, and 3. giving theBeginningEnd
kind to indices outside theinput
span. All of the logic is in the mainSymbolicRegexMatcher
code file. The logic for mapping minterms back to character kinds inDfaMatchingState
is now gone.List<T>
does. The reason we're not usingList<T>
itself is concerns with concurrent usage and ability to resize cheaply.One potentially performance-impactful change (among a few others) is that the "ascii characters to character kinds" array is now gone, in favor of a "minterm IDs to character kinds" array. I'll be evaluating/debugging any performance regressions before merging.