Break out of quantification loop if there is no forward progress #560

rctcwyvrn · 2022-07-07T20:34:39Z

If we have an inner node of a quantification that doesn't progress our position in the input, we simply loop forever #558

The cases where this can happen range from nefarious (purposefully adding groups around instructions that cannot normally be quantified) to cases that are fairly easy to accidentally hit by making a typo (like the original case in the issue where there was an empty alternation case).

The fix is to add a check to quantification to ensure that our position has progressed, however this comes at a performance cost so this PR also adds an optimization to skip these checks if we can ensure that the inner node will have forward progress

Resolves #558 and resolves #542
rdar://96461197

rctcwyvrn · 2022-07-07T20:42:05Z

Additional edge cases to consider

(?:\d|(?i)){3}a a non-progressing adjustment to matching options. The adjusted matching options stay within that scope so it shouldn't affect the outside, we don't need to match it a specific number of times
A non-progressing custom consumer/matcher that contains state and expects to be called a certain number of times. Is this part of the API that we provide? Should we only break out of the quantification if it's both unlimited and not progressing? Actually I don't see a reason why we shouldn't do that, I'm gonna add it

hamishknight · 2022-07-07T22:22:50Z

Nice! Does this also resolve #542?

rctcwyvrn · 2022-07-07T22:25:51Z

Hmmmm it doesn't right now because we assume quantifications will result in forward progress if the inner node has forward progress, but I guess that isn't true if the quantification has a min-trips of 0. I'll add that case

Sources/_StringProcessing/ByteCodeGen.swift

milseman · 2022-07-11T14:19:17Z

Sources/_StringProcessing/Engine/Registers.swift

@@ -120,11 +130,15 @@ extension Processor.Registers {

    self.values = Array(
      repeating: SentinelValue(), count: info.values)
+    self.positions = Array(
+      repeating: Processor.Registers.sentinelIndex,
+      count: info.positions)


Future: Why the start index instead of the provided sentinel? Should or could we construct an invalid index and at least assert that we're never loading that?

SentinelValue only works on the value registers since they are [Any]. For now we only emit position registers in this one case, if/when we have more complicated cases in the future we'll want some kinda validation like that

milseman · 2022-07-11T14:21:49Z

Tests/RegexTests/CompileTests.swift

@@ -208,4 +208,40 @@ extension RegexTests {
    expectProgram(for: "[abc]", semanticLevel: .unicodeScalar, doesNotContain: [.matchBitset])
    expectProgram(for: "[abc]", semanticLevel: .unicodeScalar, contains: [.consumeBy])
  }
+
+  func testQuantificationForwardProgressCompile() {


I'm ok with these for now, but do note that the kinds of tests can become extremely fragile and are likely to get dropped if control flow is ever overhauled. Make sure anything important is represented by both functional tests and more precise unit testing.

E.g. you can test the guaranteesForwardProgress query on regexes, and that wouldn't be fragile.

rctcwyvrn · 2022-07-11T20:08:43Z

@swift-ci test

…le#560) This fixes infinite loops when we loop over an internal node that does not have any forward progress. Also included is an optimization to only emit the check/break instructions if we have a case that might result in an infinite loop (possibly non-progressing inner node + unlimited quantification)

Break out of quantification loop if there is no forward progress

8584cf4

rctcwyvrn requested a review from milseman July 7, 2022 20:34

Only emit position checking if we have an unbounded quantification

799ee98

Quantification does not guarantee forward progress

df8f2a6

milseman approved these changes Jul 11, 2022

View reviewed changes

Cleanup emitting position checking

84e30b5

rctcwyvrn merged commit 33acdeb into apple:main Jul 11, 2022

rctcwyvrn mentioned this pull request Jul 12, 2022

[5.7] Fix infinite loop and scalar matching in grapheme mode + scalar matching optimizations #569

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Break out of quantification loop if there is no forward progress #560

Break out of quantification loop if there is no forward progress #560

rctcwyvrn commented Jul 7, 2022 •

edited by hamishknight

Loading

rctcwyvrn commented Jul 7, 2022 •

edited

Loading

hamishknight commented Jul 7, 2022

rctcwyvrn commented Jul 7, 2022

milseman Jul 11, 2022

rctcwyvrn Jul 11, 2022

milseman Jul 11, 2022

rctcwyvrn commented Jul 11, 2022

Break out of quantification loop if there is no forward progress #560

Break out of quantification loop if there is no forward progress #560

Conversation

rctcwyvrn commented Jul 7, 2022 • edited by hamishknight Loading

rctcwyvrn commented Jul 7, 2022 • edited Loading

hamishknight commented Jul 7, 2022

rctcwyvrn commented Jul 7, 2022

milseman Jul 11, 2022

Choose a reason for hiding this comment

rctcwyvrn Jul 11, 2022

Choose a reason for hiding this comment

milseman Jul 11, 2022

Choose a reason for hiding this comment

rctcwyvrn commented Jul 11, 2022

rctcwyvrn commented Jul 7, 2022 •

edited by hamishknight

Loading

rctcwyvrn commented Jul 7, 2022 •

edited

Loading