New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix lazy loop IndexOf optimization #68400
Conversation
This optimization wasn't kicking in for the most desirable cases due to a flaw in the logic. The optimization is meant to use IndexOf{Any} in a situation like `<[^>]*?>` where a lazy loop is consuming input until it sees some character. In this example, we'd want to `IndexOf('>')`, but the optimization wasn't kicking in because it saw that the not'd character matched the subsequent literal, and gave up. In such a case, we don't actually want to give up, we just can't short-circuit the whole match if we find the loop character (if they don't overlap, then finding the loop character means the loop ends before getting to something that could match the rest of the pattern and can thus immediately fail).
Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions Issue DetailsThis optimization wasn't kicking in for the most desirable cases due to a flaw in the logic. The optimization is meant to use IndexOf{Any} in a situation like
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know we still have #63300 open and I plan to work on it soon, but would it be worth adding some tests on this PR to ensure some pattern examples are using IndexOf as expected? If you don't want to do that now, let's just add a note on that issue so that I remember to add a test case for this PR along the rest of the tests.
Will do. Thanks. |
Possible improvements from this change: |
This optimization wasn't kicking in for the most desirable cases due to a flaw in the logic. The optimization is meant to use IndexOf{Any} in a situation like
<[^>]*?>
where a lazy loop is consuming input until it sees some character. In this example, we'd want toIndexOf('>')
, but the optimization wasn't kicking in because it saw that the not'd character matched the subsequent literal, and gave up. In such a case, we don't actually want to give up, we just can't short-circuit the whole match if we find the loop character (if they don't overlap, then finding the loop character means the loop ends before getting to something that could match the rest of the pattern and can thus immediately fail).