Skip to content

[fix](function) prevent count_substrings tail overmatch#63215

Open
officialasishkumar wants to merge 1 commit into
apache:masterfrom
officialasishkumar:fix/count-substrings-boundary
Open

[fix](function) prevent count_substrings tail overmatch#63215
officialasishkumar wants to merge 1 commit into
apache:masterfrom
officialasishkumar:fix/count-substrings-boundary

Conversation

@officialasishkumar
Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: close #62768

Related PR: None

Problem Summary:

count_substrings scanned candidate positions even when the remaining suffix was shorter than the search pattern. Because the implementation uses memcmp_small_allow_overflow15, that could count a tail position where the full pattern does not fit, for example count_substrings("ccc", "cc").

This PR limits comparisons to positions where the full pattern fits and keeps the existing not-found distance contract used by the caller.

Release note

Fix count_substrings tail-boundary matching for non-overlapping substring counts.

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes. count_substrings no longer counts a match when the full pattern does not fit in the remaining suffix.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug](function) count_substrings returns wrong result for overlapping pattern boundary

2 participants