Skip to content

branch-4.1: [fix](search) Fix slash character in search query_string terms #61599#61619

Open
github-actions[bot] wants to merge 1 commit intobranch-4.1from
auto-pick-61599-branch-4.1
Open

branch-4.1: [fix](search) Fix slash character in search query_string terms #61599#61619
github-actions[bot] wants to merge 1 commit intobranch-4.1from
auto-pick-61599-branch-4.1

Conversation

@github-actions
Copy link
Contributor

Cherry-picked from #61599

@github-actions github-actions bot requested a review from yiguolei as a code owner March 23, 2026 06:54
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@dataroaring dataroaring reopened this Mar 23, 2026
@hello-stephen
Copy link
Contributor

run buildall

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage `` 🎉
Increment coverage report
Complete coverage report

### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

The ANTLR lexer in the search() DSL parser excluded `/` from
`TERM_CHAR`, causing terms like `AC/DC` to be incorrectly tokenized. The
slash was silently skipped by ANTLR's default error recovery, splitting
`AC/DC` into two separate terms `AC` and `DC` instead of treating it as
a single term.

This caused inconsistent behavior compared to Elasticsearch's
query_string parsing, where `AC\/DC` (escaped slash) is handled as a
single analyzed term.

**Fix**: Add `/` to the `TERM_CHAR` fragment in `SearchLexer.g4`. This
allows `/` to appear within terms (e.g., `AC/DC` -> single term) while
regex patterns like `/[a-z]+/` still work correctly since `/` remains
excluded from `TERM_START_CHAR`.
@yiguolei yiguolei force-pushed the auto-pick-61599-branch-4.1 branch from 6085f93 to 12810ef Compare March 24, 2026 01:22
@yiguolei
Copy link
Contributor

run buildall

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants