Skip to content

branch-4.1: [feature](score) support BM25 scoring in inverted index query_v2 #59847#61472

Merged
yiguolei merged 4 commits intoapache:branch-4.1from
airborne12:pick/branch-4.1/59847
Mar 19, 2026
Merged

branch-4.1: [feature](score) support BM25 scoring in inverted index query_v2 #59847#61472
yiguolei merged 4 commits intoapache:branch-4.1from
airborne12:pick/branch-4.1/59847

Conversation

@airborne12
Copy link
Member

Proposed changes

Cherry-pick #59847 to branch-4.1.

Original PR: #59847

Further comments

Resolved cherry-pick conflicts (minor differences from branch-4.0 pick):

  • Added IndexQueryContextPtr member and set/get_index_query_context to IndexExecContext
  • Extended evaluate_inverted_index_with_search_param with index_query_context parameter
  • Added enable_inverted_index_wand_query to thrift and SessionVariable (field 203)
  • Kept existing branch-4.1 fields (single_backend_query field 202) intact

@airborne12 airborne12 requested a review from yiguolei as a code owner March 18, 2026 08:03
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@airborne12
Copy link
Member Author

run buildall

@airborne12 airborne12 force-pushed the pick/branch-4.1/59847 branch from 4a5d6d0 to 094e120 Compare March 18, 2026 08:11
@airborne12
Copy link
Member Author

run buildall

@doris-robot
Copy link

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.15% (1788/2259)
Line Coverage 64.42% (31942/49583)
Region Coverage 65.22% (15978/24499)
Branch Coverage 55.80% (8502/15236)

zzzxl1993 and others added 2 commits March 19, 2026 14:51
…M25 scoring

1. Update contrib/clucene from 8b57674 to c51b5cc to include:
   - ac9475a: block max WAND algorithm with BM25 similarity
   - aef5c9c: Fix GCC -Werror compilation errors
   - c51b5cc: Fix GCC -Werror=overloaded-virtual in FieldForMerge
   Required for readBlock, getMaxBlockFreq, getMaxBlockNorm,
   getLastDocInBlock APIs used by segment_postings.h

2. Fix doc_set_collector: When context.readers is empty (e.g., AllQuery
   for MATCH_ALL_DOCS), for_each_index_segment returns immediately
   without iterating. Added fallback to create scorer directly from
   weight, which allows AllScorer to work without an IndexReader.
@airborne12 airborne12 force-pushed the pick/branch-4.1/59847 branch from 46d977d to 6cb69d6 Compare March 19, 2026 06:51
@airborne12
Copy link
Member Author

run buildall

@hello-stephen
Copy link
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.06% (1786/2259)
Line Coverage 64.39% (31927/49583)
Region Coverage 65.24% (15982/24499)
Branch Coverage 55.83% (8506/15236)

@airborne12
Copy link
Member Author

run buildall

@hello-stephen
Copy link
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 78.54% (1786/2274)
Line Coverage 64.31% (31924/49641)
Region Coverage 65.14% (15979/24532)
Branch Coverage 55.72% (8502/15258)

The branch-4.0 stub (static, returns false) conflicts with the real
implementation in the anonymous namespace added by the BM25 scoring PR.
On branch-4.1 the real implementation is correct, so remove the stub
to fix the ambiguous call compilation error.
@airborne12
Copy link
Member Author

run buildall

@doris-robot
Copy link

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 78.54% (1786/2274)
Line Coverage 64.31% (31924/49641)
Region Coverage 65.11% (15972/24532)
Branch Coverage 55.72% (8502/15258)

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 33.33% (2/6) 🎉
Increment coverage report
Complete coverage report

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 24.63% (183/743) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.77% (19412/36787)
Line Coverage 36.20% (181479/501347)
Region Coverage 32.65% (140163/429298)
Branch Coverage 33.65% (61035/181393)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 33.56% (251/748) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.35% (25690/36007)
Line Coverage 54.24% (271423/500406)
Region Coverage 51.48% (223169/433535)
Branch Coverage 53.07% (96594/182019)

@yiguolei yiguolei merged commit 75bb9bd into apache:branch-4.1 Mar 19, 2026
24 of 28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants