Skip to content

Add SEMANTIC_MATCH filter predicate for hidden semantic search layer#18191

Open
xiangfu0 wants to merge 1 commit intoapache:masterfrom
xiangfu0:feature/semantic-match-filter-kind
Open

Add SEMANTIC_MATCH filter predicate for hidden semantic search layer#18191
xiangfu0 wants to merge 1 commit intoapache:masterfrom
xiangfu0:feature/semantic-match-filter-kind

Conversation

@xiangfu0
Copy link
Copy Markdown
Contributor

Summary

  • Registers SEMANTIC_MATCH(column, 'query text'[, topK]) as a Calcite SQL function in PinotOperatorTable
  • Adds SEMANTIC_MATCH to FilterKind enum so it flows through the standard predicate pipeline
  • Adds argument validation in CalciteSqlParser.validateFilter(): requires 2–3 args, column identifier as first arg, non-null string literal as second
  • Adds a pass-through case in PredicateComparisonRewriter so the predicate is preserved for downstream rewriters

The intent is to support a hidden semantic search layer: a SemanticSearchQueryRewriter (in the StarTree extension) intercepts SEMANTIC_MATCH at query time, computes the embedding vector for the query text, then rewrites the predicate to VECTOR_SIMILARITY before execution. This PR provides the OSS hook that makes that rewrite chain possible.

Test plan

  • Existing unit tests pass: CalciteSqlParser, PredicateComparisonRewriter, FilterKind usages
  • Manual: verify SELECT ... FROM t WHERE SEMANTIC_MATCH(embedding_col, 'some query', 10) parses without error and round-trips through the rewriter chain unchanged
  • Verify that an invalid call (wrong arg count, non-column first arg) throws IllegalStateException at parse time

🤖 Generated with Claude Code

Registers SEMANTIC_MATCH(column, queryText, topK) as a Calcite SQL function
and FilterKind so it can be parsed and passed through the query rewriter chain.
The SemanticSearchQueryRewriter (in startree-pinot) translates it to
VECTOR_SIMILARITY at query time after computing the embedding vector.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 14, 2026

Codecov Report

❌ Patch coverage is 18.75000% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 34.97%. Comparing base (397704e) to head (f818df2).
⚠️ Report is 12 commits behind head on master.

Files with missing lines Patch % Lines
...org/apache/pinot/sql/parsers/CalciteSqlParser.java 0.00% 9 Missing and 1 partial ⚠️
.../parsers/rewriter/PredicateComparisonRewriter.java 0.00% 2 Missing ⚠️
...ache/pinot/calcite/sql/fun/PinotOperatorTable.java 50.00% 1 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (397704e) and HEAD (f818df2). Click for more details.

HEAD has 8 uploads less than BASE
Flag BASE (397704e) HEAD (f818df2)
java-21 5 4
unittests1 2 0
unittests 4 2
temurin 10 8
java-11 5 4
Additional details and impacted files
@@              Coverage Diff              @@
##             master   #18191       +/-   ##
=============================================
- Coverage     63.13%   34.97%   -28.16%     
+ Complexity     1616      789      -827     
=============================================
  Files          3213     3229       +16     
  Lines        195730   196720      +990     
  Branches      30240    30414      +174     
=============================================
- Hits         123569    68800    -54769     
- Misses        62288   121855    +59567     
+ Partials       9873     6065     -3808     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 34.95% <18.75%> (-28.16%) ⬇️
java-21 34.96% <18.75%> (-28.13%) ⬇️
temurin 34.97% <18.75%> (-28.16%) ⬇️
unittests 34.97% <18.75%> (-28.16%) ⬇️
unittests1 ?
unittests2 34.97% <18.75%> (+0.18%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants