Skip to content

feat(operator): Add case sensitivity to keyword search Operator #3510

Open
SarahAsad23 wants to merge 4 commits into
mainfrom
sarah-keyword-search-case-sensitive
Open

feat(operator): Add case sensitivity to keyword search Operator #3510
SarahAsad23 wants to merge 4 commits into
mainfrom
sarah-keyword-search-case-sensitive

Conversation

@SarahAsad23
Copy link
Copy Markdown
Contributor

@SarahAsad23 SarahAsad23 commented Jun 28, 2025

This PR adds an option for case sensitivity to the keyword search operator. Users can now use a checkbox to specify whether their search should be case sensitive or case insensitive.

This functionality is enabled through the addition of a CaseSensitiveAnalyzer that extends the base Lucene Analyzer for case sensitive searches, while the original StandardAnalyzer is used for case insensitive searches.

Examples:
caseSensitive
CaseInsensitive1
CaseInsensitive2

Dataset for Testing:
keyword_search_test_dataset.csv

Copy link
Copy Markdown
Contributor

@bobbai00 bobbai00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments

import org.apache.lucene.analysis.Analyzer.TokenStreamComponents

class CaseSensitiveAnalyzer extends Analyzer {
override protected def createComponents(fieldName: String): TokenStreamComponents = {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some comments to explain the purpose of this class. And how you set it to make it CaseSensitive?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the KeywordSearchOpDesc, the comment mentions about the balanced performance and wide range of supported tokens for the StandardAnalyzer. How is the CaseSensitiveAnalyzer compared to StandardAnalyzer in terms of those aspects?

@SarahAsad23 SarahAsad23 changed the title Add case sensitivity to keyword search Operator Feat(Operator): Add case sensitivity to keyword search Operator Jul 8, 2025
@SarahAsad23 SarahAsad23 changed the title Feat(Operator): Add case sensitivity to keyword search Operator feat(operator): Add case sensitivity to keyword search Operator Jul 8, 2025
@github-actions github-actions Bot added backend Anything related to backend services and removed feature fix labels Oct 11, 2025
@aglinxinyuan
Copy link
Copy Markdown
Contributor

What's the plan for this PR?

@chenlica
Copy link
Copy Markdown
Contributor

chenlica commented May 2, 2026

@KyleKDang Are you interested in taking a look?

@chenlica
Copy link
Copy Markdown
Contributor

chenlica commented May 2, 2026

@SarahAsad23 You created this PR before. Do you want to finish it?

Copy link
Copy Markdown
Contributor

@KyleKDang KyleKDang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Minor suggestion: for future readability, it might be helpful to add a short comment in CaseSensitiveAnalyzer explaining that case sensitivity is achieved by avoiding the lowercasing and normalization pipeline used in StandardAnalyzer.

@chenlica
Copy link
Copy Markdown
Contributor

chenlica commented May 2, 2026

@KyleKDang Are you willing to finish this PR? Please resolve conflicts first.

@SarahAsad23
Copy link
Copy Markdown
Contributor Author

I will finish this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend Anything related to backend services

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants