feat(operator): Add case sensitivity to keyword search Operator #3510
feat(operator): Add case sensitivity to keyword search Operator #3510SarahAsad23 wants to merge 4 commits into
Conversation
| import org.apache.lucene.analysis.Analyzer.TokenStreamComponents | ||
|
|
||
| class CaseSensitiveAnalyzer extends Analyzer { | ||
| override protected def createComponents(fieldName: String): TokenStreamComponents = { |
There was a problem hiding this comment.
Can you add some comments to explain the purpose of this class. And how you set it to make it CaseSensitive?
There was a problem hiding this comment.
In the KeywordSearchOpDesc, the comment mentions about the balanced performance and wide range of supported tokens for the StandardAnalyzer. How is the CaseSensitiveAnalyzer compared to StandardAnalyzer in terms of those aspects?
|
What's the plan for this PR? |
|
@KyleKDang Are you interested in taking a look? |
|
@SarahAsad23 You created this PR before. Do you want to finish it? |
KyleKDang
left a comment
There was a problem hiding this comment.
LGTM. Minor suggestion: for future readability, it might be helpful to add a short comment in CaseSensitiveAnalyzer explaining that case sensitivity is achieved by avoiding the lowercasing and normalization pipeline used in StandardAnalyzer.
|
@KyleKDang Are you willing to finish this PR? Please resolve conflicts first. |
|
I will finish this PR. |
This PR adds an option for case sensitivity to the keyword search operator. Users can now use a checkbox to specify whether their search should be case sensitive or case insensitive.
This functionality is enabled through the addition of a CaseSensitiveAnalyzer that extends the base Lucene Analyzer for case sensitive searches, while the original StandardAnalyzer is used for case insensitive searches.
Examples:



Dataset for Testing:
keyword_search_test_dataset.csv