-
Notifications
You must be signed in to change notification settings - Fork 504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature: enhance Text.contains("word")
#1652
Conversation
bbbdc01
to
5270ae2
Compare
Codecov Report
@@ Coverage Diff @@
## master #1652 +/- ##
============================================
+ Coverage 64.40% 66.92% +2.51%
- Complexity 6897 7065 +168
============================================
Files 421 421
Lines 34675 34682 +7
Branches 4803 4804 +1
============================================
+ Hits 22334 23211 +877
+ Misses 10058 9124 -934
- Partials 2283 2347 +64
Continue to review full report at Codecov.
|
5270ae2
to
6a71e0f
Compare
@@ -76,6 +76,10 @@ | |||
TEXT_CONTAINS("textcontains", String.class, String.class, (v1, v2) -> { | |||
return v1 != null && ((String) v1).contains((String) v2); | |||
}), | |||
TEXT_CONTAINS_ENTIRE("textcontainsentire", String.class, String.class, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my suggestion is to just enhance GraphIndexTransaction.segmentWords instead of adding textcontainsentire operator, check whether there are multiple special symbols in the text param, and then split it into multiple search terms.
we can support form like Text.contains("(word)")
and Text.contains("word1|word2|word3")
Assert.assertEquals(5, vertices.size()); | ||
Assert.assertTrue(vertices.contains(vertex2)); | ||
|
||
vertices = g.V().has("name", Text.contains("(秦始皇)")).toList(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add case "秦始皇" & "秦朝", and query by "秦始皇" or "秦"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which text analyzer can gain the word『秦』?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can try ik max_word
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ik max_word
text analyzer only can gain [秦始皇, 始皇]
Text.contains("(word)")
Text.contains(text)
6a71e0f
to
d06a9e5
Compare
Text.contains(text)
Text.contains("word")
String[] split = StringUtils.split(text, WORD_DELIMITER); | ||
return Sets.newHashSet(split); | ||
} | ||
// Add original text, retain word == propValue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
improve comment
support Text.contains("(word)") and Text.contains("word1|word2|word3") | ||
*/ | ||
if (text.startsWith(START_SYMBOL) && text.endsWith(END_SYMBOL)) { | ||
return Sets.newHashSet(text.substring(1, text.length() - 1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep ImmutableSet.of() style?
/* | ||
Enhance segmentWords. | ||
support Text.contains("(word)") and Text.contains("word1|word2|word3") | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/*
* Support 3 kinds of query:
* - Text.contains("(word)"): query by user-specified word;
* - Text.contains("word1|word2|word3"): query by user-specified words;
* - Text.contains("words"): query by words splitted from analyzer;
* /
} | ||
/* | ||
Add original text to segment set. | ||
Let Text.contains("word") also contain `(word == propValue)` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not mean what the comment represents
String[] texts = StringUtils.split(text, WORD_DELIMITER); | ||
return ImmutableSet.copyOf(texts); | ||
} | ||
Set<String> segment = this.textAnalyzer.segment(text); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to segments
return ImmutableSet.copyOf(texts); | ||
} | ||
Set<String> segment = this.textAnalyzer.segment(text); | ||
segment.add(text); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't add text to segment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if don't add text to segment, will can't exact match words
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
textAnalyzer.segment() would ensue words the same between insertion and query.
926e128
to
0593893
Compare
@@ -903,7 +903,10 @@ private boolean matchSearchIndexWords(String propValue, String fieldValue) { | |||
return ImmutableSet.copyOf(texts); | |||
} | |||
Set<String> segments = this.textAnalyzer.segment(text); | |||
|
|||
// Add original words to segments, in order to can be match fully words |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for more specific like: "Add original text to segments at the insertion stage, in order to can match fully words at the query stage."
0593893
to
6bd09ad
Compare
6bd09ad
to
35da5ec
Compare
TODO: update search index doc https://hugegraph.github.io/hugegraph-doc/clients/hugegraph-client.html |
implement feature #1649
Text.contains("(word)")
Text.contains("(word1|word2|word3)")
Text.contains("word")
, also containword == propValue