feature: enhance `Text.contains("word")` #1652

coderzc · 2021-11-12T15:32:04Z

implement feature #1649

support entire word match Text.contains("(word)")
support custom words match Text.contains("(word1|word2|word3)")
support tokenizer words match Text.contains("word"), also contain word == propValue

codecov · 2021-11-12T15:39:10Z

Codecov Report

Merging #1652 (9152ecb) into master (83180fb) will increase coverage by 2.51%.
The diff coverage is 90.00%.

@@             Coverage Diff              @@
##             master    #1652      +/-   ##
============================================
+ Coverage     64.40%   66.92%   +2.51%     
- Complexity     6897     7065     +168     
============================================
  Files           421      421              
  Lines         34675    34682       +7     
  Branches       4803     4804       +1     
============================================
+ Hits          22334    23211     +877     
+ Misses        10058     9124     -934     
- Partials       2283     2347      +64

Impacted Files	Coverage Δ
...du/hugegraph/backend/tx/GraphIndexTransaction.java	`83.72% <90.00%> (-0.10%)`	⬇️
...va/com/baidu/hugegraph/auth/HugeAuthenticator.java	`34.78% <0.00%> (-5.44%)`	⬇️
...va/com/baidu/hugegraph/task/ServerInfoManager.java	`71.34% <0.00%> (-2.81%)`	⬇️
...ain/java/com/baidu/hugegraph/task/TaskManager.java	`69.06% <0.00%> (-1.44%)`	⬇️
...in/java/com/baidu/hugegraph/auth/HugeResource.java	`77.92% <0.00%> (-1.30%)`	⬇️
.../java/com/baidu/hugegraph/auth/RolePermission.java	`90.58% <0.00%> (-1.18%)`	⬇️
...egraph/backend/store/cassandra/CassandraStore.java	`71.79% <0.00%> (-0.86%)`	⬇️
...om/baidu/hugegraph/task/StandardTaskScheduler.java	`75.85% <0.00%> (+0.24%)`	⬆️
.../baidu/hugegraph/backend/query/ConditionQuery.java	`85.94% <0.00%> (+0.26%)`	⬆️
... and 20 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 83180fb...9152ecb. Read the comment docs.

javeme · 2021-11-13T07:05:28Z

hugegraph-core/src/main/java/com/baidu/hugegraph/backend/query/Condition.java

@@ -76,6 +76,10 @@
        TEXT_CONTAINS("textcontains", String.class, String.class, (v1, v2) -> {
            return v1 != null && ((String) v1).contains((String) v2);
        }),
+        TEXT_CONTAINS_ENTIRE("textcontainsentire", String.class, String.class,


my suggestion is to just enhance GraphIndexTransaction.segmentWords instead of adding textcontainsentire operator, check whether there are multiple special symbols in the text param, and then split it into multiple search terms.
we can support form like Text.contains("(word)") and Text.contains("word1|word2|word3")

javeme · 2021-11-13T07:08:59Z

hugegraph-test/src/main/java/com/baidu/hugegraph/core/VertexCoreTest.java

+        Assert.assertEquals(5, vertices.size());
+        Assert.assertTrue(vertices.contains(vertex2));
+
+        vertices = g.V().has("name", Text.contains("(秦始皇)")).toList();


add case "秦始皇" & "秦朝", and query by "秦始皇" or "秦"

Which text analyzer can gain the word『秦』?

can try ik max_word

ik max_word text analyzer only can gain [秦始皇, 始皇]

javeme · 2021-11-16T13:18:15Z