Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MOD-2972: Initial support for multi-value text #2819

Merged
merged 41 commits into from
Jul 27, 2022
Merged

Conversation

oshadmi
Copy link
Collaborator

@oshadmi oshadmi commented Jun 11, 2022

Indexing and Searching TEXT with single or multi value

Marked with label pr:break since invalid JSONPath will now fail index creation (when a RedisJSON with JSON API V2 is loaded)

Limitations:

  • Indexing arrays of String values, or multiple String values
    • No concatenation of multiple values is done
  • When JSONPath leads to multi values and not to an array, SLOP and INORDER cannot be used (since the order of the values is not well defined)
  • When JSONPath leads to multiple values: String values are indexed, null values are skipped, and any other value type is causing an indexing failure
  • SORTBY is only sorting by the first value
  • No HIGHLIGHT support
  • RETURN of a Schema attribute, whose JSONPath leads to multiple values, returns only the first value (as a JSON String)
    • If a JSONPath is specified by the RETURN instead of a Schema attribute, all values are returned (as a JSON String)

More details:

  • When indexing, a predefined delta is used to increase positional offsets between array slots for multi text values. It controls the level of separation between phrases in different array slots (related to the SLOP parameter of ft.search command)
    • This predefined value is set by RediSearch configuration parameter MULTI_TEXT_SLOP (at module load-time)
  • Currently when searching, any value is found among the multiple indexed values

Followup PR's:

  • Return multiple values / Sort by multiple values - Need to define what is returned and how (support returning multiple values, with potentially different multiple SortBy values)
  • Documentation of this new feature

src/forward_index.c Outdated Show resolved Hide resolved
'''


def expect_undef_order(query : Query):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unbound name: Name Query is used but not defined in the current scope.

Reply with "@sonatype-lift help" for info about LiftBot commands.
Reply with "@sonatype-lift ignore" to tell LiftBot to leave out the above finding from this PR.
Reply with "@sonatype-lift ignoreall" to tell LiftBot to leave out all the findings from this PR and from the status bar in Github.

When talking to LiftBot, you need to refresh the page to see its response. Click here to get to know more about LiftBot commands.


Was this a good recommendation?
[ 🙁 Not relevant ] - [ 😕 Won't fix ] - [ 😑 Not critical, will fix ] - [ 🙂 Critical, will fix ] - [ 😊 Critical, fixing now ]

@codecov
Copy link

codecov bot commented Jul 27, 2022

Codecov Report

Merging #2819 (c09ada3) into master (a1d597f) will increase coverage by 0.03%.
The diff coverage is 83.50%.

❗ Current head c09ada3 differs from pull request most recent head d0ada79. Consider uploading reports for the commit d0ada79 to get more accurate results

@@            Coverage Diff             @@
##           master    #2819      +/-   ##
==========================================
+ Coverage   81.92%   81.96%   +0.03%     
==========================================
  Files         180      180              
  Lines       29264    29425     +161     
==========================================
+ Hits        23974    24117     +143     
- Misses       5290     5308      +18     
Impacted Files Coverage Δ
src/aggregate/aggregate_exec.c 95.31% <ø> (ø)
src/config.h 100.00% <ø> (ø)
src/doc_types.h 100.00% <ø> (ø)
src/query_error.h 100.00% <ø> (ø)
src/document.c 73.78% <73.21%> (-0.85%) ⬇️
src/json.c 88.03% <82.50%> (-4.92%) ⬇️
src/query.c 90.35% <86.66%> (-0.13%) ⬇️
src/aggregate/aggregate_request.c 96.16% <100.00%> (+<0.01%) ⬆️
src/config.c 89.90% <100.00%> (+0.90%) ⬆️
src/document_basic.c 84.89% <100.00%> (+0.05%) ⬆️
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4b475a2...d0ada79. Read the comment docs.

@rafie rafie self-requested a review July 27, 2022 16:14
@oshadmi oshadmi merged commit 602ae4b into master Jul 27, 2022
@oshadmi oshadmi deleted the omer_multi_text branch July 27, 2022 17:39
@@ -727,6 +726,7 @@ def testconfigMultiTextOffsetDeltaSlopNeg(env):
err_msg = 'module init should fail due to invalid module configuration'

env.assertIsNotNone(err_msg)
env = Env()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you creating a new env here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was getting a

ConnectionRefusedError: [Errno 111] Connection refused

And then

redis.exceptions.RedisClusterException: Redis Cluster cannot be connected. Please provide at least one reachable node. 
got error on cluster info, will try again, Error 99 connecting to localhost:6379. Cannot assign requested address.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you are hiding the test failure?

oshadmi added a commit that referenced this pull request Aug 8, 2022
* WIP initial support for multi-value text

* Add test for non-text values

* WIP sorting

* WIP Add JSON API V2 and avoid SORTABLE with multi value

* add tests and avoid failing SORTABLE

* check undefined ordering

* Add FT config option MULTI_TEXT_OFFSET_DELTA

* cleanups

* code review with Ariel (1)

* More fixes and code review with Ariel

* Fix from review: rename config

* [skip ci] Fix from review: refactor querying jsonpath info (1)

* Fix from review: refactor querying jsonpath info (2)

* Rename test file and fix log message typo

* Add test for Boolean TAG

* Fail index creation on an invalid jsonpath

* Fix order in testMultiTagBool

* Remove duplicated entries from JSON API V2

* Make JSON API versions incremental

* Make JSON API versions incremental (2)

* API cleanup

* Rename API pathIsStatic to pathIsSingle

* update readies

* Add alternative expected depending on ReJSON version

* Fixes from Meir's review (1)

* update readies and increase no_output_timeout

* Add score to see ordering in failed test

* Fix test for coordinator

* Fix test for coordinator

* Fix test for coordinator (2)

* Fix tests (order and moduleArgs config param)

* Iterate a json array without parsing a jsonpath

* Skip score test in cluster

* Fixes from Meir's review (2) - single Env per test, run on coord

* Fixes from Meir's review (2) - revert timeout

* Try skip MULTI_TEXT_SLOP config tests on cluster

* Fix MULTI_TEXT_SLOP config tests on cluster

* Fix MULTI_TEXT_SLOP config tests on cluster (2)

* update readies
oshadmi added a commit that referenced this pull request Sep 21, 2022
* WIP initial support for multi-value text

* Add test for non-text values

* WIP sorting

* WIP Add JSON API V2 and avoid SORTABLE with multi value

* add tests and avoid failing SORTABLE

* check undefined ordering

* Add FT config option MULTI_TEXT_OFFSET_DELTA

* cleanups

* code review with Ariel (1)

* More fixes and code review with Ariel

* Fix from review: rename config

* [skip ci] Fix from review: refactor querying jsonpath info (1)

* Fix from review: refactor querying jsonpath info (2)

* Rename test file and fix log message typo

* Add test for Boolean TAG

* Fail index creation on an invalid jsonpath

* Fix order in testMultiTagBool

* Remove duplicated entries from JSON API V2

* Make JSON API versions incremental

* Make JSON API versions incremental (2)

* API cleanup

* Rename API pathIsStatic to pathIsSingle

* update readies

* Add alternative expected depending on ReJSON version

* Fixes from Meir's review (1)

* update readies and increase no_output_timeout

* Add score to see ordering in failed test

* Fix test for coordinator

* Fix test for coordinator

* Fix test for coordinator (2)

* Fix tests (order and moduleArgs config param)

* Iterate a json array without parsing a jsonpath

* Skip score test in cluster

* Fixes from Meir's review (2) - single Env per test, run on coord

* Fixes from Meir's review (2) - revert timeout

* Try skip MULTI_TEXT_SLOP config tests on cluster

* Fix MULTI_TEXT_SLOP config tests on cluster

* Fix MULTI_TEXT_SLOP config tests on cluster (2)

* update readies

(cherry picked from commit 602ae4b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants