-
Notifications
You must be signed in to change notification settings - Fork 513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MOD-3283: Initial support for multi-value numeric #2985
Conversation
25f83a6
to
6eb32b7
Compare
Codecov Report
@@ Coverage Diff @@
## master #2985 +/- ##
==========================================
- Coverage 82.08% 81.91% -0.17%
==========================================
Files 180 181 +1
Lines 29713 29960 +247
==========================================
+ Hits 24389 24543 +154
- Misses 5324 5417 +93
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
Some small comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
@@ -351,11 +351,16 @@ void Document_Clear(Document *d) { | |||
rm_free(field->strval); | |||
break; | |||
case FLD_VAR_T_ARRAY: | |||
for (int i = 0; i < field->arrayLen; ++i) { | |||
rm_free(field->multiVal[i]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before we freed the multiVal
anyway, now there is a case where we do not free it at all, is that OK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We free it in all currently supported types for array: TEXT, TAG and now NUMERIC (in the else clause)
TEXT and TAG use field multiVal
and NUMERIC use field arrNumval
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we reach this code and not enter one of those conditions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, if we reach here - we enter one of those conditions.
We use unionType FLD_VAR_T_ARRAY
only with TEXT TAG and NUMERIC
(VECTOR is using FLD_VAR_T_CSTR
- so not reaching here)
We can add an assert.
(split NumericRange by InvertedIndex numEntries instead of numDocs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few last comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added one question, but not critical.
* WIP [skip ci] allow zero as doc id delta * WIP [skip ci] fix encoder, prepare FieldIndexerData * WIP [skip ci] ingest multi values and avoid returning same doc * add missing init + adjust testMultiValueErrors * avoid test failure due to slow background indexing on CI * add basic tests * Per review by Ariel * add range tests * Fix duplicated results * skip tests with FT.DEBUG on cluster * remove old TODO * Skip writing/reading zero delta to/from buffer * add debug dump for NumericRangeTree * Add testInvertedIndexMultipleBlocks * Add missing free in InvertedIndex_Dump * add cpp test testNumericEncodingMulti and fix cpp tests with inverted index with zero delta * add cpp test testRangeIteratorMulti * Skip test with DEBUG command on cluster * Update TotalIIBlocks in GC * Fix GC with multi and add test * add test for SORTBY * Cleanup commented out printf/debug code * Fix FT.INFO num_records and adjust GC * fix test for coord * split NumericRange by InvertedIndex numEntries instead of numDocs * improve coverage of NumericRangeNode_Balance * Avoid calling DocTable_Exists for the same doc * Add subcommand FT.DEBUG DUMP_NUMIDXTREE * Add test for num_records * Remove unused member `card` of NumericRangeTree * Fix testRangeIteratorMulti for change in commit a982d18 (split NumericRange by InvertedIndex numEntries instead of numDocs) * fix per code review * Add numEntries to IndexBlock and split by it * Cosmetic fixes per Ariel's review (cherry picked from commit e81502d)
Adding support for indexing and searching multiple numeric values.
Multiple values could be encountered with JSONPath leading to an array, or using JSONPath operators such as wildcard, recursive descent, slices, etc.
Multiple numeric values mixed with non-numerical values or non-scalar values are causing indexing failure,
except for
null
values, which are skipped.Sort - By first value
Other considerations:
encodeNumeric
and decoderreadNumeric
EncodingHeader
andNumEncodingCommon
for detailsInvertedIndex_WriteEntryGeneric
IndexReader_SkipToBlock
(and maybe more)at_least(2)
with range[10,20]
should match value as[15, 15]
or just something like[15, 17]
Any
predicatekhash_t
)Related fixes:
FT.INFO
return valuetotal_inverted_index_blocks
which was not updated in GC, also for HASH and single value JSON (see python testcheckInfoAndGC
)Followup PRs:
FT.SEARCH idx:all '-@val:[-inf (-20] -@val:[(100 +inf]'
NumericIndexType_RdbSave
andNumericIndexType_RdbLoad