FT.SEARCH output incorrect (FT.ADD not creating tokens correctly).. #168

arora-kushal · 2017-09-20T12:27:52Z

Hi,

We are facing an issue in RediSearch 0.21.0. The command is giving wrong output.

Could anyone please help me on this. The reproduction steps are mentioned below.

Thanks !

## Commands to setup:

"FT.CREATE" "VVV" "SCHEMA" "VVId_s" "TEXT" "SORTABLE" "PrimaryBIN_s" "TEXT" "SORTABLE" "StreetAddress_s" "TEXT" "SORTABLE" "PrimaryAddress_s" "TEXT" "SORTABLE" "CreationDate_l" "NUMERIC" "SORTABLE" "NormalizedSeverity_s" "TEXT" "SORTABLE" "SourceSeverity_s" "TEXT" "SORTABLE" "ClosedDate_l" "NUMERIC" "SORTABLE" "Description_s" "TEXT" "SORTABLE" "RemediationAgency_s" "TEXT" "SORTABLE" "Group_s" "TEXT" "SORTABLE" "SiteCompliId_l" "NUMERIC" "SORTABLE" "Bin_s" "TEXT" "SORTABLE" "Conditions_s" "TEXT" "SORTABLE" "Owned_s" "TEXT" "SORTABLE" "IsCondition_s" "TEXT" "SORTABLE" "VVCost_s" "TEXT" "SORTABLE" "CAPTracker_s" "TEXT" "SORTABLE" "Fixer_s" "TEXT" "SORTABLE" "OverallStatus_s" "TEXT" "SORTABLE" "CombinedStatus_s" "TEXT" "SORTABLE" "IsExcluded_s" "TEXT" "SORTABLE" "UnitNumber_s" "TEXT" "SORTABLE" "Borough_s" "TEXT" "SORTABLE" "DateAddedtoBCS_l" "NUMERIC" "SORTABLE" "UpdatedBy_s" "TEXT" "SORTABLE" "UpdatedAt_l" "NUMERIC" "SORTABLE" "Created_At_l" "NUMERIC" "SORTABLE" "Created_At_Date_l" "NUMERIC" "SORTABLE" "RemediationOwner_s" "TEXT" "SORTABLE" "VVType_s" "TEXT" "SORTABLE" "IsActive_l" "NUMERIC" "SORTABLE" "IsActiveText_s" "TEXT" "SORTABLE"

"FT.ADD" "VVV" "ss:d:{#ClientList}50" "1" "FIELDS" "VVId_s" "123456789K" "PrimaryBIN_s" "2" "StreetAddress_s" "1800" "PrimaryAddress_s" "1800" "CreationDate_l" "6" "NormalizedSeverity_s" "days" "SourceSeverity_s" "NO" "ClosedDate_l" "222" "Description_s" "Work" "RemediationAgency_s" "U" "Group_s" "2" "SiteCompliId_l" "0" "Bin_s" "2" "Conditions_s" "2" "Owned_s" "Private" "IsCondition_s" "No" "VVCost_s" "2" "CAPTracker_s" "2" "Fixer_s" "2" "OverallStatus_s" "O" "CombinedStatus_s" "Open" "IsExcluded_s" "No" "UnitNumber_s" "2" "Borough_s" "BBBB" "DateAddedtoBCS_l" "232" "UpdatedBy_s" "2" "UpdatedAt_l" "32" "Created_At_l" "6" "Created_At_Date_l" "6" "RemediationOwner_s" "U" "VVType_s" "C" "IsActive_l" "2" "IsActiveText_s" "Yes"

Search Command : "FT.SEARCH" "VVV" "@IsActiveText_s:(Yes)"
Expected output : "ss:d:{#ClientList}50"
Actual Output (INCORRECT): 0

However, if the above commands have less columns, then it gives correct output. For e.g.:

"FT.CREATE" "VVV" "SCHEMA" "VVId_s" "TEXT" "SORTABLE" "IsActiveText_s" "TEXT" "SORTABLE"

"FT.ADD" "VVV" "ss:d:{#ClientList}50" "1" "FIELDS" "VVId_s" "123456789K" "IsActiveText_s" "Yes"

Search Command : "FT.SEARCH" "VVV" "@IsActiveText_s:(Yes)"
Expected output : "ss:d:{#ClientList}50"
Actual Output (CORRECT): "ss:d:{#ClientList}50"

dvirsky · 2017-09-20T12:41:30Z

the index is limited to 32 text fields and unlimited numeric fields. However if some of them are used only for sorting, I can release a fix that makes them not count in those 32. Will that work for you?

dvirsky · 2017-09-20T12:51:17Z

you can maybe solve it by indexing the yes/no fields as numeric and not text for example.

arora-kushal · 2017-09-20T14:01:42Z

Thanks Dvir for your prompt response. Yes, it would be great if you can release a fix that makes them not count in those 32.

Also, In the above example, there are 25 text fields and 8 numeric fields. So, ideally it should work (as numeric fields would not be counted) ?

dvirsky · 2017-09-20T20:32:51Z

The numeric fields might be mistakenly counted in those 32 as well, let me check. In anyway it should not fail silently. I'll try to release a fix for this ASAP

dvirsky · 2017-09-20T21:13:43Z

Yeah, there was a bug and they were counted together, I fixed it and now writing a test for a huge schema. It should be pushed soon and you can try it.

dvirsky · 2017-09-20T21:35:07Z

OK, it should work now. Please pull master and try.

arora-kushal · 2017-09-21T12:35:21Z

Yes, working fine now. Thanks

dvirsky · 2017-09-21T17:54:58Z

@arora-kushal nice! there are tests for big schema now (64 fields), and the actual limitation is 1024 fields (I should document it somewhere). Also, it doesn't just fail silently and tells you if there are more than 32 text fields.

I would have added more, but I'm marking the field ids for each term in each document with a bitmask that is either 8,16,24 or 32 bits, to allow filtering for many fields at constant speed. The encoding schema doesn't allow more than 32 bits for the field mask at the moment, and most people will probably not use more than 8.

I do however want to add a new type of field, I call it a "tag field", much like indexing a VARCHAR field in SQL. It will be just like a text field, but is not tokenized, and cannot be refered to without the field name, thus it doesn't need a bit mask. This will allow unlimited fields for things like (in your schema) Borough_s, RemediationAgency_s etc, where you don't just search for the value, but filter for it.

arora-kushal · 2017-09-22T08:26:01Z

Thanks for explaining it in detail. Looking forward for this solution as it will be very useful for us.

arora-kushal · 2017-10-16T07:16:15Z

Hi Dvir,

Do you have any update on this?

Actually, We are building a solution where we have schema of more than 32 text fields and it requires filtering. So, the solution you have mentioned above will be very useful for us.

dvirsky · 2017-10-16T12:04:15Z

@arora-kushal Hi, it's not ready yet, I've started implementing it but have been tied in other stuff. I don't want to promise anything but it's coming soon.

gauravgoel151 · 2017-10-24T09:24:32Z

We are building a similar solution where we have around 60 text fields on which searching/filterting can be applied.
Until the above solution is ready, possible alternatives could be following:

We will create two indexes for a schema containing fields as follows:
1) Index - 1 : Containging 32 text fields out of which 31 are data fields and 1 will act as pointer for key of Index-2.
2) Index - 2 : For remaining fields.
3) Collating the search results at the end inside a high level module and returning it back to client

Any suggestions will be highly appreciated.

gauravgoel151 · 2017-10-25T12:35:47Z

@dvirsky Could you please guide us if we are thinking in right direction? Or there could be better workaround for it?

dvirsky · 2017-10-25T14:30:53Z

It's a hack, indeed, but it will work.
I'd do the following:

In one of the indexes, use NOSAVE, and make it only save ids, to avoid duplicate docs etc.
When searching - first go to the index with no documents in it, and call it with NOCONTENT and a relatively large pagination limit. You can only do that from the client, because it is a concurrent command and won't work with Lua or another module. This will give you a long list of IDs.
then take the IDs you've received from the first query, and use INKEYS when calling the second index.

As long as there aren't too many results to load on the first query (A few thousands should be fine), it will work, and can be a good temporary solution.

dvirsky closed this as completed in ee6beb5 Sep 20, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FT.SEARCH output incorrect (FT.ADD not creating tokens correctly).. #168

FT.SEARCH output incorrect (FT.ADD not creating tokens correctly).. #168

arora-kushal commented Sep 20, 2017 •

edited

dvirsky commented Sep 20, 2017

dvirsky commented Sep 20, 2017

arora-kushal commented Sep 20, 2017

dvirsky commented Sep 20, 2017

dvirsky commented Sep 20, 2017

dvirsky commented Sep 20, 2017

arora-kushal commented Sep 21, 2017

dvirsky commented Sep 21, 2017

arora-kushal commented Sep 22, 2017

arora-kushal commented Oct 16, 2017

dvirsky commented Oct 16, 2017

gauravgoel151 commented Oct 24, 2017

gauravgoel151 commented Oct 25, 2017

dvirsky commented Oct 25, 2017

FT.SEARCH output incorrect (FT.ADD not creating tokens correctly).. #168

FT.SEARCH output incorrect (FT.ADD not creating tokens correctly).. #168

Comments

arora-kushal commented Sep 20, 2017 • edited

dvirsky commented Sep 20, 2017

dvirsky commented Sep 20, 2017

arora-kushal commented Sep 20, 2017

dvirsky commented Sep 20, 2017

dvirsky commented Sep 20, 2017

dvirsky commented Sep 20, 2017

arora-kushal commented Sep 21, 2017

dvirsky commented Sep 21, 2017

arora-kushal commented Sep 22, 2017

arora-kushal commented Oct 16, 2017

dvirsky commented Oct 16, 2017

gauravgoel151 commented Oct 24, 2017

gauravgoel151 commented Oct 25, 2017

dvirsky commented Oct 25, 2017

arora-kushal commented Sep 20, 2017 •

edited