Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement subset of RediSearch #431

Open
romange opened this issue Oct 24, 2022 · 14 comments
Open

Implement subset of RediSearch #431

romange opened this issue Oct 24, 2022 · 14 comments
Assignees

Comments

@romange
Copy link
Collaborator

romange commented Oct 24, 2022

Quoting a nodejs user, they want to create an index like this:

import { SchemaFieldTypes } from 'redis';
import { redis } from '..';

redis.ft
    .CREATE(
        'index:channels',
        {
            '$.id': {
                type: SchemaFieldTypes.TEXT,
                AS: 'id',
                SORTABLE: true,
            },
            '$.guildId': {
                type: SchemaFieldTypes.TEXT,
                AS: 'guildId',
                SORTABLE: true,
            },
         
            '$.position': {
                type: SchemaFieldTypes.NUMERIC as any,
                AS: 'position',
                SORTABLE: true,
            },
        },
        {
            ON: 'JSON',
            PREFIX: 'channels',
        }
    )
    .catch(() => null);

and then be able to query it like this:

await utils.redisSearch('index:channels', `@guildId:${id}`);

note - we do not need a full-text search, stemming, query rewrite and other language related features.
Instead this task is about formal, semi-structured querying that will provide lots of value for folks that use RedisJson.

The task is a super task that should be broken down into smaller sub-projects:

  1. Auto indexing (FT.CREATE)
  2. Building a query AST tree with all the operators we support.
  3. Executing a query without query plan optimizations.
@iko1
Copy link
Contributor

iko1 commented Oct 26, 2022

I think you meant writing in the title RedisSearch and not JsonSearch.

@romange
Copy link
Collaborator Author

romange commented Oct 27, 2022

You are right, but since the subset of functionality I want to focus on is within JSON , this mistake makes sense 😄

@romange romange changed the title Implement subset of JsonSearch Implement subset of RediSearch Apr 29, 2023
@romange
Copy link
Collaborator Author

romange commented Apr 29, 2023

Could be a great MVP for query part

@sirfz
Copy link

sirfz commented May 6, 2023

I'm not a RediSearch user (yet) but have been very interested in it recently as it seems to be exactly what I need for my use case.

In particular, the vector similarity search can be a killer feature to have in dragonfly.

In general, RediSearch seems to be an all around great feature and having it in dragonfly, in my opinion, would bring lots of adoption. Just my 2 cents

@totorofly
Copy link

My project heavily utilizes the combination of RediSearch and RediJSON, requiring roughly 100-300 FT.SEARCH commands per second in 2-3 million records to obtain results that meet various conditions. Additionally, the TTL of my 200-300 million records is only around 180-480 seconds, meaning the load on both writing and reading (FT.SEARCH, with the requirement that the average query result returns within 300ms) from the Redis cluster is quite high. As a result, I had to build a Redis cluster to meet these demands, which makes the overall maintenance cost relatively high. Therefore, I'm looking for an architecture that can achieve this effect at a lower cost. If DragonFlyDB can provide full-text search capabilities similar to RediSearch and RediJSON, I would be willing to give it a try.

@romange
Copy link
Collaborator Author

romange commented May 9, 2023

Can you provide an example for a typical query that you send? Do you need word stemming, multiple languages support in full text search?

@totorofly
Copy link

totorofly commented May 9, 2023

Can you provide an example for a typical query that you send? Do you need word stemming, multiple languages support in full text search?

I currently do not need to use stemming because my project is mainly to help users match mobile phone numbers related to their favorite numbers. Since the matching is all about numeric strings, even if it is a Chinese project, I don't need to use Chinese, just numbers and English letters. Here is my search example:
FT.SEARCH tm '@hitMassRuleId:lastABABAB|anyABCDABCD|lastAAABBB|lastAABBCC|lastABCABC|lastABCDDBCAXXX|anyAABBCC|anyAAABBB|lastAAAAB|lastAAAAA|anyABCDEF|anyAAAAA|lastAABBB|lastABCDABDCXXX|lastABCDBACD|lastABCDBACDXXX|lastABCDDCBA|lastAAAA|lastABCDACBDXXX|lastABBA|lastABBCBB|lastABCDABDC|anyAAAA|lastAABB|anyABABAB|anyABCABC|anyAAAAB|midAAAA|anyAAABB|anyABBCBB|lastABABtu368|anyAABBB|lastABBB|lastABABtu613|lastABAB|lastABABtu850|midBAAA|lastABCD|midAAAB|lastAABCC|midAABB|lastBrithYear758799|lastAAAB|midABCD|midABAB|any888|any666|lastAABAAXXX|lastABCCBAXX|anyABAB|anyABABtu368|anyBrithYear758799|anyABBA|anyAABB|anyABABtu850|anyABABtu613|lastABB|anyABCD|lastXAXAXAXA|lastXAXAXA|lastABAC|lastAXAXAX|anyAAA|head1889|lastABACAD|anyABBCDD|lastAXAXAXAX @ttlInSecond:[1683364311 +inf] @providerCode:jyxf @status:1 @preOrderTime:[-inf (1683364009] @touchCode:P0000008 @province:beijing @city:beijing @last4:7777' LIMIT 0 1

@romange
Copy link
Collaborator Author

romange commented May 9, 2023 via email

@totorofly
Copy link

Looks like a structured search, do not see here any full text-search requirements but maybe i am missing something.

On Tue, May 9, 2023, 18:34 0.618 @.> wrote: Can you provide an example for a typical query that you send? Do you need word stemming, multiple languages support in full text search? I currently do not need to use stemming because my project is mainly to help users match mobile phone numbers related to their favorite numbers. Since the matching is all about numeric strings, even if it is a Chinese project, I don't need to use Chinese, just numbers and English letters. Here is my search example: FT.SEARCH tm @.:lastABABAB|anyABCDABCD|lastAAABBB|lastAABBCC|lastABCABC|lastABCDDBCAXXX|anyAABBCC|anyAAABBB|lastAAAAB|lastAAAAA|anyABCDEF|anyAAAAA|lastAABBB|lastABCDABDCXXX|lastABCDBACD|lastABCDBACDXXX|lastABCDDCBA|lastAAAA|lastABCDACBDXXX|lastABBA|lastABBCBB|lastABCDABDC|anyAAAA|lastAABB|anyABABAB|anyABCABC|anyAAAAB|midAAAA|anyAAABB|anyABBCBB|lastABABtu368|anyAABBB|lastABBB|lastABABtu613|lastABAB|lastABABtu850|midBAAA|lastABCD|midAAAB|lastAABCC|midAABB|lastBrithYear758799|lastAAAB|midABCD|midABAB|any888|any666|lastAABAAXXX|lastABCCBAXX|anyABAB|anyABABtu368|anyBrithYear758799|anyABBA|anyAABB|anyABABtu850|anyABABtu613|lastABB|anyABCD|lastXAXAXAXA|lastXAXAXA|lastABAC|lastAXAXAX|anyAAA|head1889|lastABACAD|anyABBCDD|lastAXAXAXAX @ttlInSecond:[1683364311 +inf] @providerCode:jyxf @status:1 @preOrderTime:[-inf (1683364009] @TouchCode:P00000035328 @province:beijing @city:beijing ' LIMIT 0 0 — Reply to this email directly, view it on GitHub <#431 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA4BFCHVM5YEHKYLCMXDPWTXFJPXRANCNFSM6AAAAAARNGMONU . You are receiving this because you authored the thread.Message ID: @.***>

I use full-text search by decomposing all possible combinations of a phone number into individual keys in a JSON object and then setting these JSON keys as index values in RediSearch. For example, in order to match phone numbers similar to a user's license plate number, I specifically decompose various possible combinations of 5 consecutive digits of the target phone number, as follows:

{
    "phone": "13085669245",
    "status": "1",
    "owner": "",
    "ttlInSecond": 0,
    "preOrderTime": 0,
    "providerCode": "beijing", 
     "rule_car": {
        "any5": "69245",
        "any4": "X9245,6X245,69X45,692X5,6924X",
        "any3": "XX245,X9X45,X92X5,X924X,6XX45,6X2X5,6X24X,69XX5,69X4X,692XX",
        "tail5": "69245",
        "tail4": "9245",
        "tail3": "245",
        "continuous5": "66924,56692,85669,08566,30856,13085",
        "continuous4": "6924,6692,5669,8566,0856,3085,1308",
        "continuous3": "924,692,669,566,856,085,308,130"
    },
}

@sirfz
Copy link

sirfz commented May 9, 2023

In my case, fulltext search is the least interesting feature of RediSearch. I'm more interested in running queries like:

FT.SEARCH items-index "(@brand:xxx @model:xxx)=>[KNN 10 @vector $vector as score]" ...

which translates to something like: for all items that match brand xxx and model xxx, get me the top 10 closest items to the given $vector. Stemming/normalization could be useful for the attributes filter I guess but the power is more about searching multiple attributes and returning other attriubutes/columns (and of course the vector similarity search is great).

@totorofly
Copy link

In Redis Cluster mode, I am unable to simultaneously call FT.SEARCH and JSON.SET operations within a single Lua script, because doing so involves different hosts and different slots, and cross-slot combination operations are not supported. This is one of the areas where I think Redis Cluster mode is not as perfect as it could be.

@romange
Copy link
Collaborator Author

romange commented Jun 4, 2023

@sirfz hey can you DM me on discord? I am curious to hear more about your usecase.

@romange
Copy link
Collaborator Author

romange commented Sep 7, 2023

@sirfz @dwzkit we will have an experimental version of FT.SEARCH in v1.10 (next release).
Would you like to try it out?

@totorofly
Copy link

@sirfz @dwzkit we will have an experimental version of FT.SEARCH in v1.10 (next release). Would you like to try it out?

I'm sorry, I've been busy with other projects recently and may not have time to experiment for the next two months.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants