Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support unicode strings in search #1698

Merged
merged 2 commits into from
Aug 18, 2023

Conversation

dranikpg
Copy link
Contributor

This PR adds support for unicode strings in search. Those are now split by word boundaries with the ICU library, so unicode sentences should be handled correctly. Added tests with simple sentences in four languages

Adds a dependency on libicu-dev

Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
@royjacobson
Copy link
Contributor

Looks good!

royjacobson
royjacobson previously approved these changes Aug 14, 2023
kostasrim
kostasrim previously approved these changes Aug 15, 2023
Copy link
Contributor

@kostasrim kostasrim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good 👨‍🍳

src/core/search/indices.cc Show resolved Hide resolved
src/core/search/indices.cc Show resolved Hide resolved
Run({"hset", "d:1", "title", "Веселая СТРЕКОЗА Иван", "visits", "400"});
Run({"hset", "d:2", "title", "Die fröhliche Libelle Günther", "visits", "300"});
Run({"hset", "d:3", "title", "השפירית המהירה יעקב", "visits", "200"});
Run({"hset", "d:4", "title", "πανίσχυρη ΛΙΒΕΛΛΟΎΛΗ Δίας", "visits", "100"});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greek! Means strong dragonfly zeus 🤣 did you google translate this? (loved ALL the easter egg 👀 )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was just mixing adjective + dragonfly + some stereotypical name 🙂 Sounds like children stories

EXPECT_EQ(Run({"ft.create", "i1", "schema", "title", "text", "visits", "numeric"}), "OK");

// Explicitly using screaming uppercase to check utf-8 to lowercase functionality
Run({"hset", "d:1", "title", "Веселая СТРЕКОЗА Иван", "visits", "400"});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Merry DRAGONFLY Ivan" 🤣

Signed-off-by: Vladislav <vlad@dragonflydb.io>
@dranikpg dranikpg dismissed stale reviews from kostasrim and royjacobson via 538a0f8 August 18, 2023 06:46
@dranikpg dranikpg merged commit 5198622 into dragonflydb:main Aug 18, 2023
7 checks passed
@dranikpg dranikpg deleted the search-unicode branch August 18, 2023 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants