Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(search): Basic sorting #1941

Merged
merged 4 commits into from Oct 1, 2023
Merged

Conversation

dranikpg
Copy link
Contributor

@dranikpg dranikpg commented Sep 26, 2023

Adds SORTBY clause to sort search results

@@ -58,4 +65,10 @@ struct BaseIndex {
virtual void Remove(DocId id, DocumentAccessor* doc, std::string_view field) = 0;
};

// Base class for type-specific sorting indices.
struct BaseSortIndex : BaseIndex {
virtual ~BaseSortIndex() = default;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, you don't have to add this if the parent class has a virtual destructor already

Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
@dranikpg dranikpg marked this pull request as ready for review September 27, 2023 19:08
src/core/search/base.cc Outdated Show resolved Hide resolved
src/core/search/base.cc Outdated Show resolved Hide resolved
src/core/search/base.h Show resolved Hide resolved
src/core/search/search.cc Outdated Show resolved Hide resolved
src/core/search/search.cc Outdated Show resolved Hide resolved
src/server/search/search_family.cc Show resolved Hide resolved
src/server/search/search_family.cc Outdated Show resolved Hide resolved

// Sort ranges of 27 elements in reverse
for (size_t i = 0; i < 73; i++)
EXPECT_THAT(Run({"ft.search", "i1", "*", "SORTBY", "ord", "DESC", "LIMIT", to_string(i),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's working well, just the comment line then :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't understand 🙂

Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
Comment on lines 440 to 442

max_memory_limit = INT_MAX;

Copy link
Contributor Author

@dranikpg dranikpg Sep 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This freaked me out, I get out of memory on the last loop. The test already starts on 150mb fresh 😵 Will look into out

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us know then!

Copy link
Contributor

@kostasrim kostasrim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work 👨‍🍳, I will go over it one more time over the next days but looks good 🚀

src/core/search/base.h Outdated Show resolved Hide resolved
Comment on lines 440 to 442

max_memory_limit = INT_MAX;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us know then!

src/core/search/base.cc Show resolved Hide resolved

namespace dfly::search {

std::string_view QueryParams::operator[](std::string_view name) const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the std::string_view returned here might dangle, if for example we call the later overloaded operator [] on a non existing element (or if we insert any elements) in params (since the container might resize and the string itself is small enough to be an SSO). Is this something that can happen?

I learnt this from you 👨‍🍳

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree I dislike the interface a little, but this can't happen. The params are first parsed and they they're passed as a const ref to the query

src/server/search/search_family_test.cc Outdated Show resolved Hide resolved
if (auto it = schema_.field_names.find(field); it != schema_.field_names.end())
field = it->second;

auto it = sort_indices_.find(field);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small fooooox appeared 🦊 Small nit, you can use contains

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes but I still have to read it's value one line below 🤷🏻‍♂️

src/core/search/sort_indices.cc Outdated Show resolved Hide resolved

template <typename T>
void SimpleValueSortIndex<T>::Remove(DocId id, DocumentAccessor* doc, std::string_view field) {
values_[id] = T{};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we actually Remove the element here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't 😄 It's just a vector, what should we put in its place

template <typename T>
void SimpleValueSortIndex<T>::Add(DocId id, DocumentAccessor* doc, std::string_view field) {
if (id >= values_.size())
values_.resize(id + 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you know that id == id + 1.

Why not just push_back ?

Copy link
Contributor Author

@dranikpg dranikpg Sep 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add a DCHECK. Basically, search ids only ever grow at most by one. This allows creating efficient compact collections.

Resizing is more explicit in showing this

out->clear();
out->reserve(entries->size());
for (auto id : *entries)
out->push_back(values_[id]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out is used for scores_, is it ok, that scores will be partially sorted as well? Based on my understanding it's fine, I am just double checking

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, actually on a second thought we don't need to serialize all the scores, only the partial sort prefix 🤓 Will fix it

@dranikpg
Copy link
Contributor Author

Fixed everything 🔧

Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
@dranikpg dranikpg merged commit ee8a661 into dragonflydb:main Oct 1, 2023
7 checks passed
@dranikpg dranikpg deleted the search-sort branch November 9, 2023 09:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants