-
Notifications
You must be signed in to change notification settings - Fork 871
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(search): Basic sorting #1941
Conversation
16ebb1a
to
102495d
Compare
src/core/search/base.h
Outdated
@@ -58,4 +65,10 @@ struct BaseIndex { | |||
virtual void Remove(DocId id, DocumentAccessor* doc, std::string_view field) = 0; | |||
}; | |||
|
|||
// Base class for type-specific sorting indices. | |||
struct BaseSortIndex : BaseIndex { | |||
virtual ~BaseSortIndex() = default; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, you don't have to add this if the parent class has a virtual destructor already
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
102495d
to
f37c35e
Compare
|
||
// Sort ranges of 27 elements in reverse | ||
for (size_t i = 0; i < 73; i++) | ||
EXPECT_THAT(Run({"ft.search", "i1", "*", "SORTBY", "ord", "DESC", "LIMIT", to_string(i), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it's working well, just the comment line then :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't understand 🙂
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
decbf3c
to
8b304d6
Compare
|
||
max_memory_limit = INT_MAX; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This freaked me out, I get out of memory
on the last loop. The test already starts on 150mb fresh 😵 Will look into out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let us know then!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work 👨🍳, I will go over it one more time over the next days but looks good 🚀
|
||
max_memory_limit = INT_MAX; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let us know then!
|
||
namespace dfly::search { | ||
|
||
std::string_view QueryParams::operator[](std::string_view name) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the std::string_view
returned here might dangle, if for example we call the later overloaded operator []
on a non existing element (or if we insert any elements) in params (since the container might resize and the string itself is small enough to be an SSO). Is this something that can happen?
I learnt this from you 👨🍳
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree I dislike the interface a little, but this can't happen. The params are first parsed and they they're passed as a const ref to the query
if (auto it = schema_.field_names.find(field); it != schema_.field_names.end()) | ||
field = it->second; | ||
|
||
auto it = sort_indices_.find(field); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A small fooooox appeared 🦊 Small nit, you can use contains
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes but I still have to read it's value one line below 🤷🏻♂️
|
||
template <typename T> | ||
void SimpleValueSortIndex<T>::Remove(DocId id, DocumentAccessor* doc, std::string_view field) { | ||
values_[id] = T{}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we actually Remove
the element here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't 😄 It's just a vector, what should we put in its place
template <typename T> | ||
void SimpleValueSortIndex<T>::Add(DocId id, DocumentAccessor* doc, std::string_view field) { | ||
if (id >= values_.size()) | ||
values_.resize(id + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you know that id
== id + 1
.
Why not just push_back
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add a DCHECK. Basically, search ids only ever grow at most by one. This allows creating efficient compact collections.
Resizing is more explicit in showing this
src/core/search/sort_indices.cc
Outdated
out->clear(); | ||
out->reserve(entries->size()); | ||
for (auto id : *entries) | ||
out->push_back(values_[id]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out
is used for scores_
, is it ok, that scores
will be partially sorted as well? Based on my understanding it's fine, I am just double checking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, actually on a second thought we don't need to serialize all the scores, only the partial sort prefix 🤓 Will fix it
Fixed everything 🔧 |
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
cc18351
to
663e23e
Compare
Adds SORTBY clause to sort search results