-
Notifications
You must be signed in to change notification settings - Fork 883
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: json paths in search #1695
Conversation
src/server/search/doc_accessors.cc
Outdated
error_code ec; | ||
auto path = jsoncons::jsonpath::make_expression<JsonType>(active_field, ec); | ||
DCHECK(!ec); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creating a json path for every read is wasteful, it should be created once upon validation and stored inside the field info (?)
b335f10
to
cc4f7f0
Compare
thread_local absl::flat_hash_map<std::string, std::unique_ptr<JsonAccessor::JsonPathContainer>> | ||
JsonAccessor::path_cache_; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've chosen to use this tl cache instead of polluting the core json parts with json stuff. Alternatively we can choose to either wire json to the core or use just a generic member like std::any access_helper_
to get by without a type
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
cc4f7f0
to
dab9c4a
Compare
@@ -107,6 +137,9 @@ SearchDocData JsonAccessor::Serialize(search::Schema schema) const { | |||
return out; | |||
} | |||
|
|||
thread_local absl::flat_hash_map<std::string, std::unique_ptr<JsonAccessor::JsonPathContainer>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need unique_ptr
for the value_type
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to pull in json dependencies into the header, so I forward declared JsonPathContainer. Actually pulling in json deps is not crucial as this header is used only in search internal files 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having global caches with no eviction and no limit seems dangerous, memory wise.
Could we turn this into an LRU? Or bind this caching to the lifetime of the query?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those are field names defined by developers. Each schema has usually a few fields, and there are usually no more than a few indices per database. Having them occupy kilobytes of memory would require developers to type out kilobytes of field names which is really unlikely
We can clear this cache once indices are deleted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, they can't query for arbitrary fields outside of established indices? Somehow I thought it's possible but just slower.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, its only used for accessing an index - and that index must have been explicitly described by the user
@@ -107,6 +137,9 @@ SearchDocData JsonAccessor::Serialize(search::Schema schema) const { | |||
return out; | |||
} | |||
|
|||
thread_local absl::flat_hash_map<std::string, std::unique_ptr<JsonAccessor::JsonPathContainer>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having global caches with no eviction and no limit seems dangerous, memory wise.
Could we turn this into an LRU? Or bind this caching to the lifetime of the query?
Co-authored-by: Roy Jacobson <roi.jacobson1@gmail.com> Signed-off-by: Vladislav <vladislav.oleshko@gmail.com>
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
This PR makes it possible to reference nested json fields in search documents and use short name alises (see tests)