Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: json paths in search #1695

Merged
merged 5 commits into from
Aug 18, 2023
Merged

Conversation

dranikpg
Copy link
Contributor

@dranikpg dranikpg commented Aug 13, 2023

This PR makes it possible to reference nested json fields in search documents and use short name alises (see tests)

Comment on lines 92 to 94
error_code ec;
auto path = jsoncons::jsonpath::make_expression<JsonType>(active_field, ec);
DCHECK(!ec);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating a json path for every read is wasteful, it should be created once upon validation and stored inside the field info (?)

Comment on lines +140 to +142
thread_local absl::flat_hash_map<std::string, std::unique_ptr<JsonAccessor::JsonPathContainer>>
JsonAccessor::path_cache_;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've chosen to use this tl cache instead of polluting the core json parts with json stuff. Alternatively we can choose to either wire json to the core or use just a generic member like std::any access_helper_ to get by without a type

Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
@dranikpg dranikpg marked this pull request as ready for review August 15, 2023 07:53
src/core/search/base.h Show resolved Hide resolved
src/server/search/doc_accessors.cc Outdated Show resolved Hide resolved
@@ -107,6 +137,9 @@ SearchDocData JsonAccessor::Serialize(search::Schema schema) const {
return out;
}

thread_local absl::flat_hash_map<std::string, std::unique_ptr<JsonAccessor::JsonPathContainer>>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need unique_ptr for the value_type here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to pull in json dependencies into the header, so I forward declared JsonPathContainer. Actually pulling in json deps is not crucial as this header is used only in search internal files 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having global caches with no eviction and no limit seems dangerous, memory wise.
Could we turn this into an LRU? Or bind this caching to the lifetime of the query?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are field names defined by developers. Each schema has usually a few fields, and there are usually no more than a few indices per database. Having them occupy kilobytes of memory would require developers to type out kilobytes of field names which is really unlikely

We can clear this cache once indices are deleted

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, they can't query for arbitrary fields outside of established indices? Somehow I thought it's possible but just slower.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, its only used for accessing an index - and that index must have been explicitly described by the user

src/core/search/search.h Show resolved Hide resolved
src/server/search/doc_accessors.cc Outdated Show resolved Hide resolved
src/server/search/doc_accessors.h Show resolved Hide resolved
@@ -107,6 +137,9 @@ SearchDocData JsonAccessor::Serialize(search::Schema schema) const {
return out;
}

thread_local absl::flat_hash_map<std::string, std::unique_ptr<JsonAccessor::JsonPathContainer>>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having global caches with no eviction and no limit seems dangerous, memory wise.
Could we turn this into an LRU? Or bind this caching to the lifetime of the query?

dranikpg and others added 4 commits August 17, 2023 15:42
Co-authored-by: Roy Jacobson <roi.jacobson1@gmail.com>
Signed-off-by: Vladislav <vladislav.oleshko@gmail.com>
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
@dranikpg dranikpg merged commit e0f3684 into dragonflydb:main Aug 18, 2023
7 checks passed
@dranikpg dranikpg deleted the search-json-paths branch August 18, 2023 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants