feat: json paths in search #1695

dranikpg · 2023-08-13T10:29:26Z

This PR makes it possible to reference nested json fields in search documents and use short name alises (see tests)

dranikpg · 2023-08-13T10:31:22Z

src/server/search/doc_accessors.cc

+  error_code ec;
+  auto path = jsoncons::jsonpath::make_expression<JsonType>(active_field, ec);
+  DCHECK(!ec);


Creating a json path for every read is wasteful, it should be created once upon validation and stored inside the field info (?)

dranikpg · 2023-08-14T07:56:15Z

src/server/search/doc_accessors.cc

+thread_local absl::flat_hash_map<std::string, std::unique_ptr<JsonAccessor::JsonPathContainer>>
+    JsonAccessor::path_cache_;
+


I've chosen to use this tl cache instead of polluting the core json parts with json stuff. Alternatively we can choose to either wire json to the core or use just a generic member like std::any access_helper_ to get by without a type

Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>

src/core/search/base.h

src/server/search/doc_accessors.cc

kostasrim · 2023-08-16T09:34:20Z

src/server/search/doc_accessors.cc

@@ -107,6 +137,9 @@ SearchDocData JsonAccessor::Serialize(search::Schema schema) const {
  return out;
 }

+thread_local absl::flat_hash_map<std::string, std::unique_ptr<JsonAccessor::JsonPathContainer>>


Why do we need unique_ptr for the value_type here?

I don't want to pull in json dependencies into the header, so I forward declared JsonPathContainer. Actually pulling in json deps is not crucial as this header is used only in search internal files 🤔

Having global caches with no eviction and no limit seems dangerous, memory wise.
Could we turn this into an LRU? Or bind this caching to the lifetime of the query?

Those are field names defined by developers. Each schema has usually a few fields, and there are usually no more than a few indices per database. Having them occupy kilobytes of memory would require developers to type out kilobytes of field names which is really unlikely

We can clear this cache once indices are deleted

Ah, they can't query for arbitrary fields outside of established indices? Somehow I thought it's possible but just slower.

No, its only used for accessing an index - and that index must have been explicitly described by the user

src/core/search/search.h

src/server/search/doc_accessors.cc

src/server/search/doc_accessors.h

royjacobson · 2023-08-17T10:55:39Z

src/server/search/doc_accessors.cc

@@ -107,6 +137,9 @@ SearchDocData JsonAccessor::Serialize(search::Schema schema) const {
  return out;
 }

+thread_local absl::flat_hash_map<std::string, std::unique_ptr<JsonAccessor::JsonPathContainer>>


Having global caches with no eviction and no limit seems dangerous, memory wise.
Could we turn this into an LRU? Or bind this caching to the lifetime of the query?

Co-authored-by: Roy Jacobson <roi.jacobson1@gmail.com> Signed-off-by: Vladislav <vladislav.oleshko@gmail.com>

Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>

dranikpg commented Aug 13, 2023

View reviewed changes

dranikpg force-pushed the search-json-paths branch from b335f10 to cc4f7f0 Compare August 14, 2023 07:54

dranikpg commented Aug 14, 2023

View reviewed changes

feat: json paths in search

dab9c4a

Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>

dranikpg force-pushed the search-json-paths branch from cc4f7f0 to dab9c4a Compare August 15, 2023 07:53

dranikpg marked this pull request as ready for review August 15, 2023 07:53

dranikpg requested review from royjacobson and kostasrim August 15, 2023 07:54

kostasrim reviewed Aug 16, 2023

View reviewed changes

royjacobson reviewed Aug 17, 2023

View reviewed changes

dranikpg and others added 4 commits August 17, 2023 15:42

Update src/server/search/doc_accessors.h

3e3210d

Co-authored-by: Roy Jacobson <roi.jacobson1@gmail.com> Signed-off-by: Vladislav <vladislav.oleshko@gmail.com>

fix: foxes

fc9ed7e

Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>

Merge branch 'main' into search-json-paths

888d647

fixes: v2

9c9742d

dranikpg requested review from royjacobson and kostasrim August 17, 2023 15:28

royjacobson approved these changes Aug 18, 2023

View reviewed changes

kostasrim approved these changes Aug 18, 2023

View reviewed changes

dranikpg merged commit e0f3684 into dragonflydb:main Aug 18, 2023
7 checks passed

dranikpg deleted the search-json-paths branch August 18, 2023 19:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: json paths in search #1695

feat: json paths in search #1695

dranikpg commented Aug 13, 2023 •

edited

Loading

dranikpg Aug 13, 2023

dranikpg Aug 14, 2023

kostasrim Aug 16, 2023

dranikpg Aug 17, 2023

royjacobson Aug 17, 2023

dranikpg Aug 17, 2023

royjacobson Aug 17, 2023

dranikpg Aug 17, 2023

royjacobson Aug 17, 2023

		thread_local absl::flat_hash_map<std::string, std::unique_ptr<JsonAccessor::JsonPathContainer>>
		JsonAccessor::path_cache_;

feat: json paths in search #1695

feat: json paths in search #1695

Conversation

dranikpg commented Aug 13, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dranikpg commented Aug 13, 2023 •

edited

Loading