New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support fields in search mvp #1184
Conversation
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
src/core/search/base.h
Outdated
// Represents input to a search tree. | ||
struct SearchInput { | ||
// Callback that's supplied with field values. | ||
using FieldConsumer = std::function<bool(std::string_view)>; | ||
|
||
virtual ~SearchInput() = default; | ||
|
||
// Check if the given callback returns true on any of the active fields. | ||
virtual bool Check(FieldConsumer) = 0; | ||
|
||
// Sets a single active field. | ||
virtual void SelectField(std::string_view field) = 0; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about the interface yet.
Virtual so its fully mockable. Otherwise search needs to include core structures like StringMap and redis skiplists (or should it?)
I also don't like the FieldConsumer function glue here, maybe have a separate virtual base for nodes that work directly with strings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of the fact that input
is now a mutable argument to all expression checkers, but I don't have a better idea (nor do I fully understand the challenges here, tbh).
Since this is supposed to be a quick-N-dirty test ground, it's great to let it evolve as you go. If you do want to brainstorm about this - lmk, I'd be happy to!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Yes, this part won't outlive the basic MVP. Actually speaking returning a vector<string>
is also an option because we do bruteforce search either way and performance is already not our goal. Once we do indexed, all out interfaces will change.
But lets still brainstorm quickly 🙂
- Mutable state
The only state we have is the field (and I'm almost sure there won't be more). So if your query has @field:(subtree)
, then the whole subtree needs to be applied only to a specific field of the document, whereas by default they apply to all fields. I do this field shift with a single node that sets the active field.
The better idea for now would be most likely smth like this:
// Interface for accessing hset values with different data structures underneath.
struct HSetAccessor {
using FieldConsumer = std::function<bool(std::string_view)>;
virtual bool Check(FieldConsumer f, std::string_view active_field = {}) const;
};
struct SearchInput {
const HSetAccessor* hset;
std::string_view active_field;
};
So now its all const and state is decoupled from the interface
- FieldConsumer
Some context about the future: Why not fill an container like vector<string_view>*
The data structure can't always provide string_views to all its entries. With listpack some might be stored as real integers and need to be decoded to a temporary buffer (so we also cant return a single string_view, only if a buffer is passed). Keeping a separate buffer is cumbersome.
What I don't like with the FieldConsumer is that:
- The iteration logic is handled by the data structure accessor
- The node has to wrap its logic into one more layer
So I'd make the HSetAccessor return just some kind of virtual iterator. C++ ones are a pain really. Go style isn't bad.
struct HSetIterator {
virtual string_view Next();
virtual bool HasNext():
}
So iteration logic can be performed by the nodes themselves... But it'll be the same for all of them. Keeping it in a middle-part like SearchInput as a helper requires one more virtual base.
field_cond_expr: | ||
LPAREN field_cond_expr RPAREN { $$ = $2; } | ||
| field_cond_expr field_cond_expr %prec AND_OP { $$ = MakeExpr<AstLogicalNode>($1, $2, AstLogicalNode::kAnd); } | ||
| field_cond_expr OR_OP field_cond_expr { $$ = MakeExpr<AstLogicalNode>($1, $3, AstLogicalNode::kOr); } | ||
| NOT_OP field_cond_expr { $$ = MakeExpr<AstNegateNode>($2); }; | ||
| TERM { $$ = MakeExpr<AstTermNode>($1); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very similar to the tree above - all except the last case. Those need to be different symbols as its context free and there is no way to re-use it directly. But maybe there are macros and such in bison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea if there are macros, but thanks for the explanation!
I have never seen bison code(?), so this helped me :)
EXPECT_TRUE(Check(&input)); | ||
|
||
input = {{"f1", "foo"}, {"f2", "bar"}, {"f3", "last is wrong"}}; | ||
EXPECT_FALSE(Check(&input)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add a test case where f1
and f2
have the right values, but f3
is missing (instead of mismatching)?
(I realize it's similar to the last one, but not identical)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its line 263, though I added other words to f1, but it should still match. I can remove them for clarity
MockedSearchInput(std::string test_field) : hset_{{"field", test_field}} { | ||
} | ||
|
||
MockedSearchInput& operator=(Map hset) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to dislike implicit-looking conversions such as this. It's also against the Google style guide.
Can you please use a traditional method, like SetFrom()
or whatever?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll do.
I just always thought the rules in the style guide are there for a reason. This one mainly to not have ambiguous or unexpected conversions/behaviour/etc with re-useable core components, as its easy to miss for others. Its clear where the danger is for large code bases.
I don't see any way to misuse it here inside tests 😅 I just want it to be readable and contain as few visual clutter (brackets) as possible.
field_cond_expr: | ||
LPAREN field_cond_expr RPAREN { $$ = $2; } | ||
| field_cond_expr field_cond_expr %prec AND_OP { $$ = MakeExpr<AstLogicalNode>($1, $2, AstLogicalNode::kAnd); } | ||
| field_cond_expr OR_OP field_cond_expr { $$ = MakeExpr<AstLogicalNode>($1, $3, AstLogicalNode::kOr); } | ||
| NOT_OP field_cond_expr { $$ = MakeExpr<AstNegateNode>($2); }; | ||
| TERM { $$ = MakeExpr<AstTermNode>($1); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea if there are macros, but thanks for the explanation!
I have never seen bison code(?), so this helped me :)
src/core/search/base.h
Outdated
// Represents input to a search tree. | ||
struct SearchInput { | ||
// Callback that's supplied with field values. | ||
using FieldConsumer = std::function<bool(std::string_view)>; | ||
|
||
virtual ~SearchInput() = default; | ||
|
||
// Check if the given callback returns true on any of the active fields. | ||
virtual bool Check(FieldConsumer) = 0; | ||
|
||
// Sets a single active field. | ||
virtual void SelectField(std::string_view field) = 0; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of the fact that input
is now a mutable argument to all expression checkers, but I don't have a better idea (nor do I fully understand the challenges here, tbh).
Since this is supposed to be a quick-N-dirty test ground, it's great to let it evolve as you go. If you do want to brainstorm about this - lmk, I'd be happy to!
Fyi, i am not sure that listpack is going to meet search any time soon. Not
that i think you are wrong in general
…On Sat, May 6, 2023, 09:42 Vladislav ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In src/core/search/base.h
<#1184 (comment)>
:
> +// Represents input to a search tree.
+struct SearchInput {
+ // Callback that's supplied with field values.
+ using FieldConsumer = std::function<bool(std::string_view)>;
+
+ virtual ~SearchInput() = default;
+
+ // Check if the given callback returns true on any of the active fields.
+ virtual bool Check(FieldConsumer) = 0;
+
+ // Sets a single active field.
+ virtual void SelectField(std::string_view field) = 0;
+
Agree. Yes, this part won't outlive the MVP, but lets still brainstorm
quickly 🙂
1. Mutable state
The only state we have is the field (and I'm almost sure there won't be
more). So if your query has @field:(subtree), then the whole subtree
needs to be applied only to a specific field of the document, whereas by
default they apply to all fields. I do this field shift with a single node
that sets the active field.
The better idea for now would be most likely smth like this:
// Interface for accessing hset values with different data structures underneath.struct HSetAccessor {
using FieldConsumer = std::function<bool(std::string_view)>;
virtual bool Check(FieldConsumer f, std::string_view active_field = {}) const;
};
struct SearchInput {
const HSetAccessor* hset;
std::string_view active_field;
};
So now its all const and state is decoupled from the interface. The
downside I see in the future is that with explicit SelectField/ClearField
the real implementation can store an iterator/pointer and look up this
value again much faster, as it could be used in a large subtree. But it can
also do it as an internal optimization, comparing last accessed values.
1. FieldConsumer
Some context about the future: Why not fill an container like
vector<string_view>*
- The data structure can't always provide string_views to all its
entries. With listpack some might be stored as real integers and need to be
decoded to a temporary buffer (so we also cant return a single string_view,
only if a buffer is passed)
- We probably don't want to iterate over all fields if we match on the
first
What I don't like with the FieldConsumer is that:
- The iteration logic is handled by the data structure accessor
- The node has to wrap its logic into one more layer
So I'd make the HSetAccessor return just some kind of virtual iterator.
C++ ones are a pain really. Go style isn't bad.
struct HSetIterator {
virtual string_view Next();
virtual bool HasNext():
}
So iteration logic can be performed by the nodes themselves... But it'll
be the same for all of them. Keeping it in a middle-part like SearchInput
as a helper requires one more virtual base.
—
Reply to this email directly, view it on GitHub
<#1184 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA4BFCFAJMRBPZLTW4VAH3TXEXXFPANCNFSM6AAAAAAXXRJALA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Styleguide are also for managing larger teams, not only for large
codebases. A junior that joins will be confused if he sees that the line is
blurry.
…On Sat, May 6, 2023, 09:49 Roman Gershman ***@***.***> wrote:
Fyi, i am not sure that listpack is going to meet search any time soon.
Not that i think you are wrong in general
On Sat, May 6, 2023, 09:42 Vladislav ***@***.***> wrote:
> ***@***.**** commented on this pull request.
> ------------------------------
>
> In src/core/search/base.h
> <#1184 (comment)>
> :
>
> > +// Represents input to a search tree.
> +struct SearchInput {
> + // Callback that's supplied with field values.
> + using FieldConsumer = std::function<bool(std::string_view)>;
> +
> + virtual ~SearchInput() = default;
> +
> + // Check if the given callback returns true on any of the active fields.
> + virtual bool Check(FieldConsumer) = 0;
> +
> + // Sets a single active field.
> + virtual void SelectField(std::string_view field) = 0;
> +
>
> Agree. Yes, this part won't outlive the MVP, but lets still brainstorm
> quickly 🙂
>
> 1. Mutable state
>
> The only state we have is the field (and I'm almost sure there won't be
> more). So if your query has @field:(subtree), then the whole subtree
> needs to be applied only to a specific field of the document, whereas by
> default they apply to all fields. I do this field shift with a single node
> that sets the active field.
>
> The better idea for now would be most likely smth like this:
>
> // Interface for accessing hset values with different data structures underneath.struct HSetAccessor {
> using FieldConsumer = std::function<bool(std::string_view)>;
> virtual bool Check(FieldConsumer f, std::string_view active_field = {}) const;
> };
> struct SearchInput {
> const HSetAccessor* hset;
> std::string_view active_field;
> };
>
> So now its all const and state is decoupled from the interface. The
> downside I see in the future is that with explicit SelectField/ClearField
> the real implementation can store an iterator/pointer and look up this
> value again much faster, as it could be used in a large subtree. But it can
> also do it as an internal optimization, comparing last accessed values.
>
> 1. FieldConsumer
>
> Some context about the future: Why not fill an container like
> vector<string_view>*
>
> - The data structure can't always provide string_views to all its
> entries. With listpack some might be stored as real integers and need to be
> decoded to a temporary buffer (so we also cant return a single string_view,
> only if a buffer is passed)
> - We probably don't want to iterate over all fields if we match on
> the first
>
> What I don't like with the FieldConsumer is that:
>
> - The iteration logic is handled by the data structure accessor
> - The node has to wrap its logic into one more layer
>
> So I'd make the HSetAccessor return just some kind of virtual iterator.
> C++ ones are a pain really. Go style isn't bad.
>
> struct HSetIterator {
> virtual string_view Next();
> virtual bool HasNext():
> }
>
> So iteration logic can be performed by the nodes themselves... But it'll
> be the same for all of them. Keeping it in a middle-part like SearchInput
> as a helper requires one more virtual base.
>
> —
> Reply to this email directly, view it on GitHub
> <#1184 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AA4BFCFAJMRBPZLTW4VAH3TXEXXFPANCNFSM6AAAAAAXXRJALA>
> .
> You are receiving this because you are subscribed to this thread.Message
> ID: ***@***.***>
>
|
Aren't hashes stored with listpack as well? |
oh, I forgot about this small map optimization. You are right. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed comments
// Interface for accessing hashset values with different data structures underneath. | ||
struct HSetAccessor { | ||
// Callback that's supplied with field values. | ||
using FieldConsumer = std::function<bool(std::string_view)>; | ||
|
||
virtual bool Check(FieldConsumer f, std::string_view active_field) const = 0; | ||
}; | ||
|
||
// Wrapper around hashset accessor and optional active field. | ||
struct SearchInput { | ||
SearchInput(const HSetAccessor* hset, std::string_view active_field = {}) | ||
: hset_{hset}, active_field_{active_field} { | ||
} | ||
|
||
SearchInput(const SearchInput& base, std::string_view active_field) | ||
: hset_{base.hset_}, active_field_{active_field} { | ||
} | ||
|
||
bool Check(HSetAccessor::FieldConsumer f) { | ||
return hset_->Check(move(f), active_field_); | ||
} | ||
|
||
private: | ||
const HSetAccessor* hset_; | ||
std::string_view active_field_; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets keep the structure like this (open to comments about the code). Anyways indexed search requires another interface, so its all temporary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SG
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
2859d92
to
6f7f082
Compare
LGTM! It won't let me merge (never actually done it in GitHub so I might miss something). |
My changes dismissed your review + it was still a draft |
class AstFieldNode : public AstNode { | ||
public: | ||
AstFieldNode(std::string field, NodePtr node) : field_{field.substr(1)}, node_{node} { | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding a CHECK()
that field[0] == '@'
No description provided.