Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support fields in search mvp #1184

Merged
merged 2 commits into from May 6, 2023

Conversation

dranikpg
Copy link
Contributor

@dranikpg dranikpg commented May 5, 2023

No description provided.

Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
Comment on lines 16 to 28
// Represents input to a search tree.
struct SearchInput {
// Callback that's supplied with field values.
using FieldConsumer = std::function<bool(std::string_view)>;

virtual ~SearchInput() = default;

// Check if the given callback returns true on any of the active fields.
virtual bool Check(FieldConsumer) = 0;

// Sets a single active field.
virtual void SelectField(std::string_view field) = 0;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the interface yet.

Virtual so its fully mockable. Otherwise search needs to include core structures like StringMap and redis skiplists (or should it?)

I also don't like the FieldConsumer function glue here, maybe have a separate virtual base for nodes that work directly with strings

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of the fact that input is now a mutable argument to all expression checkers, but I don't have a better idea (nor do I fully understand the challenges here, tbh).
Since this is supposed to be a quick-N-dirty test ground, it's great to let it evolve as you go. If you do want to brainstorm about this - lmk, I'd be happy to!

Copy link
Contributor Author

@dranikpg dranikpg May 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Yes, this part won't outlive the basic MVP. Actually speaking returning a vector<string> is also an option because we do bruteforce search either way and performance is already not our goal. Once we do indexed, all out interfaces will change.

But lets still brainstorm quickly 🙂

  1. Mutable state

The only state we have is the field (and I'm almost sure there won't be more). So if your query has @field:(subtree), then the whole subtree needs to be applied only to a specific field of the document, whereas by default they apply to all fields. I do this field shift with a single node that sets the active field.

The better idea for now would be most likely smth like this:

// Interface for accessing hset values with different data structures underneath.
struct HSetAccessor {
  using FieldConsumer = std::function<bool(std::string_view)>;
  virtual bool Check(FieldConsumer f, std::string_view active_field = {}) const;
};

struct SearchInput {
  const HSetAccessor* hset;
  std::string_view active_field;
};

So now its all const and state is decoupled from the interface

  1. FieldConsumer

Some context about the future: Why not fill an container like vector<string_view>*

The data structure can't always provide string_views to all its entries. With listpack some might be stored as real integers and need to be decoded to a temporary buffer (so we also cant return a single string_view, only if a buffer is passed). Keeping a separate buffer is cumbersome.

What I don't like with the FieldConsumer is that:

  • The iteration logic is handled by the data structure accessor
  • The node has to wrap its logic into one more layer

So I'd make the HSetAccessor return just some kind of virtual iterator. C++ ones are a pain really. Go style isn't bad.

struct HSetIterator {
  virtual string_view Next();
  virtual bool HasNext():
}

So iteration logic can be performed by the nodes themselves... But it'll be the same for all of them. Keeping it in a middle-part like SearchInput as a helper requires one more virtual base.

Comment on lines +92 to +97
field_cond_expr:
LPAREN field_cond_expr RPAREN { $$ = $2; }
| field_cond_expr field_cond_expr %prec AND_OP { $$ = MakeExpr<AstLogicalNode>($1, $2, AstLogicalNode::kAnd); }
| field_cond_expr OR_OP field_cond_expr { $$ = MakeExpr<AstLogicalNode>($1, $3, AstLogicalNode::kOr); }
| NOT_OP field_cond_expr { $$ = MakeExpr<AstNegateNode>($2); };
| TERM { $$ = MakeExpr<AstTermNode>($1); }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very similar to the tree above - all except the last case. Those need to be different symbols as its context free and there is no way to re-use it directly. But maybe there are macros and such in bison.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea if there are macros, but thanks for the explanation!
I have never seen bison code(?), so this helped me :)

@dranikpg dranikpg requested a review from chakaz May 5, 2023 20:28
chakaz
chakaz previously approved these changes May 5, 2023
EXPECT_TRUE(Check(&input));

input = {{"f1", "foo"}, {"f2", "bar"}, {"f3", "last is wrong"}};
EXPECT_FALSE(Check(&input));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add a test case where f1 and f2 have the right values, but f3 is missing (instead of mismatching)?
(I realize it's similar to the last one, but not identical)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its line 263, though I added other words to f1, but it should still match. I can remove them for clarity

MockedSearchInput(std::string test_field) : hset_{{"field", test_field}} {
}

MockedSearchInput& operator=(Map hset) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to dislike implicit-looking conversions such as this. It's also against the Google style guide.
Can you please use a traditional method, like SetFrom() or whatever?

Copy link
Contributor Author

@dranikpg dranikpg May 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll do.

I just always thought the rules in the style guide are there for a reason. This one mainly to not have ambiguous or unexpected conversions/behaviour/etc with re-useable core components, as its easy to miss for others. Its clear where the danger is for large code bases.

I don't see any way to misuse it here inside tests 😅 I just want it to be readable and contain as few visual clutter (brackets) as possible.

Comment on lines +92 to +97
field_cond_expr:
LPAREN field_cond_expr RPAREN { $$ = $2; }
| field_cond_expr field_cond_expr %prec AND_OP { $$ = MakeExpr<AstLogicalNode>($1, $2, AstLogicalNode::kAnd); }
| field_cond_expr OR_OP field_cond_expr { $$ = MakeExpr<AstLogicalNode>($1, $3, AstLogicalNode::kOr); }
| NOT_OP field_cond_expr { $$ = MakeExpr<AstNegateNode>($2); };
| TERM { $$ = MakeExpr<AstTermNode>($1); }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea if there are macros, but thanks for the explanation!
I have never seen bison code(?), so this helped me :)

Comment on lines 16 to 28
// Represents input to a search tree.
struct SearchInput {
// Callback that's supplied with field values.
using FieldConsumer = std::function<bool(std::string_view)>;

virtual ~SearchInput() = default;

// Check if the given callback returns true on any of the active fields.
virtual bool Check(FieldConsumer) = 0;

// Sets a single active field.
virtual void SelectField(std::string_view field) = 0;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of the fact that input is now a mutable argument to all expression checkers, but I don't have a better idea (nor do I fully understand the challenges here, tbh).
Since this is supposed to be a quick-N-dirty test ground, it's great to let it evolve as you go. If you do want to brainstorm about this - lmk, I'd be happy to!

@romange
Copy link
Collaborator

romange commented May 6, 2023 via email

@romange
Copy link
Collaborator

romange commented May 6, 2023 via email

@dranikpg
Copy link
Contributor Author

dranikpg commented May 6, 2023

Fyi, i am not sure that listpack is going to meet search any time soon. Not
that i think you are wrong in general

Aren't hashes stored with listpack as well?

@romange
Copy link
Collaborator

romange commented May 6, 2023

oh, I forgot about this small map optimization. You are right.

Copy link
Contributor Author

@dranikpg dranikpg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed comments

Comment on lines +16 to +41
// Interface for accessing hashset values with different data structures underneath.
struct HSetAccessor {
// Callback that's supplied with field values.
using FieldConsumer = std::function<bool(std::string_view)>;

virtual bool Check(FieldConsumer f, std::string_view active_field) const = 0;
};

// Wrapper around hashset accessor and optional active field.
struct SearchInput {
SearchInput(const HSetAccessor* hset, std::string_view active_field = {})
: hset_{hset}, active_field_{active_field} {
}

SearchInput(const SearchInput& base, std::string_view active_field)
: hset_{base.hset_}, active_field_{active_field} {
}

bool Check(HSetAccessor::FieldConsumer f) {
return hset_->Check(move(f), active_field_);
}

private:
const HSetAccessor* hset_;
std::string_view active_field_;
};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets keep the structure like this (open to comments about the code). Anyways indexed search requires another interface, so its all temporary

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG

Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
@chakaz
Copy link
Collaborator

chakaz commented May 6, 2023

LGTM! It won't let me merge (never actually done it in GitHub so I might miss something).

@dranikpg dranikpg marked this pull request as ready for review May 6, 2023 18:22
@dranikpg
Copy link
Contributor Author

dranikpg commented May 6, 2023

My changes dismissed your review + it was still a draft

@dranikpg dranikpg requested a review from chakaz May 6, 2023 18:23
class AstFieldNode : public AstNode {
public:
AstFieldNode(std::string field, NodePtr node) : field_{field.substr(1)}, node_{node} {
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a CHECK() that field[0] == '@'

@dranikpg dranikpg merged commit dc853fe into dragonflydb:main May 6, 2023
6 checks passed
@dranikpg dranikpg deleted the basic-rsearch-fields branch May 28, 2023 09:17
romange pushed a commit that referenced this pull request Jun 1, 2023
Field support for search mvp
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants