Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement IF and COALESCE #1070

Merged
merged 10 commits into from Aug 28, 2023
Merged

Conversation

joka921
Copy link
Member

@joka921 joka921 commented Aug 24, 2023

This is a rough draft so far.

@codecov
Copy link

codecov bot commented Aug 24, 2023

Codecov Report

Patch coverage: 94.26% and project coverage change: +0.11% 🎉

Comparison is base (3a628ef) 78.47% compared to head (153a2d9) 78.59%.
Report is 3 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1070      +/-   ##
==========================================
+ Coverage   78.47%   78.59%   +0.11%     
==========================================
  Files         275      277       +2     
  Lines       25917    26088     +171     
  Branches     3174     3190      +16     
==========================================
+ Hits        20338    20503     +165     
+ Misses       4372     4369       -3     
- Partials     1207     1216       +9     
Files Changed Coverage Δ
src/parser/sparqlParser/SparqlQleverVisitor.h 100.00% <ø> (ø)
...ngine/sparqlExpressions/ConditionalExpressions.cpp 92.85% <92.85%> (ø)
src/engine/sparqlExpressions/NaryExpression.h 100.00% <100.00%> (ø)
...ine/sparqlExpressions/SparqlExpressionGenerators.h 70.58% <100.00%> (-1.18%) ⬇️
src/parser/sparqlParser/SparqlQleverVisitor.cpp 92.02% <100.00%> (+0.29%) ⬆️

... and 4 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@hannahbast hannahbast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First round of comments and questions. It already runs on Wikidata and works like a charm :-)

PS: Having this functions will make our AC queries much simpler, which is great!

namespace detail::conditional_expressions {
using namespace sparqlExpression::detail;
[[maybe_unused]] auto ifImpl = [](EffectiveBooleanValueGetter::Result condition,
auto&& i, auto&& e) -> IdOrString {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need auto&& here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check.

: children_{std::move(children)} {};
ExpressionResult evaluate(EvaluationContext* context) const override {
std::vector<ExpressionResult> childResults;
std::ranges::for_each(children_, [&](const auto& child) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't the algorithm in the following simply as follows: evaluate one child after the other until the result is undefined, then return that as a result, UNDEF otherwise.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, some of the children return vectors of elements. So your algorithm would have to work basically on rows of (one entry per vector). And this evaluation would be inefficient.

What we can do and probably should (also for the other expressions) is to not evaluate the complete children, but for examples always chunks of N. Then we have both the efficient evaluation of the single children as well as bounded space consumption. Then we can do something like "If in the current block everything is already defined, then we can skip it entirely".

return results;
}

std::string getCacheKey(const VariableToColumnMap& varColMap) const override {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand it correctly that the main reason why the implementation of COALESCE is so much more involved than that of IF is that COALESCE has a variable number of arguments?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. And as soon as that is the case, we have to choose a column based implementation for efficiency reasons.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once I also implement CONCAT (will look somewhat similar concerning the boilerplate structure) I will reason on whether it is beneficial to factor out some common code for such variadic expressions.

// This rule is only used by the `RelationExpression` and `BuiltInCall` rules
// which also are not supported and should already have thrown an exception.
AD_FAIL();
std::vector<ExpressionPtr> Visitor::visit(Parser::ExpressionListContext* ctx) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this not needed so far? Because we didn't have any functions yet with a variable number of arguments?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this grammar rule is only used by a few expressions, all of which were unsupported so far (IN, NOT IN, COALESCE, CONCAT).

return createTernary(&makeIfExpression);
} else if (functionName == "coalesce") {
AD_CORRECTNESS_CHECK(ctx->expressionList());
return makeCoalesceExpression(visit(ctx->expressionList()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not a more generic createVariadic(&makeCoalesceExpression)? This connects to the earlier question about why the CoalesceExpression implementation is so (relatively) complex.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be, maybe I'll do it like this. But there is only one other variadic expression in this function (CONCAT), because ([NOT] IN) is part of a different grammar rule.

@hannahbast hannahbast marked this pull request as ready for review August 25, 2023 09:54
Copy link
Member

@hannahbast hannahbast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks a lot!

Maybe consider adding BOUND to this PR.

@sonarcloud
Copy link

sonarcloud bot commented Aug 28, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 3 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@hannahbast hannahbast changed the title IF and COALESCE functions Implement IF and COALESCE Aug 28, 2023
@hannahbast hannahbast merged commit a8c4de4 into ad-freiburg:master Aug 28, 2023
18 checks passed
@joka921 joka921 deleted the if-and-coalesce branch December 7, 2023 09:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants