Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
SPARQL MAX aggregate should result in type error on empty set #1978
Reported by @manuelfiorelli on the rdf4j-user list.
Let's assume to evaluate the following queries against an empty repository.
The query below returns no results as expected:
However, this query returns one result, which doesn't contain any binding:
This is incorrect, the result should be empty, not 1 result with an empty binding.
The cause appears to be that evaluation of MAX (as well as MIN and SAMPLE) hides type errors and replaces them with an empty binding. This is not conform the spec: it should result in a type error (which should then be converted into a empty result by the processing of the extension).
I came across this W3C compliance test, which indicates that in fact the current behavior is correct: https://www.w3.org/2009/sparql/docs/tests/summary.html#aggregates-agg-empty-group. The result is a single, empty solution.
I need to dive a bit deeper into the algebra to see what I missed.
While it is true that the result of a MAX/MIN/SAMPLE aggregate on an empty solution set should be a type error, this is hidden by how groups and extensions are evaluated: the type error is converted to an empty binding, which eventually result in a query with a single, empty, solution. This is conform spec and not a bug.
There is another step that plays a part in the final outcome.
When there are zero rows from the query pattern, the grouping is empty and the
With the example query, where there are no values for the grouping.
then the aggregation becomes:
But I found a problem in the spec when it is aggregation without a
I've tried to write out the details, having got it wrong before, in
The intuition is
Thanks for analyzing this @afs . I must admit I had a hard time puzzling out where the empty row came from looking purely at the algebra, so I relied on the test case and hand-waved it a bit as "it's the grouping and extension conversions".
The crux of the argument seems to be that, in the spec as currently defined, there is a difference between an aggregate with an explicit grouping clause, and an aggregate without. That in itself seems odd to me as conceptually, an aggregate without an explicit grouping still groups - it just groups on the entire solution. From what I gather of a quick read of your writeup, your proposed fix is actually exactly making that distinction disappear.
I need to take a bit of time to puzzle with it and see where it leads us. Do you reckon this is something we should write up for the SPARQL 1.2 group as well?
Fwiw I am not yet convinced that your proposed fix fixes it "in the right direction". It seems counter-intuitive to me that, for example, a SUM aggregate:
should result in an empty result, rather than a single "0" (with or without explicit grouping). Especially given that https://www.w3.org/TR/sparql11-query/#defn_aggSum explicitly defines the "empty input" case:
The intent there seems to be that (irrespective of grouping) there's always a result - the fact that this is (apparently) hidden because of how grouping and the application of aggregation on it works seems an unintended side effect to me.
Yes, and raise errata on the SPARQL 1.1 spec for points where we agree the spec
I think it is only one current test that is wrong but the coverage of cases isn't
Let's work on tests to capture what we want SPARQL to do.
Users expectations comes from SQL. We don't have to follow that but it's good to
MySQL with an empty table.
in other words, with an empty table, aggregate-no-group is one row, aggregate-group is zero rows.
Suggested new test:
Here's a github repo to collect tests in:
Makes sense. And thinking about it a bit more, there's some symmetry in the solution. I guess the intuitive way to explain it is something along the lines of: "if you use explicit grouping, and there are no solutions, there are no groups, therefore the result is empty. However, if you don't use grouping, there is one fixed "default" group (which is optionally empty). Therefore it will have a non-empty result."
I'll try and write up some test cases later this week and contribute them to your repo. Thanks for setting it up!
Hi @afs and @jeenbroekstra thank you for the detailed analysis and sorry for the late reply. The intuitive explanation by Jeen makes sense to me. If I got it right, this is precisely the main point of the report written by @afs.
When I filed this issue, I didn't think about the deep implications of using an empty graph. In fact, I discovered the problem using some test data and a graph pattern that didn't match anything. However, I tried to "simplify" the issue by avoiding the failing graph pattern.
If I got the discussion right, when I have a group by and an failing graph pattern, in the "revised" specification, we should still zero solutions (instead of a solution with unbound variables). Am I right?