SPARQL_Bottom_Up_Semantics

Understanding SPARQL’s Bottom-up Semantics

Preface: In the 1.5.2 release we’ve implemented a couple of fixes regarding issues related to SPARQL’s bottom-up evaluation approach and associated variable scoping problems. If you encounter regressions with some of your queries after upgrading to 1.5.2, this Wiki page may help you identify ill-designed queries that are not in line with the official SPARQL 1.1 semantics.

One of the most tricky parts in SPARQL semantics is its bottom-up semantics: informally speaking, bottom-up semantics means that subqueries and nested groups are logically evaluated first. This semantics has some interesting (and sometimes unintuitive) implications for the visibility of variables in subgroups and subqueries. This help page explains the basic concepts and sketches situations in which the bottom-up semantics may lead to unexpected query results.

A Simple Example - the Basic Idea of SPARQL Evaluation

Let’s start out with a very simple dataset that we will use throughout the upcoming examples:

 <http://example.com/Alice>   a <http://example.com/Person> .
 <http://example.com/Flipper> a <http://example.com/Animal> .

This is, we have two triples, one stating the Alice is a person and the other one stating that Flipper is an Animal. To illustrate the basic idea of SPARQL evaluation, let’s start out with a simple query:

 SELECT ?s ?personType WHERE {
   BIND(<http://example.com/Person> AS ?personType)
   ?s a ?o
   FILTER(?o=?personType)
 }

Q1 binds the URI <http://example.com/Person> to variable ?personType and extracts all instances of this type. Naively following the W3C’s semantics, evaluation of this triple pattern proceeds as follows:

1. Evaluate the BIND expression, which gives us a single, so-called mapping:

{ ?personType -> <http://example.com/Person> }

2. Evaluate the triple pattern ?s a ?o against the data, which gives us two mappings, namely

 { ?s -> <http://example.com/Alice>,   ?o -> <http://example.com/Person> },
 { ?s -> <http://example.com/Flipper>, ?o -> <http://example.com/Animal> }

The interesting point here is that, when strictly following the semantics as defined by the W3C, this pattern is evaluated independently from the previous expression, the join is performed in a subsequent step (see 3. below). This is, of course, not a very efficient approach, so for the sake of performance query engines such as Blazegraph typically choose to “inject” the binding obtained from 1. into the triple pattern, evaluating the pattern ?s a <http://example.com/Person> instead.

3. The two partial results are then “joined” together, combining those mappings that share the same bindings for variable ?o. In our case, the mapping from 1. joins with both mappings from 2.. The join gives us the following two mappings:

 { ?s -> <http://example.com/Alice>,   ?o -> <http://example.com/Person>, ?personType -> <http://example.com/Person> },
 { ?s -> <http://example.com/Flipper>, ?o -> <http://example.com/Animal>, ?personType -> <http://example.com/Person> }

4. In the next step, we apply the FILTER expression, retaining those mappings for which variable ?o and ?personType coincide, namely

 { ?s -> <http://example.com/Alice>, ?o -> <http://example.com/Person>, ?personType ->  <http://example.com/Person> }

Note that, as defined in the SPARQL standard, FILTER expressions are applied after evaluating a join group, so we could safely move the FILTER expression right in the beginning without changing the evaluation process (and hence, without changing the result)

5. Finally, as per SELECT clause, we project on variables ?s and ?personType to get the final query result. Written in tabular form, here’s the outcome:

 ?s                         | ?personType
 --------------------------------------------------------
 <http://example.com/Alice> | <http://example.com/Person>

A Problematic Case: FILTER expressions

Nested Subgroups

As you may know, SPARQL allows to nest expressions into subgroups. Let’s now modify our example and look at the following query:

SELECT ?s ?personType WHERE {
   BIND(<http://example.com/Person> AS ?personType)
   {
     ?s a ?o
     FILTER(?o=?personType)
   }
 }

The only change here are the { } brackets - they put the triple pattern and the FILTER expression into a separate, nested group. Now the bottom-up evaluation of SPARQL becomes relevant, since the inner group is evaluated first:

1. Evaluate the BIND expression, which gives us the mapping (as before):

{ ?type -> <http://example.com/Person> }

2. We next turn towards an evaluation of the subgroup that is enclosed by the inner { } brackets - following a bottom-up paradigm, this evaluation is done independently from the result computed in 1. 2a. As before in step 2., we evaluate the triple pattern “?s a ?o” against the data, which gives us two mappings, namely

 { ?s -> <http://example.com/Alice>,   ?o -> <http://example.com/Person> },
 { ?s -> <http://example.com/Flipper>, ?o -> <http://example.com/Animal> }

2b. Now, the FILTER (?o=?type) is applied on the two mappings from 2a. And here’s the problem: these two mappings do not contain a binding for variable ?type (because it is not in scope). In such cases, the SPARQL semantic defines that the FILTER condition evaluates to an “error”, in which case the FILTER rejects all mappings. After step 2, we thus end up with no mapping.

3. Finally, the single mapping from 1. is joined with the empty result from 2., so we end up with the empty result.

UNIONs

While the query from before may look unnecessarily complex and somewhat constructed, the following example is probably much closer to situations as they might show up in practical applications:

SELECT ?person ?nonPerson ?type WHERE {
   BIND(<http://example.com/Person> AS ?type)
   {
     ?person a ?o
     FILTER(?o=?type)
   }
   UNION
   {
     ?nonPerson a ?o
     FILTER(?o!=?type)
   }
 }

In this UNION query, the BIND expression is used to introduce a variable binding that is intended to be reused in the two parts of the UNION : the idea is to extract all instances of type <http://example.com/Person> in the first part of the UNION, and all the others in the second part of the UNION.

You may be surprised that this query does not work as expected: the two blocks of the UNION open up new scopes, in which the ?type variable is not known. For the same reasons as in the example before, both FILTER expression evaluate to error and we end up with the empty result. One way to fix this is by (redundantly) pushing the BIND into the two parts of the UNION:

 SELECT ?person ?nonPerson ?type WHERE {
    {
      BIND(<http://example.com/Person> AS ?type)
      ?person a ?o
      FILTER(?o=?type)
    }
   UNION
   {
     BIND(<http://example.com/Person> AS ?type)
     ?nonPerson a ?o
     FILTER(?o!=?type)
   }
 }

This query will give us results as expected, namely:

 ?person                    | ?nonPerson                      | ?type
 ------------------------------------------------------------------------------------------
 <http://example.com/Alice> |                                 | <http://example.com/Person> 
                            |   <http://example.com/Flipper>  | <http://example.com/Animal>

Optimizations in Blazegraph

As shortly discussed before, strictly following the bottom-up semantics is typically not a good choice when it comes to evaluation performance. In fact, for many patterns it is well-known that top-down evaluation, where we inject mappings from outside into inner subgroups, can lead to significant speedups. The good news is that, for a broad range of SPARQL queries, bottom-up and top-down evaluation yield the same results. For instance, this holds for the class of SPARQL queries built only from triple patterns connected through “.” (so-called conjunctive queries).

Blazegraph exploits such situations to rewrite queries in such a way that they can be evaluated top-down where possible. There are various techniques and tricks that are implemented for this purpose: in many cases, subgroups can just be flattened out without changing semantics, FILTERs variables in subgroups that are known to be unbound can be renamed to avoid clashes, etc. With the fixes in 1.5.2 we’ve ruled out various inconsistencies between Blazegraph and the official W3C spec.

Summary

Although the examples above were somewhat artifically designed to illustrate the issues arising in the context of SPARQL’s bottom-up semantics, we’ve repeatedly observed ill designed queries of this style. We hope that the thoughts in this post help our readers to avoid the common pitfalls and write better, standard-compliance SPARQL in the future.

Introduction