Skip to content

Latest commit

 

History

History
141 lines (84 loc) · 6.05 KB

CIP2017-04-20-query-combinators-for-set-operations.adoc

File metadata and controls

141 lines (84 loc) · 6.05 KB

CIP2017-04-20 Query combinators for set operations

1. Motivation

Query combinators for set operations are a common feature in other query languages. Adding more query combinators to Cypher will increase language expressivity and provide functionality that has been requested — and expected to exist — in the language by users.

2. Background

The vast majority of Cypher clauses are underpinned by sequential composition; i.e. the records produced by the first clause act as an input to the next clause and so on. However, some operations require multiple streams of records as inputs. These are called query combinators (CIP2016-06-22 Nested, updating, and chained subqueries). The most notable example of query combinators are set operations.

3. Proposal

This CIP proposes the specification of pre-existing and the introduction of several new query combinators for set operations:

  • UNION

  • UNION ALL

  • UNION MAX

  • INTERSECT

  • INTERSECT ALL

  • EXCEPT

  • EXCEPT ALL

  • EXCLUSIVE UNION

  • EXCLUSIVE UNION MAX

  • OTHERWISE

  • CROSS

Query combinators are used to construct a (compound) top-level query from two input query components.

A query component is a sequence of clauses that either describes an updating query or a read-only that ends in a RETURN clause but does not contain any top-level query combinator clauses. Query components may however contain nested subqueries whose inner queries contain query combinator clauses.

Query combinators are left-associative; that is, their operations are grouped from the left.

For all proposed query combinators — except for CROSS — the variables returned are subject to the following standard rules:

  • Both input query components must return precisely the same set of variables

  • If both input query components specify the order of returned variables explicitly, they must both return those variables in exactly the same order

  • If one of the input query components does not specify the order of returned variables explicitly (e.g. by using RETURN *), then the other input query must specify the order of returned variables explicitly. This order will then be the order in which variables are returned by the query combinator

  • If both input query components do not specify the order of returned variables explicitly (e.g. by using RETURN *), variables are returned in the same order as map keys (i.e. sorted according to their UNICODE name)

3.1. UNION

UNION computes the logical set union between two sets of input records (i.e. any duplicates are discarded).

UNION ALL computes the logical multiset sum between two bags of input records (i.e. all duplicates from both arms are retained).

UNION MAX computes the logical max-bounded multiset union between two bags of input records (i.e. retains the largest number of duplicates from either arm).

3.2. INTERSECT

INTERSECT computes the logical set intersection between two sets of input records (i.e. any duplicates are discarded).

INTERSECT ALL computes the logical multiset intersection between two bags of input records (i.e. shared duplicates are retained).

3.3. EXCEPT

EXCEPT computes the logical set difference between two sets of input records (i.e. any duplicates are discarded).

EXCEPT ALL computes the logical multiset difference between two bags of input records (i.e. excess duplicates on the left-hand side are retained).

3.4. EXCLUSIVE UNION

EXCLUSIVE UNION computes the exclusive logical set union between two sets of input records (i.e. any duplicates in the final outcome are discarded).

EXCLUSIVE UNION MAX computes the exclusive logical multiset union between two bags of input records (i.e. the largest remaining excess multiplicity of each record in any argument bag is returned).

3.5. OTHERWISE

OTHERWISE computes the logical choice between two bags of input records. It evaluates to all records from the left-hand side argument provided the bag of input records is non-empty; otherwise it evaluates to all records from the right-hand side argument.

3.6. CROSS

CROSS computes the cartesian product between the records produced by both input query components. Duplicates are preserved (i.e. CROSS does not imply DISTINCT).

In contrast to the other query combinators, the standard rules regarding returned variables do not apply to CROSS. Instead, the set of variables returned from both input query components of a CROSS must be non-overlapping. The returned variables of a CROSS operation consist of all the variables returned by the left-hand side input query component (appearing in the order specified), followed by all the variables returned by the right-hand side input query component (appearing in the order specified).

3.7. Handling of NULL values

All query combinators perform record-level comparisons under equivalence (i.e. null is equivalent to null).

3.8. Interaction with existing features

This CIP codifies the pre-existing UNION and UNION ALL constructs.

The suggested changes are expected to integrate well with the parallel CIP for nested subqueries.

This CIP adds INTERSECT, EXCLUSIVE, and OTHERWISE as new keywords.

3.9. Alternatives

SQL does not provide UNION MAX (it has been suggested in the literature though).

EXCLUSIVE UNION and EXCLUSIVE UNION MAX are not provided by SQL and could be omitted.

OTHERWISE is not provided by SQL and could be omitted.

SQL allows MINUS as an alias for EXCEPT.

4. What others do

This proposal mainly follows SQL.

5. Benefits to this proposal

Set operations are added to the language.

6. Caveats to this proposal

Increase in language complexity; adopting controversial null handling issues from SQL.

This does not consider aliasing of subqueries; henceforth set operations over the same argument queries need to repeat the argument subqueries. This could be addressed in a future CIP.