Author: Stefan Plantikow <stefan.plantikow@neotechnology.com>
This CIP codifies the pre-existing
UNION
andUNION ALL
clauses, and proposes additional query combinators for set operations.
Query combinators for set operations are a common feature in other query languages. Adding more query combinators to Cypher will increase language expressivity and provide functionality that has been requested — and expected to exist — in the language by users.
The vast majority of Cypher clauses are underpinned by sequential composition; i.e. the records produced by the first clause act as an input to the next clause and so on. However, some operations require multiple streams of records as inputs. These are called query combinators (CIP2016-06-22 Nested, updating, and chained subqueries). The most notable example of query combinators are set operations.
This CIP proposes the specification of pre-existing and the introduction of several new query combinators for set operations:
-
UNION
-
UNION ALL
-
UNION MAX
-
INTERSECT
-
INTERSECT ALL
-
EXCEPT
-
EXCEPT ALL
-
EXCLUSIVE UNION
-
EXCLUSIVE UNION MAX
-
OTHERWISE
-
CROSS
Query combinators are used to construct a (compound) top-level query from two input query components.
A query component is a sequence of clauses that either describes an updating query or a read-only that ends in a RETURN
clause but does not contain any top-level query combinator clauses.
Query components may however contain nested subqueries whose inner queries contain query combinator clauses.
Query combinators are left-associative; that is, their operations are grouped from the left.
For all proposed query combinators — except for CROSS
— the variables returned are subject to the following standard rules:
-
Both input query components must return precisely the same set of variables
-
If both input query components specify the order of returned variables explicitly, they must both return those variables in exactly the same order
-
If one of the input query components does not specify the order of returned variables explicitly (e.g. by using
RETURN *
), then the other input query must specify the order of returned variables explicitly. This order will then be the order in which variables are returned by the query combinator -
If both input query components do not specify the order of returned variables explicitly (e.g. by using
RETURN *
), variables are returned in the same order as map keys (i.e. sorted according to their UNICODE name)
UNION
computes the logical set union between two sets of input records (i.e. any duplicates are discarded).
UNION ALL
computes the logical multiset sum between two bags of input records (i.e. all duplicates from both arms are retained).
UNION MAX
computes the logical max-bounded multiset union between two bags of input records (i.e. retains the largest number of duplicates from either arm).
INTERSECT
computes the logical set intersection between two sets of input records (i.e. any duplicates are discarded).
INTERSECT ALL
computes the logical multiset intersection between two bags of input records (i.e. shared duplicates are retained).
EXCEPT
computes the logical set difference between two sets of input records (i.e. any duplicates are discarded).
EXCEPT ALL
computes the logical multiset difference between two bags of input records (i.e. excess duplicates on the left-hand side are retained).
EXCLUSIVE UNION
computes the exclusive logical set union between two sets of input records (i.e. any duplicates in the final outcome are discarded).
EXCLUSIVE UNION MAX
computes the exclusive logical multiset union between two bags of input records (i.e. the largest remaining excess multiplicity of each record in any argument bag is returned).
OTHERWISE
computes the logical choice between two bags of input records.
It evaluates to all records from the left-hand side argument provided the bag of input records is non-empty; otherwise it evaluates to all records from the right-hand side argument.
CROSS
computes the cartesian product between the records produced by both input query components.
Duplicates are preserved (i.e. CROSS
does not imply DISTINCT
).
In contrast to the other query combinators, the standard rules regarding returned variables do not apply to CROSS
.
Instead, the set of variables returned from both input query components of a CROSS
must be non-overlapping.
The returned variables of a CROSS
operation consist of all the variables returned by the left-hand side input query component (appearing in the order specified), followed by all the variables returned by the right-hand side input query component (appearing in the order specified).
All query combinators perform record-level comparisons under equivalence (i.e. null
is equivalent to null
).
This CIP codifies the pre-existing UNION
and UNION ALL
constructs.
The suggested changes are expected to integrate well with the parallel CIP for nested subqueries.
This CIP adds INTERSECT
, EXCLUSIVE
, and OTHERWISE
as new keywords.