Feature 3.3/aql optimizations #5140

jsteemann · 2018-04-18T14:54:01Z

Adds the following optimizations:

remove post-sort from GatherNode in cluster AQL queries that do use indexes
for filtering but that do not require a sorted result

This optimization can speed up gathering data from multiple shards, because
it allows to remove a merge sort of the individual shards' results.
extend the already existing "reduce-extraction-to-projection" AQL optimizer
rule for RocksDB to provide projections of up to 5 document attributes. The
previous implementation only supported a projection for a single document
attribute. The new implementation will extract up to 5 document attributes from
a document while scanning a collection via an EnumerateCollectionNode.
Additionally the new version of the optimizer rule can also produce projections
when scanning an index via an IndexNode.
The optimization is benefial especially for huge documents because it will copy
out only the projected attributes from the document instead of copying the entire
document data from the storage engine.

When applied, the explainer will show the projected attributes in a projections
remark for an EnumerateCollectionNode or IndexNode. The optimization is limited
to the RocksDB storage engine.
added index-only optimization for AQL queries that can satisfy the retrieval of
all required document attributes directly from an index.

This optimization will be triggered for the RocksDB engine if an index is used
that covers all required attributes of the document used later on in the query.
If applied, it will save retrieving the actual document data (which would require
an extra lookup in RocksDB), but will instead build the document data solely
from the index values found. It will only be applied when using up to 5 attributes
from the document, and only if the rest of the document data is not used later
on in the query.

The optimization is currently available for the RocksDB engine for the index types
primary, edge, hash, skiplist and persistent.

If the optimization is applied, it will show up as "index only" in an AQL
query's execution plan for an IndexNode.
added scan-only optimization for AQL queries that iterate over collections or
indexes and that do not need to return the actual document values.

Not fetching the document values from the storage engine will provide a
considerable speedup when using the RocksDB engine, but may also help a bit
in case of the MMFiles engine. The optimization will only be applied when
full-scanning or index-scanning a collection without refering to any of its
documents later on, and, for an IndexNode, if all filter conditions for the
documents of the collection are covered by the index.

If the optimization is applied, it will show up as "scan only" in an AQL
query's execution plan for an EnumerateCollectionNode or an IndexNode.
extend existing "collect-in-cluster" optimizer rule to run grouping, counting
and deduplication on the DB servers in several cases, so that the coordinator
will only need to sum up the potentially smaller results from the individual shards.

The following types of COLLECT queries are covered now:
- RETURN DISTINCT expr
- COLLECT WITH COUNT INTO ...
- COLLECT var1 = expr1, ..., varn = exprn (WITH COUNT INTO ...), without INTO or KEEP
- COLLECT var1 = expr1, ..., varn = exprn AGGREGATE ..., without INTO or KEEP, for
  aggregate functions COUNT/LENGTH, SUM, MIN and MAX.
honor specified COLLECT method in AQL COLLECT options

for example, when the user explicitly asks for the COLLECT method
to be sorted, the optimizer will now not produce an alternative
version of the plan using the hash method.

additionally, if the user explcitly asks for the COLLECT method to
be hash, the optimizer will now change the existing plan to use
the hash method if possible instead of just creating an alternative
plan.

COLLECT ... OPTIONS { method: 'sorted' } => always use sorted method
COLLECT ... OPTIONS { method: 'hash' } => use hash if this is technically possible
COLLECT ... (no options) => create a plan using sorted, and another plan using hash method

Note that there is also a corresponding enterprise branch.

…e documents are not needed

…re-3.3/aql-optimizations

jsteemann · 2018-04-18T14:54:54Z

http://jenkins01.arangodb.biz:8080/view/PR/job/arangodb-matrix-pr-linux/298/

…es (OR on conditions that use the same index)

jsteemann · 2018-04-19T09:43:14Z

http://jenkins01.arangodb.biz:8080/view/PR/job/arangodb-matrix-pr-linux/301/

…re-3.3/aql-optimizations

mchacki

LGTM

…re-3.3/aql-optimizations

jsteemann added 13 commits April 13, 2018 17:23

multiple optimizations

ebab8e1

remove sorts from GatherNode if not required

c89da6f

do not refer to the documents when iterating over index entries if th…

b905c2e

…e documents are not needed

added tests

73356f2

projections for multiple attributes

4c63336

fix covering index queries

611a874

add index-only scans for primary and edge index

d302cc1

some cleanup

5cea570

Merge branch '3.3' of https://github.com/arangodb/arangodb into featu…

5dbe225

…re-3.3/aql-optimizations

remove debug code

9506c9e

Merge branch '3.3' of https://github.com/arangodb/arangodb into featu…

682fded

…re-3.3/aql-optimizations

added tests

7163d5c

added tests

cbb52b1

jsteemann added this to the 3.3 milestone Apr 18, 2018

jsteemann requested a review from mchacki April 18, 2018 14:54

jsteemann requested a review from graetzer April 18, 2018 14:55

jsteemann added 3 commits April 18, 2018 17:40

remove superfluous letter from CHANGELOG entry

94e0f45

add covering index support for IN operations

0714f36

use covering indexes for queries that use the same index multiple tim…

4296dde

…es (OR on conditions that use the same index)

jsteemann added the 9 WIP label Apr 19, 2018

This was referenced Apr 19, 2018

remove sorts from GatherNode if not required #5018

Closed

do not refer to the documents when iterating over index entries if th… #5003

Closed

graetzer approved these changes May 1, 2018

View reviewed changes

jsteemann added 4 commits May 1, 2018 10:59

Merge branch '3.3' of https://github.com/arangodb/arangodb into featu…

1b0c667

…re-3.3/aql-optimizations

Merge branch '3.3' of https://github.com/arangodb/arangodb into featu…

d494ebb

…re-3.3/aql-optimizations

honor comments from @mchacki

065383f

fix encryption test

65a3e1c

mchacki approved these changes May 4, 2018

View reviewed changes

jsteemann removed the 9 WIP label May 5, 2018

jsteemann added 2 commits May 13, 2018 21:54

Merge branch '3.3' of https://github.com/arangodb/arangodb into featu…

fa257d6

…re-3.3/aql-optimizations

remove calls to /_api/initialize

227af8f

jsteemann mentioned this pull request May 13, 2018

COLLECT query runs slower when using an index #5320

Closed

28 tasks

jsteemann merged commit 7319250 into 3.3 May 15, 2018

fceller deleted the feature-3.3/aql-optimizations branch June 5, 2018 14:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature 3.3/aql optimizations #5140

Feature 3.3/aql optimizations #5140

jsteemann commented Apr 18, 2018

jsteemann commented Apr 18, 2018

jsteemann commented Apr 19, 2018

mchacki left a comment

Feature 3.3/aql optimizations #5140

Feature 3.3/aql optimizations #5140

Conversation

jsteemann commented Apr 18, 2018

jsteemann commented Apr 18, 2018

jsteemann commented Apr 19, 2018

mchacki left a comment

Choose a reason for hiding this comment