Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature 3.3/aql optimizations #5140

Merged
merged 22 commits into from
May 15, 2018
Merged

Feature 3.3/aql optimizations #5140

merged 22 commits into from
May 15, 2018

Conversation

jsteemann
Copy link
Contributor

Adds the following optimizations:

  • remove post-sort from GatherNode in cluster AQL queries that do use indexes
    for filtering but that do not require a sorted result

    This optimization can speed up gathering data from multiple shards, because
    it allows to remove a merge sort of the individual shards' results.

  • extend the already existing "reduce-extraction-to-projection" AQL optimizer
    rule for RocksDB to provide projections of up to 5 document attributes. The
    previous implementation only supported a projection for a single document
    attribute. The new implementation will extract up to 5 document attributes from
    a document while scanning a collection via an EnumerateCollectionNode.
    Additionally the new version of the optimizer rule can also produce projections
    when scanning an index via an IndexNode.
    The optimization is benefial especially for huge documents because it will copy
    out only the projected attributes from the document instead of copying the entire
    document data from the storage engine.

    When applied, the explainer will show the projected attributes in a projections
    remark for an EnumerateCollectionNode or IndexNode. The optimization is limited
    to the RocksDB storage engine.

  • added index-only optimization for AQL queries that can satisfy the retrieval of
    all required document attributes directly from an index.

    This optimization will be triggered for the RocksDB engine if an index is used
    that covers all required attributes of the document used later on in the query.
    If applied, it will save retrieving the actual document data (which would require
    an extra lookup in RocksDB), but will instead build the document data solely
    from the index values found. It will only be applied when using up to 5 attributes
    from the document, and only if the rest of the document data is not used later
    on in the query.

    The optimization is currently available for the RocksDB engine for the index types
    primary, edge, hash, skiplist and persistent.

    If the optimization is applied, it will show up as "index only" in an AQL
    query's execution plan for an IndexNode.

  • added scan-only optimization for AQL queries that iterate over collections or
    indexes and that do not need to return the actual document values.

    Not fetching the document values from the storage engine will provide a
    considerable speedup when using the RocksDB engine, but may also help a bit
    in case of the MMFiles engine. The optimization will only be applied when
    full-scanning or index-scanning a collection without refering to any of its
    documents later on, and, for an IndexNode, if all filter conditions for the
    documents of the collection are covered by the index.

    If the optimization is applied, it will show up as "scan only" in an AQL
    query's execution plan for an EnumerateCollectionNode or an IndexNode.

  • extend existing "collect-in-cluster" optimizer rule to run grouping, counting
    and deduplication on the DB servers in several cases, so that the coordinator
    will only need to sum up the potentially smaller results from the individual shards.

    The following types of COLLECT queries are covered now:

    • RETURN DISTINCT expr
    • COLLECT WITH COUNT INTO ...
    • COLLECT var1 = expr1, ..., varn = exprn (WITH COUNT INTO ...), without INTO or KEEP
    • COLLECT var1 = expr1, ..., varn = exprn AGGREGATE ..., without INTO or KEEP, for
      aggregate functions COUNT/LENGTH, SUM, MIN and MAX.
  • honor specified COLLECT method in AQL COLLECT options

    for example, when the user explicitly asks for the COLLECT method
    to be sorted, the optimizer will now not produce an alternative
    version of the plan using the hash method.

    additionally, if the user explcitly asks for the COLLECT method to
    be hash, the optimizer will now change the existing plan to use
    the hash method if possible instead of just creating an alternative
    plan.

    COLLECT ... OPTIONS { method: 'sorted' } => always use sorted method
    COLLECT ... OPTIONS { method: 'hash' } => use hash if this is technically possible
    COLLECT ... (no options) => create a plan using sorted, and another plan using hash method

Note that there is also a corresponding enterprise branch.

@jsteemann jsteemann added this to the 3.3 milestone Apr 18, 2018
@jsteemann jsteemann requested a review from mchacki April 18, 2018 14:54
@jsteemann
Copy link
Contributor Author

@jsteemann jsteemann requested a review from graetzer April 18, 2018 14:55
@jsteemann
Copy link
Contributor Author

Copy link
Member

@mchacki mchacki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jsteemann jsteemann removed the 9 WIP label May 5, 2018
@jsteemann jsteemann merged commit 7319250 into 3.3 May 15, 2018
@fceller fceller deleted the feature-3.3/aql-optimizations branch June 5, 2018 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants