Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different behaviour using prefix and suffix FILTER #9032

Closed
rackom opened this issue May 17, 2019 · 7 comments
Closed

Different behaviour using prefix and suffix FILTER #9032

rackom opened this issue May 17, 2019 · 7 comments

Comments

@rackom
Copy link

rackom commented May 17, 2019

My Environment

  • ArangoDB Version: 3.4.5
  • Storage Engine: RocksDB
  • Deployment Mode: Single Server
  • Deployment Strategy: Manual Start in Docker
  • Configuration:
  • Infrastructure: own
  • Operating System:
  • Total RAM in your machine: 32gb
  • Disks in use: HDD
  • Used Package:

Component, Query & Data

Affected feature:
AQL

AQL query (if applicable):

LET z = (
    FOR i IN phone_number
        FILTER i._key LIKE "1%"
        //FILTER i._key LIKE "%1"
        FOR v,e IN 1..1 INBOUND i phone_call
            FILTER v.height > 160
            FILTER e.duration > 300
            RETURN e._id
)

RETURN z

AQL explain (prefix):

Query String:
 LET z = (
     FOR i IN phone_number
         FILTER i._key LIKE "1%"
         FOR v,e IN 1..1 INBOUND i phone_call
             FILTER v.height > 160
             FILTER e.duration > 300
             RETURN e._id
 )
 RETURN z

Execution plan:
 Id   NodeType                  Calls     Items   Runtime [s]   Comment
  1   SingletonNode                 1         1       0.00001   * ROOT
 13   SubqueryNode                  1         1       3.02448     - LET z = ...   /* const subquery */
  2   SingletonNode                 1         1       0.00001       * ROOT
  3   EnumerateCollectionNode    1000   1000000       0.73762         - FOR i IN phone_number   /* full collection scan */
  4   CalculationNode            1000   1000000       0.89147           - LET #6 = LIKE(i.`_key`, "1%")   /* simple expression */   /* collections used: i : phone_number */
  5   FilterNode                  112    111111       0.33341           - FILTER #6
  6   TraversalNode                 1         0       1.06189           - FOR v  /* vertex */, e  /* edge */ IN 1..1  /* min..maxPathDepth */ INBOUND i /* startnode */  phone_call
  7   CalculationNode               1         0       0.00001             - LET #10 = ((v.`height` > 160) && (e.`duration` > 300))   /* simple expression */
  8   FilterNode                    1         0       0.00001             - FILTER #10
 11   CalculationNode               1         0       0.00001             - LET #14 = e.`_id`   /* attribute expression */
 12   ReturnNode                    1         0       0.00001             - RETURN #14
 14   ReturnNode                    1         1       0.00001     - RETURN z

Indexes used:
 By   Type   Collection   Unique   Sparse   Selectivity   Fields      Ranges
  6   edge   phone_call   false    false            n/a   [ `_to` ]   base INBOUND

Traversals on graphs:
 Id  Depth  Vertex collections  Edge collections  Options                                  Filter / Prune Conditions
 6   1..1                       phone_call        uniqueVertices: none, uniqueEdges: path                           

Optimization rules applied:
 Id   RuleName
  1   move-calculations-up
  2   move-filters-up
  3   move-calculations-up-2
  4   move-filters-up-2
  5   fuse-filters

Query Statistics:
 Writes Exec   Writes Ign   Scan Full   Scan Index   Filtered   Exec Time [s]
           0            0     1000000            0     888889         3.04093

Query Profile:
 Query Stage           Duration [s]
 initializing               0.00003
 parsing                    0.00019
 optimizing ast             0.00004
 loading collections        0.00004
 instantiating plan         0.00013
 optimizing plan            0.00072
 executing                  3.02468
 finalizing                 0.01498

AQL explain (prefix):

Query String:
 LET z = (
     FOR i IN phone_number
         FILTER i._key LIKE "%1"
         FOR v,e IN 1..1 INBOUND i phone_call
             FILTER v.height > 160
             FILTER e.duration > 300
             RETURN e._id
 )
 RETURN z

Execution plan:
 Id   NodeType                  Calls     Items   Runtime [s]   Comment
  1   SingletonNode                 1         1       0.00001   * ROOT
 13   SubqueryNode                  1         1      58.33171     - LET z = ...   /* const subquery */
  2   SingletonNode                 1         1       0.00001       * ROOT
  3   EnumerateCollectionNode    1000   1000000       0.64415         - FOR i IN phone_number   /* full collection scan */
  4   CalculationNode            1000   1000000       1.52598           - LET #6 = LIKE(i.`_key`, "%1")   /* simple expression */   /* collections used: i : phone_number */
  5   FilterNode                  100    100000       0.32020           - FILTER #6
  6   TraversalNode              1999   1998806      51.85046           - FOR v  /* vertex */, e  /* edge */ IN 1..1  /* min..maxPathDepth */ INBOUND i /* startnode */  phone_call
  7   CalculationNode            1999   1998806       1.60681             - LET #10 = ((v.`height` > 160) && (e.`duration` > 300))   /* simple expression */
  8   FilterNode                  966    965937       1.23444             - FILTER #10
 11   CalculationNode             966    965937       0.90494             - LET #14 = e.`_id`   /* attribute expression */
 12   ReturnNode                  966    965937       0.23455             - RETURN #14
 14   ReturnNode                    1         1       0.00002     - RETURN z

Indexes used:
 By   Type   Collection   Unique   Sparse   Selectivity   Fields      Ranges
  6   edge   phone_call   false    false            n/a   [ `_to` ]   base INBOUND

Traversals on graphs:
 Id  Depth  Vertex collections  Edge collections  Options                                  Filter / Prune Conditions
 6   1..1                       phone_call        uniqueVertices: none, uniqueEdges: path                           

Optimization rules applied:
 Id   RuleName
  1   move-calculations-up
  2   move-filters-up
  3   move-calculations-up-2
  4   move-filters-up-2
  5   fuse-filters

Query Statistics:
 Writes Exec   Writes Ign   Scan Full   Scan Index   Filtered   Exec Time [s]
           0            0     1000000      1998806    1932869        58.57918

Query Profile:
 Query Stage           Duration [s]
 initializing               0.00003
 parsing                    0.00017
 optimizing ast             0.00004
 loading collections        0.00004
 instantiating plan         0.00014
 optimizing plan            0.00065
 executing                 58.53492
 finalizing                 0.04297

Dataset:
Simple model 2 vertices, 1 edge collection. V(person) -> E(phone_call) -> V(phone_number)

Problem:
When doing first FILTER in FOR loop, doing both prefix and suffix behave differently. Prefix query will not reach traversal part, suffix one does.

Expected result:
FILTER should not influence behaviour of following blocks when both filters return some number of results.

@graetzer
Copy link
Contributor

not sure if I understand your problem. AQL works much like a pipeline, if FILTER i._key LIKE "1%" does not evaluate to true for any documents in phone_number then the traversal will not have any documents to start from.

Should the below AQL query return any results ?

FOR i IN phone_number
    FILTER i._key LIKE "1%"
    RETURN i

@rackom
Copy link
Author

rackom commented May 17, 2019

Yes, that is correct. In one case filter returns 100k documents, in other something more than 111k. But in one case it will follow with FOR and in other it will not.

@graetzer
Copy link
Contributor

this does still happen if you remove the subquery? Can you send us some testdata to reproduce it?

@rackom
Copy link
Author

rackom commented May 20, 2019

If we remove subquery it will return properly both prefix and suffix results. As keys are sequential numbers we have roughly 10% of all documents having prefix / suffix of digit '1'.

However, as soon as we do subquery, prefix will not return any results and suffix will. Here are simplified queries.

Prefix

LET z = (
    FOR i IN phone_number
        FILTER i._key LIKE "1%"
        FOR v,e IN 1..1 INBOUND i phone_call
            FILTER v.height > 160
            FILTER e.duration > 300
            RETURN e._id
)
RETURN z

Suffix

LET z = (
    FOR i IN phone_number
        FILTER i._key LIKE "%1"
        FOR v,e IN 1..1 INBOUND i phone_call
            FILTER v.height > 160
            FILTER e.duration > 300
            RETURN e._id
)
RETURN z

Dataset we have is pretty much 100k phone numbers randomly doing 20milion calls - vertex on the other side is person (vertex collection of 1milion records)

@graetzer
Copy link
Contributor

I can't reproduce it unfortunately, could you send over some test data ?

@maxkernbach
Copy link
Contributor

Hi @rackom,

Since we were not able to reproduce this issue, could you please share a data set?

You can send us a message to hackers@arangodb.com (this ML is not public) and attach the dump in that email. This way we can try to reproduce and find the root cause. Please reference the number of this issue in your email.

@maxkernbach
Copy link
Contributor

@rackom

I am closing this issue for now since we have not been able to reproduce it.
In case this problem persists, please send us some test data set as described above and we would be glad to investigate it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants