AQL memoize subquery or pre-load collection to coordinator before executing query #17244

dan1els · 2022-09-29T20:42:27Z

My Environment

ArangoDB Version: 3.9.0
Deployment Mode: Cluster
Deployment Strategy: Kubernetes
Configuration: 6 coordinators, 6 primaries, 3 agency
Infrastructure: own
Operating System: -
Total RAM in your machine: 32Gb
Disks in use: SSD
Used Package: official Docker library

Component, Query & Data

Affected feature:
AQL

Size of your Dataset on disk:
~70Gb

Problem:

I have a question regarding caching some AQL subqueries on coordinator.
I'm traversing a tree structure recursively, like

 for step1Doc,step1Edge,step1Path in 1..1 INBOUND root edges
      let step1Children = (  
       // subquery call
        for step2Doc,step2Edge,step2Path in 1..1 INBOUND step1Doc edges       
           // subquery call
          let step2Children = (
              // subquery call
            for step3Doc,step3Edge,step3Path in 1..1 INBOUND step2Doc edges
                // subquery call
              let step3Children = (
                 // subquery call
                for step4Doc,step4Edge,step4Path in 1..1 INBOUND step3Doc edges    
                  return distinct step4Doc
              )      
              return step3Children
          )
          ...
  )

And also I have a separate collection with additional metadata (static) related to the each vertex, which is a really small one (about 200 elements). I wish to query this collection for each node on each level to add some data to the result, but doing it drastically increase execution time (from 9 sec. to a minute)

The problem is that the subquery is being evaluated each time for each vertex and too many connections is spawned between coordinator and dbs. It's clearly seen in the profiler report:

 398   SubqueryStartNode   COOR   5252   10390       0.02294               - LET doc4 = ( /* subquery begin */
 399   ScatterNode         COOR   5252   10390       0.04006                     - SCATTER
 400   RemoteNode          DBS   10244   10390       2.31177                      - REMOTE
 329   IndexNode           DBS   10244    6338       0.07971                          - FOR doc IN metadata   /* primary index scan, 1 shard(s) */    
 368   RemoteNode          COOR  10244    6338       2.42944                       - REMOTE
 369   GatherNode          COOR   5129    6338       0.03051                          - GATHER   /* unsorted */
 401   SubqueryEndNode     COOR   5128    5269       0.00300                    - RETURN  doc ) /* subquery end */

Results without subquery:

Query Profile:
 Query Stage           Duration [s]
 initializing               0.00000
 parsing                    0.00028
 optimizing ast             0.00004
 loading collections        0.00000
 instantiating plan         0.00034
 optimizing plan            0.05803
 executing                  9.05962
 finalizing                 0.00117

Results with subquery:

Query Profile:
 Query Stage           Duration [s]
 initializing               0.00001
 parsing                    0.00071
 optimizing ast             0.00009
 loading collections        0.00000
 instantiating plan         0.00175
 optimizing plan            0.05680
 executing                 40.63792
 finalizing                 0.00836

Another bad side effect is that network load is too high.

Expected result:

So, I'm looking for some kind of memoize operation or any other way to explicitly evaluate and the result of subquery, or for some way to preload these kind of collections to the coordinator before the query starts.

Actually, I can execute 3 different queries and then aggregate the results somewhere on the backend if there's no way to get job done inside arangodb.

UPD:
Actually I figured out the next trick. I added next line on the top of the query:

let collectionResults = true ? (subquery) : []

This forced execution of the query on startup and same results are reused.

The text was updated successfully, but these errors were encountered:

dan1els changed the title ~~AQL memoize subquery or preloading collection on coordinator before executing query~~ AQL memoize subquery or pre-load collection to coordinator before executing query Sep 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AQL memoize subquery or pre-load collection to coordinator before executing query #17244

AQL memoize subquery or pre-load collection to coordinator before executing query #17244

dan1els commented Sep 29, 2022 •

edited

AQL memoize subquery or pre-load collection to coordinator before executing query #17244

AQL memoize subquery or pre-load collection to coordinator before executing query #17244

Comments

dan1els commented Sep 29, 2022 • edited

My Environment

Component, Query & Data

dan1els commented Sep 29, 2022 •

edited