Index not used if attribute has space in the name #20712

DougGarno55 · 2024-03-10T09:04:34Z

My Environment

ArangoDB Version: 3.11.6
Deployment Mode: Single Server
Deployment Strategy: ArangoDB Starter
Configuration: Running locally as localhost:8529
Infrastructure: own
Operating System: Ubuntu 20.04.6 LTS
Total RAM in your machine: 64Gb. (19Gb used, 45Gb available)
Disks in use: SSD (200Gb with 47,318,348 used and 155,729,508 free
Used Package: Ubuntu .deb

Component, Query & Data

Data is simple two collections with two attributes with spaces in the names.
See steps to reproduce below.

Affected feature:
AQL query execution time

AQL query (if applicable):

   FOR docA IN A
   FOR docB IN B
   FILTER docB["Project ID"] == docA["Project ID"]
   RETURN {
      "Project ID": docA["Project ID"],
      "Str Version": docB["Str Version"]
   }

AQL explain and/or profile (if applicable):

Query String (156 chars, cacheable: true):
 FOR docA IN A
 FOR docB IN B
 FILTER docB["Project ID"] == docA["Project ID"]
 RETURN {
 "Project ID": docA["Project ID"],
 "Str Version": docB["Str Version"]
 }
 

Execution plan:
 Id   NodeType                  Est.   Comment
  1   SingletonNode                1   * ROOT
  3   EnumerateCollectionNode      1     - FOR docB IN B   /* full collection scan, projections: `Str Version`, `Project ID` */
  2   EnumerateCollectionNode      1       - FOR docA IN A   /* full collection scan, projections: `Project ID` */
  4   CalculationNode              1         - LET #2 = (docB.`Project ID` == docA.`Project ID`)   /* simple expression */   /* collections used: docB : B, docA : A */
  5   FilterNode                   1         - FILTER #2
  6   CalculationNode              1         - LET #4 = { "Project ID" : docA.`Project ID`, "Str Version" : docB.`Str Version` }   /* simple expression */   /* collections used: docA : A, docB : B */
  7   ReturnNode                   1         - RETURN #4

Indexes used:
 none

Optimization rules applied:
 Id   RuleName
  1   move-calculations-up
  2   move-filters-up
  3   interchange-adjacent-enumerations
  4   move-calculations-up-2
  5   move-filters-up-2
  6   reduce-extraction-to-projection

Steps to reproduce

Create two collections as listed below:

   FOR d IN 1..50000
   INSERT {
      "Project ID": d,
      "Str Version": TO_STRING(d)
   } INTO A

   FOR d IN 1..50000
   INSERT {
      "Project ID": d,
      "Str Version": TO_STRING(d)
   } INTO B

Create an index on Collection A using the Project ID field
Create an index on Collection B using the Project ID field
Enter this query in the web interface:

   FOR docA IN A
   FOR docB IN B
   FILTER docB["Project ID"] == docA["Project ID"]
   RETURN {
      "Project ID": docA["Project ID"],
      "Str Version": docB["Str Version"]
   }

Using the web interface, click the Explain button. See the "Explain" section above.

Problem:
Indexes are not used

Expected result:
Use of index

If the attribute name is Project_ID, then the index are used.

The text was updated successfully, but these errors were encountered:

jsteemann · 2024-03-10T17:32:44Z

@DougGarno55 : I just tried exactly what you wrote above above using 3.11, and for me the indexes are indeed used.
Here is what I did:
Create collections:

arangod> db._create("A");
[ArangoCollection 3157380, "A" (type document, status loaded)]
arangod> db._create("B");
[ArangoCollection 3157383, "B" (type document, status loaded)]

Insert data:

arangod> db._query(`   FOR d IN 1..50000
...>    INSERT {
...>       "Project ID": d,
...>       "Str Version": TO_STRING(d)
...>    } INTO A`);
[object GeneralArrayCursor, count: 0, cached: false]
arangod> db._query(`
...>    FOR d IN 1..50000
...>    INSERT {
...>       "Project ID": d,
...>       "Str Version": TO_STRING(d)
...>    } INTO B`);
[object GeneralArrayCursor, count: 0, cached: false]

Create indexes:

arangod> db.A.ensureIndex({ fields: ["Project ID"], type: "persistent" });
{ 
  "cacheEnabled" : false, 
  "deduplicate" : true, 
  "estimates" : true, 
  "fields" : [ 
    "Project ID" 
  ], 
  "id" : "A/3257402", 
  "name" : "idx_1793161154257747968", 
  "selectivityEstimate" : 0.9998779445868424, 
  "sparse" : false, 
  "type" : "persistent", 
  "unique" : false, 
  "isNewlyCreated" : true 
}
arangod> db.B.ensureIndex({ fields: ["Project ID"], type: "persistent" });
{ 
  "cacheEnabled" : false, 
  "deduplicate" : true, 
  "estimates" : true, 
  "fields" : [ 
    "Project ID" 
  ], 
  "id" : "B/3257410", 
  "name" : "idx_1793161159029817344", 
  "selectivityEstimate" : 0.9998779445868424, 
  "sparse" : false, 
  "type" : "persistent", 
  "unique" : false, 
  "isNewlyCreated" : true 
}

Explain query:

arangod> db._explain(`   FOR docA IN A
...>    FOR docB IN B
...>    FILTER docB["Project ID"] == docA["Project ID"]
...>    RETURN {
...>       "Project ID": docA["Project ID"],
...>       "Str Version": docB["Str Version"]
...>    }`);
Query String (182 chars, cacheable: true):
    FOR docA IN A
    FOR docB IN B
    FILTER docB["Project ID"] == docA["Project ID"]
    RETURN {
       "Project ID": docA["Project ID"],
       "Str Version": docB["Str Version"]
    }

Execution plan:
 Id   NodeType           Est.   Comment
  1   SingletonNode         1   * ROOT
  9   IndexNode         50000     - FOR docA IN A   /* persistent index scan, index only (projections: `Project ID`) */    
  8   IndexNode         50000       - FOR docB IN B   /* persistent index scan, index scan + document lookup (projections: `Str Version`) */    
  6   CalculationNode   50000         - LET #4 = { "Project ID" : docA.`Project ID`, "Str Version" : docB.`Str Version` }   /* simple expression */   /* collections used: docA : A, docB : B */
  7   ReturnNode        50000         - RETURN #4

Indexes used:
 By   Name                      Type         Collection   Unique   Sparse   Cache   Selectivity   Fields             Stored values   Ranges
  9   idx_1793161154257747968   persistent   A            false    false    false       99.99 %   [ `Project ID` ]   [  ]            *
  8   idx_1793161159029817344   persistent   B            false    false    false       99.99 %   [ `Project ID` ]   [  ]            (docB.`Project ID` == docA.`Project ID`)

Optimization rules applied:
 Id   RuleName
  1   move-calculations-up
  2   move-filters-up
  3   move-calculations-up-2
  4   move-filters-up-2
  5   use-indexes
  6   remove-filter-covered-by-index
  7   remove-unnecessary-calculations-2
  8   reduce-extraction-to-projection

79 rule(s) executed, 2 plan(s) created, peak mem [b]: 0, exec time [s]: 0.00032

As you can see, the explain plan shows that the indexes are indeed used.

So I guess you will have used a different way to create indexes on A and B.
How the indexes were created in your case is not shown in your example, but I guess you have created sparse indexes. If your indexes on A and B are sparse, they will indeed not be used for the query in question.

DougGarno55 · 2024-03-11T04:31:57Z

In case it makes any difference, I did everything within the web interface, not the command line shell.
The indexes I used were persistent indexes. Here is a screen shot of the index creation.

jsteemann · 2024-03-11T08:36:16Z

@DougGarno55 : what immediately comes to my mind is that in your screenshot example, you have enclosed the attribute name ("Project ID") in extra double quotes.
I think this is what's causing the problems. Can you try creating the index without enclosing the attribute name in extra double quotes?

DougGarno55 · 2024-03-11T09:10:19Z

AH, that now works, here is the explain output:

Query String (156 chars, cacheable: true):
 FOR docA IN A
 FOR docB IN B
 FILTER docB["Project ID"] == docA["Project ID"]
 RETURN {
 "Project ID": docA["Project ID"],
 "Str Version": docB["Str Version"]
 }
 

Execution plan:
 Id   NodeType           Est.   Comment
  1   SingletonNode         1   * ROOT
  9   IndexNode         50000     - FOR docA IN A   /* persistent index scan, index only, projections: `Project ID` */    
  8   IndexNode         50000       - FOR docB IN B   /* persistent index scan, projections: `Str Version` */    
  6   CalculationNode   50000         - LET #4 = { "Project ID" : docA.`Project ID`, "Str Version" : docB.`Str Version` }   /* simple expression */   /* collections used: docA : A, docB : B */
  7   ReturnNode        50000         - RETURN #4

Indexes used:
 By   Name     Type         Collection   Unique   Sparse   Selectivity   Fields             Ranges
  9   IDX_aa   persistent   A            false    false        99.99 %   [ `Project ID` ]   *
  8   IDX_bb   persistent   B            false    false        99.65 %   [ `Project ID` ]   (docB.`Project ID` == docA.`Project ID`)

Optimization rules applied:
 Id   RuleName
  1   move-calculations-up
  2   move-filters-up
  3   move-calculations-up-2
  4   move-filters-up-2
  5   use-indexes
  6   remove-filter-covered-by-index
  7   remove-unnecessary-calculations-2
  8   reduce-extraction-to-projection

I was under the impression that we have to quote the attribute name if it has spaces.
This is not a problem, user error.
Thanks for the clarification.

dothebart · 2024-03-11T11:07:07Z

Closing as solved.

dothebart added 1 Question 3 Index labels Mar 11, 2024

dothebart added the 2 Solved Resolution label Mar 11, 2024

dothebart closed this as completed Mar 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Index not used if attribute has space in the name #20712

Index not used if attribute has space in the name #20712

DougGarno55 commented Mar 10, 2024

jsteemann commented Mar 10, 2024

DougGarno55 commented Mar 11, 2024

jsteemann commented Mar 11, 2024

DougGarno55 commented Mar 11, 2024

dothebart commented Mar 11, 2024

Index not used if attribute has space in the name #20712

Index not used if attribute has space in the name #20712

Comments

DougGarno55 commented Mar 10, 2024

My Environment

Component, Query & Data

Steps to reproduce

jsteemann commented Mar 10, 2024

DougGarno55 commented Mar 11, 2024

jsteemann commented Mar 11, 2024

DougGarno55 commented Mar 11, 2024

dothebart commented Mar 11, 2024