Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index not used if attribute has space in the name #20712

Closed
DougGarno55 opened this issue Mar 10, 2024 · 5 comments
Closed

Index not used if attribute has space in the name #20712

DougGarno55 opened this issue Mar 10, 2024 · 5 comments

Comments

@DougGarno55
Copy link

My Environment

  • ArangoDB Version: 3.11.6
  • Deployment Mode: Single Server
  • Deployment Strategy: ArangoDB Starter
  • Configuration: Running locally as localhost:8529
  • Infrastructure: own
  • Operating System: Ubuntu 20.04.6 LTS
  • Total RAM in your machine: 64Gb. (19Gb used, 45Gb available)
  • Disks in use: SSD (200Gb with 47,318,348 used and 155,729,508 free
  • Used Package: Ubuntu .deb

Component, Query & Data

Data is simple two collections with two attributes with spaces in the names.
See steps to reproduce below.

Affected feature:
AQL query execution time

AQL query (if applicable):

   FOR docA IN A
   FOR docB IN B
   FILTER docB["Project ID"] == docA["Project ID"]
   RETURN {
      "Project ID": docA["Project ID"],
      "Str Version": docB["Str Version"]
   }

AQL explain and/or profile (if applicable):

Query String (156 chars, cacheable: true):
 FOR docA IN A
 FOR docB IN B
 FILTER docB["Project ID"] == docA["Project ID"]
 RETURN {
 "Project ID": docA["Project ID"],
 "Str Version": docB["Str Version"]
 }
 

Execution plan:
 Id   NodeType                  Est.   Comment
  1   SingletonNode                1   * ROOT
  3   EnumerateCollectionNode      1     - FOR docB IN B   /* full collection scan, projections: `Str Version`, `Project ID` */
  2   EnumerateCollectionNode      1       - FOR docA IN A   /* full collection scan, projections: `Project ID` */
  4   CalculationNode              1         - LET #2 = (docB.`Project ID` == docA.`Project ID`)   /* simple expression */   /* collections used: docB : B, docA : A */
  5   FilterNode                   1         - FILTER #2
  6   CalculationNode              1         - LET #4 = { "Project ID" : docA.`Project ID`, "Str Version" : docB.`Str Version` }   /* simple expression */   /* collections used: docA : A, docB : B */
  7   ReturnNode                   1         - RETURN #4

Indexes used:
 none

Optimization rules applied:
 Id   RuleName
  1   move-calculations-up
  2   move-filters-up
  3   interchange-adjacent-enumerations
  4   move-calculations-up-2
  5   move-filters-up-2
  6   reduce-extraction-to-projection

Steps to reproduce

  1. Create two collections as listed below:
   FOR d IN 1..50000
   INSERT {
      "Project ID": d,
      "Str Version": TO_STRING(d)
   } INTO A

   FOR d IN 1..50000
   INSERT {
      "Project ID": d,
      "Str Version": TO_STRING(d)
   } INTO B
  1. Create an index on Collection A using the Project ID field
  2. Create an index on Collection B using the Project ID field
  3. Enter this query in the web interface:
   FOR docA IN A
   FOR docB IN B
   FILTER docB["Project ID"] == docA["Project ID"]
   RETURN {
      "Project ID": docA["Project ID"],
      "Str Version": docB["Str Version"]
   }
  1. Using the web interface, click the Explain button. See the "Explain" section above.

Problem:
Indexes are not used

Expected result:
Use of index

If the attribute name is Project_ID, then the index are used.

@jsteemann
Copy link
Contributor

@DougGarno55 : I just tried exactly what you wrote above above using 3.11, and for me the indexes are indeed used.
Here is what I did:
Create collections:

arangod> db._create("A");
[ArangoCollection 3157380, "A" (type document, status loaded)]
arangod> db._create("B");
[ArangoCollection 3157383, "B" (type document, status loaded)]

Insert data:

arangod> db._query(`   FOR d IN 1..50000
...>    INSERT {
...>       "Project ID": d,
...>       "Str Version": TO_STRING(d)
...>    } INTO A`);
[object GeneralArrayCursor, count: 0, cached: false]
arangod> db._query(`
...>    FOR d IN 1..50000
...>    INSERT {
...>       "Project ID": d,
...>       "Str Version": TO_STRING(d)
...>    } INTO B`);
[object GeneralArrayCursor, count: 0, cached: false]

Create indexes:

arangod> db.A.ensureIndex({ fields: ["Project ID"], type: "persistent" });
{ 
  "cacheEnabled" : false, 
  "deduplicate" : true, 
  "estimates" : true, 
  "fields" : [ 
    "Project ID" 
  ], 
  "id" : "A/3257402", 
  "name" : "idx_1793161154257747968", 
  "selectivityEstimate" : 0.9998779445868424, 
  "sparse" : false, 
  "type" : "persistent", 
  "unique" : false, 
  "isNewlyCreated" : true 
}
arangod> db.B.ensureIndex({ fields: ["Project ID"], type: "persistent" });
{ 
  "cacheEnabled" : false, 
  "deduplicate" : true, 
  "estimates" : true, 
  "fields" : [ 
    "Project ID" 
  ], 
  "id" : "B/3257410", 
  "name" : "idx_1793161159029817344", 
  "selectivityEstimate" : 0.9998779445868424, 
  "sparse" : false, 
  "type" : "persistent", 
  "unique" : false, 
  "isNewlyCreated" : true 
}

Explain query:

arangod> db._explain(`   FOR docA IN A
...>    FOR docB IN B
...>    FILTER docB["Project ID"] == docA["Project ID"]
...>    RETURN {
...>       "Project ID": docA["Project ID"],
...>       "Str Version": docB["Str Version"]
...>    }`);
Query String (182 chars, cacheable: true):
    FOR docA IN A
    FOR docB IN B
    FILTER docB["Project ID"] == docA["Project ID"]
    RETURN {
       "Project ID": docA["Project ID"],
       "Str Version": docB["Str Version"]
    }

Execution plan:
 Id   NodeType           Est.   Comment
  1   SingletonNode         1   * ROOT
  9   IndexNode         50000     - FOR docA IN A   /* persistent index scan, index only (projections: `Project ID`) */    
  8   IndexNode         50000       - FOR docB IN B   /* persistent index scan, index scan + document lookup (projections: `Str Version`) */    
  6   CalculationNode   50000         - LET #4 = { "Project ID" : docA.`Project ID`, "Str Version" : docB.`Str Version` }   /* simple expression */   /* collections used: docA : A, docB : B */
  7   ReturnNode        50000         - RETURN #4

Indexes used:
 By   Name                      Type         Collection   Unique   Sparse   Cache   Selectivity   Fields             Stored values   Ranges
  9   idx_1793161154257747968   persistent   A            false    false    false       99.99 %   [ `Project ID` ]   [  ]            *
  8   idx_1793161159029817344   persistent   B            false    false    false       99.99 %   [ `Project ID` ]   [  ]            (docB.`Project ID` == docA.`Project ID`)

Optimization rules applied:
 Id   RuleName
  1   move-calculations-up
  2   move-filters-up
  3   move-calculations-up-2
  4   move-filters-up-2
  5   use-indexes
  6   remove-filter-covered-by-index
  7   remove-unnecessary-calculations-2
  8   reduce-extraction-to-projection

79 rule(s) executed, 2 plan(s) created, peak mem [b]: 0, exec time [s]: 0.00032

As you can see, the explain plan shows that the indexes are indeed used.

So I guess you will have used a different way to create indexes on A and B.
How the indexes were created in your case is not shown in your example, but I guess you have created sparse indexes. If your indexes on A and B are sparse, they will indeed not be used for the query in question.

@DougGarno55
Copy link
Author

In case it makes any difference, I did everything within the web interface, not the command line shell.
The indexes I used were persistent indexes. Here is a screen shot of the index creation.

image

@jsteemann
Copy link
Contributor

@DougGarno55 : what immediately comes to my mind is that in your screenshot example, you have enclosed the attribute name ("Project ID") in extra double quotes.
I think this is what's causing the problems. Can you try creating the index without enclosing the attribute name in extra double quotes?

@DougGarno55
Copy link
Author

AH, that now works, here is the explain output:

Query String (156 chars, cacheable: true):
 FOR docA IN A
 FOR docB IN B
 FILTER docB["Project ID"] == docA["Project ID"]
 RETURN {
 "Project ID": docA["Project ID"],
 "Str Version": docB["Str Version"]
 }
 

Execution plan:
 Id   NodeType           Est.   Comment
  1   SingletonNode         1   * ROOT
  9   IndexNode         50000     - FOR docA IN A   /* persistent index scan, index only, projections: `Project ID` */    
  8   IndexNode         50000       - FOR docB IN B   /* persistent index scan, projections: `Str Version` */    
  6   CalculationNode   50000         - LET #4 = { "Project ID" : docA.`Project ID`, "Str Version" : docB.`Str Version` }   /* simple expression */   /* collections used: docA : A, docB : B */
  7   ReturnNode        50000         - RETURN #4

Indexes used:
 By   Name     Type         Collection   Unique   Sparse   Selectivity   Fields             Ranges
  9   IDX_aa   persistent   A            false    false        99.99 %   [ `Project ID` ]   *
  8   IDX_bb   persistent   B            false    false        99.65 %   [ `Project ID` ]   (docB.`Project ID` == docA.`Project ID`)

Optimization rules applied:
 Id   RuleName
  1   move-calculations-up
  2   move-filters-up
  3   move-calculations-up-2
  4   move-filters-up-2
  5   use-indexes
  6   remove-filter-covered-by-index
  7   remove-unnecessary-calculations-2
  8   reduce-extraction-to-projection

I was under the impression that we have to quote the attribute name if it has spaces.
This is not a problem, user error.
Thanks for the clarification.

@dothebart dothebart added the 2 Solved Resolution label Mar 11, 2024
@dothebart
Copy link
Contributor

Closing as solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants