Wrong skiplist index usage in edge collection #2876

ThomasWeiss · 2017-07-26T07:22:41Z

my environment running ArangoDB

I'm using the latest ArangoDB of the respective release series:

3.1

Mode:

Single-Server

Storage-Engine:

mmfiles

On this operating system:

Windows, version: 10

this is an AQL-related issue:

I'm issuing AQL via:

web interface with this browser: running on this OS: Windows
In an edge collection named hasPublished, I create a skiplist index on the following fields: ['_from', 'publishedAt']
I explain the following query: for p in hasPublished filter p._from == 'from' sort p.publishedAt desc return p and get:

Execution plan:
 Id   NodeType          Est.   Comment
  1   SingletonNode        1   * ROOT
  8   IndexNode            1     - FOR p IN hasPublished   /* edge index scan */
  5   CalculationNode      1       - LET #3 = p.`publishedAt`   /* attribute expression */   /* collections used: p : hasPublished */
  6   SortNode             1       - SORT #3 DESC
  7   ReturnNode           1       - RETURN p

Indexes used:
 By   Type   Collection     Unique   Sparse   Selectivity   Fields               Ranges
  8   edge   hasPublished   false    false        76.09 %   [ `_from`, `_to` ]   (p.`_from` == "from")

I would expect to get the following:

Execution plan:
 Id   NodeType        Est.   Comment
  1   SingletonNode      1   * ROOT
  8   IndexNode          2     - FOR p IN hasPublished   /* reverse skiplist index scan */
  7   ReturnNode         2       - RETURN p

Indexes used:
 By   Type       Collection   Unique   Sparse   Selectivity  Fields                      Ranges
  8   skiplist   hasPublished false    false            n/a  [ `_from`, `publishedAt` ]  (s.`_from` == "user")

The text was updated successfully, but these errors were encountered:

mchacki · 2017-07-26T07:48:35Z

I would expect that too, thanks for reporting, managed to reproduce.
Need some time to investigate.

As a quick fix you could do the following:

FOR p IN hasPublished
  FILTER p._from == 'from'
  AND p.publishedAt >= null // << NOTE HERE, an always true condition
  SORT p.publishedAt DESC
  RETURN p

ThomasWeiss · 2017-07-26T07:52:25Z

Your quick fix returns the same execution plan and doesn't use the right index either (3.1.21 on Windows)

mchacki · 2017-07-26T07:59:28Z

Wierd...

arangod> db._version()
3.1.21
arangod> db._explain('FOR p IN col FILTER p._from == "v/1" && p.publishedAt >= null SORT p.publishedAt DESC RETURN p')
Query string:
 FOR p IN col FILTER p._from == "v/1" && p.publishedAt >= null SORT p.publishedAt DESC RETURN p

Execution plan:
 Id   NodeType        Est.   Comment
  1   SingletonNode      1   * ROOT
  8   IndexNode          2     - FOR p IN col   /* reverse skiplist index scan */
  7   ReturnNode         2       - RETURN p

Indexes used:
 By   Type       Collection   Unique   Sparse   Selectivity   Fields                       Ranges
  8   skiplist   col       false    false            n/a   [ `_from`, `publishedAt` ]   ((p.`_from` == "v/1") && (p.`publishedAt` >= null))

Optimization rules applied:
 Id   RuleName
  1   move-calculations-up
  2   move-filters-up
  3   move-calculations-up-2
  4   move-filters-up-2
  5   use-indexes
  6   remove-filter-covered-by-index
  7   use-index-for-sort
  8   remove-unnecessary-calculations-2

Seems to work on my end...
Can you try and and modify the SORT statement:

SORT p._from DESC, p.publishedAt DESC

mchacki · 2017-07-26T08:00:56Z

Ah is your Skiplist sparse?
If so than replace p.publishedAt >= null with p.publishedAt > null

ThomasWeiss · 2017-07-26T08:05:52Z

Tried all your recommendations but for some reasons, the query is analyzed differently on my machine :)

Query string:
 FOR p IN hasPublished FILTER p._from == "v/1" && p.publishedAt >= null SORT p.publishedAt DESC 
 RETURN p

Execution plan:
 Id   NodeType          Est.   Comment
  1   SingletonNode        1   * ROOT
  8   IndexNode            1     - FOR p IN hasPublished   /* edge index scan */
  9   CalculationNode      1       - LET #1 = (p.`publishedAt` >= null)   /* simple expression */   /* collections used: p : hasPublished */
  4   FilterNode           1       - FILTER #1
  5   CalculationNode      1       - LET #3 = p.`publishedAt`   /* attribute expression */   /* collections used: p : hasPublished */
  6   SortNode             1       - SORT #3 DESC
  7   ReturnNode           1       - RETURN p

Indexes used:
 By   Type   Collection     Unique   Sparse   Selectivity   Fields               Ranges
  8   edge   hasPublished   false    false        76.09 %   [ `_from`, `_to` ]   (p.`_from` == "v/1")

Optimization rules applied:
 Id   RuleName
  1   move-calculations-up
  2   move-filters-up
  3   move-calculations-up-2
  4   move-filters-up-2
  5   use-indexes
  6   remove-filter-covered-by-index

mchacki · 2017-07-26T08:09:15Z

OK thanks for checking, then i need time to investigate, no more quick fix ideas in my head sorry.

ThomasWeiss · 2017-07-26T08:11:33Z

No worries, thanks a lot for your prompt reply and trying to find a workaround. I don't know if the query parser takes index selectivity into account? (which may explain the difference between your results and mine)

mchacki · 2017-07-26T08:18:11Z

Yes it takes the selectivity into account, but afaik the SORTING should always overrule FILTER only.
Especially in the case where FILTER and SORT both use _from and publishedAt attributes.
Indexes always get an "upvote" for every attribute they cover.

mchacki · 2017-07-26T08:22:11Z

So i get exactly your plan if and only if my skiplist is sparse
In this case it decides that it cannot be used for sorting.
Is this the case on your side?

ThomasWeiss · 2017-07-26T08:24:30Z

Nope my skiplist isn't parse:

jsteemann · 2017-07-26T12:28:06Z

I think which index is picked here may depend on the amount of documents in the collection and the selectivity of the indexes.
If these values are different in your setups, then different indexes may be picked. What is the actual number of documents in the collection, and how "unique" are "_from" and the combination of "_from" and "publishedAt" in this collection?

ThomasWeiss · 2017-07-27T04:47:19Z

True that I was trying the query on my dev setup with a nearly empty DB, so:

I tried on the staging DB, where the hasPublished collection contains 2,600+ documents and:

the [_from, _to] edge index selectivity is around 46%
most _from fields are the same
most publishedAt fields are different (they are timestamps)

-> so the combination of the 2 would be rather unique. Still, it uses the [_from, _to] edge index and not the skiplist index.

I went back to my dev DB and purged the collection, which brought the [_from, _to] edge index selectivity to 100% and tried the query again -> this time, the skiplist index got used!

So it would appear that the skiplist index would only be used when the selectivity of the [_from, _to] edge index is very high?

jsteemann · 2017-07-28T09:02:52Z

Should be better with version 3.2.1.
However, the optimizer may still "guess wrong" when multiple indexes are candidates for the query, and at least one of them has no selectivity estimate, such as the skiplist index.
But it seems that skiplist indexes have been overly penalized before.

…ture/js_to_cpp_transaction_handler * 'devel' of https://github.com/arangodb/arangodb: fixed some issues detected by coverity scan (#2915) remove dependency on MMFiles features from non-MMFiles files (#2925) Converted rest handler for explain from JS to C++. (#2907) added startsBefore() for ApplicationFeature (#2913) Converted a portion of the admin routing API from JS to C++ (#2919) Bug fix/predictable results data modifcation multiple fors (#2921) Feature/cpp aql char length (#2883) slightly move responsibility for recovery (#2922) Feature/issue 387 cluster index estimates (#2866) Converted endpoint handler from JS to C++ (#2905) switch to trusty fixed issue #2876 (#2896) fix a race on shutdown (#2897) fixed issue #2868: cname missing from logger-follow results in rocksdb (#2901) make the V8 feature depend on the authentication feature (#2902) change "parameters" to "bindVars" (#2772) Do not allow replication to create/drop collections (#2898) don't mask errors with fake OOM messages (#2872)

jsteemann · 2017-08-11T09:29:49Z

3.2.1 has been released and is available for download.

mchacki self-assigned this Jul 26, 2017

mchacki added 3 Optimizer AQL query optimizer is involved performance labels Jul 26, 2017

jsteemann added a commit that referenced this issue Jul 28, 2017

fixed issue #2876

40240ee

jsteemann added the 3 AQL Query language related label Jul 28, 2017

jsteemann assigned jsteemann and unassigned mchacki Jul 28, 2017

jsteemann added this to the 3.2.1 milestone Jul 28, 2017

jsteemann added the 2 Fixed Resolution label Jul 28, 2017

fceller pushed a commit that referenced this issue Jul 30, 2017

fixed issue #2876 (#2896)

87567c3

jsteemann closed this as completed Aug 11, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong skiplist index usage in edge collection #2876

Wrong skiplist index usage in edge collection #2876

ThomasWeiss commented Jul 26, 2017

mchacki commented Jul 26, 2017

ThomasWeiss commented Jul 26, 2017

mchacki commented Jul 26, 2017

mchacki commented Jul 26, 2017

ThomasWeiss commented Jul 26, 2017

mchacki commented Jul 26, 2017

ThomasWeiss commented Jul 26, 2017

mchacki commented Jul 26, 2017

mchacki commented Jul 26, 2017

ThomasWeiss commented Jul 26, 2017

jsteemann commented Jul 26, 2017

ThomasWeiss commented Jul 27, 2017

jsteemann commented Jul 28, 2017

jsteemann commented Aug 11, 2017

Wrong skiplist index usage in edge collection #2876

Wrong skiplist index usage in edge collection #2876

Comments

ThomasWeiss commented Jul 26, 2017

my environment running ArangoDB

this is an AQL-related issue:

mchacki commented Jul 26, 2017

ThomasWeiss commented Jul 26, 2017

mchacki commented Jul 26, 2017

mchacki commented Jul 26, 2017

ThomasWeiss commented Jul 26, 2017

mchacki commented Jul 26, 2017

ThomasWeiss commented Jul 26, 2017

mchacki commented Jul 26, 2017

mchacki commented Jul 26, 2017

ThomasWeiss commented Jul 26, 2017

jsteemann commented Jul 26, 2017

ThomasWeiss commented Jul 27, 2017

jsteemann commented Jul 28, 2017

jsteemann commented Aug 11, 2017