Improve cost estimation in SortedSetDocValuesRangeQuery when using DocValuesSkipper#16061
Conversation
|
Might be me not grokking things properly but I don't see where this is changing the cost() implementation? |
This supplier will use the cost of the iterator which will be maxDocId - minDocId. |
| score(), scoreMode, context.reader().maxDoc()); | ||
| } | ||
| final DocIdSetIterator psIterator = | ||
| getDocIdSetIteratorForDensePrimarySort( |
There was a problem hiding this comment.
This is the bit that is called out as expensive below so I'm not sure if we should be doing it in the scorerSupplier call directly? A couple of weeks back we were discussing ways of improving cost estimates using skippers, so I wonder if that's a better way to go?
There was a problem hiding this comment.
I think that comment predates skippers and it is because the resolution of the ordinals. But sure, if we don't wan to to do any IO during the construction of the supplier we shouldn't do this.
My question is, should we then modify SortedNumericDocValuesRangeQuery?
There was a problem hiding this comment.
I have been thinking how to estimate the cost inside the ScorerSupplier without having to create the full iterator. The idea is that in cost we will only visit the skipper and estimate the cost, similar to what we do in PointValues. When building the iterator we will visit the doc values if necessary.
We need to keep a bit of state around but it is not too bad, let me know what you think.
There was a problem hiding this comment.
I like this a lot! I wonder if its possible to use SkipBlockRangeIterator to simplify the code that finds bounds a bit?
There was a problem hiding this comment.
I think we want to use the skipper directly here, I noticed we are being a bit inefficient here. For example:
skipper.advance(minOrd, Long.MAX_VALUE);
skipperMinDocId = skipper.minDocID(0);
skipperMinDocIdExact = false;
we are visiting always the doc values but it can be change by:
skipper.advance(minOrd, Long.MAX_VALUE);
skipperMinDocId = skipper.minDocID(0);
skipperMinDocIdExact = skipper.minValue(0) == minOrd;
So we only visit the doc values if the block does not start from the minOrd. This should be true very often because of the way we are building the blocks. There are other tricks from the maxOrd too!
There was a problem hiding this comment.
I pushed the optimizations I was thinking about in 2d89064
I noticed that the current implementation always return values.cost() because we are delaying the construction of the iterator until the last moment. This is because we need to do some lookups for resolving the ordinals of the query.
Still when there is a skipper and the index is dense and used as primary sort, we have a very fast way to compute the iterator and the exact cost so let's move that conditions when building the score supplier. This is inline to what we are doing in SortedNumericDocValuesRangeQuery.