Bulk operations for LongValues and Sorted[Set]DocValues [LUCENE-8178]

One-by-one DocValues iteration by `advanceExact` and `nextOrd`/`ordValue` is really slow for bulk operations like facetting. Reading and unpacking integers in blocks is substantially faster but DocValues for now can be queried only for single document.

To apply document-based bulk processing `DocIdSetIterator` matches have to be splitted to sequential docID runs and remapped to underlying `LongValues` positions.
 After this transformation relatively large linear scans can be performed over packed integers.

 

To do this two new interfaces

1. `LongValuesCollector` (`collectValue(long index, long value)`).
 2. `OrdStatsCollector` (`collectOrd(long ord)`, `collectMissing(int count)`).

and three new functions are introduced

1. `LongValues.forRange(long begin, long end, LongValuesCollector collector)`
 2. `SortedDocValues.forEach(DocIdSetIterator disi, OrdStatsConsumer collector)`
 3. `SortedSetDocValues.forEach(DocIdSetIterator disi, OrdStatsConsumer collector)`

with reference implementations.

Optimized versions of these functions are provided for:
 1. `DirectReader` for non-32/64 bits per value cases (using `PackedInts.Decoder`).
 2. `Lucene70DocValuesProducer` `getSorted` and `getSortedSet` (both sparse and dense).

 

Measured Solr facetting performance boost is up to 2 - 2.5x on real index.
 Patch for Solr `DocValuesFacets` is also provided as separate file.

 

Implementation notes:
- `OrdStatsCollector` does not accept document id because it will ruin performance for `SortedSetDocValues` due to excessive position lookups.
- This patch is fully compatible with Lucene 7.0 DocValues format.

![graph.png](https://apache.github.io/lucene-jira-archive/attachments/LUCENE-8178/graph.png)



---
Migrated from [LUCENE-8178](https://issues.apache.org/jira/browse/LUCENE-8178) by Nikolay Khitrin (@khitrin), 2 votes, updated Feb 20 2018
Attachments: [graph.png](https://apache.github.io/lucene-jira-archive/attachments/LUCENE-8178/graph.png), [LUCENE-8178.patch](https://apache.github.io/lucene-jira-archive/attachments/LUCENE-8178/LUCENE-8178.patch), [LUCENE-8178-for-solr.patch](https://apache.github.io/lucene-jira-archive/attachments/LUCENE-8178/LUCENE-8178-for-solr.patch)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bulk operations for LongValues and Sorted[Set]DocValues [LUCENE-8178] #9226

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Bulk operations for LongValues and Sorted[Set]DocValues [LUCENE-8178] #9226

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions