One-by-one DocValues iteration by advanceExact and nextOrd/ordValue is really slow for bulk operations like facetting. Reading and unpacking integers in blocks is substantially faster but DocValues for now can be queried only for single document.
To apply document-based bulk processing DocIdSetIterator matches have to be splitted to sequential docID runs and remapped to underlying LongValues positions.
After this transformation relatively large linear scans can be performed over packed integers.
To do this two new interfaces
LongValuesCollector (collectValue(long index, long value)).
OrdStatsCollector (collectOrd(long ord), collectMissing(int count)).
and three new functions are introduced
LongValues.forRange(long begin, long end, LongValuesCollector collector)
SortedDocValues.forEach(DocIdSetIterator disi, OrdStatsConsumer collector)
SortedSetDocValues.forEach(DocIdSetIterator disi, OrdStatsConsumer collector)
with reference implementations.
Optimized versions of these functions are provided for:
DirectReader for non-32/64 bits per value cases (using PackedInts.Decoder).
Lucene70DocValuesProducer getSorted and getSortedSet (both sparse and dense).
Measured Solr facetting performance boost is up to 2 - 2.5x on real index.
Patch for Solr DocValuesFacets is also provided as separate file.
Implementation notes:
OrdStatsCollector does not accept document id because it will ruin performance for SortedSetDocValues due to excessive position lookups.
- This patch is fully compatible with Lucene 7.0 DocValues format.

Migrated from LUCENE-8178 by Nikolay Khitrin (@khitrin), 2 votes, updated Feb 20 2018
Attachments: graph.png, LUCENE-8178.patch, LUCENE-8178-for-solr.patch
One-by-one DocValues iteration by
advanceExactandnextOrd/ordValueis really slow for bulk operations like facetting. Reading and unpacking integers in blocks is substantially faster but DocValues for now can be queried only for single document.To apply document-based bulk processing
DocIdSetIteratormatches have to be splitted to sequential docID runs and remapped to underlyingLongValuespositions.After this transformation relatively large linear scans can be performed over packed integers.
To do this two new interfaces
LongValuesCollector(collectValue(long index, long value)).OrdStatsCollector(collectOrd(long ord),collectMissing(int count)).and three new functions are introduced
LongValues.forRange(long begin, long end, LongValuesCollector collector)SortedDocValues.forEach(DocIdSetIterator disi, OrdStatsConsumer collector)SortedSetDocValues.forEach(DocIdSetIterator disi, OrdStatsConsumer collector)with reference implementations.
Optimized versions of these functions are provided for:
DirectReaderfor non-32/64 bits per value cases (usingPackedInts.Decoder).Lucene70DocValuesProducergetSortedandgetSortedSet(both sparse and dense).Measured Solr facetting performance boost is up to 2 - 2.5x on real index.
Patch for Solr
DocValuesFacetsis also provided as separate file.Implementation notes:
OrdStatsCollectordoes not accept document id because it will ruin performance forSortedSetDocValuesdue to excessive position lookups.Migrated from LUCENE-8178 by Nikolay Khitrin (@khitrin), 2 votes, updated Feb 20 2018
Attachments: graph.png, LUCENE-8178.patch, LUCENE-8178-for-solr.patch