LUCENE-9539: Use more compact datastructures for sorting doc-values #1908

s1monw · 2020-09-22T08:52:16Z

This change cuts over from object based datastructures to primitive / compressed datastructures.

jimczi

Sparse sorting, nice! The change looks great to me.
I was wondering if we could also compact the offsets array but that would require an extra sort per field. Although, that's probably not needed since we can also remove the caching of the doc values in order to ensure that we load only one field at a time during a merge.

dweiss · 2020-09-22T11:05:26Z

lucene/core/src/java/org/apache/lucene/index/SortedSetDocValuesWriter.java

+        }
+        if (startOffset != ordOffset) { // do we have any values?
+          offsets[newDocID] = startOffset;
+          builder.add(0); // 0 ord marks next value


Technically you could also have a sign switch as boundary (~value marks first value) but I don't think it'd result in any savings and would only complicate the value stream.

agreed, I didn't do it since it would have impacted readability too much IMO

dweiss · 2020-09-22T11:06:29Z

Nice. +1.

…1908) This change cuts over from object based data-structures to primitive / compressed data-structures.

LUCENE-9539: Use more compact datastructures for sorting doc-values

a1a40c8

jimczi approved these changes Sep 22, 2020

View reviewed changes

s1monw added 4 commits September 22, 2020 11:41

fix imports

9480aca

fix some style issues'

2f57159

improve loop readablility

adb99b5

remove imports

dab2137

dweiss reviewed Sep 22, 2020

View reviewed changes

s1monw added 2 commits September 22, 2020 15:08

add changes

bb93c12

Merge branch 'master' into LUCENE-9539

890dc22

s1monw merged commit c82b994 into apache:master Sep 22, 2020

s1monw deleted the LUCENE-9539 branch September 22, 2020 13:10

s1monw added a commit that referenced this pull request Sep 22, 2020

LUCENE-9539: Use more compact datastructures for sorting doc-values (#…

705faa3

…1908) This change cuts over from object based data-structures to primitive / compressed data-structures.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LUCENE-9539: Use more compact datastructures for sorting doc-values #1908

LUCENE-9539: Use more compact datastructures for sorting doc-values #1908

s1monw commented Sep 22, 2020

jimczi left a comment

dweiss Sep 22, 2020

s1monw Sep 22, 2020

dweiss commented Sep 22, 2020

LUCENE-9539: Use more compact datastructures for sorting doc-values #1908

LUCENE-9539: Use more compact datastructures for sorting doc-values #1908

Conversation

s1monw commented Sep 22, 2020

jimczi left a comment

Choose a reason for hiding this comment

dweiss Sep 22, 2020

Choose a reason for hiding this comment

s1monw Sep 22, 2020

Choose a reason for hiding this comment

dweiss commented Sep 22, 2020