Use group-varint encoding for the tail of postings #12782

easyice · 2023-11-08T13:27:00Z

As discussed in issue #12717

the read performance of group-varint is 14-30% faster than vint, the size 16-248 is the number of ints will be read.
feel free to close the PR if the performance improves is not enough :)

Benchmark                (size)   Mode  Cnt   Score   Error   Units
GroupVInt.readGroupVInt      16  thrpt    5  30.743 ± 5.054  ops/us
GroupVInt.readGroupVInt      32  thrpt    5  14.495 ± 0.606  ops/us
GroupVInt.readGroupVInt      64  thrpt    5   6.930 ± 4.679  ops/us
GroupVInt.readGroupVInt     128  thrpt    5   3.593 ± 0.687  ops/us
GroupVInt.readGroupVInt     248  thrpt    5   2.356 ± 0.073  ops/us
GroupVInt.readVInt           16  thrpt    5  21.437 ± 1.102  ops/us
GroupVInt.readVInt           32  thrpt    5  10.482 ± 3.620  ops/us
GroupVInt.readVInt           64  thrpt    5   5.966 ± 0.707  ops/us
GroupVInt.readVInt          128  thrpt    5   2.750 ± 1.668  ops/us
GroupVInt.readVInt          248  thrpt    5   1.606 ± 0.042  ops/us

jpountz

Thanks for looking into it! I left some questions.

jpountz · 2023-11-09T16:32:52Z

lucene/core/src/java/org/apache/lucene/codecs/lucene99/GroupVintReader.java

+      readVInts(docs, 0, limit);
+      return;
+    }
+    int groupValues = limit / 4 * 4;


We can do this with a single instruction I believe?

Suggested change

int groupValues = limit / 4 * 4;

int groupValues = limit & 0xFFFFFFFC;

nice idea :)

jpountz · 2023-11-09T16:40:17Z

lucene/core/src/java/org/apache/lucene/codecs/lucene99/GroupVintReader.java

+    for (int i = 0; i < groupValues; i++) {
+      cur = i % 4;
+      if (cur == 0) {
+        groupLengths = flagToLengths[Byte.toUnsignedInt(bytes[offset++])];


I wonder if the flagToLengths table approach is best on Java because of bound checks vs. recomputing the length from the flag using shits/masks.

it also looks scary big, like 4KB? it could have negative impact on cpu cache that may not show in a jmh.

jpountz · 2023-11-09T16:40:55Z

lucene/core/src/java/org/apache/lucene/codecs/lucene99/GroupVintReader.java

+      if (cur == 0) {
+        groupLengths = flagToLengths[Byte.toUnsignedInt(bytes[offset++])];
+      }
+      docs[i] = (int) BitUtil.VH_LE_INT.get(bytes, offset) & MASKS[groupLengths[cur]];


And likewise for masks, I wonder if the table lookup is actually better than recomputing the mask.

jpountz · 2023-11-09T16:41:52Z

lucene/core/src/java/org/apache/lucene/codecs/lucene99/GroupVintWriter.java

+ * Encode integers using group-varint. It uses VInt to encode tail values that are not enough for a
+ * group
+ */
+class GroupVintWriter {


Use an upper I for consistency with DataInput#readVInt?

Suggested change

class GroupVintWriter {

class GroupVIntWriter {

+1, sorry for my mistake ;)

easyice · 2023-11-10T04:13:56Z

@jpountz @rmuir Thanks for your suggestions, it's very helpful for me! I will run the benchmark for recomputing length vs table lookup.

easyice · 2023-11-11T14:41:33Z

@jpountz You are right, recomputing the length is faster than table lookup, here is the benchmark when reading the ints, each value will takes 4 bytes:

GroupVInt.readGroupVInt                  16  thrpt    5  11.822 ± 0.187  ops/us
GroupVInt.readGroupVInt                  32  thrpt    5   7.558 ± 0.209  ops/us
GroupVInt.readGroupVInt                  64  thrpt    5   3.556 ± 0.344  ops/us
GroupVInt.readGroupVInt                 128  thrpt    5   1.786 ± 0.145  ops/us
GroupVInt.readGroupVInt                 248  thrpt    5   0.972 ± 0.025  ops/us
GroupVInt.readGroupVIntWithoutTable      16  thrpt    5  19.787 ± 2.848  ops/us
GroupVInt.readGroupVIntWithoutTable      32  thrpt    5  10.162 ± 1.491  ops/us
GroupVInt.readGroupVIntWithoutTable      64  thrpt    5   5.141 ± 0.121  ops/us
GroupVInt.readGroupVIntWithoutTable     128  thrpt    5   2.247 ± 0.017  ops/us
GroupVInt.readGroupVIntWithoutTable     248  thrpt    5   1.183 ± 0.014  ops/us
GroupVInt.readVInt                       16  thrpt    5  12.679 ± 0.405  ops/us
GroupVInt.readVInt                       32  thrpt    5   6.519 ± 0.247  ops/us
GroupVInt.readVInt                       64  thrpt    5   3.218 ± 0.804  ops/us
GroupVInt.readVInt                      128  thrpt    5   1.762 ± 0.096  ops/us
GroupVInt.readVInt                      248  thrpt    5   0.887 ± 0.035  ops/us

But i found that the group-varint encoding is faster than vint only when the value takes up 4 bytes, here is the benchmark for reading 64 int values. numBytesOfInt is the bytes for an int value that will be taken :

Benchmark                            (numBytesOfInt)  (size)   Mode  Cnt  Score   Error   Units
GroupVInt.readGroupVIntWithoutTable                1      64  thrpt    5  5.099 ± 0.147  ops/us
GroupVInt.readGroupVIntWithoutTable                2      64  thrpt    5  4.982 ± 0.632  ops/us
GroupVInt.readGroupVIntWithoutTable                3      64  thrpt    5  5.194 ± 0.163  ops/us
GroupVInt.readGroupVIntWithoutTable                4      64  thrpt    5  4.923 ± 0.092  ops/us
GroupVInt.readVInt                                 1      64  thrpt    5  8.433 ± 0.287  ops/us
GroupVInt.readVInt                                 2      64  thrpt    5  6.309 ± 0.155  ops/us
GroupVInt.readVInt                                 3      64  thrpt    5  5.196 ± 0.213  ops/us
GroupVInt.readVInt                                 4      64  thrpt    5  3.300 ± 0.302  ops/us

The decoding for group-varint takes up constant time. vint decoding is faster when the values take up fewer bytes. so the actual payoff depends on factors like maxDoc.

jpountz · 2023-11-13T12:11:34Z

Could you check in your benchmark under lucene/benchmark-jmh so that we could play with it?

jpountz · 2023-11-13T14:53:28Z

At least in theory, group varint could be made faster than vints even with single-byte integers, because a single check on flag == 0 would tell us that all 4 integers have a single byte. Now, I don't know if we should do it, this doesn't sound like the most common case for doc IDs.

Your change keep mixing doc IDs and frequencies. I wonder if we should write them in separate varint blocks?

easyice · 2023-11-14T11:33:40Z

Thank you @jpountz , I pushed the benchmark code, and added a new comparison between ByteArrayDataInput vs ByteBufferIndexInput . For readVInt, the ByteBufferIndexInput is a bit slower than ByteArrayDataInput. with some minor optimize, group-viarint is faster than vInt when bytes take greater than 1 bytes.

Now, I don't know if we should do it

+1

I wonder if we should write them in separate varint blocks?

It's a good idea, It can use less memory when decoding.

Code

Benchmark                                   (numBytesPerInt)  (size)   Mode  Cnt   Score   Error   Units
GroupVIntBenchmark.byteArrayReadGroupVInt                  1      64  thrpt    5   8.113 ± 1.135  ops/us
GroupVIntBenchmark.byteArrayReadGroupVInt                  2      64  thrpt    5   6.343 ± 0.058  ops/us
GroupVIntBenchmark.byteArrayReadGroupVInt                  3      64  thrpt    5   6.339 ± 0.162  ops/us
GroupVIntBenchmark.byteArrayReadGroupVInt                  4      64  thrpt    5   6.268 ± 0.743  ops/us
GroupVIntBenchmark.byteArrayReadVInt                       1      64  thrpt    5  20.325 ± 0.896  ops/us
GroupVIntBenchmark.byteArrayReadVInt                       2      64  thrpt    5   7.303 ± 0.350  ops/us
GroupVIntBenchmark.byteArrayReadVInt                       3      64  thrpt    5   4.333 ± 0.261  ops/us
GroupVIntBenchmark.byteArrayReadVInt                       4      64  thrpt    5   3.236 ± 0.030  ops/us
GroupVIntBenchmark.byteBufferReadGroupVInt                 1      64  thrpt    5   8.063 ± 0.890  ops/us
GroupVIntBenchmark.byteBufferReadGroupVInt                 2      64  thrpt    5   6.518 ± 0.203  ops/us
GroupVIntBenchmark.byteBufferReadGroupVInt                 3      64  thrpt    5   6.367 ± 0.362  ops/us
GroupVIntBenchmark.byteBufferReadGroupVInt                 4      64  thrpt    5   6.526 ± 0.245  ops/us
GroupVIntBenchmark.byteBufferReadVInt                      1      64  thrpt    5  19.794 ± 1.177  ops/us
GroupVIntBenchmark.byteBufferReadVInt                      2      64  thrpt    5   6.081 ± 0.144  ops/us
GroupVIntBenchmark.byteBufferReadVInt                      3      64  thrpt    5   4.139 ± 0.102  ops/us
GroupVIntBenchmark.byteBufferReadVInt                      4      64  thrpt    5   3.112 ± 0.049  ops/us

Bulk decoding rather than one by one. Write docs and freqs separetely instead of interleaved. Write freqs as regular vints, as the benchmark suggest single-byte vints are fast and freqs are often small. Remove `len`/`numGroup` to save space. Read directly from the directory instead of using an intermediate buffer. This helps save memory copies.

jpountz · 2023-11-17T21:44:07Z

Thanks @easyice. I took some time to look into the benchmark and improve a few things, hopefully you don't mind. Here is the output of the benchmark on my machine now:

Benchmark                                   (numBytesPerInt)  (size)   Mode  Cnt   Score   Error   Units
GroupVIntBenchmark.byteArrayReadGroupVInt                  1      64  thrpt    5  24.483 ± 0.345  ops/us
GroupVIntBenchmark.byteArrayReadGroupVInt                  2      64  thrpt    5  23.346 ± 0.288  ops/us
GroupVIntBenchmark.byteArrayReadGroupVInt                  3      64  thrpt    5  16.318 ± 0.062  ops/us
GroupVIntBenchmark.byteArrayReadGroupVInt                  4      64  thrpt    5  24.748 ± 0.993  ops/us
GroupVIntBenchmark.byteArrayReadVInt                       1      64  thrpt    5  17.767 ± 0.081  ops/us
GroupVIntBenchmark.byteArrayReadVInt                       2      64  thrpt    5   7.256 ± 0.013  ops/us
GroupVIntBenchmark.byteArrayReadVInt                       3      64  thrpt    5   5.546 ± 0.449  ops/us
GroupVIntBenchmark.byteArrayReadVInt                       4      64  thrpt    5   4.475 ± 0.021  ops/us
GroupVIntBenchmark.byteBufferReadGroupVInt                 1      64  thrpt    5  21.812 ± 0.485  ops/us
GroupVIntBenchmark.byteBufferReadGroupVInt                 2      64  thrpt    5  20.623 ± 1.454  ops/us
GroupVIntBenchmark.byteBufferReadGroupVInt                 3      64  thrpt    5  13.601 ± 0.299  ops/us
GroupVIntBenchmark.byteBufferReadGroupVInt                 4      64  thrpt    5  22.649 ± 0.662  ops/us
GroupVIntBenchmark.byteBufferReadVInt                      1      64  thrpt    5  22.147 ± 0.083  ops/us
GroupVIntBenchmark.byteBufferReadVInt                      2      64  thrpt    5   8.072 ± 0.116  ops/us
GroupVIntBenchmark.byteBufferReadVInt                      3      64  thrpt    5   4.554 ± 0.394  ops/us
GroupVIntBenchmark.byteBufferReadVInt                      4      64  thrpt    5   4.145 ± 0.674  ops/us

The benchmark used to read directly from the in-memory byte[] by calling rewind() , I changed that to force it to read directly from the directoly to make the comparison a bit more fair.

jpountz · 2023-11-13T12:23:23Z

lucene/core/src/java/org/apache/lucene/codecs/lucene99/GroupVIntWriter.java

+    buffer[bufferOffset++] = v;
+  }
+
+  public void reset(int numValues) {


Let's remove the numValues parameter since it looks like we can't actually rely on it?

+1. It's more simpler now.

jpountz · 2023-11-17T21:44:44Z

lucene/benchmark-jmh/src/java/org/apache/lucene/benchmark/jmh/GroupVIntBenchmark.java

+    byteBufferVIntIn.seek(0);
+    for (int i = 0; i < size; i++) {
+      values[i] = byteBufferVIntIn.readVInt();
+    }


Do we need to pass the values array to a Blackhole object to make sure that the JVM doesn't optimize away some of the decoding logic?

Thank you very much for your guidance! ...

jpountz · 2023-11-17T21:45:15Z

lucene/core/src/java/org/apache/lucene/codecs/lucene99/GroupVIntReader.java

+  }
+
+  /** only readValues or nextInt can be called after reset */
+  public void readValues(long[] docs, int limit) throws IOException {


I'm not sure we need both a reset() and readValues(), maybe readValues() could take a DataInput directly?

jpountz · 2023-11-17T21:45:57Z

lucene/core/src/java/org/apache/lucene/codecs/lucene99/GroupVIntWriter.java

+      return;
+    }
+    encodeValues(buffer, bufferOffset);
+  }


What about making the API look more like the reader API, ie. replace reset/add/flush with a single writeValues(DataOutput, long[] values, int limit) API?

jpountz · 2023-11-17T21:46:50Z

lucene/core/src/java/org/apache/lucene/codecs/lucene99/GroupVIntWriter.java

+    // encode each group
+    while ((limit - off) >= 4) {
+      // the maximum size of one group is 4 ints + 1 byte flag.
+      bytes = ArrayUtil.grow(bytes, byteOffset + 17);


could we write the data to out here instead of growing the buffer?

+1. but a 17-byte array for single group is still required, because the DataOutput cannot write an integer to the specified number of bytes. Is it okay?

easyice · 2023-11-18T15:22:48Z

Wow, what an incredible speedup! I would not have expected bulk decoding with read directly is so much faster than read from array, Thank you for your time @jpountz , and I'm sorry i didn't try this approach.

easyice

@jpountz Thank a lot for the great suggestions. :)

easyice · 2023-11-18T16:01:16Z

lucene/core/src/java/org/apache/lucene/codecs/lucene99/GroupVIntWriter.java

+    buffer[bufferOffset++] = v;
+  }
+
+  public void reset(int numValues) {


+1. It's more simpler now.

easyice · 2023-11-18T16:11:48Z

lucene/core/src/java/org/apache/lucene/codecs/lucene99/GroupVIntReader.java

+  }
+
+  /** only readValues or nextInt can be called after reset */
+  public void readValues(long[] docs, int limit) throws IOException {


easyice · 2023-11-18T16:13:14Z

lucene/core/src/java/org/apache/lucene/codecs/lucene99/GroupVIntWriter.java

+      return;
+    }
+    encodeValues(buffer, bufferOffset);
+  }


easyice · 2023-11-18T16:40:35Z

lucene/core/src/java/org/apache/lucene/codecs/lucene99/GroupVIntWriter.java

+    // encode each group
+    while ((limit - off) >= 4) {
+      // the maximum size of one group is 4 ints + 1 byte flag.
+      bytes = ArrayUtil.grow(bytes, byteOffset + 17);


+1. but a 17-byte array for single group is still required, because the DataOutput cannot write an integer to the specified number of bytes. Is it okay?

easyice · 2023-11-18T17:21:46Z

lucene/benchmark-jmh/src/java/org/apache/lucene/benchmark/jmh/GroupVIntBenchmark.java

+    byteBufferVIntIn.seek(0);
+    for (int i = 0; i < size; i++) {
+      values[i] = byteBufferVIntIn.readVInt();
+    }


Thank you very much for your guidance! ...

easyice · 2023-11-20T03:07:40Z

I ran some rounds of wikimediumall(sometimes there is noise), It looks a bit speed up :

.doc files were 0.4% larger overall (5.45GB to 5.47GB)

Round 1


                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                      TermDTSort       67.15      (6.4%)       67.35      (5.2%)    0.3% ( -10% -   12%) 0.872
             MedIntervalsOrdered        7.03      (4.2%)        7.07      (4.4%)    0.5% (  -7% -    9%) 0.731
             LowIntervalsOrdered        4.38      (4.3%)        4.40      (4.4%)    0.5% (  -7% -    9%) 0.690
                    OrNotHighMed      178.75      (3.7%)      179.82      (5.8%)    0.6% (  -8% -   10%) 0.697
                          IntNRQ        7.32      (8.5%)        7.36      (8.6%)    0.6% ( -15% -   19%) 0.821
                HighSloppyPhrase        8.50      (3.5%)        8.56      (4.1%)    0.7% (  -6% -    8%) 0.547
                        HighTerm      268.55      (3.7%)      270.62      (5.1%)    0.8% (  -7% -    9%) 0.583
                   OrHighNotHigh      250.85      (4.3%)      253.02      (4.4%)    0.9% (  -7% -    9%) 0.530
               HighTermMonthSort      894.77      (5.0%)      904.83      (6.6%)    1.1% ( -10% -   13%) 0.545
                    OrHighNotMed      188.96      (4.5%)      191.13      (5.2%)    1.1% (  -8% -   11%) 0.455
                      OrHighHigh       23.02      (6.2%)       23.31      (6.8%)    1.3% ( -11% -   15%) 0.543
                     AndHighHigh       18.21      (3.1%)       18.47      (5.2%)    1.4% (  -6% -   10%) 0.302
               HighTermTitleSort       64.91      (4.1%)       65.87      (5.5%)    1.5% (  -7% -   11%) 0.337
                    OrHighNotLow      164.86      (3.3%)      167.51      (5.5%)    1.6% (  -7% -   10%) 0.266
           HighTermDayOfYearSort      141.25      (6.0%)      143.54      (3.5%)    1.6% (  -7% -   11%) 0.298
                    HighSpanNear        2.93      (4.9%)        2.97      (5.2%)    1.6% (  -8% -   12%) 0.309
            HighIntervalsOrdered        1.22      (5.2%)        1.24      (4.7%)    1.7% (  -7% -   12%) 0.294
                      HighPhrase        4.27      (5.0%)        4.34      (6.0%)    1.7% (  -8% -   13%) 0.320
                 MedSloppyPhrase       19.72      (3.3%)       20.08      (4.8%)    1.8% (  -6% -   10%) 0.163
                        Wildcard       44.12      (2.2%)       44.94      (2.4%)    1.9% (  -2% -    6%) 0.011
                     MedSpanNear        5.58      (4.5%)        5.68      (5.3%)    1.9% (  -7% -   12%) 0.234
                   OrNotHighHigh      122.46      (2.9%)      124.76      (5.2%)    1.9% (  -6% -   10%) 0.158
                     LowSpanNear        3.35      (3.8%)        3.42      (4.6%)    1.9% (  -6% -   10%) 0.147
                         MedTerm      216.60      (4.3%)      220.85      (6.6%)    2.0% (  -8% -   13%) 0.264
                 LowSloppyPhrase       14.57      (2.9%)       14.86      (4.0%)    2.0% (  -4% -    9%) 0.072
                       OrHighMed       34.95      (5.3%)       35.66      (6.2%)    2.0% (  -8% -   14%) 0.264
                       OrHighLow      269.80      (4.0%)      275.59      (6.2%)    2.1% (  -7% -   12%) 0.196
                       LowPhrase       80.24      (4.3%)       82.45      (6.7%)    2.8% (  -7% -   14%) 0.122
                       MedPhrase       40.89      (5.0%)       42.10      (6.9%)    3.0% (  -8% -   15%) 0.122
                         LowTerm      233.14      (6.5%)      240.96      (6.7%)    3.4% (  -9% -   17%) 0.108
                      AndHighLow      279.72      (3.6%)      290.01      (5.9%)    3.7% (  -5% -   13%) 0.017
                          Fuzzy2       34.50      (2.5%)       35.87      (4.3%)    4.0% (  -2% -   11%) 0.000
                      AndHighMed       50.26      (4.2%)       52.31      (5.3%)    4.1% (  -5% -   14%) 0.007
                    OrNotHighLow      209.13      (4.1%)      219.47      (5.7%)    4.9% (  -4% -   15%) 0.002
                        PKLookup       84.26      (3.3%)       88.56      (4.7%)    5.1% (  -2% -   13%) 0.000
                          Fuzzy1       38.26      (2.6%)       40.23      (3.2%)    5.1% (   0% -   11%) 0.000
                         Respell       21.14      (2.3%)       22.52      (2.8%)    6.6% (   1% -   11%) 0.000
                         Prefix3       96.26      (4.8%)      104.41      (3.6%)    8.5% (   0% -   17%) 0.000

Round 2


                           TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                          IntNRQ       13.42      (5.4%)       12.79      (2.5%)   -4.7% ( -11% -    3%) 0.000
            HighIntervalsOrdered        4.01      (4.3%)        3.96      (4.5%)   -1.2% (  -9% -    8%) 0.409
             LowIntervalsOrdered        8.26      (3.7%)        8.25      (4.0%)   -0.2% (  -7% -    7%) 0.892
                HighSloppyPhrase        1.22      (2.9%)        1.22      (3.0%)   -0.2% (  -5% -    5%) 0.862
                    HighSpanNear        2.90      (3.0%)        2.91      (4.5%)    0.4% (  -6% -    8%) 0.755
                     LowSpanNear       14.61      (3.9%)       14.71      (5.1%)    0.7% (  -7% -   10%) 0.639
                      TermDTSort       76.54      (4.7%)       77.12      (5.4%)    0.8% (  -8% -   11%) 0.632
                     MedSpanNear        4.14      (3.2%)        4.18      (4.5%)    0.9% (  -6% -    8%) 0.476
                      OrHighHigh        9.30      (8.7%)        9.40      (6.9%)    1.0% ( -13% -   18%) 0.676
                 LowSloppyPhrase       20.69      (6.1%)       20.91      (5.3%)    1.1% (  -9% -   13%) 0.560
                 MedSloppyPhrase        5.21      (2.6%)        5.27      (2.6%)    1.1% (  -4% -    6%) 0.190
             MedIntervalsOrdered       18.39      (3.0%)       18.71      (3.7%)    1.7% (  -4% -    8%) 0.105
               HighTermMonthSort      865.08      (5.2%)      880.31      (4.7%)    1.8% (  -7% -   12%) 0.263
                        Wildcard       67.89      (3.6%)       69.42      (3.8%)    2.3% (  -4% -    9%) 0.053
                    OrHighNotLow      121.81      (6.0%)      124.59      (3.3%)    2.3% (  -6% -   12%) 0.132
           HighTermDayOfYearSort      144.72      (5.2%)      148.06      (3.9%)    2.3% (  -6% -   12%) 0.115
               HighTermTitleSort       63.45      (4.3%)       65.04      (4.4%)    2.5% (  -5% -   11%) 0.069
                       LowPhrase       24.95      (4.4%)       25.70      (5.5%)    3.0% (  -6% -   13%) 0.056
                       MedPhrase       79.79      (6.4%)       82.22      (5.8%)    3.0% (  -8% -   16%) 0.113
                       OrHighMed       31.22      (7.5%)       32.20      (7.3%)    3.1% ( -10% -   19%) 0.179
                   OrHighNotHigh       81.94      (5.7%)       84.83      (3.8%)    3.5% (  -5% -   13%) 0.022
                     AndHighHigh       15.36      (7.6%)       15.94      (5.4%)    3.7% (  -8% -   18%) 0.073
                      HighPhrase       17.98      (4.2%)       18.66      (3.5%)    3.7% (  -3% -   12%) 0.002
                   OrNotHighHigh      157.83      (5.0%)      163.97      (3.6%)    3.9% (  -4% -   13%) 0.005
                    OrNotHighMed      114.61      (6.2%)      119.24      (5.2%)    4.0% (  -6% -   16%) 0.025
                    OrHighNotMed      132.35      (5.3%)      137.75      (3.7%)    4.1% (  -4% -   13%) 0.005
                        HighTerm      197.10      (6.8%)      205.28      (5.2%)    4.2% (  -7% -   17%) 0.030
                        PKLookup       82.78      (3.0%)       86.77      (6.1%)    4.8% (  -4% -   14%) 0.002
                         Prefix3      193.36      (5.7%)      202.80      (4.7%)    4.9% (  -5% -   16%) 0.003
                         MedTerm      255.38      (5.8%)      268.76      (6.2%)    5.2% (  -6% -   18%) 0.006
                      AndHighMed       47.21      (7.3%)       49.78     (10.1%)    5.5% ( -11% -   24%) 0.050
                         Respell       29.43      (2.5%)       31.07      (3.9%)    5.6% (   0% -   12%) 0.000
                          Fuzzy2       35.54      (2.8%)       37.52      (5.2%)    5.6% (  -2% -   14%) 0.000
                         LowTerm      265.21      (7.4%)      280.08      (8.5%)    5.6% (  -9% -   23%) 0.026
                    OrNotHighLow      186.93      (6.7%)      197.72      (4.6%)    5.8% (  -5% -   18%) 0.001
                      AndHighLow      170.86      (5.8%)      182.37      (3.9%)    6.7% (  -2% -   17%) 0.000
                       OrHighLow      186.98      (5.1%)      200.07      (4.8%)    7.0% (  -2% -   17%) 0.000
                          Fuzzy1       39.35      (2.5%)       42.58      (5.1%)    8.2% (   0% -   16%) 0.000

Round 3


                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
             MedIntervalsOrdered       18.38      (4.2%)       18.37      (4.6%)   -0.0% (  -8% -    9%) 0.975
                    HighSpanNear        4.73      (2.7%)        4.77      (2.5%)    0.9% (  -4% -    6%) 0.261
                     MedSpanNear        6.79      (2.7%)        6.86      (3.3%)    1.0% (  -4% -    7%) 0.306
                      TermDTSort       78.98      (7.1%)       79.85      (6.5%)    1.1% ( -11% -   15%) 0.609
            HighIntervalsOrdered        6.24      (4.5%)        6.32      (5.5%)    1.4% (  -8% -   11%) 0.391
                      OrHighHigh       17.95      (4.7%)       18.19      (6.7%)    1.4% (  -9% -   13%) 0.454
             LowIntervalsOrdered        8.38      (3.0%)        8.50      (4.1%)    1.5% (  -5% -    8%) 0.189
                HighSloppyPhrase        3.26      (4.7%)        3.31      (4.8%)    1.6% (  -7% -   11%) 0.285
           HighTermDayOfYearSort      131.00      (5.2%)      133.44      (3.5%)    1.9% (  -6% -   11%) 0.185
               HighTermMonthSort      903.94      (5.6%)      921.63      (5.9%)    2.0% (  -9% -   14%) 0.284
                       OrHighMed       26.33      (4.4%)       26.85      (6.1%)    2.0% (  -8% -   13%) 0.233
                   OrNotHighHigh       87.49      (7.2%)       89.27      (5.2%)    2.0% (  -9% -   15%) 0.304
                     AndHighHigh       11.17      (3.6%)       11.41      (5.6%)    2.2% (  -6% -   11%) 0.146
                 MedSloppyPhrase        7.89      (3.2%)        8.07      (3.8%)    2.4% (  -4% -    9%) 0.033
                        HighTerm      199.48      (7.5%)      205.16      (7.7%)    2.8% ( -11% -   19%) 0.238
                      AndHighMed       30.06      (5.6%)       30.92      (6.9%)    2.9% (  -9% -   16%) 0.150
                    OrNotHighLow      218.73      (6.3%)      225.25      (7.3%)    3.0% (  -9% -   17%) 0.166
                       OrHighLow      172.98      (5.5%)      178.21      (6.5%)    3.0% (  -8% -   15%) 0.114
                       MedPhrase       18.46      (6.0%)       19.02      (5.1%)    3.1% (  -7% -   15%) 0.080
                         MedTerm      237.91      (8.6%)      245.24      (7.2%)    3.1% ( -11% -   20%) 0.219
                   OrHighNotHigh      120.17      (6.9%)      124.04      (4.2%)    3.2% (  -7% -   15%) 0.075
                      HighPhrase       53.66      (5.0%)       55.43      (4.7%)    3.3% (  -6% -   13%) 0.031
               HighTermTitleSort       63.82      (4.8%)       66.00      (5.8%)    3.4% (  -6% -   14%) 0.043
                 LowSloppyPhrase       20.55      (3.2%)       21.26      (3.7%)    3.4% (  -3% -   10%) 0.002
                     LowSpanNear       61.77      (5.5%)       63.92      (5.2%)    3.5% (  -6% -   15%) 0.041
                          IntNRQ       14.57      (8.5%)       15.09     (11.2%)    3.6% ( -14% -   25%) 0.257
                         Respell       23.67      (2.7%)       24.57      (4.2%)    3.8% (  -3% -   11%) 0.001
                    OrHighNotMed      152.63      (6.5%)      158.68      (5.5%)    4.0% (  -7% -   17%) 0.037
                      AndHighLow      283.84      (4.6%)      295.74      (5.9%)    4.2% (  -6% -   15%) 0.012
                          Fuzzy2       32.91      (3.9%)       34.37      (5.7%)    4.4% (  -4% -   14%) 0.004
                        PKLookup       82.30      (4.9%)       86.30      (6.2%)    4.9% (  -5% -   16%) 0.006
                       LowPhrase       28.29      (6.0%)       29.69      (5.8%)    5.0% (  -6% -   17%) 0.008
                    OrHighNotLow      194.24      (7.5%)      204.23      (6.3%)    5.1% (  -8% -   20%) 0.019
                    OrNotHighMed      127.26      (7.7%)      133.91      (7.5%)    5.2% (  -9% -   22%) 0.030
                         LowTerm      207.34      (8.6%)      218.60     (11.3%)    5.4% ( -13% -   27%) 0.088
                          Fuzzy1       43.40      (2.4%)       46.00      (4.5%)    6.0% (   0% -   13%) 0.000
                        Wildcard       37.62      (3.0%)       39.87      (3.7%)    6.0% (   0% -   13%) 0.000
                         Prefix3       34.35      (6.3%)       37.91      (6.5%)   10.4% (  -2% -   24%) 0.000

jpountz

Thanks for running these macro benchmarks, it's good to see that this change is translating into noticeable speedups. I see that fuzzy, wildcard and prefix queries get speedups with very low p values, which is what I'd have expected given that these queries need to visit many low-doc-frequency terms. Overall, the bigger size on disk looks worth the query speedup to me. We made a similar trade-off when switching from PFOR back to FOR for doc blocks.

Thanks for adding a unit test too. The change looks good to me, I just left a minor suggestion. Can you add a CHANGES entry?

jpountz · 2023-11-20T08:35:27Z

lucene/core/src/test/org/apache/lucene/codecs/lucene99/TestGroupVInt.java

+  public void testEncodeDecode() throws IOException {
+    long[] values = new long[ForUtil.BLOCK_SIZE];
+    long[] restored = new long[ForUtil.BLOCK_SIZE];
+    final int iterations = RandomNumbers.randomIntBetween(random(), 50, 1000);


We have an atLeast helper for this kind of thing, it automatically gives more iterations to nightly runs.

Suggested change

final int iterations = RandomNumbers.randomIntBetween(random(), 50, 1000);

final int iterations = atLeast(100);

Wow, It's very nice, Thank you @jpountz

jpountz · 2023-11-20T08:49:06Z

lucene/core/src/java/org/apache/lucene/codecs/lucene99/GroupVIntWriter.java

+      bytes[byteOffset++] = (byte) (v & 0xFF);
+      v >>>= 8;
+    }
+    bytes[byteOffset++] = (byte) v;


I wonder if we can simplify the above loop by using a do...while loop instead of a regular while loop, something like

do { bytes[byteOffset++] = (byte) (v & 0xFF); v >>>= 8; } while (v != 0);

Then we don't need the extra write to bytes after the loop?

It's more simpler, Thank you :)

Co-authored-by: Adrien Grand <jpountz@gmail.com>

jpountz · 2023-11-22T08:13:52Z

There seems to be a speedup on prefix queries in nightly benchmarks, I'll add an annotation.

jpountz · 2023-11-22T08:19:37Z

Also the size increase is hardly noticeable.

jpountz · 2023-11-22T10:01:10Z

For reference, I computed the most frequent flag values on wikibigall, which are the values that might be worth optimizing for:

0x55 (4 2-bytes ints): 29.6%
0xaa (5 3-bytes ints): 6.5%
every other combination is below 3.5%

Now broken down by number of bytes per int:

1 byte: 13.8%
2 bytes: 60.1%
3 bytes: 26.1%
4 bytes: 0%

easyice · 2023-11-22T10:31:46Z

It's very important as a reference! Thanks a lot!

jpountz · 2023-11-22T10:33:24Z

I opened a PR to feed some of this data into the micro benchmark to make it more realistic: #12833.

wjp719 · 2024-02-22T07:43:35Z

@easyice Hi， I have doubt that the encoding data result using group-varint encoding is different from the old way, so is this way compatible with the old index format data?

easyice · 2024-02-22T08:47:22Z

@easyice Hi， I have doubt that the encoding data result using group-varint encoding is different from the old way, so is this way compatible with the old index format data?

This change was released at lucene 9.9.0, which uses a new version of posting format Lucene99PostingsFormat, if you read a old index format, it will use the matching posting format class to decode the index. so it will compatible with the old index.

Have you got specific errors? could you give some detailed message? Thanks!

See https://lucene.apache.org/core/9_9_0/changes/Changes.html#v9.9.0.optimizations

wjp719 · 2024-02-22T11:11:15Z

Have you got specific errors? could you give some detailed message? Thanks!

I have no errors,I didn't realize the new format was used, Thanks.

easyice added 2 commits November 8, 2023 20:58

tidy

75d8a45

fix ci

b74c186

jpountz reviewed Nov 9, 2023

View reviewed changes

easyice added 3 commits November 10, 2023 18:19

fix review

c39a0a9

rename

6caca54

clear

aa11f3b

easyice added 3 commits November 14, 2023 14:10

Merge branch 'main' into posting_groupvint

003621c

add jmh benchmark

6900ce2

minor speed up decode

d01774b

jpountz reviewed Nov 17, 2023

View reviewed changes

fix review

9f7efef

easyice commented Nov 18, 2023

View reviewed changes

Merge branch 'main' into posting_groupvint

87109a0

easyice marked this pull request as ready for review November 18, 2023 17:43

add unit test

092f8cb

jpountz reviewed Nov 20, 2023

View reviewed changes

fix review and add CHANGES.txt

c618176

jpountz merged commit d0f63ec into apache:main Nov 20, 2023
4 checks passed

jpountz added this to the 9.9.0 milestone Nov 20, 2023

jpountz added a commit that referenced this pull request Nov 20, 2023

Use group-varint encoding for the tail of postings (#12782)

36c727c

Co-authored-by: Adrien Grand <jpountz@gmail.com>

jpountz mentioned this pull request Nov 20, 2023

Move group-varint encoding/decoding logic to DataOutput/DataInput? #12826

Closed

slow-J pushed a commit to slow-J/lucene that referenced this pull request Nov 20, 2023

Use group-varint encoding for the tail of postings (apache#12782)

3cb458b

Co-authored-by: Adrien Grand <jpountz@gmail.com>

	int groupValues = limit / 4 * 4;
	int groupValues = limit & 0xFFFFFFFC;

	final int iterations = RandomNumbers.randomIntBetween(random(), 50, 1000);
	final int iterations = atLeast(100);

Use group-varint encoding for the tail of postings #12782

Use group-varint encoding for the tail of postings #12782

Conversation

easyice commented Nov 8, 2023 • edited Loading

jpountz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

easyice commented Nov 10, 2023

easyice commented Nov 11, 2023

jpountz commented Nov 13, 2023

jpountz commented Nov 13, 2023

easyice commented Nov 14, 2023

jpountz commented Nov 17, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

easyice commented Nov 18, 2023 • edited Loading

easyice left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

easyice commented Nov 20, 2023

jpountz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jpountz commented Nov 22, 2023

jpountz commented Nov 22, 2023

jpountz commented Nov 22, 2023

easyice commented Nov 22, 2023

jpountz commented Nov 22, 2023

wjp719 commented Feb 22, 2024

easyice commented Feb 22, 2024

wjp719 commented Feb 22, 2024

easyice commented Nov 8, 2023 •

edited

Loading

easyice commented Nov 18, 2023 •

edited

Loading