Avoid redundant loop for compute min value in DirectMonotonicWriter #12377

easyice · 2023-06-20T06:49:26Z

Description

This small change will reduce an unnecessary loop in DirectMonotonicWriter#flush

gf2121 · 2023-06-20T07:37:05Z

LGTM, Thanks @easyice !

Could you please add a CHANGES entry under 9.8.0 ?

easyice · 2023-06-20T08:50:56Z

@gf2121 Thanks for you quick reply, the CHANGES.txt has updated

mikemccand · 2023-06-20T13:01:35Z

I wonder if this will move the needle in Lucene's nightly benchmarks? Let's watch the charts after this is merged ...

…12377) * Avoid redundant loop for get min value * update CHANGES.txt

mikemccand · 2023-06-20T13:15:28Z

Thank you @easyice -- I merged this and backported to branch_9x as well.

jpountz · 2023-06-20T16:23:10Z

Have we checked if this actually made things faster? I remember getting surprised in the past because folding too much into a single loop would prevent C2 from recognizing thas some bits can be auto-vectorized (like computing the min value across the entire array).

mikemccand · 2023-06-20T17:17:09Z

Have we checked if this actually made things faster? I remember getting surprised in the past because folding too much into a single loop would prevent C2 from recognizing that some bits can be auto-vectorized (like computing the min value across the entire array).

Hmm, tricky. I don't think we've tested if it's actually faster. We could wait for nightlies to see any impact? Or, revert now and benchmark before pushing again? Darned fragile auto-vectorization... if these loops are an example of that, let's at least add a comment explaining so.

jpountz · 2023-06-20T19:34:42Z

I'm fine either way, hopefully this bit of code is not a bottleneck anyway.

easyice · 2023-06-21T07:40:30Z

@mikemccand @jpountz Thank you for your suggestions and fresh perspectives on this change, i wrote a simply benchmark for DirectMonotonicWriter, it will write 500 blocks each loop and observe the minimum time taken, the results appear to be slightly faster, from 492ms->425ms

here is the benchmark:


public class IndexBenchMarks {


    public static void main(final String[] args) throws Exception {
        doWriteMonotonic();
    }

    static void doWriteMonotonic() throws IOException {
        BenchMark benchMark = new BenchMark(50, 50, (1 << BenchMark.DIRECT_MONOTONIC_BLOCK_SHIFT) * 500);
        benchMark.run();
    }

    static class BenchMark {
        final int warmup;
        final int numValues;
        final int loopCount;
        static final int DIRECT_MONOTONIC_BLOCK_SHIFT = 16;
        Directory dir;
        DirectMonotonicWriter writer;
        IndexOutput metaOut;
        IndexOutput dataOut;

        BenchMark(int warmup, int loopCount, int numValues) {
            this.warmup = warmup;
            this.numValues = numValues;
            this.loopCount = loopCount;
        }

        private void init() throws IOException {
            Path tempDir = Files.createTempDirectory(Paths.get("/Volumes/RamDisk"), "tmp");
            dir = MMapDirectory.open(tempDir);
            metaOut = dir.createOutput("meta", IOContext.DEFAULT);
            dataOut = dir.createOutput("data", IOContext.DEFAULT);
        }

        private void close() throws IOException {
            metaOut.close();
            dataOut.close();
            dir.close();
        }

        private void doWrite() throws IOException {
            long v = 100;
            for (int i = 0; i < numValues; i++) {
                if (i % 2 == 0) {
                    v += 5;
                } else {
                    v += 10;
                }
                writer.add(v);
            }
        }

        void run() throws IOException {
            init();
            for (int i = 0; i < warmup; i++) {
                writer = DirectMonotonicWriter.getInstance(metaOut, dataOut, numValues, DIRECT_MONOTONIC_BLOCK_SHIFT);
                doWrite();
                writer.finish();
            }
            System.gc();
            List<Double> times = new ArrayList<>();
            for (int i = 0; i < loopCount; i++) {
                writer = DirectMonotonicWriter.getInstance(metaOut, dataOut, numValues, DIRECT_MONOTONIC_BLOCK_SHIFT);
                long t0 = System.nanoTime();
                doWrite();
                writer.finish();
                times.add((System.nanoTime() - t0) / 1000000D);

            }
            double min = times.stream().mapToDouble(Number::doubleValue).min().getAsDouble();
            System.out.println("took(ms):" + String.format(Locale.ROOT, "%.2f", min));
            close();
        }
    }
}

jpountz · 2023-06-21T08:09:08Z

Great, thanks for checking!

mikemccand · 2023-06-21T13:04:05Z

@jpountz observed that OrdinalMap uses a similar idea but different code path:

lucene/lucene/core/src/java/org/apache/lucene/util/packed/DeltaPackedLongValues.java

Lines 85 to 91 in fe0278e

    
           long min = values[0]; 
        
           for (int i = 1; i < numValues; ++i) { 
        
             min = Math.min(min, values[i]); 
        
           } 
        
           for (int i = 0; i < numValues; ++i) { 
        
             values[i] -= min; 
        
           }

Maybe we should merge these two loops too? (Separately: somehow consolidate these two paths)

mikemccand · 2023-06-21T16:57:56Z

Duh, I think we cannot actually merge those two loops? Seems we must first make a pass to find the min across all values, before subtracting it from all values..

mikemccand · 2023-06-21T16:59:05Z

Though, separately, I wonder why this code does not also use the "best/simple linear fit" compression too ...

easyice · 2023-06-22T02:02:38Z

Though, separately, I wonder why this code does not also use the "best/simple linear fit" compression too ...

Yes, the loop in pack really cannot be merge

Avoid redundant loop for get min value

1ef6e87

update CHANGES.txt

a97dccb

gf2121 approved these changes Jun 20, 2023

View reviewed changes

mikemccand mentioned this pull request Jun 20, 2023

Would DirectMonotonicWriter give a wee bit better compression if we rounded instead of truncated the expected value? #12379

Open

mikemccand approved these changes Jun 20, 2023

View reviewed changes

mikemccand merged commit 37b92ad into apache:main Jun 20, 2023
4 checks passed

asfgit pushed a commit that referenced this pull request Jun 20, 2023

Avoid redundant loop for compute min value in DirectMonotonicWriter (#…

59e424e

…12377) * Avoid redundant loop for get min value * update CHANGES.txt

zhaih added this to the 9.8.0 milestone Sep 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid redundant loop for compute min value in DirectMonotonicWriter #12377

Avoid redundant loop for compute min value in DirectMonotonicWriter #12377

easyice commented Jun 20, 2023

gf2121 commented Jun 20, 2023

easyice commented Jun 20, 2023

mikemccand commented Jun 20, 2023

mikemccand commented Jun 20, 2023

jpountz commented Jun 20, 2023

mikemccand commented Jun 20, 2023

jpountz commented Jun 20, 2023

easyice commented Jun 21, 2023

jpountz commented Jun 21, 2023

mikemccand commented Jun 21, 2023

mikemccand commented Jun 21, 2023

mikemccand commented Jun 21, 2023

easyice commented Jun 22, 2023

Avoid redundant loop for compute min value in DirectMonotonicWriter #12377

Avoid redundant loop for compute min value in DirectMonotonicWriter #12377

Conversation

easyice commented Jun 20, 2023

Description

gf2121 commented Jun 20, 2023

easyice commented Jun 20, 2023

mikemccand commented Jun 20, 2023

mikemccand commented Jun 20, 2023

jpountz commented Jun 20, 2023

mikemccand commented Jun 20, 2023

jpountz commented Jun 20, 2023

easyice commented Jun 21, 2023

jpountz commented Jun 21, 2023

mikemccand commented Jun 21, 2023

mikemccand commented Jun 21, 2023

mikemccand commented Jun 21, 2023

easyice commented Jun 22, 2023