Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid redundant loop for compute min value in DirectMonotonicWriter #12377

Merged
merged 2 commits into from
Jun 20, 2023

Conversation

easyice
Copy link
Contributor

@easyice easyice commented Jun 20, 2023

Description

This small change will reduce an unnecessary loop in DirectMonotonicWriter#flush

@gf2121
Copy link
Contributor

gf2121 commented Jun 20, 2023

LGTM, Thanks @easyice !

Could you please add a CHANGES entry under 9.8.0 ?

@easyice
Copy link
Contributor Author

easyice commented Jun 20, 2023

@gf2121 Thanks for you quick reply, the CHANGES.txt has updated

@mikemccand
Copy link
Member

I wonder if this will move the needle in Lucene's nightly benchmarks? Let's watch the charts after this is merged ...

@mikemccand mikemccand merged commit 37b92ad into apache:main Jun 20, 2023
4 checks passed
asfgit pushed a commit that referenced this pull request Jun 20, 2023
…12377)

* Avoid redundant loop for get min value

* update CHANGES.txt
@mikemccand
Copy link
Member

Thank you @easyice -- I merged this and backported to branch_9x as well.

@jpountz
Copy link
Contributor

jpountz commented Jun 20, 2023

Have we checked if this actually made things faster? I remember getting surprised in the past because folding too much into a single loop would prevent C2 from recognizing thas some bits can be auto-vectorized (like computing the min value across the entire array).

@mikemccand
Copy link
Member

Have we checked if this actually made things faster? I remember getting surprised in the past because folding too much into a single loop would prevent C2 from recognizing that some bits can be auto-vectorized (like computing the min value across the entire array).

Hmm, tricky. I don't think we've tested if it's actually faster. We could wait for nightlies to see any impact? Or, revert now and benchmark before pushing again? Darned fragile auto-vectorization... if these loops are an example of that, let's at least add a comment explaining so.

@jpountz
Copy link
Contributor

jpountz commented Jun 20, 2023

I'm fine either way, hopefully this bit of code is not a bottleneck anyway.

@easyice
Copy link
Contributor Author

easyice commented Jun 21, 2023

@mikemccand @jpountz Thank you for your suggestions and fresh perspectives on this change, i wrote a simply benchmark for DirectMonotonicWriter, it will write 500 blocks each loop and observe the minimum time taken, the results appear to be slightly faster, from 492ms->425ms

here is the benchmark:


public class IndexBenchMarks {


    public static void main(final String[] args) throws Exception {
        doWriteMonotonic();
    }

    static void doWriteMonotonic() throws IOException {
        BenchMark benchMark = new BenchMark(50, 50, (1 << BenchMark.DIRECT_MONOTONIC_BLOCK_SHIFT) * 500);
        benchMark.run();
    }

    static class BenchMark {
        final int warmup;
        final int numValues;
        final int loopCount;
        static final int DIRECT_MONOTONIC_BLOCK_SHIFT = 16;
        Directory dir;
        DirectMonotonicWriter writer;
        IndexOutput metaOut;
        IndexOutput dataOut;

        BenchMark(int warmup, int loopCount, int numValues) {
            this.warmup = warmup;
            this.numValues = numValues;
            this.loopCount = loopCount;
        }

        private void init() throws IOException {
            Path tempDir = Files.createTempDirectory(Paths.get("/Volumes/RamDisk"), "tmp");
            dir = MMapDirectory.open(tempDir);
            metaOut = dir.createOutput("meta", IOContext.DEFAULT);
            dataOut = dir.createOutput("data", IOContext.DEFAULT);
        }

        private void close() throws IOException {
            metaOut.close();
            dataOut.close();
            dir.close();
        }

        private void doWrite() throws IOException {
            long v = 100;
            for (int i = 0; i < numValues; i++) {
                if (i % 2 == 0) {
                    v += 5;
                } else {
                    v += 10;
                }
                writer.add(v);
            }
        }

        void run() throws IOException {
            init();
            for (int i = 0; i < warmup; i++) {
                writer = DirectMonotonicWriter.getInstance(metaOut, dataOut, numValues, DIRECT_MONOTONIC_BLOCK_SHIFT);
                doWrite();
                writer.finish();
            }
            System.gc();
            List<Double> times = new ArrayList<>();
            for (int i = 0; i < loopCount; i++) {
                writer = DirectMonotonicWriter.getInstance(metaOut, dataOut, numValues, DIRECT_MONOTONIC_BLOCK_SHIFT);
                long t0 = System.nanoTime();
                doWrite();
                writer.finish();
                times.add((System.nanoTime() - t0) / 1000000D);

            }
            double min = times.stream().mapToDouble(Number::doubleValue).min().getAsDouble();
            System.out.println("took(ms):" + String.format(Locale.ROOT, "%.2f", min));
            close();
        }
    }
}

@jpountz
Copy link
Contributor

jpountz commented Jun 21, 2023

Great, thanks for checking!

@mikemccand
Copy link
Member

@jpountz observed that OrdinalMap uses a similar idea but different code path:

long min = values[0];
for (int i = 1; i < numValues; ++i) {
min = Math.min(min, values[i]);
}
for (int i = 0; i < numValues; ++i) {
values[i] -= min;
}

Maybe we should merge these two loops too? (Separately: somehow consolidate these two paths)

@mikemccand
Copy link
Member

Duh, I think we cannot actually merge those two loops? Seems we must first make a pass to find the min across all values, before subtracting it from all values..

@mikemccand
Copy link
Member

Though, separately, I wonder why this code does not also use the "best/simple linear fit" compression too ...

@easyice
Copy link
Contributor Author

easyice commented Jun 22, 2023

Though, separately, I wonder why this code does not also use the "best/simple linear fit" compression too ...

Yes, the loop in pack really cannot be merge

@zhaih zhaih added this to the 9.8.0 milestone Sep 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants