Allow FST builder to use different writer (#12543) #12624

dungba88 · 2023-10-04T22:11:56Z

Description

Split the scratch byte operations inside BytesStore out of FSTWriter and allow the DataOutput to be configurable from FSTCompiler.Builder. Those operations are for building the frontier node. Only the final bytes will be written to the DataOutput. The Builder will still maintain bytesPageBits(int) for backward-compatibility, in which it will use a ByteBuffersDataOutput.

Also extract the saving of metadata as a standalone method, as there can be use case where some already saved the FST to a DataOutput when building and only need to save the metadata here.

Issue: #12543

The FST produced by this DataOutput-backed FST is not readable directly, and users need to construct a corresponding DataInput and pass it to the FST public constructor. In short the FSTCompiler only compile the FST but not read it.

The write-then-read-immediately use case is still supported with ByteBuffersDataOutput.

lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java

dungba88 · 2023-11-10T06:46:00Z

@mikemccand I put out another revision. Basically the idea is to write everything to a DataOutput (BytesStore is also a DataOutput). To support write-then-read-immediately use case that we are able to do today, if the DataOutput implements the FSTReader then we will use it to read (which only BytesStore has currently). Otherwise we are following a general use case, which is to use the FST public constructor to read from previously compiled one. Let me know what you think.

I think a benefit is that it's also easy to drop support for BytesStore or write-then-read-immediately use case if needed.

mikemccand

Thanks @dungba88 -- I left a bunch of head scratching comments :) I feel like this is moving in the wrong-ish direction (adding yet more IO abstractions to FST) when we should be able to eliminate most of FST's bizarre IO classes?

lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java

lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java

lucene/core/src/java/org/apache/lucene/util/fst/Freezeable.java

lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java

mikemccand · 2023-11-13T12:53:26Z

lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java

+  final FSTReader fstReader;
+
+  // buffer to store the scratch bytes before writing to the fstWriter
+  final BytesStore bytes = new BytesStore(DEFAULT_SCRATCH_PAGE_BITS);


Rename to scratchBytes, and maybe say in the comment // buffer to store bytes for the one node we are currently writing?

Also, instead of BytesStore, could we use Lucene's existing oal.store ByteBuffersDataOutput.newResettableInstance()? Let's try to get away from FST implementing so many of its own storage classes/interfaces...

lucene/core/src/java/org/apache/lucene/util/fst/FSTDataOutputWriter.java

mikemccand · 2023-11-13T13:13:27Z

lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java

@@ -337,11 +349,23 @@ public long size() {
    return getPosition();
  }

+  /** Similar to {@link #truncate(long)} with newLen=0 but keep the first block to reduce GC. */
+  public void reset() {


If we can switch to Lucene's existing ByteBuffersDataOutput for our scratch single-node bytes then we can remove this?

The public API of ByteBuffersDataOutput class seems to only allow appending only, but for scratch area we still need to some backward writing right?

Can't we turn it into a byte[] (.toArrayCopy()) and do our own reversal? These byte[] are small, just a single node right?

Also, do you think this change would be simpler if we go and do #12355 first? I suppose that won't make FST building any simpler since we'd likely write the whole thing in reverse (like we do now), but then make 2nd pass to re-reverse the bytes so everything becomes forwards.

Yeah it seems the main complexity is due to the fact that we need to override the previously written bytes, sometimes when we want to remove the labels, sometimes when we need to update the header after writing everything. So I guess making it forward would not eliminate those backward writing operation.

I simplified the writing operation a bit in the direct addressing mode. Seems like we can convert it to append-only mode. But the binary search one seems to be difficult, as when we fix the arc we need to do in backward (I assume to avoid overriding?). Anyhow, the remaining backward operations are:

writeBytes: As mentioned, this is used in the binary search mode. If we first copy the whole buffer to a new one, we can start appending back to back. But this isn't as efficient as the current in-place copy? The whole method seems to be expanding each arc from variable arc length to fixed arc lengths. Anyhow there's only a single place this method is used, and the method is pretty much 1-liner, so I embed it to FSTCompiler.

reverse: This is only done at the end before writing to the DataOutput and the NodeHash.ByteBlockPool

mikemccand · 2023-11-13T13:16:03Z

lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java

@@ -26,7 +26,8 @@
 // TODO: merge with PagedBytes, except PagedBytes doesn't
 // let you read while writing which FST needs

-class BytesStore extends DataOutput implements FSTReader {
+// TODO: Separate the scratch writer and reader functionality
+class BytesStore extends DataOutput implements FSTReader, Freezeable {


Could we remove this class entirely? And callers that want to write FST and immediately use it in RAM should just use ByteBuffersDataOutput for their scratch area?

Do we really need to publish a Freezable interface? Can't it just be a private boolean frozen in this class? Is anything needing freezing besides this BytesStore?

And then FSTCompiler should only get a DataOutput? Hmm this may require implementing a ByteBuffersRandomAccessInput or so, so the reverse (positional) byte reads work.

Could we remove this class entirely? And callers that want to write FST and immediately use it in RAM should just use ByteBuffersDataOutput for their scratch area?

This BytesStore class currently serves as 3 purposes:

Acts as a scratch area writer. Their operations (copying random bytes from one place to other, truncating, skipping the bytes) seems to be complicated to model with other DataOutput

Acts as a FST writer. Any other DataOutput can be used.

Acts as a FST reader. This is tricky as DataOutput does not support read operation, and a separate DataInput needs to be provided.

To replace the third one, I think we should keep both the second and third as a single class, because they have to match (a ByteBufferDataOutput needs to go with the ByteBufferRandomAccessInput). However the third purpose is only useful for write-then-read-immediately, which I assume is not the most common use case. Here we can either use the existing BytesStore, or create something based on ByteBufferDataOutput as you suggested. Both are easily replaceable in the future.

Do we really need to publish a Freezable interface

Now thinking again, I think even the ByteBuffersDataOutput would need freezing as well, otherwise its last block will always be a full block, and might be a waste of RAM. Freezable provides a chance to optimize those datastructure as they are not to be modified again.

Anyhow I removed the interface and the method for now. It's easy to add back.

lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java

mikemccand · 2023-11-28T11:15:10Z

Thanks @dungba88 -- I will catch up with the latest iterations soon. I tested just how much slower the ByteBuffer based store is than the FST's BytesStore:

9.x:

  1> TEST: now verify [fst size=5368709408; nodeCount=2252341463; arcCount=2264078568]
  1> 144.4s 0...
  1> 150.7s 1000000...
  1> 157.1s 2000000...
  1> 163.5s 3000000...
  1> 169.9s 4000000...
  1> 176.2s 5000000...
  1> 182.6s 6000000...
  1> 188.9s 7000000...
  1> 195.3s 8000000...
  1> 201.6s 9000000...
  1> 208.0s 10000000...
  1> 214.3s 11000000...

This PR:

  1> TEST: now verify [fst size=4622366544; nodeCount=2252341629; arcCount=2264078736]
  1> 0.0s 0...
  1> 599.5s 100000...
  1> 1201.7s 200000...
  1> 1797.6s 300000...
  1> 2392.0s 400000...
  1> 2967.8s 500000...
  1> 3532.1s 600000...
  1> 4110.5s 700000...
  1> 4678.4s 800000...
  1> 5247.9s 900000...
  1> 5815.1s 1000000...
  1> 6382.3s 1100000...
  1> 6950.0s 1200000...
  1> 7514.7s 1300000...
  1> 8087.2s 1400000...
  1> 8657.6s 1500000...
  1> 9227.2s 1600000...
  1> 9795.8s 1700000...
  1> 10358.1s 1800000...
  1> 10923.6s 1900000...
  1> 11491.5s 2000000...
  1> 12057.4s 2100000...
  1> 12649.5s 2200000...
  1> 13231.7s 2300000...
  1> 13797.8s 2400000...
  1> 14382.8s 2500000...
  1> 15017.1s 2600000...
  1> 15633.8s 2700000...
  1> 16200.9s 2800000...
  1> 16762.8s 2900000...
  1> 17329.6s 3000000...
  1> 17891.7s 3100000...
  1> 18462.2s 3200000...
  1> 19025.3s 3300000...
  1> 19609.4s 3400000...
  1> 20171.3s 3500000...
  1> 20738.6s 3600000...
  1> 21306.2s 3700000...
  1> 21872.2s 3800000...
  1> 22439.8s 3900000...
  1> 22998.3s 4000000...

More than two orders-of-magnitude (base 10) slower!

+1 to freeze and do the output -> input conversion only once. Thanks @dungba88.

mikemccand · 2023-11-28T11:26:36Z

More than two orders-of-magnitude (base 10) slower!

I wonder: are there other places in Lucene that might fall prey to this performance trap (calling toDataInput frequently while continuing to append bytes to the DataOutput)? FreqProxTermsWriter seems to use this for each term to re-read the postings (maybe for the static sort use case).

dungba88 · 2023-11-29T14:54:35Z

I re-ran the Test2BFST with the new change, it looks much better

2.2B nodes

  1> TEST: now verify [fst size=4621076364; nodeCount=2252341486; arcCount=2264078585]
  1> 0...: took 0 seconds
  1> 1000000...: took 27 seconds
  1> 2000000...: took 54 seconds
  1> 3000000...: took 82 seconds
  1> 4000000...: took 109 seconds
  1> 5000000...: took 137 seconds
  1> 6000000...: took 165 seconds
  1> 7000000...: took 192 seconds
  1> 8000000...: took 219 seconds
  1> 9000000...: took 247 seconds
  1> 10000000...: took 275 seconds
  1> 11000000...: took 302 seconds

3GB

  1> 0...: took 0 seconds
  1> 1000000...: took 3 seconds
  1> 2000000...: took 6 seconds
  1> 3000000...: took 9 seconds
  ...
  1> 140000000...: took 495 seconds

dungba88 · 2023-12-05T04:43:31Z

I tested just how much slower the ByteBuffer based store is than the FST's BytesStore:

I assume this is before the last iteration that does the freeze, is that right? What do you think about the last results?

mikemccand · 2023-12-05T11:53:51Z

I tested just how much slower the ByteBuffer based store is than the FST's BytesStore:

I assume this is before the last iteration that does the freeze, is that right? What do you think about the last results?

Sorry yes -- I'll try to come back to this soon, and re-test. Thanks @dungba88

mikemccand

This is getting close! Thank you for persisting @dungba88 and sorry for the slow turnaround here.

mikemccand · 2023-12-05T12:10:19Z

lucene/core/src/java/org/apache/lucene/util/fst/FST.java

+   *
+   * @param metaOut the DataOutput to write the metadata to
+   * @param out the DataOutput to write the FST bytes to
+   */
  public void save(DataOutput metaOut, DataOutput out) throws IOException {


One need not call this if they passed their own IndexOutput when creating the compiler?

Yeah. It's also not supported (throwing exception). I'll add a comment.

lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java

mikemccand · 2023-12-05T12:26:20Z

lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java

+  private long numBytesWritten;
+
+  /**
+   * Get an on-heap DataOutput that allows the FST to be read immediately after writing.


Add , and also optionally saved to an external DataOutput?

I.e. with these changes we support these possible FST workflows:

Build FST and use it immediately entirely in RAM and then discard it

Build FST and use it immediately entirely in RAM and also save it to disk (any other DataOutput), and load it later and use it

Build FST but stream it immediately to disk (except the FSTMetaData, saved at the end) and you cannot use it when done unless you go and open your own DataInput on the backing store and make a new FST from that

Could we include this enumeration in `FSTCompiler''s class javadocs?

Also: could you update Test2BFSTs to also test the 3rd bullet above? Right now it only tests the first 2 bullets. Actually, maybe create a new test, Test2BFSTsToDisk or so? That way we can limit heap size of that new test and confirm RAM is truly capped. That last bullet fully caps the RAM usage during compilation, yay!

I added a new Test2BFSTOffHeap and running it:

100000: 424 RAM bytes used; 39257811 FST bytes; 19189176 nodes; took 23 seconds 200000: 424 RAM bytes used; 78522623 FST bytes; 38378071 nodes; took 49 seconds 300000: 424 RAM bytes used; 117788163 FST bytes; 57567190 nodes; took 80 seconds 400000: 424 RAM bytes used; 157053095 FST bytes; 76756389 nodes; took 107 seconds 500000: 424 RAM bytes used; 196318494 FST bytes; 95945639 nodes; took 133 seconds 600000: 424 RAM bytes used; 235583412 FST bytes; 115134691 nodes; took 170 seconds 700000: 480 RAM bytes used; 274866378 FST bytes; 134324199 nodes; took 198 seconds 800000: 480 RAM bytes used; 314246540 FST bytes; 153513668 nodes; took 222 seconds 900000: 480 RAM bytes used; 353626848 FST bytes; 172703151 nodes; took 245 seconds 1000000: 480 RAM bytes used; 393006717 FST bytes; 191892620 nodes; took 277 seconds 1100000: 480 RAM bytes used; 432387052 FST bytes; 211082115 nodes; took 311 seconds 1200000: 480 RAM bytes used; 471766692 FST bytes; 230271461 nodes; took 334 seconds 1300000: 480 RAM bytes used; 511147081 FST bytes; 249461034 nodes; took 357 seconds

Yay, very exciting we can now build massive FSTs with "trivial" amounts of heap!! Thank you Tantivy FST implementation :)

mikemccand · 2023-12-05T12:30:26Z

lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java

-     * How many bits wide to make each byte[] block in the BytesStore; if you know the FST will be
-     * large then make this larger. For example 15 bits = 32768 byte pages.
+     * Set the {@link DataOutput} which is used for low-level writing of FST. If you want the FST to
+     * be immediately readable, you need to use a DataOutput that also implements {@link FSTReader},


Does FSTReader need to be public and known to users? Could we make it package private and change this to say "you need to use FSTCompiler#getOnHeapReaderWriter"?

I think when we change the compile() to only return metadata, users need to create the FST with the on-heap FSTReader, and thus it needs to be public.

lucene/core/src/java/org/apache/lucene/util/fst/ReadWriteDataOutput.java

mikemccand · 2023-12-05T12:44:01Z

lucene/core/src/java/org/apache/lucene/util/fst/ReadWriteDataOutput.java

+  public void freeze() {
+    // these operations are costly, so we want to compute it once and cache
+    List<ByteBuffer> byteBuffers = dataOutput.toWriteableBufferList();
+    if (byteBuffers.size() == 1) {


Hmm can we just call .toDataInput()? I think it's already doing this "is it a single buffer" opto? Or was there even poor performance in that case?

I think it didn't handle the single buffer use case, and ByteBuffersDataInput would fall into the same performance regression problem as BytesStore with multiple blocks (which due to the fact that the method is size is larger and won't be inlined by JVM).

I don't think it has anything to do with method sizes. I think it's got more to do with ByteBuffersDataInput wrapping input buffers with asReadOnlyBuffer:

public ByteBuffersDataInput(List<ByteBuffer> buffers) { ensureAssumptions(buffers); this.blocks = buffers.toArray(ByteBuffer[]::new); for (int i = 0; i < blocks.length; ++i) { blocks[i] = blocks[i].asReadOnlyBuffer().order(ByteOrder.LITTLE_ENDIAN); }

It also eagerly preallocates arrays for float and long buffers - something that was added later (I wasn't aware), which may also contribute to performance if you call this method millions of times. But this is only my gut feeling, it'd have to be verified with a profile run.

Sorry for the missing context, I was referring to a regression I found at #10520 instead.

Maybe it was different for ByteBuffersDataInput, or there would be no regression if the block size is just 1. I'll think we need to run some test to verify. Let me know if there's a process for this.

Even for ByteBuffersDataInput in multi-block cases, there's a slight regression over the BytesStore implementation (note that we already called freeze()). The diff is https://github.com/dungba88/lucene/pull/13/files.

ReverseRandomBytesReader with ByteBuffersDataInput

1> TEST: now verify [fst size=4621076364; nodeCount=2252341486; arcCount=2264078585] 1> 0...: took 0 seconds 1> 1000000...: took 27 seconds 1> 2000000...: took 54 seconds 1> 3000000...: took 82 seconds 1> 4000000...: took 109 seconds 1> 5000000...: took 137 seconds 1> 6000000...: took 165 seconds 1> 7000000...: took 192 seconds 1> 8000000...: took 219 seconds 1> 9000000...: took 247 seconds 1> 10000000...: took 275 seconds 1> 11000000...: took 300 seconds

BytesStore-like BytesReader

1> TEST: now verify [fst size=4621076364; nodeCount=2252341486; arcCount=2264078585] 1> 0...: took 0 seconds 1> 1000000...: took 22 seconds 1> 2000000...: took 44 seconds 1> 3000000...: took 66 seconds 1> 4000000...: took 89 seconds 1> 5000000...: took 111 seconds 1> 6000000...: took 133 seconds 1> 7000000...: took 155 seconds 1> 8000000...: took 178 seconds 1> 9000000...: took 200 seconds 1> 10000000...: took 222 seconds 1> 11000000...: took 245 seconds

As this PR is already rather long, I can make the first implementation simple by solely using ByteBuffersDataInput, then open an issue to benchmark different alternatives thoroughly. WDYT?

For this PR, I'll use the more simplified version by solely using ByteBuffersDataInput. I opened another one for the alternative way: #12879. I did see improvement as shown in the log above.

Thanks @dweiss . I think that makes sense. I left this PR to use the simple implementation (i.e just delegate everything to ByteBuffersDataInput). I'll mark the other one as draft, as I still want to play around with the benchmark just for curiosity.

I did see improvement as shown in the log above.

I compared 9.7.x and main + this PR (an earlier rev than the current one, the rev where we re-added .freeze()), running Test2BFSTs. This PR is a bit faster at building (643 seconds vs 677 for the NO_OUTPUTS case), but slower at get (89 seconds vs 60 seconds), but there were other changes since 9.7.x so this is not a pure test. Yet perhaps this asReadOnly wrapping (vs FST's dedicated byte[]-backed BytesStore) is indeed part of that slowdown.

+1 to sticking to the "clean but slower" impl for this PR, and seeing if we can / it's worth clawing back the performance cost/overhead of ByteBuffers in a separate follow-on PR. Maybe @uschindler has some ideas on recovering performance here too ...

Yes, please stick with ByteBuffer. I have a similar suggestion with the stupid groupVInts. Don't splatter our code with highly specialized implementations everywhere. Create a default ByteBuffer impl. There's no overhead as the JVM has optimized ByteBuffer access to make those invisible.

What you see in benchmarks like this is the overhead of C2 jumping in. To make a "correct" benchmark use JMH. In addition: Don't trust profilers, they are useless in those areas as they measure bullshit.

So:

create a JMH benchmark

use ByteBuffer only and use ByteBuffer.wrap() to temporarily create wrappers around other implementations like byte[] or MemorySegment

P.S.: I have not looked into this PR, this is just my general suggestion regarding ByteBuffer and escape analysis, see also #12841.

lucene/core/src/test/org/apache/lucene/util/fst/TestFSTDataOutputWriter.java

mikemccand · 2023-12-05T12:52:36Z

lucene/test-framework/src/java/org/apache/lucene/tests/util/fst/FSTTester.java

+  }
+
+  protected FSTCompiler.Builder<T> getFSTBuilder() {
+    return new FSTCompiler.Builder<>(


Could we randomize whether the "on disk" vs "on heap" DataOutput is used, so that TestFSTs randomly picks? We should randomize all three posibilities:

Create & use on-heap FST

Create on heap, save to disk, load FST from disk

Stream to disk, then load FST from disk

I added this test along with the 2B nodes off-heap

lucene/core/src/test/org/apache/lucene/util/fst/Test2BFSTOffHeap.java

mikemccand · 2023-12-07T10:41:49Z

Since we are struggling to best measure FST performance impact of these changes, I opened a spinoff issue to create a dedicated FST microbenchmark. Having such a compass that's quick to check would really help us make more informed decisions on complex "code complexity vs performance" types of FST changes.

mikemccand

This is super close! I left minor comments.

lucene/core/src/java/org/apache/lucene/util/fst/FST.java

lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java

lucene/core/src/java/org/apache/lucene/util/fst/ReadWriteDataOutput.java

lucene/core/src/test/org/apache/lucene/util/fst/Test2BFSTOffHeap.java

lucene/test-framework/src/java/org/apache/lucene/tests/util/fst/FSTTester.java

dungba88 · 2023-12-07T12:22:37Z

Thanks everyone! I addressed comments, putting a simpler implementation.

+1 to the FST micro benchmarking

mikemccand · 2023-12-07T12:30:02Z

lucene/core/src/java/org/apache/lucene/util/fst/FST.java

@@ -419,7 +418,8 @@ public FST(FSTMetadata<T> metadata, DataInput in, FSTStore fstStore) throws IOEx

  /** Create the FST with a metadata object and a FSTReader. */
  FST(FSTMetadata<T> metadata, FSTReader fstReader) {
-    this.metadata = metadata;
+    assert fstReader != null;


Hmm also this one?

fstReader is not directly passed by users. If the users call the constructor with FSTStore, that cannot be null and it will throw an NPE due to the call fstStore.init()

Aha, super!

mikemccand · 2023-12-07T14:25:30Z

Thanks for persisting @dungba88 -- this was a crazy long and tricky exercise. I'm so excited Lucene can finally build arbitrarily large FSTs with bounded heap usage.

I'll merge this soon and backport, then let's watch builds for any exciting FST related failures...

* Move size() to FSTStore * Remove size() completely * Allow FST builder to use different DataOutput * access BytesStore byte[] directly for copying * Rename BytesStore * Change class to final * Reorder methods * Remove unused methods * Rename truncate to setPosition() and remove skipBytes() * Simplify the writing operations * Update comment * remove unused parameter * Simplify BytesStore operation * tidy code * Rename copyBytes to writeTo * Simplify BytesStore operations * Embed writeBytes() to FSTCompiler * Fix the write bytes method * Remove the default block bits constant * add assertion * Rename method parameter names * Move reverse to FSTCompiler * Revert setPosition call * Address comments * Return immediately when writing 0 bytes * Add comment & * Rename variables * Fix the compile error * Remove isReadable() * Remove isReadable() * Optimize ReadWriteDataOutput * tidy code * Freeze the DataOutput once finished() * Refactor * freeze the DataOutput before use * Address comments and add off-heap FST tests * Remove the hardcoded random * Ignore the Test2BFSTOffHeap test * Simplify ReadWriteDataOutput * Update Javadoc * Address comments

dungba88 mentioned this pull request Oct 5, 2023

FSTCompiler.Builder should have an option to stream the FST bytes directly to Directory #12543

Closed

mikemccand reviewed Oct 9, 2023

View reviewed changes

lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java Outdated Show resolved Hide resolved

dungba88 force-pushed the pr-12543 branch 2 times, most recently from 882f5a5 to 90534ee Compare October 16, 2023 15:41

dungba88 marked this pull request as draft October 24, 2023 05:37

dungba88 force-pushed the pr-12543 branch 2 times, most recently from e93ff38 to 9ef101f Compare October 28, 2023 22:48

dungba88 mentioned this pull request Nov 1, 2023

Use value-based LRU cache in NodeHash #12738

Merged

dungba88 force-pushed the pr-12543 branch 5 times, most recently from bc14a46 to a619079 Compare November 6, 2023 13:12

dungba88 marked this pull request as ready for review November 7, 2023 08:01

dungba88 force-pushed the pr-12543 branch 4 times, most recently from 714b2db to f807a32 Compare November 10, 2023 06:41

dungba88 force-pushed the pr-12543 branch 5 times, most recently from 63bb2fd to 5b05fe6 Compare November 13, 2023 12:08

mikemccand reviewed Nov 13, 2023

View reviewed changes

dungba88 force-pushed the pr-12543 branch 2 times, most recently from a1f2ee2 to cea2933 Compare November 13, 2023 16:23

Move size() to FSTStore

a6259fc

dungba88 force-pushed the pr-12543 branch from 957a6f6 to da44cb4 Compare November 14, 2023 00:42

dungba88 marked this pull request as draft November 14, 2023 05:08

dungba88 added 3 commits November 28, 2023 21:13

Refactor

4823bc1

freeze the DataOutput before use

817ae08

Merge branch 'main' into pr-12543

e0d56f5

dungba88 mentioned this pull request Nov 29, 2023

Do we suboptimally call ByteBuffersDataOutput.toDataInput()? #12852

Open

mikemccand reviewed Dec 5, 2023

View reviewed changes

dungba88 added 2 commits December 6, 2023 00:56

Address comments and add off-heap FST tests

6f7a8c5

Remove the hardcoded random

dba57ff

dungba88 commented Dec 5, 2023

View reviewed changes

lucene/core/src/test/org/apache/lucene/util/fst/Test2BFSTOffHeap.java Outdated Show resolved Hide resolved

Ignore the Test2BFSTOffHeap test

5552271

dungba88 mentioned this pull request Dec 6, 2023

Optimize FST on-heap BytesReader #12879

Merged

dungba88 added 2 commits December 6, 2023 10:56

Simplify ReadWriteDataOutput

6cc31c9

Update Javadoc

4344dce

mikemccand reviewed Dec 7, 2023

View reviewed changes

Address comments

10c9898

mikemccand reviewed Dec 7, 2023

View reviewed changes

mikemccand approved these changes Dec 7, 2023

View reviewed changes

mikemccand merged commit b699981 into apache:main Dec 7, 2023
4 checks passed

mikemccand mentioned this pull request Dec 7, 2023

Disable suffix sharing for block tree index #12722

Merged

This was referenced Dec 10, 2023

Where should we stream FST to disk directly? #12902

Open

Introduce growInRange to reduce array overallocation #12844

Merged

Try out a tantivy's term dictionary format #12513

Open

Allow FST builder to use different writer (#12543) #12624

Allow FST builder to use different writer (#12543) #12624

Conversation

dungba88 commented Oct 4, 2023 • edited

Description

dungba88 commented Nov 10, 2023 • edited

mikemccand left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dungba88 Nov 13, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dungba88 Nov 16, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dungba88 Nov 14, 2023 • edited

Choose a reason for hiding this comment

mikemccand commented Nov 28, 2023

mikemccand commented Nov 28, 2023

dungba88 commented Nov 29, 2023 • edited

dungba88 commented Dec 5, 2023

mikemccand commented Dec 5, 2023

mikemccand left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dungba88 Dec 5, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dungba88 Dec 5, 2023 • edited

Choose a reason for hiding this comment

ReverseRandomBytesReader with ByteBuffersDataInput

BytesStore-like BytesReader

Choose a reason for hiding this comment

mikemccand Dec 7, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mikemccand commented Dec 7, 2023

mikemccand left a comment

Choose a reason for hiding this comment

dungba88 commented Dec 7, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mikemccand commented Dec 7, 2023

dungba88 commented Oct 4, 2023 •

edited

dungba88 commented Nov 10, 2023 •

edited

dungba88 Nov 13, 2023 •

edited

dungba88 Nov 16, 2023 •

edited

dungba88 Nov 14, 2023 •

edited

dungba88 commented Nov 29, 2023 •

edited

dungba88 Dec 5, 2023 •

edited

dungba88 Dec 5, 2023 •

edited

mikemccand Dec 7, 2023 •

edited