Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow FST builder to use different writer (#12543) #12624

Merged
merged 52 commits into from Dec 7, 2023

Conversation

dungba88
Copy link
Contributor

@dungba88 dungba88 commented Oct 4, 2023

Description

Split the scratch byte operations inside BytesStore out of FSTWriter and allow the DataOutput to be configurable from FSTCompiler.Builder. Those operations are for building the frontier node. Only the final bytes will be written to the DataOutput. The Builder will still maintain bytesPageBits(int) for backward-compatibility, in which it will use a ByteBuffersDataOutput.

Also extract the saving of metadata as a standalone method, as there can be use case where some already saved the FST to a DataOutput when building and only need to save the metadata here.

Issue: #12543

The FST produced by this DataOutput-backed FST is not readable directly, and users need to construct a corresponding DataInput and pass it to the FST public constructor. In short the FSTCompiler only compile the FST but not read it.

The write-then-read-immediately use case is still supported with ByteBuffersDataOutput.

@dungba88 dungba88 force-pushed the pr-12543 branch 2 times, most recently from 882f5a5 to 90534ee Compare October 16, 2023 15:41
@dungba88 dungba88 marked this pull request as draft October 24, 2023 05:37
@dungba88 dungba88 force-pushed the pr-12543 branch 2 times, most recently from e93ff38 to 9ef101f Compare October 28, 2023 22:48
@dungba88 dungba88 force-pushed the pr-12543 branch 5 times, most recently from bc14a46 to a619079 Compare November 6, 2023 13:12
@dungba88 dungba88 marked this pull request as ready for review November 7, 2023 08:01
@dungba88 dungba88 force-pushed the pr-12543 branch 4 times, most recently from 714b2db to f807a32 Compare November 10, 2023 06:41
@dungba88
Copy link
Contributor Author

dungba88 commented Nov 10, 2023

@mikemccand I put out another revision. Basically the idea is to write everything to a DataOutput (BytesStore is also a DataOutput). To support write-then-read-immediately use case that we are able to do today, if the DataOutput implements the FSTReader then we will use it to read (which only BytesStore has currently). Otherwise we are following a general use case, which is to use the FST public constructor to read from previously compiled one. Let me know what you think.

I think a benefit is that it's also easy to drop support for BytesStore or write-then-read-immediately use case if needed.

@dungba88 dungba88 force-pushed the pr-12543 branch 5 times, most recently from 63bb2fd to 5b05fe6 Compare November 13, 2023 12:08
Copy link
Member

@mikemccand mikemccand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dungba88 -- I left a bunch of head scratching comments :) I feel like this is moving in the wrong-ish direction (adding yet more IO abstractions to FST) when we should be able to eliminate most of FST's bizarre IO classes?

final FSTReader fstReader;

// buffer to store the scratch bytes before writing to the fstWriter
final BytesStore bytes = new BytesStore(DEFAULT_SCRATCH_PAGE_BITS);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to scratchBytes, and maybe say in the comment // buffer to store bytes for the one node we are currently writing?

Also, instead of BytesStore, could we use Lucene's existing oal.store ByteBuffersDataOutput.newResettableInstance()? Let's try to get away from FST implementing so many of its own storage classes/interfaces...

@@ -337,11 +349,23 @@ public long size() {
return getPosition();
}

/** Similar to {@link #truncate(long)} with newLen=0 but keep the first block to reduce GC. */
public void reset() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can switch to Lucene's existing ByteBuffersDataOutput for our scratch single-node bytes then we can remove this?

Copy link
Contributor Author

@dungba88 dungba88 Nov 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The public API of ByteBuffersDataOutput class seems to only allow appending only, but for scratch area we still need to some backward writing right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we turn it into a byte[] (.toArrayCopy()) and do our own reversal? These byte[] are small, just a single node right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, do you think this change would be simpler if we go and do #12355 first? I suppose that won't make FST building any simpler since we'd likely write the whole thing in reverse (like we do now), but then make 2nd pass to re-reverse the bytes so everything becomes forwards.

Copy link
Contributor Author

@dungba88 dungba88 Nov 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it seems the main complexity is due to the fact that we need to override the previously written bytes, sometimes when we want to remove the labels, sometimes when we need to update the header after writing everything. So I guess making it forward would not eliminate those backward writing operation.

I simplified the writing operation a bit in the direct addressing mode. Seems like we can convert it to append-only mode. But the binary search one seems to be difficult, as when we fix the arc we need to do in backward (I assume to avoid overriding?). Anyhow, the remaining backward operations are:

  • writeBytes: As mentioned, this is used in the binary search mode. If we first copy the whole buffer to a new one, we can start appending back to back. But this isn't as efficient as the current in-place copy? The whole method seems to be expanding each arc from variable arc length to fixed arc lengths. Anyhow there's only a single place this method is used, and the method is pretty much 1-liner, so I embed it to FSTCompiler.
  • reverse: This is only done at the end before writing to the DataOutput and the NodeHash.ByteBlockPool

@@ -26,7 +26,8 @@
// TODO: merge with PagedBytes, except PagedBytes doesn't
// let you read while writing which FST needs

class BytesStore extends DataOutput implements FSTReader {
// TODO: Separate the scratch writer and reader functionality
class BytesStore extends DataOutput implements FSTReader, Freezeable {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we remove this class entirely? And callers that want to write FST and immediately use it in RAM should just use ByteBuffersDataOutput for their scratch area?

Do we really need to publish a Freezable interface? Can't it just be a private boolean frozen in this class? Is anything needing freezing besides this BytesStore?

And then FSTCompiler should only get a DataOutput? Hmm this may require implementing a ByteBuffersRandomAccessInput or so, so the reverse (positional) byte reads work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we remove this class entirely? And callers that want to write FST and immediately use it in RAM should just use ByteBuffersDataOutput for their scratch area?

This BytesStore class currently serves as 3 purposes:

  • Acts as a scratch area writer. Their operations (copying random bytes from one place to other, truncating, skipping the bytes) seems to be complicated to model with other DataOutput
  • Acts as a FST writer. Any other DataOutput can be used.
  • Acts as a FST reader. This is tricky as DataOutput does not support read operation, and a separate DataInput needs to be provided.

To replace the third one, I think we should keep both the second and third as a single class, because they have to match (a ByteBufferDataOutput needs to go with the ByteBufferRandomAccessInput). However the third purpose is only useful for write-then-read-immediately, which I assume is not the most common use case. Here we can either use the existing BytesStore, or create something based on ByteBufferDataOutput as you suggested. Both are easily replaceable in the future.

Copy link
Contributor Author

@dungba88 dungba88 Nov 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to publish a Freezable interface

Now thinking again, I think even the ByteBuffersDataOutput would need freezing as well, otherwise its last block will always be a full block, and might be a waste of RAM. Freezable provides a chance to optimize those datastructure as they are not to be modified again.

Anyhow I removed the interface and the method for now. It's easy to add back.

@dungba88 dungba88 force-pushed the pr-12543 branch 2 times, most recently from a1f2ee2 to cea2933 Compare November 13, 2023 16:23
@mikemccand
Copy link
Member

Thanks @dungba88 -- I will catch up with the latest iterations soon. I tested just how much slower the ByteBuffer based store is than the FST's BytesStore:

9.x:

  1> TEST: now verify [fst size=5368709408; nodeCount=2252341463; arcCount=2264078568]
  1> 144.4s 0...
  1> 150.7s 1000000...
  1> 157.1s 2000000...
  1> 163.5s 3000000...
  1> 169.9s 4000000...
  1> 176.2s 5000000...
  1> 182.6s 6000000...
  1> 188.9s 7000000...
  1> 195.3s 8000000...
  1> 201.6s 9000000...
  1> 208.0s 10000000...
  1> 214.3s 11000000...

This PR:

  1> TEST: now verify [fst size=4622366544; nodeCount=2252341629; arcCount=2264078736]
  1> 0.0s 0...
  1> 599.5s 100000...
  1> 1201.7s 200000...
  1> 1797.6s 300000...
  1> 2392.0s 400000...
  1> 2967.8s 500000...
  1> 3532.1s 600000...
  1> 4110.5s 700000...
  1> 4678.4s 800000...
  1> 5247.9s 900000...
  1> 5815.1s 1000000...
  1> 6382.3s 1100000...
  1> 6950.0s 1200000...
  1> 7514.7s 1300000...
  1> 8087.2s 1400000...
  1> 8657.6s 1500000...
  1> 9227.2s 1600000...
  1> 9795.8s 1700000...
  1> 10358.1s 1800000...
  1> 10923.6s 1900000...
  1> 11491.5s 2000000...
  1> 12057.4s 2100000...
  1> 12649.5s 2200000...
  1> 13231.7s 2300000...
  1> 13797.8s 2400000...
  1> 14382.8s 2500000...
  1> 15017.1s 2600000...
  1> 15633.8s 2700000...
  1> 16200.9s 2800000...
  1> 16762.8s 2900000...
  1> 17329.6s 3000000...
  1> 17891.7s 3100000...
  1> 18462.2s 3200000...
  1> 19025.3s 3300000...
  1> 19609.4s 3400000...
  1> 20171.3s 3500000...
  1> 20738.6s 3600000...
  1> 21306.2s 3700000...
  1> 21872.2s 3800000...
  1> 22439.8s 3900000...
  1> 22998.3s 4000000...

More than two orders-of-magnitude (base 10) slower!

+1 to freeze and do the output -> input conversion only once. Thanks @dungba88.

@mikemccand
Copy link
Member

More than two orders-of-magnitude (base 10) slower!

I wonder: are there other places in Lucene that might fall prey to this performance trap (calling toDataInput frequently while continuing to append bytes to the DataOutput)? FreqProxTermsWriter seems to use this for each term to re-read the postings (maybe for the static sort use case).

@dungba88
Copy link
Contributor Author

dungba88 commented Nov 29, 2023

I re-ran the Test2BFST with the new change, it looks much better

2.2B nodes

  1> TEST: now verify [fst size=4621076364; nodeCount=2252341486; arcCount=2264078585]
  1> 0...: took 0 seconds
  1> 1000000...: took 27 seconds
  1> 2000000...: took 54 seconds
  1> 3000000...: took 82 seconds
  1> 4000000...: took 109 seconds
  1> 5000000...: took 137 seconds
  1> 6000000...: took 165 seconds
  1> 7000000...: took 192 seconds
  1> 8000000...: took 219 seconds
  1> 9000000...: took 247 seconds
  1> 10000000...: took 275 seconds
  1> 11000000...: took 302 seconds

3GB

  1> 0...: took 0 seconds
  1> 1000000...: took 3 seconds
  1> 2000000...: took 6 seconds
  1> 3000000...: took 9 seconds
  ...
  1> 140000000...: took 495 seconds

@dungba88
Copy link
Contributor Author

dungba88 commented Dec 5, 2023

I tested just how much slower the ByteBuffer based store is than the FST's BytesStore:

I assume this is before the last iteration that does the freeze, is that right? What do you think about the last results?

@mikemccand
Copy link
Member

I tested just how much slower the ByteBuffer based store is than the FST's BytesStore:

I assume this is before the last iteration that does the freeze, is that right? What do you think about the last results?

Sorry yes -- I'll try to come back to this soon, and re-test. Thanks @dungba88

Copy link
Member

@mikemccand mikemccand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is getting close! Thank you for persisting @dungba88 and sorry for the slow turnaround here.

*
* @param metaOut the DataOutput to write the metadata to
* @param out the DataOutput to write the FST bytes to
*/
public void save(DataOutput metaOut, DataOutput out) throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One need not call this if they passed their own IndexOutput when creating the compiler?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. It's also not supported (throwing exception). I'll add a comment.

private long numBytesWritten;

/**
* Get an on-heap DataOutput that allows the FST to be read immediately after writing.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add , and also optionally saved to an external DataOutput?

I.e. with these changes we support these possible FST workflows:

  • Build FST and use it immediately entirely in RAM and then discard it
  • Build FST and use it immediately entirely in RAM and also save it to disk (any other DataOutput), and load it later and use it
  • Build FST but stream it immediately to disk (except the FSTMetaData, saved at the end) and you cannot use it when done unless you go and open your own DataInput on the backing store and make a new FST from that

Could we include this enumeration in `FSTCompiler''s class javadocs?

Also: could you update Test2BFSTs to also test the 3rd bullet above? Right now it only tests the first 2 bullets. Actually, maybe create a new test, Test2BFSTsToDisk or so? That way we can limit heap size of that new test and confirm RAM is truly capped. That last bullet fully caps the RAM usage during compilation, yay!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a new Test2BFSTOffHeap and running it:

100000: 424 RAM bytes used; 39257811 FST bytes; 19189176 nodes; took 23 seconds
200000: 424 RAM bytes used; 78522623 FST bytes; 38378071 nodes; took 49 seconds
300000: 424 RAM bytes used; 117788163 FST bytes; 57567190 nodes; took 80 seconds
400000: 424 RAM bytes used; 157053095 FST bytes; 76756389 nodes; took 107 seconds
500000: 424 RAM bytes used; 196318494 FST bytes; 95945639 nodes; took 133 seconds
600000: 424 RAM bytes used; 235583412 FST bytes; 115134691 nodes; took 170 seconds
700000: 480 RAM bytes used; 274866378 FST bytes; 134324199 nodes; took 198 seconds
800000: 480 RAM bytes used; 314246540 FST bytes; 153513668 nodes; took 222 seconds
900000: 480 RAM bytes used; 353626848 FST bytes; 172703151 nodes; took 245 seconds
1000000: 480 RAM bytes used; 393006717 FST bytes; 191892620 nodes; took 277 seconds
1100000: 480 RAM bytes used; 432387052 FST bytes; 211082115 nodes; took 311 seconds
1200000: 480 RAM bytes used; 471766692 FST bytes; 230271461 nodes; took 334 seconds
1300000: 480 RAM bytes used; 511147081 FST bytes; 249461034 nodes; took 357 seconds

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay, very exciting we can now build massive FSTs with "trivial" amounts of heap!! Thank you Tantivy FST implementation :)

* How many bits wide to make each byte[] block in the BytesStore; if you know the FST will be
* large then make this larger. For example 15 bits = 32768 byte pages.
* Set the {@link DataOutput} which is used for low-level writing of FST. If you want the FST to
* be immediately readable, you need to use a DataOutput that also implements {@link FSTReader},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does FSTReader need to be public and known to users? Could we make it package private and change this to say "you need to use FSTCompiler#getOnHeapReaderWriter"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think when we change the compile() to only return metadata, users need to create the FST with the on-heap FSTReader, and thus it needs to be public.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

public void freeze() {
// these operations are costly, so we want to compute it once and cache
List<ByteBuffer> byteBuffers = dataOutput.toWriteableBufferList();
if (byteBuffers.size() == 1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm can we just call .toDataInput()? I think it's already doing this "is it a single buffer" opto? Or was there even poor performance in that case?

Copy link
Contributor Author

@dungba88 dungba88 Dec 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it didn't handle the single buffer use case, and ByteBuffersDataInput would fall into the same performance regression problem as BytesStore with multiple blocks (which due to the fact that the method is size is larger and won't be inlined by JVM).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it has anything to do with method sizes. I think it's got more to do with ByteBuffersDataInput wrapping input buffers with asReadOnlyBuffer:

  public ByteBuffersDataInput(List<ByteBuffer> buffers) {
    ensureAssumptions(buffers);

    this.blocks = buffers.toArray(ByteBuffer[]::new);
    for (int i = 0; i < blocks.length; ++i) {
      blocks[i] = blocks[i].asReadOnlyBuffer().order(ByteOrder.LITTLE_ENDIAN);
    }

It also eagerly preallocates arrays for float and long buffers - something that was added later (I wasn't aware), which may also contribute to performance if you call this method millions of times. But this is only my gut feeling, it'd have to be verified with a profile run.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the missing context, I was referring to a regression I found at #10520 instead.

Maybe it was different for ByteBuffersDataInput, or there would be no regression if the block size is just 1. I'll think we need to run some test to verify. Let me know if there's a process for this.

Copy link
Contributor Author

@dungba88 dungba88 Dec 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even for ByteBuffersDataInput in multi-block cases, there's a slight regression over the BytesStore implementation (note that we already called freeze()). The diff is https://github.com/dungba88/lucene/pull/13/files.

ReverseRandomBytesReader with ByteBuffersDataInput

  1> TEST: now verify [fst size=4621076364; nodeCount=2252341486; arcCount=2264078585]
  1> 0...: took 0 seconds
  1> 1000000...: took 27 seconds
  1> 2000000...: took 54 seconds
  1> 3000000...: took 82 seconds
  1> 4000000...: took 109 seconds
  1> 5000000...: took 137 seconds
  1> 6000000...: took 165 seconds
  1> 7000000...: took 192 seconds
  1> 8000000...: took 219 seconds
  1> 9000000...: took 247 seconds
  1> 10000000...: took 275 seconds
  1> 11000000...: took 300 seconds

BytesStore-like BytesReader

  1> TEST: now verify [fst size=4621076364; nodeCount=2252341486; arcCount=2264078585]
  1> 0...: took 0 seconds
  1> 1000000...: took 22 seconds
  1> 2000000...: took 44 seconds
  1> 3000000...: took 66 seconds
  1> 4000000...: took 89 seconds
  1> 5000000...: took 111 seconds
  1> 6000000...: took 133 seconds
  1> 7000000...: took 155 seconds
  1> 8000000...: took 178 seconds
  1> 9000000...: took 200 seconds
  1> 10000000...: took 222 seconds
  1> 11000000...: took 245 seconds

As this PR is already rather long, I can make the first implementation simple by solely using ByteBuffersDataInput, then open an issue to benchmark different alternatives thoroughly. WDYT?

For this PR, I'll use the more simplified version by solely using ByteBuffersDataInput. I opened another one for the alternative way: #12879. I did see improvement as shown in the log above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dweiss . I think that makes sense. I left this PR to use the simple implementation (i.e just delegate everything to ByteBuffersDataInput). I'll mark the other one as draft, as I still want to play around with the benchmark just for curiosity.

Copy link
Member

@mikemccand mikemccand Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did see improvement as shown in the log above.

I compared 9.7.x and main + this PR (an earlier rev than the current one, the rev where we re-added .freeze()), running Test2BFSTs. This PR is a bit faster at building (643 seconds vs 677 for the NO_OUTPUTS case), but slower at get (89 seconds vs 60 seconds), but there were other changes since 9.7.x so this is not a pure test. Yet perhaps this asReadOnly wrapping (vs FST's dedicated byte[]-backed BytesStore) is indeed part of that slowdown.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to sticking to the "clean but slower" impl for this PR, and seeing if we can / it's worth clawing back the performance cost/overhead of ByteBuffers in a separate follow-on PR. Maybe @uschindler has some ideas on recovering performance here too ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please stick with ByteBuffer. I have a similar suggestion with the stupid groupVInts. Don't splatter our code with highly specialized implementations everywhere. Create a default ByteBuffer impl. There's no overhead as the JVM has optimized ByteBuffer access to make those invisible.

What you see in benchmarks like this is the overhead of C2 jumping in. To make a "correct" benchmark use JMH. In addition: Don't trust profilers, they are useless in those areas as they measure bullshit.

So:

  • create a JMH benchmark
  • use ByteBuffer only and use ByteBuffer.wrap() to temporarily create wrappers around other implementations like byte[] or MemorySegment

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P.S.: I have not looked into this PR, this is just my general suggestion regarding ByteBuffer and escape analysis, see also #12841.

}

protected FSTCompiler.Builder<T> getFSTBuilder() {
return new FSTCompiler.Builder<>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we randomize whether the "on disk" vs "on heap" DataOutput is used, so that TestFSTs randomly picks? We should randomize all three posibilities:

  • Create & use on-heap FST
  • Create on heap, save to disk, load FST from disk
  • Stream to disk, then load FST from disk

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this test along with the 2B nodes off-heap

@mikemccand
Copy link
Member

Since we are struggling to best measure FST performance impact of these changes, I opened a spinoff issue to create a dedicated FST microbenchmark. Having such a compass that's quick to check would really help us make more informed decisions on complex "code complexity vs performance" types of FST changes.

Copy link
Member

@mikemccand mikemccand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is super close! I left minor comments.

@dungba88
Copy link
Contributor Author

dungba88 commented Dec 7, 2023

Thanks everyone! I addressed comments, putting a simpler implementation.

+1 to the FST micro benchmarking

@@ -419,7 +418,8 @@ public FST(FSTMetadata<T> metadata, DataInput in, FSTStore fstStore) throws IOEx

/** Create the FST with a metadata object and a FSTReader. */
FST(FSTMetadata<T> metadata, FSTReader fstReader) {
this.metadata = metadata;
assert fstReader != null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm also this one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fstReader is not directly passed by users. If the users call the constructor with FSTStore, that cannot be null and it will throw an NPE due to the call fstStore.init()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, super!

@mikemccand
Copy link
Member

Thanks for persisting @dungba88 -- this was a crazy long and tricky exercise. I'm so excited Lucene can finally build arbitrarily large FSTs with bounded heap usage.

I'll merge this soon and backport, then let's watch builds for any exciting FST related failures...

@mikemccand mikemccand merged commit b699981 into apache:main Dec 7, 2023
4 checks passed
mikemccand pushed a commit that referenced this pull request Dec 8, 2023
* Move size() to FSTStore

* Remove size() completely

* Allow FST builder to use different DataOutput

* access BytesStore byte[] directly for copying

* Rename BytesStore

* Change class to final

* Reorder methods

* Remove unused methods

* Rename truncate to setPosition() and remove skipBytes()

* Simplify the writing operations

* Update comment

* remove unused parameter

* Simplify BytesStore operation

* tidy code

* Rename copyBytes to writeTo

* Simplify BytesStore operations

* Embed writeBytes() to FSTCompiler

* Fix the write bytes method

* Remove the default block bits constant

* add assertion

* Rename method parameter names

* Move reverse to FSTCompiler

* Revert setPosition call

* Address comments

* Return immediately when writing 0 bytes

* Add comment &

* Rename variables

* Fix the compile error

* Remove isReadable()

* Remove isReadable()

* Optimize ReadWriteDataOutput

* tidy code

* Freeze the DataOutput once finished()

* Refactor

* freeze the DataOutput before use

* Address comments and add off-heap FST tests

* Remove the hardcoded random

* Ignore the Test2BFSTOffHeap test

* Simplify ReadWriteDataOutput

* Update Javadoc

* Address comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants