Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate FSTStore and BytesStore in FST #12709

Merged
merged 20 commits into from
Oct 26, 2023

Conversation

dungba88
Copy link
Contributor

@dungba88 dungba88 commented Oct 23, 2023

Description

Consolidate the FSTStore and BytesStore in FST. The two are similar, except that FSTStore has an init() method, which is not needed for BytesStore. Thus I extracted the common methods to FSTReader (maybe there is better name). FST no longer needs to have if-else conditional logics to choose between the two.

Also fix the numBytes() method which would throw NullPointerException if the FST is FSTStore-backed.

I'm not sure if this needs a new entry in CHANGES.txt, but it can be backported to 9.x

Copy link
Member

@mikemccand mikemccand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great -- I think it's ready -- I left just minor comments.

I think we should land this only on main for now, and then backport it eventually to 9.x along with the other FST changes?

if (startNode != -1) {
throw new IllegalStateException("already finished");
}
if (newStartNode == FINAL_END_NODE && emptyOutput != null) {
newStartNode = 0;
}
startNode = newStartNode;
bytes.finish();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm was/is this a no-op?

Edit: nevermind -- I see you moved it up in the call stack (FSTCompiler.compile).

metaOut.writeVLong(numBytes);
fstStore.writeTo(out);
}
metaOut.writeVLong(numBytes());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So much cleaner, I love it! No more scattered ifs depending on which store is backing the FST...

@@ -859,6 +859,7 @@ public FST<T> compile() throws IOException {
// if (DEBUG) System.out.println(" builder.finish root.isFinal=" + root.isFinal + "
// root.output=" + root.output);
fst.finish(compileNode(root, lastInput.length()).node);
bytes.finish();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, you just moved the .finish() to here, OK.

bytes = new BytesStore(bytesPageBits);
// pad: ensure no node gets address 0 which is reserved to mean
// the stop state w/ no arcs
bytes.writeByte((byte) 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent to move this out of FST to here, making it more consistent that it's the FSTCompiler that does the writing, and FST that does the reading.

@@ -317,8 +319,6 @@ private CompiledNode compileNode(UnCompiledNode<T> nodeIn, int tailLength) throw
// serializes new node by appending its bytes to the end
// of the current byte[]
long addNode(FSTCompiler.UnCompiledNode<T> nodeIn) throws IOException {
T NO_OUTPUT = fst.outputs.getNoOutput();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was dead? I wonder why nothing in our build statically finds our dead code...

Copy link
Contributor Author

@dungba88 dungba88 Oct 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically it was not dead code but we already kept the NO_OUTPUT as property of FSTCompiler, and since it is an immutable property of Outputs there is no need to reassign it again.

@dungba88
Copy link
Contributor Author

I think we should land this only on main for now, and then backport it eventually to 9.x along with the other FST changes?

I think this makes sense. Let hold off the backporting.

@dungba88
Copy link
Contributor Author

I added an entry in the CHANGES.txt under Lucene 10.0 (as we are not backporting)

@mikemccand mikemccand merged commit 12fc7bf into apache:main Oct 26, 2023
4 checks passed
@mikemccand
Copy link
Member

Thanks @dungba88 -- I just merged. We can open a new PR when it's time to backport ...

@dungba88 dungba88 deleted the fst-consolidate-bytesstore branch November 17, 2023 04:31
mikemccand pushed a commit that referenced this pull request Nov 20, 2023
* Remove direct dependency of NodeHash to FST

* Fix index out of bounds when writing FST to different metaOut (#12697)

* Tidify code

* Update CHANGES.txt

* Re-add assertion

* Remove direct dependency of NodeHash to FST

* Hold off the FSTTraversal changes

* Rename variable

* Add Javadoc

* Add @OverRide

* tidy

* tidy

* Change to FSTReader

* Update CHANGES.txt
mikemccand added a commit that referenced this pull request Nov 20, 2023
…0.0 -> 9.9.0 on bulk backport of recent FST improvements
@mikemccand mikemccand added this to the 9.9.0 milestone Nov 20, 2023
slow-J pushed a commit to slow-J/lucene that referenced this pull request Nov 20, 2023
… move CHANGES.txt entry from 10.0 -> 9.9.0 on bulk backport of recent FST improvements
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants