-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consolidate FSTStore and BytesStore in FST #12709
Conversation
# Conflicts: # lucene/core/src/java/org/apache/lucene/util/fst/FST.java # lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java # lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great -- I think it's ready -- I left just minor comments.
I think we should land this only on main
for now, and then backport it eventually to 9.x along with the other FST changes?
if (startNode != -1) { | ||
throw new IllegalStateException("already finished"); | ||
} | ||
if (newStartNode == FINAL_END_NODE && emptyOutput != null) { | ||
newStartNode = 0; | ||
} | ||
startNode = newStartNode; | ||
bytes.finish(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm was/is this a no-op?
Edit: nevermind -- I see you moved it up in the call stack (FSTCompiler.compile
).
metaOut.writeVLong(numBytes); | ||
fstStore.writeTo(out); | ||
} | ||
metaOut.writeVLong(numBytes()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So much cleaner, I love it! No more scattered ifs depending on which store is backing the FST...
@@ -859,6 +859,7 @@ public FST<T> compile() throws IOException { | |||
// if (DEBUG) System.out.println(" builder.finish root.isFinal=" + root.isFinal + " | |||
// root.output=" + root.output); | |||
fst.finish(compileNode(root, lastInput.length()).node); | |||
bytes.finish(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, you just moved the .finish()
to here, OK.
bytes = new BytesStore(bytesPageBits); | ||
// pad: ensure no node gets address 0 which is reserved to mean | ||
// the stop state w/ no arcs | ||
bytes.writeByte((byte) 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent to move this out of FST
to here, making it more consistent that it's the FSTCompiler
that does the writing, and FST
that does the reading.
@@ -317,8 +319,6 @@ private CompiledNode compileNode(UnCompiledNode<T> nodeIn, int tailLength) throw | |||
// serializes new node by appending its bytes to the end | |||
// of the current byte[] | |||
long addNode(FSTCompiler.UnCompiledNode<T> nodeIn) throws IOException { | |||
T NO_OUTPUT = fst.outputs.getNoOutput(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was dead? I wonder why nothing in our build statically finds our dead code...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically it was not dead code but we already kept the NO_OUTPUT
as property of FSTCompiler
, and since it is an immutable property of Outputs there is no need to reassign it again.
I think this makes sense. Let hold off the backporting. |
I added an entry in the CHANGES.txt under Lucene 10.0 (as we are not backporting) |
Thanks @dungba88 -- I just merged. We can open a new PR when it's time to backport ... |
* Remove direct dependency of NodeHash to FST * Fix index out of bounds when writing FST to different metaOut (#12697) * Tidify code * Update CHANGES.txt * Re-add assertion * Remove direct dependency of NodeHash to FST * Hold off the FSTTraversal changes * Rename variable * Add Javadoc * Add @OverRide * tidy * tidy * Change to FSTReader * Update CHANGES.txt
… move CHANGES.txt entry from 10.0 -> 9.9.0 on bulk backport of recent FST improvements
Description
Consolidate the FSTStore and BytesStore in FST. The two are similar, except that FSTStore has an
init()
method, which is not needed for BytesStore. Thus I extracted the common methods to FSTReader (maybe there is better name). FST no longer needs to have if-else conditional logics to choose between the two.Also fix the
numBytes()
method which would throw NullPointerException if the FST is FSTStore-backed.I'm not sure if this needs a new entry in CHANGES.txt, but it can be backported to 9.x