-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LUCENE-9353: Move terms metadata to its own file. #1473
Conversation
BytesRef minTerm = readBytesRef(termsMetaIn); | ||
BytesRef maxTerm = readBytesRef(termsMetaIn); | ||
if (docCount < 0 || docCount > state.segmentInfo.maxDoc()) { // #docs with field must be <= #docs | ||
throw new CorruptIndexException("invalid docCount: " + docCount + " maxDoc: " + state.segmentInfo.maxDoc(), termsMetaIn); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have a test that tickles these cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not directly, and these things are hard to test, though I agree we could do better. I opened https://issues.apache.org/jira/browse/LUCENE-9356 to try to improve the coverage of these code paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Adrien
priorE = exception; | ||
} finally { | ||
if (metaIn != null) { | ||
CodecUtil.checkFooter(metaIn, priorE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, I don't know if we have test coverage for the corrputed metadata?
|
||
termsOut.writeVInt(fields.size()); | ||
metaOut.writeVInt(fields.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jpountz here I see the same lack of serializer write/read code, could it be possible to have such thing, It would improve readability and unit testing by only mocking fieldMetadatas and check serialization is correctly applied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your proposal sounds orthogonal to this pull request to me?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes but maybe is the opportunity to move the code and add more testability in BlockTreeTermsReader.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer decoupling changes when possible.
|
||
final long dirStart = termsOut.getFilePointer(); | ||
final long indexDirStart = indexOut.getFilePointer(); | ||
try (IndexOutput metaOut = state.directory.createOutput(metaName, state.context)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this file is not created at the same time with the indexOut, termOut?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would work too. I like keeping the index output open for as little time as possible when it doesn't make things worse otherwise.
* upstream/master: (218 commits) LUCENE-9412 Do not validate jenkins HTTPS cert LUCENE-8962: add ability to selectively merge on commit (apache#1552) Replace DWPT.DocState with simple method parameters (apache#1594) LUCENE-9402: Let MultiCollector handle minCompetitiveScore (apache#1567) SOLR-14574: Fix or suppress warnings in solr/core/src/test (part 2) SOLR-14561 CoreAdminAPI's parameters instanceDir and dataDir are now validated (apache#1572) SOLR-14532: Add *.iml files to gitignore SOLR-14577: Return BAD REQUEST when field is missing in terms QP (apache#1588) SOLR-14574: Fix or suppress warnings in solr/core/src/test (part 1) remove debug code LUCENE-9408: roll back only called once enforcement LUCENE-8962: Allow waiting for all merges in a merge spec (apache#1585) SOLR-14572 document missing SearchComponents (apache#1581) LUCENE-9359: Avoid test failures when the extra file is a dir. SOLR-14573: Fix or suppress warnings in solrj/src/test LUCENE-9353: Move terms metadata to its own file. (apache#1473) Cleanup TermsHashPerField (apache#1573) SOLR-14558: Record all log lines in SolrLogPostTool (apache#1570) LUCENE-9404: simplify checksum calculation of ByteBuffersIndexOutput LUCENE-9403: tune BufferedChecksum.DEFAULT_BUFFERSIZE ...
See https://issues.apache.org/jira/browse/LUCENE-9353.