Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LUCENE-9353: Move terms metadata to its own file. #1473

Merged
merged 10 commits into from
Jun 16, 2020

Conversation

jpountz
Copy link
Contributor

@jpountz jpountz commented May 1, 2020

BytesRef minTerm = readBytesRef(termsMetaIn);
BytesRef maxTerm = readBytesRef(termsMetaIn);
if (docCount < 0 || docCount > state.segmentInfo.maxDoc()) { // #docs with field must be <= #docs
throw new CorruptIndexException("invalid docCount: " + docCount + " maxDoc: " + state.segmentInfo.maxDoc(), termsMetaIn);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have a test that tickles these cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not directly, and these things are hard to test, though I agree we could do better. I opened https://issues.apache.org/jira/browse/LUCENE-9356 to try to improve the coverage of these code paths.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Adrien

priorE = exception;
} finally {
if (metaIn != null) {
CodecUtil.checkFooter(metaIn, priorE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, I don't know if we have test coverage for the corrputed metadata?


termsOut.writeVInt(fields.size());
metaOut.writeVInt(fields.size());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpountz here I see the same lack of serializer write/read code, could it be possible to have such thing, It would improve readability and unit testing by only mocking fieldMetadatas and check serialization is correctly applied.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your proposal sounds orthogonal to this pull request to me?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes but maybe is the opportunity to move the code and add more testability in BlockTreeTermsReader.java

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer decoupling changes when possible.


final long dirStart = termsOut.getFilePointer();
final long indexDirStart = indexOut.getFilePointer();
try (IndexOutput metaOut = state.directory.createOutput(metaName, state.context)) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this file is not created at the same time with the indexOut, termOut?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would work too. I like keeping the index output open for as little time as possible when it doesn't make things worse otherwise.

@jpountz jpountz merged commit 87a3bef into apache:master Jun 16, 2020
@jpountz jpountz deleted the separate_terms_meta_file branch June 16, 2020 13:05
MarcusSorealheis added a commit to MarcusSorealheis/lucene-solr that referenced this pull request Jun 20, 2020
* upstream/master: (218 commits)
  LUCENE-9412 Do not validate jenkins HTTPS cert
  LUCENE-8962: add ability to selectively merge on commit (apache#1552)
  Replace DWPT.DocState with simple method parameters (apache#1594)
  LUCENE-9402: Let MultiCollector handle minCompetitiveScore (apache#1567)
  SOLR-14574: Fix or suppress warnings in solr/core/src/test (part 2)
  SOLR-14561 CoreAdminAPI's parameters instanceDir and dataDir are now validated (apache#1572)
  SOLR-14532: Add *.iml files to gitignore
  SOLR-14577: Return BAD REQUEST when field is missing in terms QP (apache#1588)
  SOLR-14574: Fix or suppress warnings in solr/core/src/test (part 1)
  remove debug code
  LUCENE-9408: roll back only called once enforcement
  LUCENE-8962: Allow waiting for all merges in a merge spec (apache#1585)
  SOLR-14572 document missing SearchComponents (apache#1581)
  LUCENE-9359: Avoid test failures when the extra file is a dir.
  SOLR-14573: Fix or suppress warnings in solrj/src/test
  LUCENE-9353: Move terms metadata to its own file. (apache#1473)
  Cleanup TermsHashPerField (apache#1573)
  SOLR-14558: Record all log lines in SolrLogPostTool (apache#1570)
  LUCENE-9404: simplify checksum calculation of ByteBuffersIndexOutput
  LUCENE-9403: tune BufferedChecksum.DEFAULT_BUFFERSIZE
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants