-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleanup TermsHashPerField #1573
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a great cleanup -- thanks @s1monw
I left some small comments.
I think we should run indexing throughput benchmark to make sure there's no real impact here. I'll do that.
lucene/core/src/java/org/apache/lucene/index/FreqProxTermsWriterPerField.java
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/index/ParallelPostingsArray.java
Outdated
Show resolved
Hide resolved
@@ -56,12 +56,6 @@ public FreqProxTermsWriterPerField(FieldInvertState invertState, TermsHash terms | |||
@Override | |||
void finish() throws IOException { | |||
super.finish(); | |||
sumDocFreq += fieldState.uniqueTermCount; | |||
sumTotalTermFreq += fieldState.length; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, did these aggregations move somewhere else? Oh, they look entirely removed? Were they redundant (computed elsewhere) and these ones were unused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sumDocFreq
and sumTotalTermFreq
are unused. They were used in FreqProxFields
in the past but not anymore for a while now. I removed their commented out usage so you can see it in a followup commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -222,7 +234,7 @@ void writeProx(TermVectorsPostingsArray postings, int termID) { | |||
} | |||
|
|||
@Override | |||
void newTerm(final int termID) { | |||
void newTerm(final int termID, final int docID) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm docID
is unused in this method? But I guess the other impl (normal postings) needs it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's correct.
lucene/core/src/java/org/apache/lucene/index/TermsHashPerField.java
Outdated
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/index/TermsHashPerField.java
Outdated
Show resolved
Hide resolved
+1 thanks @mikemccand |
lucene/core/src/java/org/apache/lucene/index/TermsHashPerField.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like the changes you made, Simon.
lucene/core/src/java/org/apache/lucene/index/TermsHashPerField.java
Outdated
Show resolved
Hide resolved
lucene/core/src/test/org/apache/lucene/index/TestTermsHashPerField.java
Outdated
Show resolved
Hide resolved
I tested indexing throughput on
Net/net this change might be a bit faster, or just noise, so all clear to push! Thanks @s1monw. |
thanks @mikemccand - I will run tests again and push. |
++ |
Several classes within the IndexWriter indexing chain haven't been touched for several years. Most of these classes expose their internals through public members and are difficult to construct in tests since they depend on many other classes. This change tries to clean up TermsHashPerField and adds a dedicated standalone test for it to make it more accessible for other developers since it's simpler to understand. There are also attempts to make documentation better as a result of this refactoring.
* upstream/master: (218 commits) LUCENE-9412 Do not validate jenkins HTTPS cert LUCENE-8962: add ability to selectively merge on commit (apache#1552) Replace DWPT.DocState with simple method parameters (apache#1594) LUCENE-9402: Let MultiCollector handle minCompetitiveScore (apache#1567) SOLR-14574: Fix or suppress warnings in solr/core/src/test (part 2) SOLR-14561 CoreAdminAPI's parameters instanceDir and dataDir are now validated (apache#1572) SOLR-14532: Add *.iml files to gitignore SOLR-14577: Return BAD REQUEST when field is missing in terms QP (apache#1588) SOLR-14574: Fix or suppress warnings in solr/core/src/test (part 1) remove debug code LUCENE-9408: roll back only called once enforcement LUCENE-8962: Allow waiting for all merges in a merge spec (apache#1585) SOLR-14572 document missing SearchComponents (apache#1581) LUCENE-9359: Avoid test failures when the extra file is a dir. SOLR-14573: Fix or suppress warnings in solrj/src/test LUCENE-9353: Move terms metadata to its own file. (apache#1473) Cleanup TermsHashPerField (apache#1573) SOLR-14558: Record all log lines in SolrLogPostTool (apache#1570) LUCENE-9404: simplify checksum calculation of ByteBuffersIndexOutput LUCENE-9403: tune BufferedChecksum.DEFAULT_BUFFERSIZE ...
Several classes within the IndexWriter indexing chain haven't been touched for several years. Most of these classes expose their internals through public members and are difficult to construct in tests since they depend on many other classes. This change tries to clean up TermsHashPerField and adds a dedicated standalone test for it to make it more accessible for other developers since it's simpler to understand. There are also attempts to make documentation better as a result of this refactoring.