Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LUCENE-5309: Optimize facet counting for single-valued SSDV / StringValueFacetCounts #255

Merged
merged 5 commits into from
Aug 23, 2021

Conversation

gsmiller
Copy link
Contributor

@gsmiller gsmiller commented Aug 20, 2021

Description

This change introduces special-case logic when facet counting on a SortedDocValues field. Instead of using the more-general logic that supports multi-valued fields (i.e., SortedSetDocValues) it has a separate implementation for the single-valued case.

Solution

Unwrap the SSDV into a SDV if possible and provide a separate counting implementation for the single-valued case.

Tests

Introduced some new basic unit tests that specifically test single-valued cases, and also modified the randomized testing to occasionally produce all single-valued docs.

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Lucene maintainers access to contribute to my PR branch. (optional but recommended)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.

@@ -82,7 +82,6 @@ public int nextDoc() throws IOException {

if (newDocID == NO_MORE_DOCS) {
currentValues = null;
continue;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were just unnecessary continue statements so I removed them.

@@ -452,7 +449,6 @@ public static SortedNumericDocValues getSortedNumericValues(

boolean anyReal = false;
final SortedNumericDocValues[] values = new SortedNumericDocValues[size];
final int[] starts = new int[size + 1];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

starts is never referenced, so I pulled it; unrelated cleanup

@@ -680,9 +673,9 @@ public static SortedSetDocValues getSortedSetValues(final IndexReader r, final S
*/
public static class MultiSortedDocValues extends SortedDocValues {
/** docbase for each leaf: parallel with {@link #values} */
public final int docStarts[];
public final int[] docStarts;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleaned up some array declaration styling where I was in this file

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love it.

@@ -415,6 +415,8 @@ Improvements
This prevents caching a query clause when it is much more expensive than
running the top-level query. (Julie Tibshirani)

* LUCENE-5309: Optimize facet counting for single-valued SSDV / StringValueFacetCounts. (Greg Miller)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would backport this to 8x

@gsmiller gsmiller changed the title Lucene-5309: Optimize facet counting for single-valued SSDV / StringValueFacetCounts LUCENE-5309: Optimize facet counting for single-valued SSDV / StringValueFacetCounts Aug 20, 2021
Copy link
Member

@mikemccand mikemccand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks @gsmiller! What an awesome performance pop!

@@ -680,9 +673,9 @@ public static SortedSetDocValues getSortedSetValues(final IndexReader r, final S
*/
public static class MultiSortedDocValues extends SortedDocValues {
/** docbase for each leaf: parallel with {@link #values} */
public final int docStarts[];
public final int[] docStarts;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love it.

@@ -103,7 +105,7 @@ public FacetResult getTopChildren(int topN, String dim, String... path) throws I
return getDim(dim, ordRange, topN);
}

private final FacetResult getDim(String dim, OrdRange ordRange, int topN) throws IOException {
private FacetResult getDim(String dim, OrdRange ordRange, int topN) throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm why did we remove this final? I wonder if the whole class should be final?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mikemccand no need to mark private methods as final since they're not visible for sub-classes to override

@@ -286,7 +310,7 @@ private final void count(List<MatchingDocs> matchingDocs)
}

/** Does all the "real work" of tallying up the counts. */
private final void countAll() throws IOException, InterruptedException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More final attrition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, same comment as above. Since private methods aren't visible to sub-classes, it's not necessary to mark them final. I don't feel strongly about this though, so happy to add it back if the community prefers to keep it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants