Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Facets#getTopDims across Facets implementations [LUCENE-10488] #11524

Closed
asfimport opened this issue Mar 28, 2022 · 11 comments
Closed

Comments

@asfimport
Copy link

asfimport commented Mar 28, 2022

#11361 added a new getTopDims API, allowing users to specify the number of "top" dimensions they want. The default implementation just delegates to getAllDims and returns the number of top dims requested, but some Facets sub-classes can do this more optimally. #11361 demonstrated this in SortedSetDocValueFacetCounts, but we can take it further. There's at least some opportunity to do better in:

  • ConcurrentSortedSetDocValuesFacetCounts
  • FastTaxonomyFacetCounts
  • TaxonomyFacetSumFloatAssociations
  • TaxonomyFacetSumIntAssociations

Migrated from LUCENE-10488 by Greg Miller (@gsmiller), resolved May 13 2022
Pull requests: #777, #779, #806

@asfimport
Copy link
Author

Greg Miller (@gsmiller) (migrated from JIRA)

Note that I have an open PR that proposes some significant changes to association facets, so might be worth trying to avoid large merge collisions with that if someone jumps on this.

@asfimport
Copy link
Author

Yuting Gan (@Yuti-G) (migrated from JIRA)

Hi @gsmiller ,

I have brought the optimization changes of getTopDims from SortedSetDocValuesFacetCounts to ``ConcurrentSortedSetDocValuesFacetCounts since these two subclasses are very similar, please see #777. I also created a new PR #779 for IntTaxonomyFacets, which is extended by FastTaxonomyFacetCounts and TaxonomyFacetSumIntAssociations.

The optimization logic is very similar to what I did in SortedSetDocValuesFacetCounts.}I will really appreciate any feedback from you or anyone who have time to review my PRs and will spend time on optimizing getTopDims in TaxonomyFacetSumFloatAssociations after reviewing feedback from #777 and #779.

Thanks,

Yuting

 

 

 

 

 

@asfimport
Copy link
Author

asfimport commented Apr 7, 2022

Greg Miller (@gsmiller) (migrated from JIRA)

Very exciting. Thanks @Yuti-G! Also, please note that the refactoring change I mentioned above for association facets is now merged (#11480), so it should be easy now to move forward with optimizations there as well if you're interested (or if anyone else is interested). Thanks again!

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit ef43242 in lucene's branch refs/heads/main from Yuting Gan
https://gitbox.apache.org/repos/asf?p=lucene.git;h=ef43242d77a

LUCENE-10488: Optimized getTopDims in ConcurrentSSDVFacetCounts (#777)

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit 57f8cb2 in lucene's branch refs/heads/main from Yuting Gan
https://gitbox.apache.org/repos/asf?p=lucene.git;h=57f8cb2fd6b

LUCENE-10488: Optimize Facets#getTopDims in IntTaxonomyFacets (#779)

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit f0ec226 in lucene's branch refs/heads/main from Yuting Gan
https://gitbox.apache.org/repos/asf?p=lucene.git;h=f0ec2261672

LUCENE-10488: Optimize Facets#getTopDims in FloatTaxonomyFacets (#806)

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit e01b65d in lucene's branch refs/heads/main from Greg Miller
https://gitbox.apache.org/repos/asf?p=lucene.git;h=e01b65d2841

CHANGES entry for LUCENE-10488

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit 87655fd in lucene's branch refs/heads/branch_9x from Greg Miller
https://gitbox.apache.org/repos/asf?p=lucene.git;h=87655fd015c

LUCENE-10488: Optimized Facets#getTopDims for taxonomy faceting and ConcurrentSSDVFacetCounts (#889)

Co-authored-by: Yuting Gan <44444710+Yuti-G@users.noreply.github.com>

@asfimport
Copy link
Author

Greg Miller (@gsmiller) (migrated from JIRA)

Merged to main and branch_9x. Resolving. Thanks again @Yuti-G!

@asfimport
Copy link
Author

Yuting Gan (@Yuti-G) (migrated from JIRA)

Thank you so much for reviewing and merging my PRs! I will work on adding getTopDim to benchmarks soon.

@asfimport
Copy link
Author

Alan Woodward (@romseygeek) (migrated from JIRA)

Bulk close for 9.2.0 release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant