-
Notifications
You must be signed in to change notification settings - Fork 975
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initialize facet counting data structures lazily #12408
Conversation
Note that luceneutil benchmarks (wikimedium10m) show no obvious change:
|
In some internal benchmarks though, for situations where it's semi-common to get no results or sparse results, we see a 9.6% average latency reduction. In cases where results tend to be pretty dense, this still offers a 1.9% average latency reduction. So it's definitely helpful in situations where sparse/no-results are semi-common. |
+1, this makes sense to me. |
b6d77fa
to
e98d3b3
Compare
Thanks @mikemccand. Just removed the errant "nocommit" comment I left hanging in the initial PR (doh!) and added a CHANGES entry, so this should be a clean change now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @gsmiller!
This change covers: * Taxonomy faceting * FastTaxonomyFacetCounts * TaxonomyFacetIntAssociations * TaxonomyFacetFloatAssociations * SSDV faceting * SortedSetDocValuesFacetCounts * ConcurrentSortedSetDocValuesFacetCounts * StringValueFacetCounts * Range faceting: * LongRangeFacetCounts * DoubleRangeFacetCounts * Long faceting: * LongValueFacetCounts Left for a future iteration: * RangeOnRange faceting * FacetSet faceting
e98d3b3
to
0e0b5b4
Compare
Is this change the reason why we are seeing a major slowdown on AndHighMedDayTaxoFacets and speedup on OrHighMedDayTaxoFacets and OrHighMedDayTaxoFacets? I wouldn't expect these queries to have sparse hits. Maybe the introduction of counting tasks is also related to this change by making the JVM compile things differently? (facets were the only tasks to use non-scoring boolean queries before, not anymore) |
This change covers: * Taxonomy faceting * FastTaxonomyFacetCounts * TaxonomyFacetIntAssociations * TaxonomyFacetFloatAssociations * SSDV faceting * SortedSetDocValuesFacetCounts * ConcurrentSortedSetDocValuesFacetCounts * StringValueFacetCounts * Range faceting: * LongRangeFacetCounts * DoubleRangeFacetCounts * Long faceting: * LongValueFacetCounts Left for a future iteration: * RangeOnRange faceting * FacetSet faceting
This change covers: * Taxonomy faceting * FastTaxonomyFacetCounts * TaxonomyFacetIntAssociations * TaxonomyFacetFloatAssociations * SSDV faceting * SortedSetDocValuesFacetCounts * ConcurrentSortedSetDocValuesFacetCounts * StringValueFacetCounts * Range faceting: * LongRangeFacetCounts * DoubleRangeFacetCounts * Long faceting: * LongValueFacetCounts Left for a future iteration: * RangeOnRange faceting * FacetSet faceting Co-authored-by: Greg Miller <gsmiller@gmail.com>
This change proposes some faceting optimizations for situations where there are no hits to facet over. While this seems like an odd scenario, at Amazon product search we actually have some situations where this can become common (sparse queries that don't have results in some segments, or have no results altogether).
You could argue that the calling code should handle this optimization and avoid faceting altogether if there are no hits, but that only solves for the case of no results, not no results in some segments. I think it's also nice to move this optimization into the faceting module so that every user doesn't have to do this themselves.
Curious what people think of this idea. Happy to hear feedback, counterarguments, etc. :)
This change covers:
Left for a future iteration (I'll open a follow up issue if we move forward with this one):