Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix tie-break bug in various Facets implementations #11768

Merged
merged 3 commits into from
Sep 26, 2022

Conversation

gsmiller
Copy link
Contributor

Description

There are a number of places in Facets implementations where getTopChildren is incorrectly handling count/value ties. The behavior should prefer smaller ordinals when counts/values are equal, but it's not always doing that. Some tests were also incorrect and needed to be updated.

@Yuti-G
Copy link
Contributor

Yuti-G commented Sep 22, 2022

Thanks @gsmiller for discovering this issue! The changes look good to me.

I am curious if the index in LongIntCursor works similarly to ordinals in other faceting implementation? If so, do you think we should also return a.count < b.count || (a.count == b.count && a.value > b.value) || (a.count == b.count && a.value == b.value && a.index < b.index) in the lessThan() function of the PQ in getTopChildrenSortByCount in the LongValueFacetCounts class? Please let me know if I misunderstand the index here. Thank you so much!

@gsmiller
Copy link
Contributor Author

@Yuti-G could you help me understand what faceting implementation or part of the code you're referring to? Thanks!

@Yuti-G
Copy link
Contributor

Yuti-G commented Sep 22, 2022

Sure, I just updated the previous comment with links. Thanks!

@gsmiller
Copy link
Contributor Author

@Yuti-G thanks for the links. In this case, the contract is that we break ties by the value (of the long) itself (low-to-high), which the PQ is already doing. So this appears to be correct to me, but let me know if I'm overlooking something. Also, it's not possible to have identical values between two results since the counting structures guarantee unique indexes/keys right?

@Yuti-G
Copy link
Contributor

Yuti-G commented Sep 22, 2022

I see.. Thanks for the explanation of indexes!

@@ -189,10 +190,11 @@ private TopChildrenForPath getTopChildrenForPath(DimConfig dimConfig, int pathOr

TopOrdAndFloatQueue.OrdAndValue reuse = null;
while (ord != TaxonomyReader.INVALID_ORDINAL) {
if (values[ord] > 0) {
float value = values[ord];
if (value > 0) {
aggregatedValue = aggregationFunction.aggregate(aggregatedValue, values[ord]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: might as well use value here too (and check if we you can replace values[ord] with value elsewhere

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the catch. I intended to do this everyone but looks like I missed a spot. I'll go back through and re-check.

@@ -626,7 +626,7 @@ public void testBasicWithCollectorManager() throws Exception {
List<FacetResult> topNDimsResult = r.facets.getTopDims(1, 2);
assertEquals(1, topNDimsResult.size());
assertEquals(
"dim=Author path=[] value=5 childCount=4\n Lisa (2)\n Susan (1)\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's disturbing that these tests were "wrong" and we just let them be like that. I'm glad that you fixed them, but makes me wonder if it was possible to catch this bug earlier by scrutinizing these tests better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, hindsight is 20/20 I suppose. We'll eventually converge on something that's correct :)

@gsmiller gsmiller merged commit 971ae01 into apache:main Sep 26, 2022
@gsmiller gsmiller deleted the GH/facet-tie-break-bugfix branch September 26, 2022 22:06
@rmuir rmuir added this to the 9.5.0 milestone Jan 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants