Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LUCENE-10678: Fix potential overflow when computing the partition point on the BKD tree #1065

Merged
merged 2 commits into from
Aug 11, 2022

Conversation

iverase
Copy link
Contributor

@iverase iverase commented Aug 10, 2022

We currently compute the partition point for a set of points by multiplying the number of nodes that needs to be on the left of the BKD tree by the maxPointsInLeafNode. This multiplication is done on the integer space so if the partition point is bigger than Integer.MAX_VALUE it will overflow.

This may happen for high dimension cases (numDims > 1) and when documents are multivalued.

This PR moves the multiplication to the long space so it doesn't overflow.

In order to test it I modify the test Test2BBKDPoints to index 4Billion points instead and test is renamed accordingly. That should be fine, this test was developed before we improve the efficiency of the tree so CI should be ok running it in Monster runs.

@iverase iverase requested a review from dnhatn August 10, 2022 15:50
Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Thanks @iverase for the quick fix.

@iverase iverase changed the title LUCENE-10678: Fix possible overflow when computing the partition point on the BKD tree LUCENE-10678: Fix potential overflow when computing the partition point on the BKD tree Aug 11, 2022
@iverase iverase merged commit fe8d112 into apache:main Aug 11, 2022
iverase added a commit that referenced this pull request Aug 11, 2022
…nt on the BKD tree (#1065)

We currently compute the partition point for a set of points by multiplying the number of nodes that needs to be on
 the left of the BKD tree by the maxPointsInLeafNode. This multiplication is done on the integer space so if the partition point is bigger than Integer.MAX_VALUE it will overflow. This commit moves the multiplication to the long space so it doesn't overflow.
iverase added a commit that referenced this pull request Aug 11, 2022
…nt on the BKD tree (#1065)

We currently compute the partition point for a set of points by multiplying the number of nodes that needs to be on
 the left of the BKD tree by the maxPointsInLeafNode. This multiplication is done on the integer space so if the partition point is bigger than Integer.MAX_VALUE it will overflow. This commit moves the multiplication to the long space so it doesn't overflow.
# Conflicts:
#	lucene/CHANGES.txt
@msokolov msokolov added this to the 9.4.0 milestone Sep 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants