Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LUCENE-8687: Optimise radix partitioning for points on heap #569

Closed
wants to merge 10 commits into from

Conversation

Projects
None yet
2 participants
@iverase
Copy link
Contributor

commented Feb 8, 2019

This PR revives the idea of path slices so we do not have to hold many copies of the same data on memory.

@jpountz

jpountz approved these changes Feb 8, 2019

Copy link
Contributor

left a comment

I need to spend a bit more time understanding the accounting logic but the in-place partitioning looks good to me.

HeapPointWriter heapSource = (HeapPointWriter) data;

int from = (int) points.start;
int to = (int) (points.start + points.count);

This comment has been minimized.

Copy link
@jpountz

jpountz Feb 8, 2019

Contributor

can you use Math#toIntExact instead?

@@ -82,30 +82,40 @@ public BKDRadixSelector(int numDim, int bytesPerDim, int maxPointsSortInHeap, Di
* the split happens. The method destroys the original writer.
*
*/
public byte[] select(PointWriter points, PointWriter left, PointWriter right, long from, long to, long partitionPoint, int dim) throws IOException {
public byte[] select(PathSlice points, PathSlice[] slices, long from, long to, long partitionPoint, int dim, int dimCommonPrefix) throws IOException {

This comment has been minimized.

Copy link
@jpountz

jpountz Feb 8, 2019

Contributor

can you document in javadocs that slices should have a length of 2 and how it gets filled?

This comment has been minimized.

Copy link
@iverase

iverase Feb 8, 2019

Author Contributor

I have relaxed the condition and it should be bigger than 1. Javadocs added

@@ -176,7 +188,7 @@ private int findCommonPrefix(OfflinePointWriter points, long from, long to, int
histogram[commonPrefix][bucket]++;
}
}
//Count left points and record the partition point
//Count left points and record the offlinePartition point

This comment has been minimized.

Copy link
@jpountz

jpountz Feb 8, 2019

Contributor

bad search/replace?

PointWriter right = getPointWriter(to - partitionPoint, "right" + dim)) {
slices[0] = new PathSlice(left, 0, partitionPoint - from);
slices[1] = new PathSlice(right, 0, to - partitionPoint);
//if all equals we just offlinePartition the data

This comment has been minimized.

Copy link
@jpountz

jpountz Feb 8, 2019

Contributor

bad search/replace?

pointsUsed += right.count();
}
assert maxPointsSortInHeap >= pointsUsed;
return maxPointsSortInHeap - (int) pointsUsed;

This comment has been minimized.

Copy link
@jpountz

jpountz Feb 8, 2019

Contributor

can we check this cast?

This comment has been minimized.

Copy link
@iverase

iverase Feb 8, 2019

Author Contributor

I have changed the logic a bit and I am using the maxSize on the HeapPointwriter to calculate the offset to move into heap the selection.

byte[] partition = new byte[bytesPerDim];
points.getPackedValueSlice(partitionPoint, bytesRef1);
System.arraycopy(bytesRef1.bytes, bytesRef1.offset + dim * bytesPerDim, partition, 0, bytesPerDim);
return partition;
}

PointWriter getPointWriter(long count, String desc) throws IOException {
if (count <= maxPointsSortInHeap / 2) {

This comment has been minimized.

Copy link
@jpountz

jpountz Feb 8, 2019

Contributor

why do we divide by 2? can you add a comment?

This comment has been minimized.

Copy link
@iverase

iverase Feb 8, 2019

Author Contributor

We divided by 2 because as we recurse, we have two heapPointWriter as maximum, so they should not hold more than half of the maxPointsSortInHeap.

@iverase

This comment has been minimized.

Copy link
Contributor Author

commented Feb 11, 2019

Pushed on master: 9db39ab

Pushed on branch_8x: 7934414

@iverase iverase closed this Feb 11, 2019

@iverase iverase deleted the iverase:optimizeHeap branch Feb 11, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.