Explore partially decoding blocks (within-block skipping) #12749

slow-J · 2023-11-02T12:27:45Z

Description

Idea from @mikemccand 's comment in #12696 (comment)

Another exciting optimization such a "patch-less" encoding could implement is within-block skipping (I believe Tantivy does this).

Today, our skipper is forced to align to block boundaries, so when we skip to a given docid, we go to the block that may contain this docid, decode all 128 int[], then linearly scan within those 128 ints. This is quite a bit of overhead for each skip request!

If we could lower that linear scan cost to maybe 16 or 8 or something, the conjunctive queries should get even faster. But perhaps it becomes trickier to take advantage of SIMD optimizations if we are decoding a subset of ints, not sure.

After the change in #12741 , we will no longer use patching when encoding doc blocks.
This may allow us to partially decode blocks? This would mean skipping could jump to the middle of a block, instead of having to be at block boundaries as they are today.

The text was updated successfully, but these errors were encountered:

jpountz · 2023-11-03T14:55:46Z

How would it work? Since blocks are delta-coded, you can't know the value at a given index without decoding all previous values and computing their sum? Or you need to store some checkpoints separately, but then it may be easier/better to simply go with smaller blocks (e.g. 64 doc IDs instead of 128)?

Tony-X · 2023-11-07T18:28:30Z

How would it work? Since blocks are delta-coded, you can't know the value at a given index without decoding all previous values and computing their sum? Or you need to store some checkpoints separately, but then it may be easier/better to simply go with smaller blocks (e.g. 64 doc IDs instead of 128)?

+1. Delta-encoding here is the blocker, unless we change the encoding scheme.

slow-J added the type:enhancement label Nov 2, 2023

slow-J mentioned this issue Nov 2, 2023

Adding option to codec to disable patching in Lucene's PFOR encoding #12696

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explore partially decoding blocks (within-block skipping) #12749

Explore partially decoding blocks (within-block skipping) #12749

slow-J commented Nov 2, 2023

jpountz commented Nov 3, 2023

Tony-X commented Nov 7, 2023

Explore partially decoding blocks (within-block skipping) #12749

Explore partially decoding blocks (within-block skipping) #12749

Comments

slow-J commented Nov 2, 2023

Description

jpountz commented Nov 3, 2023

Tony-X commented Nov 7, 2023