[common] Introduce PrefixFileIndex for prefix query optimization by xuzifu666 · Pull Request #7750 · apache/paimon

xuzifu666 · 2026-04-30T09:13:40Z

Purpose

In real-world analytics scenarios, prefix queries on high-cardinality string columns are very common. For example:

WHERE url LIKE '/api/v1/%'
WHERE order_id LIKE 'ORD2024%'

Existing file indexes in Paimon, such as BloomFilter and Bitmap, excel at equality lookups but cannot efficiently handle prefix matching. BloomFilter only checks exact value existence; Bitmap Index maps each distinct value to a bitmap, making it impossible to determine which values share a common prefix without scanning all entries.When no suitable index exists, the query engine must perform a full file scan — reading the entire data file (often tens of MBs) just to discover that no rows match the prefix predicate. This becomes prohibitively expensive at scale.

This PR introduces PrefixFileIndex, a new pluggable file-level index that accelerates prefix queries through a lightweight inverted index structure.

Prefix File Index is an inverted index that maps prefix strings to row number bitmaps. Unlike Bitmap Index which indexes exact values, it extracts the first N characters from each string value and groups rows by their prefix.

According to benchmark test result：

Test Environment

CPU: Apple M4
JVM: Java HotSpot 17.0.12
Data Volume: 1 million string rows ({category}_{id} format, 5 categories)
Test Module: paimon-benchmark/paimon-micro-benchmarks

1. Index Size Comparison

Cardinality	PrefixLen=2	PrefixLen=3	PrefixLen=4	BitmapIndex	Raw Data	Prefix Space Saving
100	649 KB	649 KB	649 KB	2.0 MB	13.1 MB	20x vs data
1000	649 KB	649 KB	649 KB	2.7 MB	14.7 MB	23x vs data
10000	649 KB	649 KB	649 KB	7.7 MB	15.7 MB	24x vs data

Key Finding: Prefix Index size is independent of data cardinality, depending only on the number of prefix types. Even at cardinality 10000, the index remains at ~649KB.

2. Index Build Time Comparison

Cardinality	PrefixLen=2	PrefixLen=3	PrefixLen=4	BitmapIndex	Prefix Build Speedup
100	126 ms	114 ms	110 ms	163 ms	1.3-1.5x
1000	112 ms	112 ms	110 ms	246 ms	2.2x
10000	111 ms	110 ms	113 ms	633 ms	5.6x

3. Query Performance — Skip Scenario (Core Value)

Querying a non-existing prefix; no-index scan must check all 1 million rows to confirm no match:

Cardinality	PrefixIndex	BitmapIndex	No-Index-Full-Scan	Prefix Index Speedup
100	~1.3 μs	~2.1 μs	12.558 μs	~9.8x
1000	~1.1 μs	~2.1 μs	13.147 μs	~11.9x
10000	~1.2 μs	~1.6 μs	13.262 μs	~11.3x

4. Production Scenario Inference

The above tests were conducted in memory, without accounting for disk I/O. In production:

Scenario	No Index	Prefix Index	Inferred Speedup
Data file size	~15 MB (1M rows)	~649 KB	-
Disk read time	~50-200 ms	~0.1 ms (cache hit)	500-2000x
Skip decision time	Must read all data	Returns SKIP in 1 μs	Tens of thousands x

Conclusion

Dimension	Prefix Index Advantage
Index Size	Only 1/20 of raw data, 3-12x smaller than Bitmap Index
Build Speed	Up to 5.6x faster in high-cardinality scenarios
Skip Performance	10-12x faster in memory, hundreds to thousands x in real disk I/O

Tests

PrefixFileIndexTest
PrefixIndexBenchmark

JingsongLi

Review: PrefixFileIndex for prefix query optimization

Thanks for this contribution. The idea of a lightweight prefix-based inverted index for accelerating LIKE 'prefix%' and STARTS_WITH queries is sound, and the benchmarks clearly demonstrate the value. Below are some issues and suggestions.

Correctness Issues

1. Null bitmap offset semantics are ambiguous in the reader

In the Writer, when there is exactly one null row, you encode it as nullOffset = -1 - nullBitmap.first(). However, in the Reader, the code that calls hasPrefix() never actually uses this offset encoding for the null case. The visitIsNull only checks hasNull, while the visitEqual(fieldRef, null) path also only checks hasNull. The nullOffset field is read but never actually used to reconstruct the null bitmap. If a future reader needs to return row-level results (e.g., for row-group filtering), this compact encoding will require documentation so it can be properly decoded.

2. hasPrefix() contains a dead code path for negative offset

if (offset < 0) {
    // single value shortcut
    return true;
}

This branch can never be reached because the offsets stored in prefixOffsets are always >= 0 (they are computed via bodyOffset which starts at 0 and accumulates). The negative-offset optimization is only used for the null bitmap, which is not stored in prefixOffsets. This is dead code and might confuse future maintainers.

3. Query prefix longer than index prefix length produces false negatives risk

When a query literal (e.g., "hello_world") is longer than prefixLength, both the writer and reader truncate it to prefixLength chars. This is correct — but when the query prefix is shorter than prefixLength, the fallback iteration in hasPrefix() does a linear scan of all entries. This is O(n) where n is the number of distinct prefixes. For high-cardinality prefix spaces this could regress. Consider building a sorted structure (TreeMap) or at minimum documenting this trade-off.

Design Suggestions

4. dataType is accepted but never validated

The constructor accepts any DataType but the index only works with string types. If a user misconfigures a prefix index on an INT column, they will get a confusing ClassCastException at write time. Consider adding a type check in the constructor or factory (similar to how other indexes validate supported types).

5. The Reader does not store bitmap lengths — deserialization relies on internal format

The body section stores bitmaps back-to-back, but there is no stored length per bitmap. The readBitmap(offset) method passes data.length - bodyStart - offset as the available bytes, relying on RoaringBitmap32.deserialize() to only read what it needs. While this works for the current RoaringBitmap implementation, it is fragile — if the serialization format changes or if there are trailing bytes, it could break. Consider storing the byte length of each bitmap in the header.

6. No integration with the predicate pushdown framework

This PR only adds the index implementation but does not wire it into the query planning / file-pruning logic. For example, there is no evidence that StartsWith predicates or LIKE predicates will actually consult this index during scan. This might be intentional (staged PRs), but it would be good to clarify in the PR description whether a follow-up is planned.

7. Missing close() / resource cleanup in the Reader

The Reader class extends FileIndexReader and reads the full byte array eagerly, so there is no resource leak per se. However, the benchmark's queryPrefix method creates a LocalSeekableInputStream on every call and never closes it — this is a resource leak in the benchmark (though not in production code).

Minor / Style

The Writer class uses java.util.List and java.util.ArrayList with fully-qualified names inside method bodies (lines in sortedPrefixes()). These should be proper imports at the top of the file for consistency with the rest of the codebase.
The benchmark mixes JUnit 4 (@Rule, TemporaryFolder) with JUnit 5 (@Test from jupiter). @Rule does not work with JUnit 5 without the @ExtendWith(SpringExtension.class) or @RegisterExtension — this means folder.create() must be called manually (which it is), but it is an unusual pattern. Consider switching to JUnit 5's @TempDir.
The VERSION and PREFIX_LENGTH string constants in PrefixFileIndex shadow similar constants in BitmapFileIndex. If these are user-facing option keys, consider namespacing them (e.g., "prefix.prefix-length").

Summary

The core algorithm is correct and the index design is reasonable for its intended use case. The main concerns are: (1) dead code in the reader's offset handling, (2) missing type validation, (3) no stored bitmap lengths making the format fragile, and (4) the linear fallback in hasPrefix() for short query prefixes. The benchmarks are convincing but would benefit from JUnit 5 alignment. Looking forward to seeing the integration with the scan/pushdown layer.

xuzifu666 added 3 commits April 30, 2026 15:51

[core] Introduce PrefixFileIndex for prefix query optimization

560eb1e

added

7ce0e7d

added

c346975

JingsongLi reviewed May 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[common] Introduce PrefixFileIndex for prefix query optimization#7750

[common] Introduce PrefixFileIndex for prefix query optimization#7750
xuzifu666 wants to merge 3 commits into
apache:masterfrom
xuzifu666:prefix_file_index_support

xuzifu666 commented Apr 30, 2026 •

edited

Loading

Uh oh!

JingsongLi left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xuzifu666 commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Environment

1. Index Size Comparison

2. Index Build Time Comparison

3. Query Performance — Skip Scenario (Core Value)

4. Production Scenario Inference

Conclusion

Tests

Uh oh!

JingsongLi left a comment

Choose a reason for hiding this comment

Review: PrefixFileIndex for prefix query optimization

Correctness Issues

Design Suggestions

Minor / Style

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xuzifu666 commented Apr 30, 2026 •

edited

Loading