Skip to content

Columnar map 2 query#18263

Closed
tarun11Mavani wants to merge 2 commits into
apache:masterfrom
tarun11Mavani:columnar-map-2-query
Closed

Columnar map 2 query#18263
tarun11Mavani wants to merge 2 commits into
apache:masterfrom
tarun11Mavani:columnar-map-2-query

Conversation

@tarun11Mavani
Copy link
Copy Markdown
Contributor

Summary

This is PR 2 of 3 introducing the columnar MAP query layer on top of the storage layer landed in #17896. This PR makes COLUMNAR_MAP columns actually queryable — adding the per-key DataSource, mutable consuming-segment index, filter-operator delegation strategies, and null-aware item() evaluation.

Stack

Motivation

PR 1 added the binary format and immutable read path for COLUMNAR_MAP, but no query path was wired up — MapFilterOperator would fall through to the slow ExpressionFilterOperator (per-doc Map<String, Object> materialization), ItemTransformFunction had no per-key null bitmap, and consuming (real-time) segments would fail at segment creation because createMutableIndex threw UnsupportedOperationException. This PR closes those gaps and delivers the actual query speedups the columnar storage was designed to enable.

What's in this PR

Query data source (pinot-segment-local)

ColumnarMapDataSource implements MapDataSource and routes getKeyDataSource(key) to one of four paths:

  • Dense immutable — wraps ColumnarMapKeyForwardIndexReader + per-key dictionary + per-key inverted index
  • Sparse immutable — wraps a per-key reader backed by the JSON sidecar
  • Mutable — wraps the per-key FixedByteSVMutableForwardIndex directly (O(1) lock-free)
  • Unknown key — returns NullDataSource(key) to match BaseMapDataSource (callers like ProjectionBlock and ItemTransformFunction never see null)

Supporting classes: ColumnarMapForwardIndexReader (column-level wrapper), ColumnarMapRealtimeInvertedIndex (mutable per-dictId inverted index, thread-safe via ThreadSafeMutableRoaringBitmap), MutableColumnarMapIndexImpl (consuming-segment columnar map implementing ColumnarMapIndexReader).

Filter operator delegation (pinot-core)

MapFilterOperator selects between four strategies:

  1. JSON index match (existing) — when a JsonIndexReader exists on the column
  2. Per-key inverted index (new) — when a MapDataSource exposes a per-key inverted index for the requested key
  3. Presence-bitmap fast path (new) — for IS NULL / IS NOT NULL on COLUMNAR_MAP, via BitmapBasedFilterOperator over the per-key presence bitmap
  4. Expression filter (existing fallback)

explainAttributes now emits delegateTo:per_key_inverted_index and delegateTo:presence_bitmap (was previously misreported as expression_filter).

ItemTransformFunction null-aware reads (pinot-core)

Captures the per-key null bitmap from keyDS.getNullValueVector() and projects it to block-local docIds in getNullBitmap. Gated on the query's nullHandlingEnabled flag; short-circuits when the segment-level null bitmap doesn't intersect the block's docId range. Adds getKeyPath() for direct key resolution by TransformBlock/ProjectionBlock.

Storage format addition

OnHeapColumnarMapIndexCreator now writes a per-sparse-key presence bitmap (run-optimized) into the SPMX layout, reusing the dense-tier nullBitmapOffset/Len slot per tierFlag. Without this, IS NULL / IS NOT NULL on sparse keys returned wrong results. Format version unchanged (still SPMX v3); legacy v3 segments without the new bytes (nullBitmapLen == 0) preserve the prior empty-bitmap behavior.

Wire-up + SPI hardening

  • ColumnarMapIndexType.createMutableIndex returns new MutableColumnarMapIndexImpl(...) instead of throwing
  • ColumnarMapIndexReader.getKeyDataSource(String) throws UnsupportedOperationException on both reader implementations — callers must go through ColumnarMapDataSource so the unknown-key fall-back stays consistent
  • Sparse-tier reader now throws RuntimeException with column/key/docId/value context on JSON parse and numeric parse failures (was previously silent)

How to use

Schema and table-config setup is unchanged from PR 1 (#17896). Once both PRs are landed, queries against COLUMNAR_MAP columns work transparently:

-- EQ filter on a dense key — uses per-key inverted index
SELECT count(*) FROM events WHERE metrics['country'] = 'US'
 
-- IS NOT NULL on a sparse key — uses presence bitmap (new)
SELECT count(*) FROM events WHERE metrics['rare_flag'] IS NOT NULL
 
-- Projection of multiple keys
SELECT user_id, metrics['clicks'], metrics['spend'] FROM events WHERE metrics['country'] = 'US'
 
-- GROUP BY on a MAP key
SELECT metrics['country'], count(*) FROM events GROUP BY metrics['country'] ORDER BY count(*) DESC

EXPLAIN PLAN VERBOSE will show delegateTo:per_key_inverted_index for fast-path EQ predicates and delegateTo:presence_bitmap for IS NULL / IS NOT NULL on sparse keys.

Test plan

  • ColumnarMapIndexEndToEndTest — 4 tests (mutable→immutable round-trip, undeclared-key default type, user-metrics scenario, cross-segment merge)
  • ColumnarMapFilterOperatorTest — 7 tests (IS_NULL / IS_NOT_NULL / NOT_EQ / NOT_IN against absent and partially-present keys)
  • ColumnarMapSegmentCreationTest — 1 test (segment build pipeline)
  • ColumnarMapBenchmarkTest — disabled (@Test(enabled = false)); manual perf benchmark, run with -DenableBenchmark=true and the annotation flipped
  • All 12 in-PR tests pass
  • 46 additional tests in follow up test PR cover storage + query + concurrency in depth
  • Deployed this code in a test cluster to collect the query performance and storage improvements.

tarun11Mavani and others added 2 commits April 20, 2026 08:42
…immutable read path

Introduces the COLUMNAR_MAP index type for MAP columns with per-key columnar
storage. Includes ComplexFieldSpec enhancements, SPMX v3 binary format with
dense/sparse two-tier storage, dictionary encoding, forward index reader with
co-iterator, per-key inverted index, and index plugin/type/handler wiring.

Format details:
- 56-byte header (magic + version + numKeys + numDocs + numDenseKeys +
  numSparseKeys + 4 section offsets)
- 70-byte key metadata (tier flag + storedType + numDocs + 4 offset/length
  pairs for nullBitmap/forward/inverted/dictIdForward)
- Dense tier: full forward index per key with run-optimized null bitmap
- Sparse tier: JSON sidecar file with per-key SPMX entries reduced to type
  metadata (per-key presence bitmap added in PR-2 query layer)

Quality fixes (from self-review):
- sortValues() uses type-aware comparator matching ColumnarMapKeyDictionary,
  preventing wrong range query results and GROUP BY ordering for numeric keys
- Sparse sidecar JSON serialization uses Jackson ObjectMapper to handle
  control characters per RFC 8259
- Class-level Javadoc accurately documents the 56-byte header and 70-byte
  key metadata layout
- StandardIndexes.columnarMap() returns parameterized IndexType<> matching
  other accessor methods
- Preconditions.checkState guards bufferSize long-to-int cast
- ColumnarMapIndexHandler.updateIndices declares throws Exception
- DataOutputStream wrapped in try-with-resources
- WARN log when sparse sidecar missing but SPMX has sparse keys

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ter delegation

Builds on the COLUMNAR_MAP storage layer (PR apache#18167) to enable query
execution against the index.

New classes:
- ColumnarMapDataSource: query-side data source backed by ColumnarMapIndexReader.
  Returns a NullDataSource for unknown keys (matches BaseMapDataSource contract;
  callers never see null). Sparse-key forward index throws on JSON parse / numeric
  parse failures with column/key/docId context (was previously silent).
- ColumnarMapForwardIndexReader: per-key forward index reader (immutable)
- ColumnarMapRealtimeInvertedIndex: per-key inverted index (mutable segment).
  Wraps each per-dictId bitmap in ThreadSafeMutableRoaringBitmap and returns a
  synchronized clone from getDocIds() — readers can iterate concurrently with
  the writer's add() calls (mirrors RealtimeInvertedIndex pattern).
- MutableColumnarMapIndexImpl: mutable index for consuming segments

Wire-up:
- ColumnarMapIndexType.createMutableIndex returns MutableColumnarMapIndexImpl
  instead of throwing UnsupportedOperationException.

Storage format:
- OnHeapColumnarMapIndexCreator now writes a per-sparse-key presence bitmap
  (run-optimized) into the SPMX layout for sparse-tier keys, reusing the
  nullBitmap slot. ImmutableColumnarMapIndexReader loads it directly. Without
  this, IS NULL / IS NOT NULL on sparse keys returned wrong results.
- Layout javadoc updated to clarify the slot semantic per tier.

Query operators:
- MapFilterOperator: adds per-key inverted index path and presence bitmap path
  alongside the existing JSON-index and full-scan paths. explainAttributes now
  emits the per_key_inverted_index branch.
- ItemTransformFunction: captures per-key null bitmap for null-aware item()
  evaluation; exposes getKeyPath() for direct MAP key resolution. getNullBitmap
  is gated on the query's nullHandlingEnabled flag and skips allocation when
  the segment-level null bitmap doesn't intersect the block's docId range.

SPI hardening:
- ImmutableColumnarMapIndexReader/MutableColumnarMapIndexImpl getKeyDataSource
  throw UnsupportedOperationException — callers must go through
  ColumnarMapDataSource so the unknown-key fall-back is consistent.

Tests:
- ColumnarMapIndexTest: testSparseKeyPresenceBitmapMatchesPresentDocs verifies
  IS NULL / IS NOT NULL correctness on sparse-tier keys;
  testRealtimeInvertedIndexConcurrentReaderCorrectness exercises 1-writer +
  4-reader stress with strict membership assertions on observed docIds;
  testSparseKeyDataSourceReturnsNullDataSourceForUnknownKey verifies the
  NullDataSource fall-back for unknown keys and the SPI-level UOE.
- ColumnarMapIndexEndToEndTest, ColumnarMapBenchmarkTest

Sourced from columnar-map-split-wip via 3-way merge against the storage
branch's current state. Storage-side conflicts resolved in favor of the
storage branch's deep-review fixes (RLE bitmaps, two-tier flag, JsonUtils,
try-with-resources, type-aware sort comparator).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 20, 2026

Codecov Report

❌ Patch coverage is 35.03493% with 1209 lines in your changes missing coverage. Please review.
✅ Project coverage is 34.97%. Comparing base (aa483d3) to head (4d61026).
⚠️ Report is 40 commits behind head on master.

Files with missing lines Patch % Lines
...dex/columnarmap/OnHeapColumnarMapIndexCreator.java 54.50% 190 Missing and 32 partials ⚠️
...gment/index/columnarmap/ColumnarMapDataSource.java 0.00% 221 Missing ⚠️
...x/columnarmap/ImmutableColumnarMapIndexReader.java 46.61% 156 Missing and 41 partials ⚠️
...index/columnarmap/MutableColumnarMapIndexImpl.java 28.94% 170 Missing and 19 partials ⚠️
...nt/index/columnarmap/ColumnarMapKeyDictionary.java 34.37% 62 Missing and 1 partial ⚠️
.../columnarmap/ColumnarMapKeyForwardIndexReader.java 0.00% 62 Missing ⚠️
.../pinot/core/operator/filter/MapFilterOperator.java 0.00% 57 Missing ⚠️
...va/org/apache/pinot/spi/data/ComplexFieldSpec.java 24.07% 33 Missing and 8 partials ⚠️
...pinot/spi/config/table/ColumnarMapIndexConfig.java 26.92% 33 Missing and 5 partials ⚠️
...ent/index/columnarmap/ColumnarMapIndexHandler.java 44.44% 17 Missing and 3 partials ⚠️
... and 14 more

❗ There is a different number of reports uploaded between BASE (aa483d3) and HEAD (4d61026). Click for more details.

HEAD has 16 uploads less than BASE
Flag BASE (aa483d3) HEAD (4d61026)
java-21 5 3
unittests1 2 0
unittests 4 2
temurin 10 6
java-11 5 3
integration 6 4
custom-integration1 2 0
Additional details and impacted files
@@              Coverage Diff              @@
##             master   #18263       +/-   ##
=============================================
- Coverage     63.31%   34.97%   -28.34%     
+ Complexity     1627      789      -838     
=============================================
  Files          3229     3259       +30     
  Lines        196705   199210     +2505     
  Branches      30408    30879      +471     
=============================================
- Hits         124544    69675    -54869     
- Misses        62183   123341    +61158     
+ Partials       9978     6194     -3784     
Flag Coverage Δ
custom-integration1 ?
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 34.95% <35.03%> (-28.33%) ⬇️
java-21 34.97% <35.03%> (-28.31%) ⬇️
temurin 34.97% <35.03%> (-28.34%) ⬇️
unittests 34.97% <35.03%> (-28.34%) ⬇️
unittests1 ?
unittests2 34.97% <35.03%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants