Skip to content

Invert eligible BinaryColumns in the column pass#16116

Draft
Tim-Brooks wants to merge 5 commits into
apache:mainfrom
Tim-Brooks:columnar_inverted_indices
Draft

Invert eligible BinaryColumns in the column pass#16116
Tim-Brooks wants to merge 5 commits into
apache:mainfrom
Tim-Brooks:columnar_inverted_indices

Conversation

@Tim-Brooks
Copy link
Copy Markdown
Contributor

BinaryColumns that are DOCS or DOCS_AND_FREQS, omitNorms, no TVs, and
not stored are now inverted directly in the column-oriented pass via
processBinaryColumnInvert, walking each column's cursor sparsely. When
any row-mode column is present in the batch, eligible columns are
demoted to the row pass so all inverted fields share a single
termsHash frame per doc.

processBinaryColumnInvert skips termsHash.startDocument/finishDocument:
those drive TermVectorsConsumer's segment-scoped lastDocID, and framing
the same batch doc from multiple eligible columns would break its
monotonicity invariant. Eligibility guarantees doVectors=false, so TV
state is never touched; unframed docs are reconciled by
TermVectorsConsumer.fill(). Exception handling mirrors processDocument:
pf.finish runs only if the first pf.invert returned normally.

The validation pass caches each column's PerField in docFields[] by
original column position; the row pass tracks original indices in
rowPfIndices[] instead of overwriting docFields[], so both passes can
reuse the cache without a second hash lookup.

@github-actions github-actions Bot added this to the 10.5.0 milestone May 24, 2026
@Tim-Brooks Tim-Brooks marked this pull request as draft May 25, 2026 02:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant