Skip to content

[feature](inverted index) Introduce SPIMI V4 inverted index storage format#63633

Open
airborne12 wants to merge 5 commits into
apache:masterfrom
airborne12:inverted-index-spimi
Open

[feature](inverted index) Introduce SPIMI V4 inverted index storage format#63633
airborne12 wants to merge 5 commits into
apache:masterfrom
airborne12:inverted-index-spimi

Conversation

@airborne12
Copy link
Copy Markdown
Member

@airborne12 airborne12 commented May 25, 2026

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Introduces a new inverted index storage format V4 powered by SPIMI
(Single-Pass In-Memory Indexing), replacing the CLucene IndexWriter
on the write path for analyzed (fulltext) string columns.

Why

CLucene's IndexWriter accumulates per-token Posting linked-list
nodes plus a term hash table plus a char[] interning pool. On Doris
fulltext columns this dominates BE memory during write and shows up
in OOM kills on large segments. The encoding is byte-equivalent to
Lucene 2.x, but the in-memory representation is the cost. SPIMI
keeps a flat (term_id, doc_id, position) record array plus a
single intern arena, then sorts + emits the same Lucene 2.x sibling
files (.tis/.tii/.frq/.prx/.fnm/segments_N) on finish(). The
on-disk format is unchanged; only the writer's working memory shape
changes.

Measured impact (SPIMI_BENCH=1, ~614 K occurrences/segment)

Dimension V4 vs V2 (mostly_unique / all_unique) V4 vs V2 (repetitive, vocab=16)
Writer peak memory −55.6 % / −55.6 % +406 % (160 KB → 811 KB; both negligible)
Writer CPU (median) −68 % / −68 % +5 % (within bench cap)
.idx on-disk size ~0 % +8 % (PFOR sub-block header overhead)
Query latency ~0 % (not measured at bench scale)

Repetitive vocab is the architectural trade-off region: V4's
compact-mode VInt-delta stream scales per-occurrence while CLucene's
Posting struct scales per-unique-term. Absolute memory in this
regime is sub-MB on both sides, so the percentage swing has no
production impact. Storage-size delta on repetitive is the
documented PFOR header cost.

What's in this PR

  • V4 writer pipeline (be/src/storage/index/inverted/spimi/):
    SpimiPostingBuffer (flat record + arena + intern map with
    hybrid compact-mode VInt-delta migration), SegmentWriter,
    TermDictWriter, FieldInfosWriter, SegmentInfosWriter,
    PFOR encoder for high-doc-freq postings, ByteOutput family
    abstracting CLucene's IndexOutput.
  • V4 reader pipeline: SpimiQueryIndexReader,
    SpimiTermDocsReader, SpimiProxReader, SpimiTermEnum,
    SpimiSearcherBuilder; SpimiFulltextIndexReader is the
    Doris-side adapter (overrides type() -> SPIMI_FULLTEXT so
    the searcher cache routes correctly).
  • column_reader.cpp dispatch: V4 storage format → SPIMI
    reader; V1/V2/V3 unchanged.
  • EmitSegment post-flush self-validation:
    ValidateClosedSegmentByteCounts re-queries on-disk file
    lengths after close, throws INVERTED_INDEX_FILE_CORRUPTED on
    mismatch — guards against the async-S3 partial-flush class of
    bugs that single-node tests can't see.
  • 108 BE unit tests under be/test/storage/index/inverted/spimi/
    plus extended tests under be/test/storage/segment/:
    • 17 corruption-path tests covering every SPIMI_THROW_CORRUPT
      site (segments_N / .frq / .prx / PFOR / .tis-.tii readers)
    • 7 byte-count validator tests including the truncation
      fault-injection case
    • Storage-size benchmark (V2 vs V4 .idx byte parity)
    • Throughput benchmark with 11 runs + 2 warmup discards +
      randomized V2/V4 alternation + full distribution report
    • Memory benchmark across mostly_unique / all_unique /
      repetitive workloads
    • Query-latency benchmark via the production read path
      (InvertedIndexReaderTest.SpimiV2V4QueryLatencyBenchmark)
      using the corrected SpimiFulltextIndexReader::create_shared
      dispatch
  • SPIMI_BENCH env-var tier: default UT runs use 12 K
    occurrences (fast regression guard); SPIMI_BENCH=1 scales to
    ~614 K, SPIMI_BENCH=large scales to ~6 M for full-segment
    stress. Keeps headline benchmark numbers reproducible without
    ballooning every UT pass.
  • Regression suites:
    • inverted_index_p0/storage_format/test_storage_format_v4
      — V2 vs V4 black-box parity across MATCH_ANY / MATCH_ALL /
      MATCH_PHRASE / MATCH_PHRASE_PREFIX / MATCH_REGEXP, NULL/empty
      handling, and the support_phrase=false (omit_tfap) no-prox
      write+read path.
    • test_storage_format_v4_cloud — same coverage gated by
      isCloudMode() so the async-S3 upload path gets exercised.
    • test_storage_format_v4_query_latency — cluster-level
      V2 vs V4 query timing distribution.
  • FE plumbing (PropertyAnalyzer, TabletIndex,
    OlapTable): accept inverted_index_storage_format=V4 in
    CREATE TABLE PROPERTIES; propagate through the protocol to BE.

What's NOT in this PR (known gaps)

  • V4 segment compaction across multiple SPIMI segments — V4
    currently emits a single _0 segment per column; compaction is
    documented as a follow-up in SPIMI_DESIGN.md.
  • BM25-style scoring on V4 — V4 sets omit_norms=true; the read
    side synthesizes a default-norm array. Score-using paths
    (MATCH_ALL with relevance ordering) fall back to V2 behavior
    on V4 columns. Listed in design doc.
  • V4 only covers analyzed (fulltext) string columns. Keyword-mode
    (should_analyzer=false) and numeric (BKD) paths remain on the
    existing writers.

Release note

Add inverted index storage format V4, an in-house SPIMI-based writer
that reduces BE write-side memory by ~55 % and CPU by ~68 % on
diverse-vocab fulltext workloads while keeping segment on-disk
format Lucene 2.x compatible. Enable by setting
inverted_index_storage_format = "V4" in CREATE TABLE PROPERTIES.

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes. New value 'V4' accepted by inverted_index_storage_format property; V1/V2/V3 paths unchanged.
  • Does this need documentation?

    • No.
    • Yes. Doc PR will follow against apache/doris-website.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

gavinchou
gavinchou previously approved these changes May 25, 2026
Copy link
Copy Markdown
Contributor

@gavinchou gavinchou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@airborne12
Copy link
Copy Markdown
Member Author

run buildall

Problem Summary: Doris's inverted index writer is CLucene
`IndexWriter`-driven, which keeps the full posting list in
memory until segment flush and roughly doubles RAM during
fulltext ingest. This PR adds an in-house SPIMI (Single-Pass
In-Memory Indexing) writer plus a CLucene `IndexReader`-subclass
read shim, dispatched via a new `V4` value on
`InvertedIndexStorageFormatPB` / `TInvertedIndexFileStorageFormat`.
The user opts in per-table:

```sql
CREATE TABLE … PROPERTIES ("inverted_index_storage_format" = "V4");
```

Memory results vs the V2 CLucene path (matched workloads, equal
input):
- Diverse vocabulary  : -51.997 %  (mostly-unique tokens)
- All-unique vocab    : -57.864 %
- Repetitive vocab    : -70.450 %  (hybrid compact mode kicks in)

ARCHITECTURE

```
FE: CREATE TABLE inverted_index_storage_format = V4
       → TInvertedIndexFileStorageFormat.V4 (thrift, =4)
       → InvertedIndexStorageFormatPB.V4    (proto,  =3)

BE write path (V4):
  InvertedIndexColumnWriter::add_values
      → reusableTokenStream → next() loop
      → SpimiPostingBuffer::Append             (in-memory accumulator)
      → SpimiPostingBuffer::Sort
      → SegmentWriter::Emit
            ├── TermDictWriter   → .tis / .tii
            ├── FreqProxEncoder  → .frq / .prx   (PFOR + skip list)
            ├── FieldInfosWriter → .fnm
            └── (norms)          → .nrm
      → SegmentInfosWriter        → segments_N / segments.gen
      → IndexFileWriter compound packing → .idx

BE read path (V4):
  ColumnReader factory (V4 && analyzed)
      → SpimiFulltextIndexReader
      → SpimiSearcherBuilder
            └── SpimiCLuceneIndexReader   (lucene::index::IndexReader subclass)
                 ├── SpimiCLuceneTermEnum
                 ├── SpimiCLuceneTermDocs
                 └── SpimiCLuceneTermPositions
      → CLucene IndexSearcher  (drives all 8 query types unchanged)
      → roaring::Roaring
```

By subclassing `lucene::index::IndexReader`, the CLucene query
engine itself drives `MATCH_ANY`, `MATCH_ALL`, `MATCH_PHRASE`,
`MATCH_PHRASE_PREFIX`, `MATCH_REGEXP`, `EQUAL_QUERY`, `IS NULL`,
and raw-scan queries without any Doris-side query reimplementation.

KEY ENGINEERING DECISIONS

- **Pure SPIMI, no CLucene IndexWriter on V4.** Earlier work used
  a "shadow" mode running SPIMI alongside CLucene for byte-equality
  validation; V4 explicitly drops the CLucene side so the memory
  savings are real. Shadow mode is preserved (gated by
  `config::inverted_index_fulltext_spimi_shadow`) for ongoing
  byte-equality testing.
- **Hybrid compact mode.** A per-term tagged-VInt stream replaces
  the flat 12-byte record format when the avg occurrence/term
  ratio crosses a threshold; this is what unlocks the -70.45 % on
  repetitive vocabularies.
- **Byte format = Lucene 2.x, packaged via Doris's V2 compound
  packer.** SPIMI files share the read path with V1/V2/V3 at
  the IndexFileWriter layer; only the in-segment encoding
  differs.
- **Byte-parser hardening.** Every reader treats its input as
  untrusted: DCHECK→throw for every bound check, allocations
  capped at `1<<24`, no pointer arithmetic on attacker-influenced
  offsets.
- **`SpimiCLuceneIndexReader` marked `final`.** A future CLucene
  upgrade adding a pure virtual surfaces as a compile-time
  "abstract class marked final" diagnostic instead of throwing
  `CL_ERR_UnsupportedOperation` at query time.
- **`doc()` distinguishes pre-start (-1) from post-exhausted
  (INT_MAX).** CLucene's `SegmentTermDocs` contract has two
  terminal states; conflating them breaks `PhraseQuery::do_next`'s
  `_others`-advancement loop for 3+ token phrases.
- **Empty-string rows are NOT marked NULL.** Mirrors V2's
  semantics for `WHERE body IS NULL`.
- **Zero-term segments emit a sentinel TII entry.** Honest
  all-NULL / all-empty segments are now legitimate input to the
  reader without crashing its hardening.
- **V4 + non-analyzed (keyword) columns rejected at writer init**
  so users see the misconfiguration before ingesting data.

FILES

- `contrib/clucene` (submodule) — bumped to pull in the
  `getTermInfosRAMUsed` / posting RAM accounting hooks SPIMI's
  reader relies on, plus a couple of CLucene-side memory cuts
  the design doc references.
- `gensrc/proto/olap_file.proto` / `gensrc/thrift/Types.thrift` —
  V4 enum addition.
- `fe/fe-core/.../PropertyAnalyzer.java` — accepts `"v4"` literal,
  plus `Config.inverted_index_storage_format = "V4"` global.
- `fe/fe-core/.../CloudInternalCatalog.java` — thrift→PB mapping.
- `be/src/storage/index/inverted/spimi/**` (32 files) — new SPIMI
  directory.
- `be/src/storage/index/inverted/inverted_index_{writer,reader,
  searcher,query_type}.{cpp,h}` — V4 dispatch.
- `be/src/storage/segment/column_reader.cpp` — factory V4 branch.
- `be/src/storage/index/inverted/SPIMI_DESIGN.md` — design doc.

New per-table storage format `inverted_index_storage_format =
"V4"` reduces fulltext index memory by 51–70 % depending on
vocabulary diversity, with no behavior change for the 8 production
query types (validated byte-byte-equal vs V2 on identical data).

- Test: Unit Test (108 SPIMI BE UTs), Regression Test
  (`test_storage_format_v4.groovy`, plus all 5
  `inverted_index_p0/storage_format` suites on both single-node
  and cloud / storage-compute-separation clusters).
- Behavior changed: only when `inverted_index_storage_format =
  "V4"` is set; V1/V2/V3/default behavior unchanged.
- Does this need documentation: V4 is opt-in; design doc included
  in `be/src/storage/index/inverted/SPIMI_DESIGN.md`.
… regression

### What problem does this PR solve?

Problem Summary: Test coverage for the V4 SPIMI inverted index
storage format. Split into 28 BE unit-test files (per SPIMI
component) plus an end-to-end integration test, and one Groovy
regression suite that drives the full FE→BE→storage pipeline
against a live Doris cluster.

COVERAGE MATRIX

BE unit tests (28 files, 108 tests):

- `posting_buffer_test`        — SPIMI accumulator, hybrid compact mode
- `pfor_encoder_test`          — PFOR + skip-list encoder roundtrip
- `freq_prox_encoder_test`     — .frq / .prx wire format
- `term_dict_{writer,reader}_test` — .tis / .tii roundtrip
- `term_docs_reader_test`      — postings decoder
- `prox_reader_test`           — positions decoder
- `field_infos_writer_test`    — .fnm wire format
- `segment_infos_{writer,reader}_test` — segments_N manifest
- `skip_list_writer_test`      — skip-list emit
- `segment_{writer,roundtrip}_test` — full segment-emit
- `clucene_term_{enum,docs,positions}_test` — IndexReader
  shim correctness
- `clucene_index_reader_test`  — virtual override set + final
  marker invariants
- `fulltext_writer_test`       — high-level facade
- `clucene_term_dict_differential_test` — SPIMI vs CLucene
  byte-equality under shadow mode
- `integration_test`           — end-to-end via CLucene
  IndexOutput adapter
- `memory_budget_test`         — guardrail asserting the ≥ 50 %
  reduction on diverse / unique vocab and "at least no
  increase" on repetitive vocab
- `lucene_output_test` / `file_lucene_output_test` /
  `index_output_lucene_output_test` — output-stream adapters
- `tee_token_stream_test`      — shadow-mode tap
- `inverted_index_writer_test::V4FullPipelineEndToEnd` — full
  production-chain integration test that proved 3 cloud-only
  bugs would have shipped (doc() pre-start sentinel, empty-
  string null bitmap, zero-term segment .tii).

Regression suite (`test_storage_format_v4.groovy`, 19 assertions):

- 5 core MATCH queries on hand-crafted data
- 3-token MATCH_PHRASE (exercises `_others` array path)
- MATCH_REGEXP, MATCH_PHRASE_PREFIX
- MATCH_PHRASE vs MATCH_ALL distinguishability (locks in phrase
  semantics; `MATCH_PHRASE 'the brown'` returns ∅ while
  `MATCH_ALL` returns {1})
- NULL row + empty-string row handling
- Cross-format parity: same data on V2 and V4 tables produces
  identical id sets for 6 query types — the strongest possible
  black-box correctness check.

VALIDATED ON

- Single-node ASAN BE.
- Cloud cluster (storage-compute separation: FoundationDB
  metaservice + recycler + cloud-mode FE + cloud-mode BE).
  Cloud-mode surfaced 3 production bugs that single-node passed
  by chance.

### Release note

V4 SPIMI ships with 108 BE unit tests and a 19-assertion
regression suite covering 8 query types and cross-format V2/V4
parity.

### Check List (For Author)

- Test: Unit Test, Regression Test.
- Behavior changed: No — tests only.
- Does this need documentation: No.
Issue Number: close #xxx

Problem Summary:
The initial SPIMI V4 write path (9fde735e7df + 39d3a1d6cf0) shipped with
a CLucene-shaped abstraction surface (`LuceneOutput`, `clucene_*` reader
classes) and limited defensive coverage. A multi-agent review surfaced
five concrete gaps that this change closes together:

1. Post-flush self-validation. `EmitSegment` now records per-stream
   byte counts (`EmittedSegmentByteCounts`) and a sibling
   `ValidateClosedSegmentByteCounts` re-queries on-disk lengths via
   the Directory after close, throwing `INVERTED_INDEX_FILE_CORRUPTED`
   on any partial-flush mismatch. This catches the P44–P46 cloud
   pattern where async-S3 uploads silently truncate a segment file.

2. Corruption-path unit tests. New `corruption_paths_test.cpp`
   exercises every `SPIMI_THROW_CORRUPT` site reachable from a public
   reader entry point: `SegmentInfosReader::Read`,
   `SpimiTermDocsReader::ReadTerm`, `SpimiProxReader::ReadPositions`,
   `SpimiPforDecoder::DecodeBlockFromBytes`. Each crafted buffer maps
   to a specific bad-input branch.

3. Benchmark statistical rigor. `run_spimi_throughput_workload` now
   takes 11 samples (was 3), discards 2 warmup runs, randomizes V2/V4
   per-iteration order to defeat page-cache warming bias, and reports
   full distribution (min/p25/median/p75/max). Regression guard uses
   median-of-9 plus a 2x-worst-run guard against single pathological
   runs. Workload size is gated via `SPIMI_BENCH=1|large` so default
   UT runs stay as regression guards while honest scaling validation
   moves to the env-var-gated tier.

4. `omit_tfap` and cloud regression coverage. `test_storage_format_v4`
   gains a `support_phrase=false` column that exercises the no-prox
   writer/reader contract that previously had zero end-to-end
   coverage. New `test_storage_format_v4_cloud` runs V4 through the
   cloud storage vault when `isCloudMode()` so async-flush behavior
   gets a dedicated regression.

5. SPIMI/CLucene naming separation. `LuceneOutput` /
   `MemoryLuceneOutput` / `FileLuceneOutput` / `IndexOutputLuceneOutput`
   renamed to `ByteOutput` family; `clucene_*` reader classes renamed
   to `query_*`; `_CLTHROWA` byte-parser hard-fails migrated to
   `doris::Exception` via the new `SPIMI_THROW_CORRUPT` macro in
   `byte_parser_error.h`. The single remaining CLucene dependency in
   `spimi/` is `IndexOutputByteOutput`, the explicit
   ByteOutput-to-CLucene IndexOutput adapter that the production
   write path must go through until `DorisFSDirectory` itself is
   replaced.

None — internal hardening of an unreleased V4 storage format.

- Test:
    - Unit Test: 40 BE UTs added/extended, all passing (validator
      tests, corruption-path tests, refreshed throughput stats).
    - Regression test: `test_storage_format_v4` extended +
      `test_storage_format_v4_cloud` added, both passing locally
      against the cloud-sim cluster.
- Behavior changed: No (test/observability changes; the validator
  surfaces existing failure modes earlier).
- Does this need documentation: No
…ix V4 reader UT dispatch

Issue Number: close #xxx

Problem Summary:
Two gaps in the SPIMI V4 test coverage closed together:

1. **No storage-size benchmark.** Throughput / memory tests measured
   write-side wins (V4: -55 % memory, -68 % CPU on diverse vocab) but
   nothing measured the on-disk segment size. Customers see segment
   size directly (storage cost, scan I/O, cloud transfer); a
   structural change in the segment format would slip past existing
   tests. Added `FullTextSpimiIdxSize*` tests that write the same
   input through V2 and V4, read back the `.idx` file size, and
   compare. Empirically the V4 segment is within 1 % of V2 on
   diverse-vocab (Lucene 2.x format match) and ~10 % bigger on
   repetitive (PFOR sub-block header overhead — documented trade-off).

2. **No V4 query-latency benchmark, and the existing reader UT pattern
   crashed on V4 segments.** UT-level V4 reads in
   `InvertedIndexReaderTest` previously segfaulted: tests
   instantiated `FullTextIndexReader::create_shared(...)` for V4
   segments, which routes the searcher build through CLucene's
   `FulltextIndexSearcherBuilder` — but V4 segments don't have the
   CLucene compound layout that builder expects. Production code at
   `column_reader.cpp:657-664` dispatches correctly to
   `SpimiFulltextIndexReader::create_shared(...)` for V4, which
   `type()`-overrides to `SPIMI_FULLTEXT` and routes the searcher
   build to `SpimiSearcherBuilder`. The UT pattern never picked up
   that dispatch.

   Fixed by adding two new tests in `inverted_index_reader_test.cpp`:
   - `SpimiV4SingleQueryProbe`: writes a V4 segment via
     `prepare_string_index(..., V4)`, opens with
     `SpimiFulltextIndexReader`, runs MATCH_PHRASE — expects 600
     matches, completes in ~8 ms. This locks in the production-
     correct UT dispatch so the V4 read path stops being a UT-level
     black box.
   - `SpimiV2V4QueryLatencyBenchmark`: writes the same data through
     both formats, runs 11 query iterations per format with V2/V4
     alternation per iteration (defeats searcher-cache warm-up bias),
     drops 2 warmups, reports min/p25/median/p75/max distribution.
     Measured V4/V2 ratio is ~0.99-1.03 across high-freq and low-freq
     posting lists — V4 reader is at parity with CLucene at this
     scale.

A standalone regression-level latency suite
(`test_storage_format_v4_query_latency.groovy`) is included for
cluster-wide timing — UT-level numbers exclude planner / executor /
network cost, which only the cluster path can measure honestly.

None — test-only additions.

- Test:
    - Unit Test: 5 new BE UTs (3 idx-size + 2 query-latency), all
      passing locally. The query benchmark also validates V2/V4
      cardinality parity per query as a correctness gate before
      reporting latency.
    - Regression test: 1 new suite
      (`test_storage_format_v4_query_latency`); does not need cloud
      cluster.
- Behavior changed: No (test-only).
- Does this need documentation: No.
Three cleanups bundled to recover the branch after a master rebase:

1. **Remove SPIMI shadow-mode code path.** Per the V4-pure design
   directive, V4 must not co-exist with CLucene at write time. The
   shadow-mode toggle `inverted_index_fulltext_spimi_shadow` was a
   dev-only validation aid that tee'd SPIMI onto V1/V2/V3 writes for
   byte-equality comparison vs CLucene. With V4 shipping as pure
   SPIMI, this is dead weight. Removed:
   - 10 `FullTextSpimiShadow*` / `FullTextSpimiByteEqual_*_DIAG` tests
   - The `inverted_index_fulltext_spimi_shadow` config switch
   - `release_spimi_shadow_for_array_path()` and shadow buffer
     plumbing from `InvertedIndexColumnWriter` (kept the V4 array
     rejection that's still useful)
   - `run_spimi_memory_workload` refactored from "shadow on/off" to
     "V2/V4 format-switch" — same intent (V2 vs V4 memory compare)
     but uses the production path instead of the shadow tee

2. **Adapt to master API changes.** Master refactored two surfaces
   the SPIMI branch touched:
   - `FullTextIndexReader::query(...)` and `try_query(...)` now take
     `const Field&` instead of `const void*`. Updated
     `SpimiFulltextIndexReader::try_query` to match the new
     signature. Updated all test call sites that built a
     `StringRef` and passed `&ref` to construct a typed
     `Field::create_field<TYPE_STRING>(str)` and pass it by reference.
   - `IndexColumnWriter::create(...)` now accepts `const
     TabletColumn*` directly; the `StorageField` / `StorageFieldFactory`
     wrapper layer is gone. Updated 17 test call sites accordingly.

3. **clang-format.** Re-ran formatter on all changed C++ files. No
   semantic changes — indent normalisation in the writer's
   `add_values` slice path is the main visible diff.

### Release note

None — internal cleanup + master-rebase adaptation.

### Check List (For Author)

- Test:
    - Unit Test: 163 SPIMI UTs pass after the refactor.
- Behavior changed: No — removed config switch was a dev-only
  validation toggle that defaulted off; production users never set it.
- Does this need documentation: No.
@airborne12 airborne12 force-pushed the inverted-index-spimi branch from 9281771 to 0f96cfa Compare May 26, 2026 03:22
@hello-stephen
Copy link
Copy Markdown
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 78.06% (1854/2375)
Line Coverage 64.50% (33323/51665)
Region Coverage 65.23% (16530/25343)
Branch Coverage 55.72% (8834/15854)

@airborne12 airborne12 changed the title [feature](be)(fe) Introduce SPIMI V4 inverted index storage format [feature](inverted index) Introduce SPIMI V4 inverted index storage format May 26, 2026
@airborne12
Copy link
Copy Markdown
Member Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 0.00% (0/8) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 1.69% (2/118) 🎉
Increment coverage report
Complete coverage report

2 similar comments
@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 1.69% (2/118) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 1.69% (2/118) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants