Skip to content

[refactor](inverted-index) replace void* query_value with typed paraminterface#62960

Merged
airborne12 merged 10 commits into
apache:masterfrom
csun5285:fix/inverted-index-query-void-ptr
May 11, 2026
Merged

[refactor](inverted-index) replace void* query_value with typed paraminterface#62960
airborne12 merged 10 commits into
apache:masterfrom
csun5285:fix/inverted-index-query-void-ptr

Conversation

@csun5285
Copy link
Copy Markdown
Contributor

@csun5285 csun5285 commented Apr 30, 2026

What problem does this PR solve?

A typed query-param interface replaces the void*:

  • InvertedIndexQueryParam (root) → StringQueryParam / NumericQueryParam (intermediates) → TypedInvertedIndexQueryParam (concrete, picks base via PT).
  • Readers dynamic_cast to the appropriate intermediate at entry; mismatch fail-fast with Status::InternalError.
  • Sentinels move into TypedInvertedIndexQueryParam::encode_min/max_ascending using type_limit<storage_val> directly — no scratch buffer, no TypeInfo callback.
  • TypeInfo::set_to_min/max and all its overrides are removed.
  • DATETIME signedness fix: storage_val = std::conditional_t<PT == TYPE_DATETIME, int64_t, ...> aligns the typed param with KeyCoder's signed view, so type_limit<int64_t>::max() = INT64_MAX
    produces the correct +inf byte string.

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@csun5285 csun5285 requested a review from airborne12 as a code owner April 30, 2026 02:51
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@csun5285 csun5285 force-pushed the fix/inverted-index-query-void-ptr branch 2 times, most recently from a842dba to 5fee30d Compare April 30, 2026 03:11
@csun5285
Copy link
Copy Markdown
Contributor Author

run buildall

Comment thread be/src/core/type_limit.h Outdated
Comment thread be/src/storage/index/inverted/inverted_index_query_param.h Outdated
@csun5285 csun5285 force-pushed the fix/inverted-index-query-void-ptr branch from 5fee30d to 636dac4 Compare May 6, 2026 08:56
@airborne12
Copy link
Copy Markdown
Member

run buildall

@airborne12
Copy link
Copy Markdown
Member

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review completed.

Critical checkpoint conclusions:

  • Goal/test proof: The PR replaces inverted-index query values from raw pointers/typed wrapper objects with , removes the obsolete path, and adds expanded BE unit coverage for BKD range queries and type-limit sentinels. The changed tests are aligned with the new reader interface.
  • Scope/focus: The implementation is mostly focused on the query-value refactor and shared key-encoding helper reuse; no unrelated GitHub PR files were present in the actual PR diff.
  • Concurrency/lifecycle: No new concurrent mutable state or lock ordering changes were introduced. now owns the , which removes the previous raw pointer lifetime risk.
  • Config/compatibility: No new config, persisted format, edit log, storage format, or FE/BE protocol compatibility changes were introduced.
  • Parallel paths: The main inverted-index call sites in expression pushdown, storage predicates, match/multi-match, IP, arrays, and tests were updated to pass ; I did not find a remaining live raw path for inverted index readers.
  • Error handling: New status-returning BKD encoding dispatch reports unsupported field types instead of silently continuing. Existing CLucene boundaries remain unchanged.
  • Data correctness: BKD min/max sentinel handling now uses on storage/key types, with added specializations for and ; I did not find an additional correctness regression not already covered by existing review threads.
  • Memory/observability/performance: No large new allocations, ownership cycles, or observability gaps stood out. Moving from raw pointers to owned values is a small per-query copy but removes unsafe lifetime coupling.
  • Test coverage: BE unit tests were expanded for BKD-supported primitive types and sentinel regressions. No regression-test result files were added, which is acceptable for this internal BE refactor.
  • User focus: No additional user-provided review focus was supplied.

Existing review context was checked first; I did not duplicate the existing comments about storage-type specialization or the old explanation.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review completed.

Critical checkpoint conclusions:

  • Goal/test proof: The PR replaces inverted-index query values from raw pointers/typed wrapper objects with Field, removes the obsolete TypeInfo::set_to_min/max path, and adds expanded BE unit coverage for BKD range queries and type-limit sentinels. The changed tests are aligned with the new reader interface.
  • Scope/focus: The implementation is mostly focused on the query-value refactor and shared key-encoding helper reuse; no unrelated GitHub PR files were present in the actual PR diff.
  • Concurrency/lifecycle: No new concurrent mutable state or lock ordering changes were introduced. InvertedIndexParam now owns the Field, which removes the previous raw pointer lifetime risk.
  • Config/compatibility: No new config, persisted format, edit log, storage format, or FE/BE protocol compatibility changes were introduced.
  • Parallel paths: The main inverted-index call sites in expression pushdown, storage predicates, match/multi-match, IP, arrays, and tests were updated to pass Field; I did not find a remaining live raw query_value path for inverted index readers.
  • Error handling: New status-returning BKD encoding dispatch reports unsupported field types instead of silently continuing. Existing CLucene boundaries remain unchanged.
  • Data correctness: BKD min/max sentinel handling now uses type_limit on storage/key types, with added specializations for decimal12_t and uint24_t; I did not find an additional correctness regression not already covered by existing review threads.
  • Memory/observability/performance: No large new allocations, ownership cycles, or observability gaps stood out. Moving from raw pointers to owned Field values is a small per-query copy but removes unsafe lifetime coupling.
  • Test coverage: BE unit tests were expanded for BKD-supported primitive types and sentinel regressions. No regression-test result files were added, which is acceptable for this internal BE refactor.
  • User focus: No additional user-provided review focus was supplied.

Existing review context was checked first; I did not duplicate the existing comments about type_limit storage-type specialization or the old KeyCoder explanation.

airborne12
airborne12 previously approved these changes May 7, 2026
Copy link
Copy Markdown
Member

@airborne12 airborne12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label May 7, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

PR approved by anyone and no changes requested.

@github-actions github-actions Bot removed the approved Indicates a PR has been approved by one committer. label May 8, 2026
@airborne12
Copy link
Copy Markdown
Member

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 29387 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 12f6e4164e4e4cbdb67b8f2a5489fe0c0b47bf0a, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	18488	3869	3806	3806
q2	q3	10716	902	596	596
q4	4687	458	344	344
q5	7450	1331	1130	1130
q6	192	169	136	136
q7	953	957	757	757
q8	9763	1400	1296	1296
q9	5660	5458	5355	5355
q10	6302	2083	1798	1798
q11	485	270	258	258
q12	688	425	290	290
q13	18210	3318	2776	2776
q14	290	286	260	260
q15	q16	895	863	795	795
q17	951	1045	769	769
q18	6361	5619	5548	5548
q19	1238	1307	1033	1033
q20	524	413	260	260
q21	4600	2288	1878	1878
q22	417	354	302	302
Total cold run time: 98870 ms
Total hot run time: 29387 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4145	4063	4049	4049
q2	q3	4661	4773	4156	4156
q4	2085	2169	1376	1376
q5	4958	4963	5269	4963
q6	184	167	131	131
q7	2031	1928	1753	1753
q8	3520	3205	3170	3170
q9	8434	8422	8404	8404
q10	4545	4437	4217	4217
q11	598	414	394	394
q12	692	744	530	530
q13	3168	3545	2899	2899
q14	317	309	277	277
q15	q16	779	813	696	696
q17	1303	1313	1246	1246
q18	8205	7062	7026	7026
q19	1125	1173	1174	1173
q20	2261	2263	1980	1980
q21	6114	5338	4826	4826
q22	564	501	447	447
Total cold run time: 59689 ms
Total hot run time: 53713 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 169793 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 12f6e4164e4e4cbdb67b8f2a5489fe0c0b47bf0a, data reload: false

query5	4343	654	499	499
query6	333	212	214	212
query7	4217	573	300	300
query8	316	228	231	228
query9	8814	4028	3995	3995
query10	444	338	302	302
query11	5772	2416	2178	2178
query12	177	130	126	126
query13	1305	583	425	425
query14	6162	5382	5035	5035
query14_1	4324	4361	4305	4305
query15	209	201	184	184
query16	1068	451	418	418
query17	1129	750	608	608
query18	2728	473	342	342
query19	218	205	161	161
query20	137	140	132	132
query21	212	136	114	114
query22	13616	13538	13426	13426
query23	17105	16247	16639	16247
query23_1	16340	16237	16362	16237
query24	7925	1820	1367	1367
query24_1	1389	1377	1423	1377
query25	606	547	513	513
query26	1460	335	196	196
query27	2880	624	368	368
query28	4473	1987	1961	1961
query29	1033	650	540	540
query30	303	240	198	198
query31	1102	1045	950	950
query32	83	76	69	69
query33	540	350	301	301
query34	1166	1110	644	644
query35	748	782	670	670
query36	1343	1377	1164	1164
query37	151	100	93	93
query38	3195	3183	3061	3061
query39	919	913	876	876
query39_1	887	861	865	861
query40	235	168	135	135
query41	69	65	62	62
query42	112	109	110	109
query43	331	323	287	287
query44	
query45	206	204	195	195
query46	1115	1198	734	734
query47	2359	2323	2238	2238
query48	358	406	295	295
query49	680	510	423	423
query50	686	278	219	219
query51	4317	4230	4181	4181
query52	105	105	92	92
query53	253	275	201	201
query54	300	263	254	254
query55	97	86	81	81
query56	289	295	288	288
query57	1442	1382	1298	1298
query58	290	264	265	264
query59	1525	1591	1411	1411
query60	346	326	325	325
query61	152	151	151	151
query62	669	620	543	543
query63	238	195	200	195
query64	2381	817	676	676
query65	
query66	1673	501	391	391
query67	30029	29222	29882	29222
query68	
query69	468	383	300	300
query70	1050	964	969	964
query71	291	277	266	266
query72	2895	2699	2444	2444
query73	809	804	403	403
query74	5040	4865	4721	4721
query75	2770	2676	2322	2322
query76	2322	1151	775	775
query77	403	418	362	362
query78	12895	12964	12399	12399
query79	1490	968	747	747
query80	1403	571	477	477
query81	517	279	250	250
query82	1317	160	125	125
query83	374	272	241	241
query84	258	141	109	109
query85	934	509	441	441
query86	434	333	327	327
query87	3410	3328	3203	3203
query88	3525	2649	2635	2635
query89	435	389	333	333
query90	1843	175	174	174
query91	179	169	141	141
query92	78	77	70	70
query93	950	951	568	568
query94	691	331	288	288
query95	690	453	335	335
query96	964	782	331	331
query97	2684	2674	2541	2541
query98	243	230	229	229
query99	1106	1133	979	979
Total cold run time: 255022 ms
Total hot run time: 169793 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 78.79% (104/132) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.48% (20538/38406)
Line Coverage 37.10% (193972/522891)
Region Coverage 33.42% (151325/452730)
Branch Coverage 34.51% (66165/191714)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 86.36% (114/132) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.81% (27761/37609)
Line Coverage 57.67% (300781/521517)
Region Coverage 54.83% (250668/457151)
Branch Coverage 56.42% (108573/192442)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 86.36% (114/132) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.79% (27752/37609)
Line Coverage 57.64% (300618/521517)
Region Coverage 54.82% (250590/457151)
Branch Coverage 56.39% (108513/192442)

Comment thread be/src/core/type_limit.h Outdated
Comment thread be/src/core/type_limit.h Outdated

// DECIMALV2 storage. Largest representable DecimalV2 value (18 digits . 9 digits).
template <>
struct type_limit<decimal12_t> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我觉得这个定义可能是有问题的。 前面那些定义,都是关联到计算层的value上的,但是你加的这两个,似乎又是存储层的value

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

删了这里

return doris::Status::InternalError("unsupported BKD field type {}", static_cast<int>(ft));
}

static doris::Status encode_bkd_max_ascending(doris::FieldType ft, const doris::KeyCoder* coder,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

对于decimal 和 datetime 这种类型,他是有scale的,此时我们min和max 怎么体现呢?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个 encoding 不需要考虑 scale,因为是把里面的 int64 或者 int128 来进行encoding,scale 只是缩放比例,对于同一个列的缩放比例是一样的,所以不需要考虑。

Copy link
Copy Markdown
Contributor Author

@csun5285 csun5285 May 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

datetime 这种也不影响,他的scale 只是表示小数秒精度,不同的精度的min 和max 都是000000 和 999999,不影响大小比较。

SCOPED_RAW_TIMER(&context->stats->inverted_index_query_timer);

std::string search_str = *reinterpret_cast<const std::string*>(query_value);
std::string search_str = query_value.get<PrimitiveType::TYPE_STRING>();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typename PrimitiveTypeTraits::CppType& Field::get() { 去把field 里这个函数的定义改一下,里面加一个检查,当t != type的时候抛异常把

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

const T* value = (const T*)(iter->get_value());
RETURN_IF_ERROR(InvertedIndexQueryParamFactory::create_query_value<Type>(
value, query_param));
field_value = Field::create_field<Type>(*value);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in list predicate 的时候,只有string 类型不是计算层的类型吗? 其他的,比如date 都是计算层类型吗?
或者我们更广泛的说,predicate 里,运算的时候,都是按照计算层在计算吗?
比如date 类型,string的padding 之类的

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

predicate 里都是按照计算层在计算,读出来的列读到columnxxx 里面就是计算层的column

csun5285 and others added 8 commits May 11, 2026 11:35
…encode_field

inverted_index_query_param.h has no remaining includers; the matching
test file is a stub. Remove both. Also drop the FieldType template
parameter from bkd_encode_field — the storage value's bytes are already
correct for KeyCoder's compile-time CppType, so the explicit key_t
conversion was unnecessary. bkd_encode_min/max still need the
CppTypeTraits<FT>::CppType for the right type_limit sentinel.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Both RowCursor::_encode_field and BKD's encode_bkd_field_ascending did
the same Field -> storage value -> KeyCoder dispatch with their own
copy of the (FieldType, PrimitiveType) table. Extract the conversion
helper and the dispatch X-macro into storage/field_key_encoder.h so
both call sites share one source of truth.

- field.h: expose StorageField::key_coder() for callers that already
  have a KeyCoder-shaped helper.
- field_key_encoder.h: new header with full_encode_field_as_key /
  encode_field_as_key templates plus
  DORIS_APPLY_FOR_KEY_ENCODABLE_NON_STRING_TYPES X-macro.
- row_cursor.cpp: 19 hand-written cases collapse into one macro
  expansion; encode_non_string_field<PT> wrapper removed.
- inverted_index_reader.cpp: drops local bkd_encode_field<PT> and
  BKD_TYPE_CASES; the three encode_bkd_*_ascending functions reuse
  the shared macro.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Field-to-key encoding helpers and the dispatch X-macro fit
naturally next to KeyCoder rather than in a stand-alone header,
since they are thin wrappers around KeyCoder calls. Inline them
into storage/key_coder.h and remove storage/field_key_encoder.h.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…match indexed type

`Field::get<PT>()` DCHECKs that the Field's primitive type tag equals `PT`,
but predicates like `arr_col = []` reach `encode_bkd_field_ascending` via
`FunctionComparison<EqualsOp>` with the entire const ARRAY literal as the
query Field, so `actual = TYPE_ARRAY` while the BKD index records the inner
scalar (e.g. IPV4). Under ASAN the assert aborts the BE with
"requested IPV4, actual ARRAY" -- before the void*->Field refactor the old
factory rejected non-scalar types via NotSupported and the engine fell back,
this defense was lost when the typed dispatch moved into BKD.

Validate the Field type before dispatching to `full_encode_field_as_key<PT>`
and return INVERTED_INDEX_EVALUATE_SKIPPED on mismatch so
`SegmentIterator::_apply_index_expr` downgrades to scalar evaluation
instead of crashing on the assert. Scalar predicates (`int_col = 1`,
`array_contains(int_arr, 2)`) keep matching as before.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…pe_limit

bkd_encode_min/max are now templated by PrimitiveType instead of FieldType.
The +/- infinity sentinel is taken from type_limit<CppType> in the compute
layer (e.g. DecimalV2Value::get_min_decimal, VecDateTimeValue::datetime_min_value)
and projected onto the storage POD via PrimitiveTypeConvertor<PT>::to_storage_field_type.
This single-sources every limit constant: DecimalV2 bounds live only on
DecimalV2Value, DATE bounds only on VecDateTimeValue.

The two storage-layer type_limit<> specialisations added for decimal12_t and
uint24_t in the previous PR are no longer required and are removed along with
their includes and the half-bounded-BKD comment block. core/type_limit.h is
now exclusively a compute-layer header.

Tests: the two sanity-probe tests that asserted on the deleted specialisations
are removed; verify_bkd_range_queries (one TEST_F per BKD-supported PT) still
exercises the same +/- infinity codepath end-to-end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The DCHECK in both Field::get<T>() overloads was debug-only, so a release
build silently returned a reinterpret_cast of unrelated storage bytes when
T disagreed with the stored type. Replace it with a runtime check that
throws Exception(FatalError(...)), matching the rest of field.cpp's error
style and surfacing the bug in production.

The existing INVERTED_INDEX_EVALUATE_SKIPPED gate in encode_bkd_field_ascending
still fires first for the legitimate ARRAY-query-vs-scalar-BKD-index case so
the predicate falls back to scalar evaluation; the throw inside Field::get
is defense-in-depth for any future caller that forgets to gate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… non-string keys

`encode_field_as_key<PT>` was a near-duplicate of `full_encode_field_as_key`.
After `DORIS_APPLY_FOR_KEY_ENCODABLE_NON_STRING_TYPES` excluded strings from
its dispatch, the `index_size` argument became dead: every non-string
`KeyCoderTraits<FT>::encode_ascending` just delegates to
`full_encode_ascending` and ignores `index_size`. Drop the helper.

Also tighten `full_encode_field_as_key` with a compile-time assert via a
new `is_key_encodable_non_string_type(PrimitiveType)` constexpr derived
from the existing macro, so future callers can't accidentally feed it a
string or nested/aggregate type.

`RowCursor::_encode_field` simplifies accordingly: its non-string branch
no longer needs to distinguish `full_encode`, since both paths produce
byte-identical output for fixed-width keys. The flag still matters for
the string branch and is preserved there.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ld_as_key

Adds a TEST_F that walks an ascending sequence of compute-layer values
for every (PrimitiveType, FieldType) pair that goes through
`PrimitiveTypeConvertor<PT>` + `KeyCoder` and asserts the encoded byte
order matches.

Coverage (21 key-encodable non-string types):
  - BOOLEAN, TINYINT/SMALLINT/INT/BIGINT/LARGEINT (incl. numeric_limits)
  - FLOAT, DOUBLE (finite values + +/-infinity, skipping NaN/-0 covered
    separately by FloatComprehensiveOrdering)
  - DECIMAL32/64/128I/256: 2-3 scale interpretations each (scale=0
    boundary + a representative mid scale + max scale where applicable)
  - DECIMALV2: full int128 split via decimal12_t {int_value, frac_value}
  - DATE V1, DATETIME V1, DATEV2, DATETIMEV2 (scales 0/3/6 incl.
    microsecond=999999 -> next-second carry), TIMESTAMPTZ
  - IPV4, IPV6

Locks in:
  - the compute-layer < <=> encoded byte < contract for the
    `full_encode_field_as_key` path used by both `RowCursor::encode_key`
    (short-key index, MOW PK) and `encode_bkd_field_ascending` (BKD
    inverted index)
  - the implicit "scale does not enter the encode path" property,
    documented per scale-labelled subgroup -- scale/frac live on
    TabletColumn metadata above this layer and never reach
    PrimitiveTypeConvertor or KeyCoder

Raw decimal values are constructed as `whole * pow10_iN(scale) + frac`
so they read like `whole.frac` at the labelled scale; IPv4 values use
an `ip4(a, b, c, d)` lambda; date/time use existing `pack_d` / `pack_dt`
or `create_from_olap_date/datetime` helpers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@csun5285 csun5285 force-pushed the fix/inverted-index-query-void-ptr branch from 3ee2b4a to 56f5489 Compare May 11, 2026 03:35
@csun5285
Copy link
Copy Markdown
Contributor Author

run buildall

@yiguolei
Copy link
Copy Markdown
Contributor

/review

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label May 11, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review completed for PR 62960.

Findings: no new blocking issues found beyond the existing inline review threads already present in .opencode-review.f6cqRi/pr_review_threads.md.

Critical checkpoint conclusions:

  • Goal and coverage: the PR replaces raw inverted-index query values with typed Field values, centralizes BKD encoding through KeyCoder, removes obsolete min/max TypeInfo APIs, and adds broad BKD key/range tests. The changed implementation appears to accomplish that goal.
  • Scope and focus: changes are mostly focused on inverted-index query parameter ownership and key encoding; the larger test churn is directly related to the API change.
  • Concurrency: no new shared mutable query state, lock ordering, or background-thread behavior was introduced. Existing reader/cache lifecycles are preserved.
  • Lifecycle/static initialization: no new cross-translation-unit static dependency or unusual ownership cycle was found in the changed code.
  • Configuration/compatibility: no new config item, persisted format, thrift protocol, or rolling-upgrade compatibility concern was introduced.
  • Parallel paths: string, fulltext, BKD, predicate, expression-function, and array/IP callers were updated to pass Field query values; I did not find an omitted production caller.
  • Conditional checks/error handling: BKD type mismatch now returns INVERTED_INDEX_EVALUATE_SKIPPED, which is consistent with fallback evaluation. Existing CLucene error boundaries remain in place.
  • Testing: BE unit tests were updated and expanded for key encoding and BKD range behavior across many scalar types. I did not run the test binaries in this review environment.
  • Observability: no new observability need was identified for this refactor; existing query/cache stats and logs remain applicable.
  • Transaction/persistence/data-write correctness: not applicable; no transaction, delete-bitmap, storage-format, or committed-data visibility path was changed.
  • Performance/memory: no obvious hot-path regression was found. Passing Field by value introduces copies for string query constants, but these are per predicate/query value and replace previous heap-backed param wrappers.

User focus: no additional user-provided review focus was specified.

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 29964 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 56f5489c3e285c48e470a78059a6604e38f2dab0, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17752	4015	3947	3947
q2	q3	10712	894	621	621
q4	4656	467	350	350
q5	7460	1332	1143	1143
q6	185	170	140	140
q7	923	964	743	743
q8	9334	1358	1319	1319
q9	5629	5446	5379	5379
q10	6254	2085	1818	1818
q11	475	275	263	263
q12	628	419	298	298
q13	18106	3330	2746	2746
q14	288	286	270	270
q15	q16	907	868	800	800
q17	1035	1057	827	827
q18	6507	5766	5638	5638
q19	1256	1261	1038	1038
q20	526	387	275	275
q21	4552	2362	2005	2005
q22	485	425	344	344
Total cold run time: 97670 ms
Total hot run time: 29964 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4744	4848	4771	4771
q2	q3	4638	4805	4177	4177
q4	2147	2180	1400	1400
q5	4985	5022	5291	5022
q6	189	169	135	135
q7	2095	1826	1616	1616
q8	3479	3085	3134	3085
q9	8517	8400	8527	8400
q10	4480	4529	4229	4229
q11	582	407	383	383
q12	682	742	523	523
q13	3257	3563	3085	3085
q14	294	303	270	270
q15	q16	752	776	691	691
q17	1333	1320	1258	1258
q18	7985	7165	7081	7081
q19	1176	1125	1127	1125
q20	2211	2235	1929	1929
q21	6065	5443	4865	4865
q22	540	520	431	431
Total cold run time: 60151 ms
Total hot run time: 54476 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 170742 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 56f5489c3e285c48e470a78059a6604e38f2dab0, data reload: false

query5	4319	659	521	521
query6	328	232	214	214
query7	4303	549	303	303
query8	323	232	214	214
query9	8810	4037	4043	4037
query10	481	353	294	294
query11	5811	2372	2212	2212
query12	187	130	131	130
query13	1306	607	434	434
query14	6036	5359	5053	5053
query14_1	4403	4363	4376	4363
query15	217	204	182	182
query16	1024	486	466	466
query17	1164	790	614	614
query18	2444	469	345	345
query19	233	199	159	159
query20	144	130	135	130
query21	220	132	125	125
query22	13712	13449	13510	13449
query23	17189	16309	15954	15954
query23_1	16034	16062	16212	16062
query24	7415	1748	1327	1327
query24_1	1352	1361	1358	1358
query25	579	484	444	444
query26	1294	315	171	171
query27	2723	602	339	339
query28	4430	1958	1985	1958
query29	1020	623	511	511
query30	305	234	197	197
query31	1113	1074	940	940
query32	82	73	75	73
query33	537	356	286	286
query34	1174	1145	655	655
query35	749	779	665	665
query36	1329	1334	1185	1185
query37	148	103	96	96
query38	3203	3133	3030	3030
query39	949	914	911	911
query39_1	877	866	887	866
query40	240	157	139	139
query41	65	63	64	63
query42	113	112	112	112
query43	328	326	293	293
query44	
query45	211	198	198	198
query46	1099	1200	749	749
query47	2285	2303	2209	2209
query48	385	412	299	299
query49	659	545	453	453
query50	709	287	219	219
query51	4233	4228	4245	4228
query52	108	108	97	97
query53	256	279	218	218
query54	330	290	270	270
query55	94	93	86	86
query56	312	327	322	322
query57	1437	1426	1326	1326
query58	304	282	284	282
query59	1503	1643	1433	1433
query60	357	343	347	343
query61	185	181	179	179
query62	664	619	569	569
query63	248	207	211	207
query64	2541	902	743	743
query65	
query66	1787	533	442	442
query67	29956	29975	29866	29866
query68	
query69	474	335	307	307
query70	964	1024	967	967
query71	305	279	270	270
query72	2971	2720	2180	2180
query73	859	732	421	421
query74	5110	4897	4739	4739
query75	2782	2662	2315	2315
query76	2275	1131	753	753
query77	431	425	347	347
query78	12965	12893	12433	12433
query79	1476	1011	717	717
query80	1338	562	491	491
query81	533	281	239	239
query82	1202	163	116	116
query83	350	280	255	255
query84	257	145	113	113
query85	901	504	435	435
query86	438	340	339	339
query87	3419	3316	3216	3216
query88	3513	2675	2633	2633
query89	448	380	336	336
query90	1924	177	174	174
query91	180	168	139	139
query92	79	83	70	70
query93	1001	959	555	555
query94	732	347	301	301
query95	658	382	352	352
query96	1079	782	314	314
query97	2697	2662	2579	2579
query98	254	237	235	235
query99	1120	1112	972	972
Total cold run time: 254068 ms
Total hot run time: 170742 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 78.62% (114/145) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.78% (27826/37714)
Line Coverage 57.64% (301239/522618)
Region Coverage 54.90% (251499/458080)
Branch Coverage 56.42% (108757/192753)

3 similar comments
@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 78.62% (114/145) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.78% (27826/37714)
Line Coverage 57.64% (301239/522618)
Region Coverage 54.90% (251499/458080)
Branch Coverage 56.42% (108757/192753)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 78.62% (114/145) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.78% (27826/37714)
Line Coverage 57.64% (301239/522618)
Region Coverage 54.90% (251499/458080)
Branch Coverage 56.42% (108757/192753)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 78.62% (114/145) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.78% (27826/37714)
Line Coverage 57.64% (301239/522618)
Region Coverage 54.90% (251499/458080)
Branch Coverage 56.42% (108757/192753)

Copy link
Copy Markdown
Member

@airborne12 airborne12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@airborne12 airborne12 merged commit 789ec33 into apache:master May 11, 2026
32 of 33 checks passed
yiguolei pushed a commit that referenced this pull request May 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.1.2-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants