Skip to content

[fix](be) Bound BitmapValue::deserialize to prevent heap over-read on tampered BITMAP payloads#63702

Open
jacktengg wants to merge 2 commits into
apache:masterfrom
jacktengg:260526-bitmap-improve
Open

[fix](be) Bound BitmapValue::deserialize to prevent heap over-read on tampered BITMAP payloads#63702
jacktengg wants to merge 2 commits into
apache:masterfrom
jacktengg:260526-bitmap-improve

Conversation

@jacktengg
Copy link
Copy Markdown
Contributor

Problem Summary:
BitmapValue::deserialize(const char* src) lacked a buffer length and called upstream Roaring64Map::read(src) whose documentation explicitly warns it is unsafe: "if you provide bad data, many bytes could be read, possibly causing a buffer overflow". The function trusts an inline map_size varint (unbounded) and the per-bucket roaring::Roaring::read (also unsafe, no maxbytes). A crafted/corrupt BITMAP payload, e.g. one fed through bitmap_from_base64('BP////8P') (BITMAP64 + varint UINT32_MAX), would loop deep past the end of the input buffer.

The pre-existing try/catch(std::runtime_error) only catches roaring's own exceptions; a silent over-read does not necessarily throw and ASAN flags it as heap-buffer-overflow.

Fix:

  • Add Roaring64Map::readSafe(buf, maxbytes) built on the CRoaring roaring_bitmap_*deserialize_safe primitives, with an explicit upper bound on the outer map_size varint for BITMAP64.
  • Add a bounded BitmapValue::deserialize(const char* src, size_t maxbytes) that validates every per-branch size (SINGLE32/SINGLE64, BITMAP32/64, v2 portable, SET, SET_V2) before reading and catches both std::runtime_error and doris::Exception.
  • Replace the unsafe BitmapValue(const char*) constructor with BitmapValue(const char*, size_t maxbytes) that throws on failure.
  • Migrate all untrusted callers (data_type_bitmap_serde.cpp, function_bitmap.cpp / bitmap_from_base64, data_type_bitmap.cpp, column_complex.h) to pass the actual buffer length.
  • Harden BitmapIntersect<T>::deserialize similarly: Helper::read_from now bounds-checks every read (POD, datetime, decimal, string), and BitmapIntersect::deserialize and its constructor take an explicit maxbytes. Update the single caller in aggregate_function_orthogonal_bitmap.h.
  • Drop the unused BitmapExprCalculation(const char*) constructor.

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@jacktengg
Copy link
Copy Markdown
Contributor Author

/review

@jacktengg
Copy link
Copy Markdown
Contributor Author

run buildall

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found blocking issues in the error propagation around the new bounded bitmap deserialization. The core bound checks are directionally correct, and the added unit/regression cases cover the main malicious payload shapes, but several migrated call sites still ignore false from BitmapValue::deserialize, so malformed bitmap payloads can be accepted as default or partially initialized values instead of failing.

Critical checkpoint conclusions:

  • Goal/test coverage: The PR aims to bound bitmap deserialization and adds relevant BE/regression tests, but the goal is not fully achieved because not all migrated call sites fail on decode errors.
  • Scope/clarity: The change is focused on bitmap deserialization hardening.
  • Concurrency/lifecycle: No new concurrency or special lifecycle concerns found.
  • Configuration/compatibility: No new config or wire-format/storage-format change; existing format is read with extra bounds.
  • Parallel paths: Some parallel deserialization paths were updated but still miss return-value handling, as noted inline.
  • Error handling: Blocking issue: decode failures are silently ignored in some paths, violating Error Means Failure.
  • Data correctness: Blocking issue: malformed serialized bitmap data can become an empty/partial bitmap and produce incorrect results.
  • Memory/performance/observability: No additional blocking concerns found beyond potential noisy ERROR logs on rejected user input.
  • User focus: No additional user-provided review focus was present.

data_ptr += unaligned_load<size_t>(&meta_ptr[i]);
size_t one_size = unaligned_load<size_t>(&meta_ptr[i]);
data[i].deserialize(data_ptr, one_size);
data_ptr += one_size;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BitmapValue::deserialize now returns false for truncated or malformed input, but this block deserialization path still ignores that result. If a serialized block carries a bad bitmap payload, data[i] remains default/partially initialized and the reader advances by the declared size, so execution can continue with wrong bitmap values instead of failing. Please check the return value here and raise/propagate an error; the same applies to deserialize_as_stream below, which currently ignores value.deserialize(ref.data, ref.size) as well.

}

if constexpr (T == TYPE_BITMAP) {
pvalue->deserialize(pos);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This newly bounded decode can fail, but insert_binary_data has already inserted a default value and then ignores the false return. A malformed bitmap binary cell will therefore stay in the column as an empty/default bitmap rather than making the load/read fail, which violates the hardening goal and can silently corrupt query results. Please check the return value and throw/propagate an error before leaving the inserted value in place.

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31507 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 0425da0f720781fc4f48fde9d1044e66f2be6714, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17632	4097	4001	4001
q2	q3	10847	1373	821	821
q4	4716	474	344	344
q5	8190	2357	2131	2131
q6	356	180	141	141
q7	966	803	641	641
q8	9623	1705	1559	1559
q9	7011	4979	4970	4970
q10	6478	2221	1890	1890
q11	446	268	243	243
q12	693	441	286	286
q13	18202	3374	2776	2776
q14	267	254	233	233
q15	q16	810	764	709	709
q17	1006	939	1008	939
q18	7140	5749	5495	5495
q19	1309	1365	1104	1104
q20	559	413	285	285
q21	6144	2644	2621	2621
q22	438	367	318	318
Total cold run time: 102833 ms
Total hot run time: 31507 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4879	4939	4713	4713
q2	q3	4853	5263	4680	4680
q4	2130	2261	1427	1427
q5	4949	4713	4735	4713
q6	231	179	128	128
q7	1889	1761	1429	1429
q8	2253	1962	1956	1956
q9	7407	7435	7393	7393
q10	4737	4682	4221	4221
q11	539	392	360	360
q12	751	751	541	541
q13	3013	3376	2790	2790
q14	276	294	246	246
q15	q16	730	711	606	606
q17	1312	1280	1269	1269
q18	7349	6963	6997	6963
q19	1109	1120	1110	1110
q20	2219	2219	1937	1937
q21	5336	4634	4485	4485
q22	533	464	430	430
Total cold run time: 56495 ms
Total hot run time: 51397 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 171898 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 0425da0f720781fc4f48fde9d1044e66f2be6714, data reload: false

query5	4332	671	525	525
query6	329	225	196	196
query7	4220	567	310	310
query8	329	236	214	214
query9	8857	4064	4061	4061
query10	465	342	309	309
query11	5790	2621	2238	2238
query12	181	131	136	131
query13	1301	585	430	430
query14	6202	5501	5194	5194
query14_1	4520	4502	4534	4502
query15	225	213	186	186
query16	1080	481	460	460
query17	1174	759	631	631
query18	2721	482	350	350
query19	218	205	159	159
query20	134	132	127	127
query21	217	138	115	115
query22	13686	13598	13392	13392
query23	17395	16585	16283	16283
query23_1	16370	16325	16375	16325
query24	7548	1785	1290	1290
query24_1	1316	1300	1298	1298
query25	553	477	404	404
query26	1321	325	187	187
query27	2651	547	350	350
query28	4420	1982	1970	1970
query29	994	628	484	484
query30	307	236	197	197
query31	1125	1078	955	955
query32	88	82	74	74
query33	564	352	290	290
query34	1180	1123	679	679
query35	779	790	696	696
query36	1397	1456	1253	1253
query37	156	109	91	91
query38	3206	3161	3059	3059
query39	929	923	919	919
query39_1	878	889	880	880
query40	228	149	126	126
query41	65	65	64	64
query42	110	138	105	105
query43	333	331	295	295
query44	
query45	216	201	196	196
query46	1092	1182	743	743
query47	2305	2362	2278	2278
query48	378	416	302	302
query49	643	496	388	388
query50	993	371	258	258
query51	4353	4322	4375	4322
query52	102	104	92	92
query53	262	287	205	205
query54	330	277	252	252
query55	94	94	84	84
query56	303	302	310	302
query57	1445	1421	1325	1325
query58	303	291	258	258
query59	1569	1635	1412	1412
query60	319	319	309	309
query61	168	164	157	157
query62	704	652	591	591
query63	246	206	208	206
query64	2325	824	653	653
query65	
query66	1634	475	360	360
query67	29656	29557	29547	29547
query68	
query69	518	337	304	304
query70	978	959	994	959
query71	306	281	266	266
query72	3035	2729	2402	2402
query73	860	779	398	398
query74	5121	4909	4780	4780
query75	2727	2613	2270	2270
query76	2320	1150	786	786
query77	405	414	330	330
query78	12391	12375	11880	11880
query79	1269	1063	709	709
query80	562	549	498	498
query81	453	278	248	248
query82	236	161	123	123
query83	280	285	255	255
query84	259	150	116	116
query85	967	626	546	546
query86	394	332	319	319
query87	3458	3393	3269	3269
query88	3622	2786	2729	2729
query89	431	401	358	358
query90	2145	186	186	186
query91	193	185	159	159
query92	84	79	72	72
query93	1510	1582	878	878
query94	551	346	337	337
query95	683	491	371	371
query96	1008	797	348	348
query97	2764	2745	2607	2607
query98	244	232	249	232
query99	1187	1161	1043	1043
Total cold run time: 253331 ms
Total hot run time: 171898 ms

jacktengg added 2 commits May 27, 2026 10:04
… tampered BITMAP payloads

Problem Summary:
`BitmapValue::deserialize(const char* src)` lacked a buffer length and
called upstream `Roaring64Map::read(src)` whose documentation explicitly
warns it is unsafe: "if you provide bad data, many bytes could be read,
possibly causing a buffer overflow". The function trusts an inline
`map_size` varint (unbounded) and the per-bucket `roaring::Roaring::read`
(also unsafe, no `maxbytes`). A crafted/corrupt BITMAP payload, e.g. one
fed through `bitmap_from_base64('BP////8P')` (BITMAP64 + varint
UINT32_MAX), would loop deep past the end of the input buffer.

The pre-existing `try/catch(std::runtime_error)` only catches roaring's
own exceptions; a silent over-read does not necessarily throw and ASAN
flags it as heap-buffer-overflow.

Fix:
- Add `Roaring64Map::readSafe(buf, maxbytes)` built on the CRoaring
  `roaring_bitmap_*deserialize_safe` primitives, with an explicit upper
  bound on the outer `map_size` varint for BITMAP64.
- Add a bounded `BitmapValue::deserialize(const char* src, size_t maxbytes)`
  that validates every per-branch size (SINGLE32/SINGLE64, BITMAP32/64,
  v2 portable, SET, SET_V2) before reading and catches both
  `std::runtime_error` and `doris::Exception`.
- Replace the unsafe `BitmapValue(const char*)` constructor with
  `BitmapValue(const char*, size_t maxbytes)` that throws on failure.
- Migrate all untrusted callers (`data_type_bitmap_serde.cpp`,
  `function_bitmap.cpp` / `bitmap_from_base64`, `data_type_bitmap.cpp`,
  `column_complex.h`) to pass the actual buffer length.
- Harden `BitmapIntersect<T>::deserialize` similarly: `Helper::read_from`
  now bounds-checks every read (POD, datetime, decimal, string), and
  `BitmapIntersect::deserialize` and its constructor take an explicit
  `maxbytes`. Update the single caller in
  `aggregate_function_orthogonal_bitmap.h`.
- Drop the unused `BitmapExprCalculation(const char*)` constructor.

- Test:
    - Regression test: added 6 `bitmap_from_base64` cases in
      `test_bitmap_function.groovy` covering BITMAP64 / BITMAP64_V2 / SET_V2
      oversized-size payloads and three invalid type-code payloads. All
      expect the new decode-failure exception.
    - Unit Test: added 5 BE cases in `bitmap_value_test.cpp`
      (`deserialize_malicious_bitmap64_map_size`,
      `deserialize_malicious_bitmap64v2_map_size`,
      `deserialize_truncated_single`,
      `deserialize_malicious_set_v2`,
      `deserialize_bounded_roundtrip`). Ran
      `./run-be-ut.sh --run --filter='BitmapValueTest.*'` — 34/34 pass under
      ASAN.
@jacktengg jacktengg force-pushed the 260526-bitmap-improve branch from 0425da0 to e776293 Compare May 27, 2026 02:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants