Skip to content

[fix](variant) Fix variant column space usage showing as 0#61331

Closed
hoshinojyunn wants to merge 1 commit intoapache:masterfrom
hoshinojyunn:fix_variant_data_size
Closed

[fix](variant) Fix variant column space usage showing as 0#61331
hoshinojyunn wants to merge 1 commit intoapache:masterfrom
hoshinojyunn:fix_variant_data_size

Conversation

@hoshinojyunn
Copy link
Contributor

@hoshinojyunn hoshinojyunn commented Mar 14, 2026

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:
The variant column in the table contains data; however, when querying the space occupied by each column in the table, the following query reveals that the space usage for the 'variant' column consistently appears as 0:

SELECT     
  COLUMN_NAME,     
  sum(COMPRESSED_DATA_BYTES) AS compressed_data_bytes,
  sum(UNCOMPRESSED_DATA_BYTES) AS uncompressed_data_bytes, 
  sum(RAW_DATA_BYTES) as raw_data_bytes 
FROM information_schema.column_data_sizes WHERE table_id={table_id}
GROUP BY COLUMN_NAME, COLUMN_TYPE
ORDER BY compressed_data_bytes
desc;

Fix:
Implement three methods in VariantDocWriter, UnifiedSparseColumnWriter, VariantDocCompactWriter, and VariantColumnWriterImpl to align with the ColumnWriter interface:

uint64_t get_raw_data_bytes() const;
uint64_t get_total_uncompressed_data_pages_bytes() const;
uint64_t get_total_compressed_data_pages_bytes() const;

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Mar 14, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@hoshinojyunn
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 26817 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit f6504d0b70cf30c712b4e882f9861af10e23693b, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17694	4396	4288	4288
q2	q3	10643	781	505	505
q4	4673	370	255	255
q5	7563	1204	1026	1026
q6	178	178	147	147
q7	794	858	692	692
q8	9300	1468	1344	1344
q9	4872	4788	4768	4768
q10	6339	1902	1665	1665
q11	441	270	257	257
q12	755	572	468	468
q13	18055	2929	2172	2172
q14	228	233	225	225
q15	q16	747	743	661	661
q17	726	834	431	431
q18	5988	5398	5198	5198
q19	1132	1015	620	620
q20	532	489	388	388
q21	4521	1851	1412	1412
q22	418	492	295	295
Total cold run time: 95599 ms
Total hot run time: 26817 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4842	4648	4626	4626
q2	q3	3870	4360	3833	3833
q4	916	1234	819	819
q5	4148	4411	4368	4368
q6	202	189	151	151
q7	1843	1683	1524	1524
q8	2515	2766	2617	2617
q9	7476	7480	7396	7396
q10	3760	4002	3583	3583
q11	512	436	421	421
q12	549	653	477	477
q13	2855	3382	2430	2430
q14	289	333	296	296
q15	q16	715	783	720	720
q17	1161	1410	1342	1342
q18	7109	6860	6659	6659
q19	1010	925	899	899
q20	2083	2161	1955	1955
q21	3976	3478	3359	3359
q22	459	429	389	389
Total cold run time: 50290 ms
Total hot run time: 47864 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 168181 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit f6504d0b70cf30c712b4e882f9861af10e23693b, data reload: false

query5	4317	647	526	526
query6	328	237	215	215
query7	4209	470	277	277
query8	339	253	242	242
query9	8695	2712	2726	2712
query10	494	371	350	350
query11	7029	5106	4893	4893
query12	183	122	123	122
query13	1277	456	348	348
query14	5585	3654	3461	3461
query14_1	2820	2803	2819	2803
query15	215	195	180	180
query16	989	461	454	454
query17	872	727	626	626
query18	2440	451	356	356
query19	211	206	185	185
query20	132	130	125	125
query21	215	131	114	114
query22	13418	14044	14338	14044
query23	16324	15834	15528	15528
query23_1	15865	16055	15532	15532
query24	7310	1617	1228	1228
query24_1	1225	1227	1236	1227
query25	564	482	429	429
query26	1243	305	142	142
query27	2747	483	313	313
query28	4491	1856	1826	1826
query29	856	552	464	464
query30	298	231	188	188
query31	1006	936	867	867
query32	87	71	70	70
query33	519	329	274	274
query34	914	865	530	530
query35	634	673	597	597
query36	1057	1141	960	960
query37	139	90	85	85
query38	2947	2958	2883	2883
query39	861	829	821	821
query39_1	797	793	789	789
query40	233	150	136	136
query41	75	61	60	60
query42	256	255	256	255
query43	247	253	220	220
query44	
query45	200	192	183	183
query46	894	982	611	611
query47	2076	2145	2018	2018
query48	321	316	245	245
query49	624	500	387	387
query50	687	273	213	213
query51	4102	4088	4023	4023
query52	266	266	262	262
query53	290	335	283	283
query54	304	270	276	270
query55	89	88	83	83
query56	310	328	310	310
query57	1958	1708	1718	1708
query58	285	288	274	274
query59	2767	2947	2753	2753
query60	333	334	327	327
query61	144	148	149	148
query62	637	595	566	566
query63	317	280	271	271
query64	5116	1255	983	983
query65	
query66	1471	460	362	362
query67	24303	24381	24209	24209
query68	
query69	392	318	280	280
query70	943	995	960	960
query71	337	315	299	299
query72	2705	2615	2413	2413
query73	529	557	318	318
query74	9605	9589	9359	9359
query75	2862	2739	2502	2502
query76	2287	1018	659	659
query77	368	367	307	307
query78	10923	11113	10406	10406
query79	2380	779	590	590
query80	1748	634	545	545
query81	549	253	223	223
query82	996	149	123	123
query83	334	264	244	244
query84	300	113	103	103
query85	918	483	442	442
query86	407	303	276	276
query87	3194	3096	3063	3063
query88	3575	2641	2663	2641
query89	428	371	342	342
query90	2031	183	179	179
query91	171	156	134	134
query92	75	73	68	68
query93	1026	836	491	491
query94	630	328	287	287
query95	589	341	384	341
query96	658	511	227	227
query97	2465	2479	2427	2427
query98	250	225	218	218
query99	1008	1007	910	910
Total cold run time: 250767 ms
Total hot run time: 168181 ms

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.71% (19749/37467)
Line Coverage 36.27% (184472/508548)
Region Coverage 32.41% (142498/439616)
Branch Coverage 33.60% (62266/185329)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.69% (26290/36672)
Line Coverage 54.47% (276079/506826)
Region Coverage 51.72% (229479/443664)
Branch Coverage 53.12% (98698/185811)

@hoshinojyunn hoshinojyunn force-pushed the fix_variant_data_size branch from f6504d0 to 7294990 Compare March 15, 2026 08:14
@hoshinojyunn
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.63% (19715/37457)
Line Coverage 36.22% (184158/508437)
Region Coverage 32.36% (142159/439239)
Branch Coverage 33.55% (62147/185238)

@doris-robot
Copy link

TPC-H: Total hot run time: 27135 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 7294990e8c3b796d6e7585ae84042e7897e8697f, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17609	4463	4287	4287
q2	q3	10650	801	527	527
q4	4698	371	249	249
q5	7777	1211	1034	1034
q6	186	176	146	146
q7	823	855	669	669
q8	10307	1484	1349	1349
q9	5433	4762	4728	4728
q10	6321	1948	1646	1646
q11	491	259	241	241
q12	766	565	468	468
q13	18075	2962	2198	2198
q14	235	229	215	215
q15	q16	749	748	678	678
q17	743	824	470	470
q18	6312	5440	5364	5364
q19	1197	983	627	627
q20	535	484	381	381
q21	4436	2129	1556	1556
q22	381	328	302	302
Total cold run time: 97724 ms
Total hot run time: 27135 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4681	4600	4683	4600
q2	q3	3891	4377	3818	3818
q4	846	1195	765	765
q5	4047	4429	4527	4429
q6	193	172	141	141
q7	1779	1727	1601	1601
q8	2445	2715	2609	2609
q9	7716	7366	7467	7366
q10	3785	3944	3539	3539
q11	527	449	420	420
q12	500	610	548	548
q13	2853	3235	2312	2312
q14	305	309	285	285
q15	q16	738	753	739	739
q17	1220	1391	1367	1367
q18	7488	7099	6787	6787
q19	958	966	938	938
q20	2051	2193	1984	1984
q21	3997	3454	3362	3362
q22	486	427	380	380
Total cold run time: 50506 ms
Total hot run time: 47990 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 168579 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 7294990e8c3b796d6e7585ae84042e7897e8697f, data reload: false

query5	4327	658	538	538
query6	324	232	216	216
query7	4207	488	266	266
query8	338	240	230	230
query9	8748	2763	2723	2723
query10	501	369	346	346
query11	7034	5089	4871	4871
query12	196	132	128	128
query13	1270	469	358	358
query14	5769	3737	3542	3542
query14_1	2860	2816	2831	2816
query15	205	197	176	176
query16	972	467	475	467
query17	883	721	639	639
query18	2449	491	341	341
query19	215	215	179	179
query20	131	129	126	126
query21	221	126	113	113
query22	13155	14176	14595	14176
query23	16049	15895	15587	15587
query23_1	15677	15664	15318	15318
query24	7185	1580	1189	1189
query24_1	1225	1232	1227	1227
query25	553	457	404	404
query26	1254	269	159	159
query27	2755	498	296	296
query28	4477	1855	1841	1841
query29	858	600	462	462
query30	298	233	191	191
query31	1007	953	890	890
query32	78	74	71	71
query33	499	340	284	284
query34	891	873	522	522
query35	640	681	599	599
query36	1074	1096	960	960
query37	144	99	83	83
query38	2909	2942	2934	2934
query39	859	832	818	818
query39_1	788	790	783	783
query40	225	149	136	136
query41	63	60	61	60
query42	255	254	257	254
query43	239	262	218	218
query44	
query45	199	190	183	183
query46	891	988	605	605
query47	2642	2098	2042	2042
query48	332	336	234	234
query49	632	461	389	389
query50	698	283	212	212
query51	4063	4012	4074	4012
query52	261	265	255	255
query53	328	333	290	290
query54	298	272	269	269
query55	97	90	83	83
query56	329	370	312	312
query57	1933	1780	1740	1740
query58	288	281	289	281
query59	2793	2940	2724	2724
query60	358	339	333	333
query61	153	141	150	141
query62	639	594	548	548
query63	316	280	273	273
query64	5149	1234	999	999
query65	
query66	1481	459	366	366
query67	24249	24306	24150	24150
query68	
query69	395	311	296	296
query70	977	981	953	953
query71	348	323	310	310
query72	2781	2798	2574	2574
query73	542	557	328	328
query74	9631	9556	9363	9363
query75	2838	2744	2504	2504
query76	2295	1036	700	700
query77	362	398	325	325
query78	10846	11075	10471	10471
query79	1115	827	602	602
query80	1356	690	564	564
query81	555	270	228	228
query82	988	157	121	121
query83	345	305	259	259
query84	304	129	102	102
query85	967	499	449	449
query86	408	304	318	304
query87	3253	3189	3015	3015
query88	3584	2662	2653	2653
query89	424	370	345	345
query90	2028	189	186	186
query91	167	167	140	140
query92	76	77	75	75
query93	969	897	510	510
query94	632	342	296	296
query95	578	349	314	314
query96	642	529	229	229
query97	2473	2468	2390	2390
query98	234	225	221	221
query99	957	977	923	923
Total cold run time: 249492 ms
Total hot run time: 168579 ms

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.59% (26246/36663)
Line Coverage 54.38% (275555/506724)
Region Coverage 51.56% (228581/443288)
Branch Coverage 52.99% (98410/185720)

1 similar comment
@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.59% (26246/36663)
Line Coverage 54.38% (275555/506724)
Region Coverage 51.56% (228581/443288)
Branch Coverage 52.99% (98410/185720)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants