Skip to content

[Fix](variance) Fix sample variance/stddev null res for single value#63605

Open
linrrzqqq wants to merge 2 commits into
apache:masterfrom
linrrzqqq:fix-sample-single-val
Open

[Fix](variance) Fix sample variance/stddev null res for single value#63605
linrrzqqq wants to merge 2 commits into
apache:masterfrom
linrrzqqq:fix-sample-single-val

Conversation

@linrrzqqq
Copy link
Copy Markdown
Collaborator

@linrrzqqq linrrzqqq commented May 25, 2026

Problem Summary:

Fix VAR_SAMP, VARIANCE_SAMP, and STDDEV_SAMP to return NULL when the number of valid input values is less than or equal to 1. Sample variance/stddev are undefined for n <= 1, so returning 0.0 is misleading.

before:

CREATE TABLE t (id INT, v DOUBLE) DUPLICATE KEY(id) DISTRIBUTED BY HASH(id) BUCKETS 1 PROPERTIES('replication_num'='1');
INSERT INTO t VALUES (1, 5.0);  -- 单行

SELECT VAR_SAMP(v), STDDEV_SAMP(v) FROM t;
+-------------+----------------+
| VAR_SAMP(v) | STDDEV_SAMP(v) |
+-------------+----------------+
|           0 |              0 |
+-------------+----------------+

now:

SELECT VAR_SAMP(v), STDDEV_SAMP(v) FROM t;
+-------------+----------------+
| VAR_SAMP(v) | STDDEV_SAMP(v) |
+-------------+----------------+
|         NaN |            NaN |
+-------------+----------------+

doc: apache/doris-website#3765

@linrrzqqq
Copy link
Copy Markdown
Collaborator Author

/review

@linrrzqqq
Copy link
Copy Markdown
Collaborator Author

run buildall

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review result: no blocking issues found. Critical checkpoints: goal is clear and implemented: sample variance/stddev now return NULL for zero or one non-null input, including grouped results; tests cover existing nullable aggregate suites plus a single-row aggregate case. Scope is small and focused on the two sample aggregate functions plus expected output changes. Concurrency/lifecycle: no new shared state, threads, locks, or lifecycle-sensitive objects. Configuration/compatibility: no configs added; BE function compatibility restriction remains in place and aggregate state serialization is unchanged. Parallel paths: variance_samp, var_samp alias, and stddev_samp paths are covered; population variance/stddev behavior is intentionally unchanged. Error handling/memory: no ignored Status paths or new owned allocations. Data correctness: nullable inputs are skipped and result nullability now matches SQL sample aggregate semantics for count <= 1. Tests: regression expectations were updated; I did not run the regression suite in this review runner. User focus: no additional user-provided review focus.

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31648 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 0d3684c8240240a8d33abfaea6a896b884d8301a, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17638	4105	4110	4105
q2	q3	10815	1419	824	824
q4	4686	471	354	354
q5	7642	2256	2110	2110
q6	248	180	140	140
q7	1011	760	658	658
q8	9365	1768	1612	1612
q9	6568	4998	4966	4966
q10	6471	2219	1954	1954
q11	458	274	241	241
q12	703	428	298	298
q13	18184	3344	2725	2725
q14	262	255	233	233
q15	q16	815	787	710	710
q17	971	990	929	929
q18	6917	5878	5759	5759
q19	1240	1281	1054	1054
q20	515	383	265	265
q21	5745	2613	2402	2402
q22	428	368	309	309
Total cold run time: 100682 ms
Total hot run time: 31648 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4513	4452	4450	4450
q2	q3	4581	5005	4329	4329
q4	2272	2324	1477	1477
q5	4512	4397	5014	4397
q6	262	216	156	156
q7	2096	1976	1681	1681
q8	2630	2325	2291	2291
q9	8268	8084	8160	8084
q10	4829	4970	4369	4369
q11	612	476	412	412
q12	820	792	585	585
q13	3280	3706	2969	2969
q14	289	323	274	274
q15	q16	765	777	676	676
q17	1421	1414	1394	1394
q18	8457	7551	6822	6822
q19	1086	1108	1105	1105
q20	2251	2237	1970	1970
q21	5438	4806	4649	4649
q22	553	457	420	420
Total cold run time: 58935 ms
Total hot run time: 52510 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 172613 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 0d3684c8240240a8d33abfaea6a896b884d8301a, data reload: false

query5	4315	671	516	516
query6	347	222	198	198
query7	4257	545	297	297
query8	327	230	219	219
query9	8836	4162	4162	4162
query10	453	358	306	306
query11	5816	2535	2264	2264
query12	181	133	127	127
query13	1337	610	444	444
query14	6301	5609	5345	5345
query14_1	4641	4643	4632	4632
query15	219	210	192	192
query16	1015	463	446	446
query17	1160	751	626	626
query18	2674	503	368	368
query19	219	206	168	168
query20	142	145	133	133
query21	216	142	121	121
query22	13665	13555	13395	13395
query23	17410	16562	16294	16294
query23_1	16397	16353	16417	16353
query24	7794	1788	1331	1331
query24_1	1308	1322	1363	1322
query25	595	495	446	446
query26	1312	313	177	177
query27	2707	598	361	361
query28	4447	2034	2027	2027
query29	1052	655	529	529
query30	310	243	202	202
query31	1143	1095	970	970
query32	87	75	81	75
query33	560	366	308	308
query34	1177	1186	683	683
query35	778	792	694	694
query36	1379	1420	1274	1274
query37	153	101	97	97
query38	3272	3195	3078	3078
query39	906	902	863	863
query39_1	870	859	872	859
query40	226	147	131	131
query41	66	64	61	61
query42	110	110	112	110
query43	328	331	299	299
query44	
query45	217	204	196	196
query46	1080	1229	769	769
query47	2404	2324	2234	2234
query48	409	418	305	305
query49	639	499	410	410
query50	1095	362	266	266
query51	4432	4372	4338	4338
query52	117	107	99	99
query53	259	304	205	205
query54	324	292	258	258
query55	95	98	87	87
query56	306	309	313	309
query57	1438	1382	1323	1323
query58	298	266	296	266
query59	1575	1675	1447	1447
query60	320	324	311	311
query61	157	155	153	153
query62	710	655	571	571
query63	239	205	206	205
query64	2375	781	630	630
query65	
query66	1694	489	357	357
query67	29440	29998	29830	29830
query68	
query69	469	354	311	311
query70	1040	1030	970	970
query71	302	283	274	274
query72	2987	2796	2451	2451
query73	834	742	460	460
query74	4861	4682	4559	4559
query75	2689	2599	2275	2275
query76	2294	1186	791	791
query77	416	438	342	342
query78	12489	12467	11833	11833
query79	1473	1009	734	734
query80	1361	547	450	450
query81	518	284	237	237
query82	1219	161	121	121
query83	368	300	255	255
query84	267	140	110	110
query85	947	551	450	450
query86	458	358	323	323
query87	3470	3422	3279	3279
query88	3668	2774	2751	2751
query89	451	387	343	343
query90	1896	187	185	185
query91	177	167	146	146
query92	82	78	74	74
query93	1529	1375	905	905
query94	724	339	313	313
query95	697	483	351	351
query96	1013	778	355	355
query97	2759	2708	2597	2597
query98	239	226	234	226
query99	1158	1165	1030	1030
Total cold run time: 255960 ms
Total hot run time: 172613 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 100.00% (6/6) 🎉
Increment coverage report
Complete coverage report

@linrrzqqq
Copy link
Copy Markdown
Collaborator Author

run buildall

@linrrzqqq
Copy link
Copy Markdown
Collaborator Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review opinion: no blocking issues found in the PR diff.

Critical checkpoint conclusions:

  • Goal/test proof: The PR changes VAR_SAMP/VARIANCE_SAMP/STDDEV_SAMP to return NULL for zero or one valid input and updates aggregate/window/foreach regression outputs covering the changed behavior.
  • Scope: The actual GitHub PR diff is focused on sample variance/stddev return nullability plus expected results.
  • Concurrency/lifecycle: No new concurrency, locking, thread lifecycle, or static initialization concerns were introduced in the reviewed diff.
  • Configuration/compatibility: No new configuration items. BE already registers variance_samp/stddev_samp as restricted function compatibility paths; aggregate state serialization is unchanged.
  • Parallel paths: BE execution, Nereids FE nullability, aliases through existing registration, window outputs, and foreach outputs were considered. I did not find a missed parallel path in the PR diff.
  • Conditional checks: The count <= 1 result-null condition matches the sample variance/stddev definition and null inputs continue to be ignored.
  • Tests/results: Regression expected outputs were updated for normal aggregate, nullable aggregate, window, MV, and foreach cases. I did not run the tests locally in this review runner.
  • Observability: No new observability needed for this scalar aggregate behavior change.
  • Transaction/persistence/data writes: Not applicable.
  • FE-BE variable passing/protocol: No new transmitted variables or thrift/protocol changes.
  • Performance: The added nullable-column check is local to aggregate add and avoids wrapper overhead for these functions; no obvious performance regression found.

User focus: No additional user-provided review focus was present.

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31169 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 68e902d80bad105073ea4681bb069a7a192b3259, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17613	3957	3931	3931
q2	q3	10759	1367	802	802
q4	4692	475	348	348
q5	7593	2280	2117	2117
q6	250	178	137	137
q7	957	768	643	643
q8	9363	1791	1498	1498
q9	6027	5015	4916	4916
q10	6499	2229	1868	1868
q11	444	274	249	249
q12	689	428	296	296
q13	18257	3327	2800	2800
q14	269	258	240	240
q15	q16	831	782	712	712
q17	880	925	991	925
q18	6970	5670	5644	5644
q19	1242	1267	1095	1095
q20	539	406	261	261
q21	5660	2587	2374	2374
q22	436	362	313	313
Total cold run time: 99970 ms
Total hot run time: 31169 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4330	4261	4259	4259
q2	q3	4542	4966	4392	4392
q4	2105	2231	1426	1426
q5	4450	4333	4374	4333
q6	229	175	163	163
q7	2153	1846	1710	1710
q8	2515	2157	2117	2117
q9	8044	8041	8005	8005
q10	4791	4768	4369	4369
q11	570	597	389	389
q12	749	753	571	571
q13	3323	3574	3021	3021
q14	289	289	293	289
q15	q16	731	745	669	669
q17	1402	1355	1388	1355
q18	8003	7529	6876	6876
q19	1090	1064	1096	1064
q20	2232	2243	1959	1959
q21	5321	4646	4537	4537
q22	527	469	408	408
Total cold run time: 57396 ms
Total hot run time: 51912 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 171137 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 68e902d80bad105073ea4681bb069a7a192b3259, data reload: false

query5	4326	653	523	523
query6	335	225	211	211
query7	4216	540	316	316
query8	323	232	221	221
query9	8816	4128	4115	4115
query10	451	344	300	300
query11	5830	2567	2246	2246
query12	183	123	123	123
query13	1257	595	433	433
query14	6156	5500	5155	5155
query14_1	4504	4488	4472	4472
query15	212	204	186	186
query16	997	454	411	411
query17	1120	718	582	582
query18	2478	474	350	350
query19	221	212	164	164
query20	140	134	134	134
query21	214	140	122	122
query22	13691	13598	13453	13453
query23	17366	16639	16284	16284
query23_1	16293	16453	16307	16307
query24	7497	1792	1321	1321
query24_1	1353	1359	1339	1339
query25	572	497	454	454
query26	1335	342	177	177
query27	2692	552	353	353
query28	4447	2013	2034	2013
query29	1003	657	520	520
query30	307	249	200	200
query31	1141	1091	955	955
query32	86	79	74	74
query33	563	355	312	312
query34	1201	1143	663	663
query35	784	811	690	690
query36	1461	1442	1255	1255
query37	152	105	93	93
query38	3263	3218	3098	3098
query39	912	897	870	870
query39_1	882	858	863	858
query40	233	152	130	130
query41	71	68	67	67
query42	112	111	116	111
query43	331	342	293	293
query44	
query45	220	201	199	199
query46	1084	1177	754	754
query47	2369	2454	2300	2300
query48	421	402	306	306
query49	655	509	399	399
query50	984	375	261	261
query51	4369	4347	4311	4311
query52	111	108	96	96
query53	257	289	206	206
query54	334	308	270	270
query55	97	93	98	93
query56	358	335	327	327
query57	1457	1469	1367	1367
query58	329	284	283	283
query59	1723	1755	1466	1466
query60	362	325	318	318
query61	180	151	145	145
query62	707	646	584	584
query63	245	198	199	198
query64	2423	809	647	647
query65	
query66	1717	484	371	371
query67	29061	29693	28918	28918
query68	
query69	459	360	308	308
query70	1074	951	998	951
query71	307	269	267	267
query72	2996	2659	2030	2030
query73	839	756	438	438
query74	4868	4685	4505	4505
query75	2704	2618	2269	2269
query76	2273	1183	787	787
query77	406	412	321	321
query78	12406	12541	11831	11831
query79	1458	1045	757	757
query80	1300	539	463	463
query81	518	279	240	240
query82	1160	157	127	127
query83	319	282	245	245
query84	266	141	110	110
query85	923	534	469	469
query86	472	353	328	328
query87	3458	3386	3279	3279
query88	3648	2718	2724	2718
query89	457	385	350	350
query90	1897	178	191	178
query91	178	167	139	139
query92	80	77	73	73
query93	1512	1482	876	876
query94	736	330	299	299
query95	676	472	344	344
query96	988	785	363	363
query97	2759	2757	2644	2644
query98	238	229	228	228
query99	1184	1164	1044	1044
Total cold run time: 254488 ms
Total hot run time: 171137 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 0.00% (0/19) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.85% (20891/38797)
Line Coverage 37.43% (198010/528947)
Region Coverage 33.73% (155124/459922)
Branch Coverage 34.71% (67507/194480)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants