[Fix](variance) Fix sample variance/stddev null res for single value#63605
[Fix](variance) Fix sample variance/stddev null res for single value#63605linrrzqqq wants to merge 2 commits into
Conversation
|
/review |
|
run buildall |
There was a problem hiding this comment.
Review result: no blocking issues found. Critical checkpoints: goal is clear and implemented: sample variance/stddev now return NULL for zero or one non-null input, including grouped results; tests cover existing nullable aggregate suites plus a single-row aggregate case. Scope is small and focused on the two sample aggregate functions plus expected output changes. Concurrency/lifecycle: no new shared state, threads, locks, or lifecycle-sensitive objects. Configuration/compatibility: no configs added; BE function compatibility restriction remains in place and aggregate state serialization is unchanged. Parallel paths: variance_samp, var_samp alias, and stddev_samp paths are covered; population variance/stddev behavior is intentionally unchanged. Error handling/memory: no ignored Status paths or new owned allocations. Data correctness: nullable inputs are skipped and result nullability now matches SQL sample aggregate semantics for count <= 1. Tests: regression expectations were updated; I did not run the regression suite in this review runner. User focus: no additional user-provided review focus.
TPC-H: Total hot run time: 31648 ms |
TPC-DS: Total hot run time: 172613 ms |
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
|
/review |
There was a problem hiding this comment.
Review opinion: no blocking issues found in the PR diff.
Critical checkpoint conclusions:
- Goal/test proof: The PR changes
VAR_SAMP/VARIANCE_SAMP/STDDEV_SAMPto return NULL for zero or one valid input and updates aggregate/window/foreach regression outputs covering the changed behavior. - Scope: The actual GitHub PR diff is focused on sample variance/stddev return nullability plus expected results.
- Concurrency/lifecycle: No new concurrency, locking, thread lifecycle, or static initialization concerns were introduced in the reviewed diff.
- Configuration/compatibility: No new configuration items. BE already registers
variance_samp/stddev_sampas restricted function compatibility paths; aggregate state serialization is unchanged. - Parallel paths: BE execution, Nereids FE nullability, aliases through existing registration, window outputs, and foreach outputs were considered. I did not find a missed parallel path in the PR diff.
- Conditional checks: The
count <= 1result-null condition matches the sample variance/stddev definition and null inputs continue to be ignored. - Tests/results: Regression expected outputs were updated for normal aggregate, nullable aggregate, window, MV, and foreach cases. I did not run the tests locally in this review runner.
- Observability: No new observability needed for this scalar aggregate behavior change.
- Transaction/persistence/data writes: Not applicable.
- FE-BE variable passing/protocol: No new transmitted variables or thrift/protocol changes.
- Performance: The added nullable-column check is local to aggregate add and avoids wrapper overhead for these functions; no obvious performance regression found.
User focus: No additional user-provided review focus was present.
TPC-H: Total hot run time: 31169 ms |
TPC-DS: Total hot run time: 171137 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
Problem Summary:
Fix
VAR_SAMP,VARIANCE_SAMP, andSTDDEV_SAMPto returnNULLwhen the number of valid input values is less than or equal to 1. Sample variance/stddev are undefined forn <= 1, so returning0.0is misleading.before:
now:
doc: apache/doris-website#3765