[VL] Flushable distinct agg caused correctness issue #4421

Yohahaha · 2024-01-16T09:56:42Z

Backend

VL (Velox)

Bug description

TPC-DS q38 has distinct agg, when partial distinct is flushed, final result is incorrect.

flushed	unflushed

expect: [26056]
actual: [26131]

Spark version

None

Spark configurations

No response

System information

No response

Relevant logs

No response

rui-mo · 2024-01-16T12:50:09Z

It seems the aggregation before partial_count should not be flushable. Can we fully rely on whether the data being directly sent to exchange to decide whether flushable aggregation can be generated? cc @zhztheplayer

Yohahaha · 2024-01-16T14:06:51Z

When agg function is empty, we may not use flushable agg which may produce duplicate rows.

zhztheplayer · 2024-01-17T02:08:12Z

Thank you for reporting @Yohahaha . I think this is very likely similar to the possible corner case I mentioned in #4312 (comment)

if partial aggregation is already handing data that is partitioned by the ~~exchange~~ (aggregate) keys

We'd enhance FlushableAggregateRule to handle this case.

Probably our CI tests with result comparisons are not handling enough data to trigger flush. I'll take a look into that as well.

zhztheplayer · 2024-01-17T02:22:40Z

When agg function is empty, we may not use flushable agg which may produce duplicate rows.

Flushing on distinct aggregation can be useful, e.g., do flush on scan-side then do final distinct on reducers. The problem here is Spark has specialized logic on count(distinct), however if we think it generally it's about the aggregate key distribution. If there is an exchange on the exit side of aggregation, and the exchange would be enforced to redistribute the data populated in the same aggregate group but on different mappers, to the same reducer, the partial aggregation can be considered safe to flush.

zhztheplayer · 2024-01-18T07:12:25Z

Update: The issue should impact Spark version >= 3.3

Yohahaha added bug Something isn't working triage labels Jan 16, 2024

zhztheplayer mentioned this issue Jan 18, 2024

[VL] Spill related issues #3030

Open

14 tasks

zhztheplayer mentioned this issue Jan 18, 2024

[GLUTEN-4421][VL] Disable flushable aggregate when input is already partitioned by grouping keys #4443

Merged

zhztheplayer closed this as completed in #4443 Jan 19, 2024

Yohahaha mentioned this issue Jul 31, 2024

[VL] Result mismatch found in FlushableAgg #6630

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[VL] Flushable distinct agg caused correctness issue #4421

[VL] Flushable distinct agg caused correctness issue #4421

Yohahaha commented Jan 16, 2024

rui-mo commented Jan 16, 2024

Yohahaha commented Jan 16, 2024

zhztheplayer commented Jan 17, 2024 •

edited

Loading

zhztheplayer commented Jan 17, 2024 •

edited

Loading

zhztheplayer commented Jan 18, 2024

[VL] Flushable distinct agg caused correctness issue #4421

[VL] Flushable distinct agg caused correctness issue #4421

Comments

Yohahaha commented Jan 16, 2024

Backend

Bug description

Spark version

Spark configurations

System information

Relevant logs

rui-mo commented Jan 16, 2024

Yohahaha commented Jan 16, 2024

zhztheplayer commented Jan 17, 2024 • edited Loading

zhztheplayer commented Jan 17, 2024 • edited Loading

zhztheplayer commented Jan 18, 2024

zhztheplayer commented Jan 17, 2024 •

edited

Loading

zhztheplayer commented Jan 17, 2024 •

edited

Loading