Skip to content

fix: suppress nondeterministic metrics in agg_dyn_e2e sqllogictest#21657

Merged
mbutrovich merged 1 commit intoapache:mainfrom
mbutrovich:fix_extended_tests
Apr 15, 2026
Merged

fix: suppress nondeterministic metrics in agg_dyn_e2e sqllogictest#21657
mbutrovich merged 1 commit intoapache:mainfrom
mbutrovich:fix_extended_tests

Conversation

@mbutrovich
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

  • Closes #.

Rationale for this change

PR #21620 (commit 5c653be) ported test_aggregate_dynamic_filter_parquet_e2e from Rust to sqllogictest using analyze_categories = 'rows', which includes exact pushdown metrics. These metrics are nondeterministic under parallel execution — the order in which Partial aggregates publish dynamic filter updates races against when the scan reads each partition — so the expected output is flaky.

Noticed on #21629 (CI run) and confirmed on main (CI run).

What changes are included in this PR?

Switch the agg_dyn_e2e test to analyze_level = summary + analyze_categories = 'none', suppressing nondeterministic metrics. This matches the approach already used by the other aggregate dynamic filter tests in the same file. The original Rust test only asserted matched < 4 (i.e. some pruning happened); the important invariant — that the DynamicFilter [ column1@0 > 4 ] text and pruning predicate are correct — is still verified.

Are these changes tested?

Yes — the test itself is what is being fixed.

Are there any user-facing changes?

No.

@mbutrovich mbutrovich requested a review from adriangb April 15, 2026 21:26
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Apr 15, 2026
Copy link
Copy Markdown
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mbutrovich

02)--CoalescePartitionsExec, metrics=[output_rows=2, output_batches=2]
03)----AggregateExec: mode=Partial, gby=[], aggr=[max(agg_dyn_e2e.column1)], metrics=[output_rows=2, output_batches=2]
04)------DataSourceExec: file_groups={2 groups: [[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/push_down_filter_regression/agg_dyn/file_0.parquet, WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/push_down_filter_regression/agg_dyn/file_1.parquet], [WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/push_down_filter_regression/agg_dyn/file_2.parquet, WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/push_down_filter_regression/agg_dyn/file_3.parquet]]}, projection=[column1], file_type=parquet, predicate=column1@0 > 1 AND DynamicFilter [ column1@0 > 4 ], pruning_predicate=column1_null_count@1 != row_count@2 AND column1_max@0 > 1 AND column1_null_count@1 != row_count@2 AND column1_max@0 > 4, required_guarantees=[], metrics=[output_rows=2, output_batches=2, files_ranges_pruned_statistics=4 total → 4 matched, row_groups_pruned_statistics=4 total → 2 matched -> 2 fully matched, row_groups_pruned_bloom_filter=2 total → 2 matched, page_index_pages_pruned=2 total → 2 matched, page_index_rows_pruned=2 total → 2 matched, limit_pruned_row_groups=0 total → 0 matched, batches_split=0, file_open_errors=0, file_scan_errors=0, files_opened=4, files_processed=4, num_predicate_creation_errors=0, predicate_evaluation_errors=0, pushdown_rows_matched=2, pushdown_rows_pruned=0, predicate_cache_inner_records=2, predicate_cache_records=4, scan_efficiency_ratio=25.15% (130/517)]
01)AggregateExec: mode=Final, gby=[], aggr=[max(agg_dyn_e2e.column1)], metrics=[]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metrics missing, is it expected?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's the root problem. We can't run this slt test with set datafusion.explain.analyze_categories = 'rows'; because the metrics are non-deterministic in the scan. We have to use set datafusion.explain.analyze_level = summary;.

@mbutrovich mbutrovich added this pull request to the merge queue Apr 15, 2026
Merged via the queue into apache:main with commit 240fbdb Apr 15, 2026
31 checks passed
@mbutrovich mbutrovich deleted the fix_extended_tests branch April 15, 2026 21:45
@adriangb
Copy link
Copy Markdown
Contributor

Thank you @mbutrovich sorry for breaking CI I noticed right before I had a dentist apt 🙏🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants