Skip to content

[MINOR]: Extract aggregate topk function to aggregate_topk.slt#8948

Merged
alamb merged 4 commits intoapache:mainfrom
synnada-ai:minor/aggregate_test
Jan 23, 2024
Merged

[MINOR]: Extract aggregate topk function to aggregate_topk.slt#8948
alamb merged 4 commits intoapache:mainfrom
synnada-ai:minor/aggregate_test

Conversation

@mustafasrepo
Copy link
Copy Markdown
Contributor

@mustafasrepo mustafasrepo commented Jan 22, 2024

Update: this PR just moves a intermittently failing test into its own file. It still fails on different platforms

Which issue does this PR close?

Closes #.

Rationale for this change

Sometimes in my locale, the tests in this PR fails. However, in CI it seems that there is no problem. However, with current configuration, there are more than 1 possible valid result for these queries.

With this PR I changed these tests such that there is only one possible answer.

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions Bot added the sqllogictest SQL Logic Tests (.slt) label Jan 22, 2024
Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THANK YOU -- I hit the same thing as well

@alamb alamb changed the title [MINOR]: (Test change) Add new orders to make result unique [MINOR]: (Test change) Add new orders to make result unique in aggregate.slt Jan 22, 2024
@alamb
Copy link
Copy Markdown
Contributor

alamb commented Jan 22, 2024

cc @avantgardnerio

@avantgardnerio
Copy link
Copy Markdown
Contributor

@avantgardnerio
Copy link
Copy Markdown
Contributor

That being said, I think this code could / should be expanded to include multiple order bys.

@avantgardnerio
Copy link
Copy Markdown
Contributor

IIRC, it conceptually supports anything that implements Ord, so we can implement support by adding support for the OwnedRow type.

@mustafasrepo
Copy link
Copy Markdown
Contributor Author

I think this skips the rule it is intended to test if there are multiple order bys:

https://github.com/apache/arrow-datafusion/blob/795c71f3e8249847593a58dafe578ffcbcd3e012/datafusion/core/src/physical_optimizer/topk_aggregation.rs#L90

What if we change top order by to ORDER BY trace_id. This would also produce single correct result.
What I mean is that changing test from

select trace_id, other, MIN(timestamp) from traces group by trace_id, other order by MIN(timestamp) asc limit 4;

to

select trace_id, other, MIN(timestamp) from traces group by trace_id, other order by trace_id asc limit 4;

instead of

select trace_id, other, MIN(timestamp) from traces group by trace_id, other order by MIN(timestamp) asc, trace_id DESC limit 4;

Will it test desired property?

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Jan 23, 2024

Will it test desired property?

I don't think so -- this test is pretty specific to ordering by MIN(..) as it invokes a special code path. However, given this is not reflected in the tests I think that is a gap.

Here is my suggestion:

  1. Move these tests into a new .slt file (aggregtes_topk.slt maybe?) in one PR
  2. Figure out how to improve the tests to be both determisitic as a second PR (I think we will have to adjust the data and/or the N in the limit), also adding an explain test to ensure the desired property is covered

My rationale for moving to a new file is that then this failure won't mask other failures in the (very large) aggregates.slt file and it will help to make that file more manageable

@mustafasrepo
Copy link
Copy Markdown
Contributor Author

Here is my suggestion:

  1. Move these tests into a new .slt file (aggregtes_topk.slt maybe?) in one PR
  2. Figure out how to improve the tests to be both determisitic as a second PR (I think we will have to adjust the data and/or the N in the limit), also adding an explain test to ensure the desired property is covered

My rationale for moving to a new file is that then this failure won't mask other failures in the (very large) aggregates.slt file and it will help to make that file more manageable

I moved the topk aggregate tests to their own file as your suggestion in 1 (Please note that I retracted the changes in the tests, this PR is just a restructure PR now.).

@alamb alamb changed the title [MINOR]: (Test change) Add new orders to make result unique in aggregate.slt [MINOR]: Extract aggregate topk function to aggregate_topk.slt Jan 23, 2024
Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @mustafasrepo

I'll merge this PR and we can keep working to make the test deterministic as a follow on PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants