Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Clickbench results with DataFusion 28.0.0 #7108

Closed
alamb opened this issue Jul 27, 2023 · 6 comments
Closed

Update Clickbench results with DataFusion 28.0.0 #7108

alamb opened this issue Jul 27, 2023 · 6 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@alamb
Copy link
Contributor

alamb commented Jul 27, 2023

Is your feature request related to a problem or challenge?

With the release of DataFusion 28, it would be great to update the DataFusion performance reported on https://benchmark.clickhouse.com/

Notably, #6904 and related PRs for faster grouping performance show 2-3x performance improvements

Describe the solution you'd like

Update the results. I am not 100% sure how to do so, but I think the discussions on #5276 contain the answer

The datafusion scripts are in https://github.com/ClickHouse/ClickBench/tree/main/datafusion

maybe @jychen7 knows / can provide some hints

Describe alternatives you've considered

No response

Additional context

No response

@alamb alamb added enhancement New feature or request help wanted Extra attention is needed labels Jul 27, 2023
@jychen7
Copy link
Contributor

jychen7 commented Jul 29, 2023

the readme of https://github.com/ClickHouse/ClickBench/blob/main/datafusion/README.md was updated at Apr this year and is likely still updated.

@jychen7
Copy link
Contributor

jychen7 commented Jul 29, 2023

I could set it up in weekend for v28

@jychen7
Copy link
Contributor

jychen7 commented Jul 29, 2023

the result is nice! Especially previous timeout query 32 now finish in 8s
https://github.com/ClickHouse/ClickBench/pull/127/files

@jychen7
Copy link
Contributor

jychen7 commented Jul 29, 2023

Also shall we create an issue to add DataFusion (Parquet, partitioned) to Clickbench? Current one is single parquet

@alamb
Copy link
Contributor Author

alamb commented Jul 30, 2023

Also shall we create an issue to add DataFusion (Parquet, partitioned) to Clickbench? Current one is single parquet

Yes please do @jychen7 -- that would be amazing

Note there is at least one query that will fail with the partitioned dataset in datafusion 28.0.0 due to #7039. @jonahgao fixed the problem on main but I don't think the fix is in the 28.0.0 release

@alamb
Copy link
Contributor Author

alamb commented Oct 17, 2023

We just released datafusion 33, so I think this ticket is somewhat stale. If someone wants to update ClickBench again with the latest release, that would be great -- perhaps we can open a new ticket

@alamb alamb closed this as completed Oct 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants