Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark results #163

Open
phofl opened this issue Jun 20, 2023 · 5 comments
Open

Benchmark results #163

phofl opened this issue Jun 20, 2023 · 5 comments

Comments

@phofl
Copy link
Collaborator

phofl commented Jun 20, 2023

I ran the benchmarks yesterday. Many of them are still failing because of #159

Link to the results: https://github.com/coiled/benchmarks/actions/runs/5313713615

@mrocklin
Copy link
Member

I'm not sure exactly how to interpret these results, but in general it seems like modest improvements at best, is that correct?

@phofl
Copy link
Collaborator Author

phofl commented Jun 20, 2023

Yep that's my interpretation as well, but since this surprised me (kind of), so I looked a bit deeper:

  • test_csv: calls persist before grouping and selecting single columns -> won't see anything here
  • test_filter -> odd, have to investigate
  • test_q*: already restrict the DataFrame to the columns that are needed, only advantage we can expect is the push up to read_parquet
  • test_join: Probably related to P2P not implemented yet?
  • setitem not pushing optimizations through yet looks like a bottleneck in some benchmarks.

So

  1. I think the benchmarks don't capture the things we already have very well
  2. We should probably add more _simplify_up steps in Blockwise ops

@rjzamora
Copy link
Member

Probably related to P2P not implemented yet?

Just a note that I'll probably push on this today.

@mrocklin
Copy link
Member

Woo!

@mrocklin
Copy link
Member

mrocklin commented Jul 4, 2023

It would be interesting to see a branch of the benchmarks that were simpler / dumber. This might include things like the following:

  1. Not specifying columns at the outset
  2. Not persisting data in memory

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants