Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark failures - bugs/missing operations #159

Closed
9 of 13 tasks
phofl opened this issue Jun 19, 2023 · 3 comments
Closed
9 of 13 tasks

Benchmark failures - bugs/missing operations #159

phofl opened this issue Jun 19, 2023 · 3 comments

Comments

@phofl
Copy link
Collaborator

phofl commented Jun 19, 2023

  • GroupBy.value_counts
  • iloc
  • select_dtypes
  • Repartition misses keyword-argument partition_size
  • GroupBy.apply
  • set_index
  • grouping by a Series object
  • grouping by an Index object
  • assign with callable
  • GroupBy.agg shuffle keyword
  • missing align step in add/sub, ...
  • missing keywords in read_csv
  • "p2p" shuffle

Not planning on fixing all of them immediately, just collecting failures and fixing what was on my agenda anyway or is a very small fix.

Reference build: https://github.com/coiled/benchmarks/actions/runs/5310108176/jobs/9611644148?pr=837

@phofl
Copy link
Collaborator Author

phofl commented Jul 26, 2023

@mrocklin thoughts on partition_size in repartition? This triggers an immediate computation in dask/dask, but I think we could reasonably defer this to compute time

@mrocklin
Copy link
Member

I think that anything we can defer we should defer.

I think that for this value in particular we'll likely have to compute it if anything asks us for divisions.

@phofl
Copy link
Collaborator Author

phofl commented Feb 20, 2024

closing, those all work

@phofl phofl closed this as completed Feb 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants