Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimise linear layer chains #210

Merged
merged 14 commits into from
May 3, 2023
Merged

Conversation

martindurant
Copy link
Collaborator

No description provided.

@martindurant
Copy link
Collaborator Author

I suspect this fails when reducing in any layer (i.e,. number of indices decreases), but for the purposes of dask-awkward we might be fine, where there is always one index of iteration: the partition.

@codecov-commenter
Copy link

codecov-commenter commented Apr 4, 2023

Codecov Report

Merging #210 (c546cda) into main (4db98e1) will increase coverage by 0.14%.
The diff coverage is 98.75%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff             @@
##             main     #210      +/-   ##
==========================================
+ Coverage   94.74%   94.88%   +0.14%     
==========================================
  Files          19       19              
  Lines        2111     2190      +79     
==========================================
+ Hits         2000     2078      +78     
- Misses        111      112       +1     
Impacted Files Coverage Δ
src/dask_awkward/lib/optimize.py 96.55% <98.75%> (+1.12%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

(because will never meet one-dep requirement)
@douglasdavis
Copy link
Collaborator

related: dask/dask#9795

@martindurant
Copy link
Collaborator Author

@rjzamora , you might be interested to look at this. It implements a linear-time "fuse" operation on chains of Blockwise layers which is much faster that standard dask's layer fuse for very large numbers of layers. As it stands, the algorithm passes the specific two tests written for it, but causes many other dask-awkward tests to fail. I believe that there are two most likely causes:

  • we should be excluding reductive layers from consideration, but aren't. It would be even better if we could cope with these, but the number of indices changes and that would need special attention.
  • something is not quite right with the keys included in the subgraph versus the keys mentioned in the various attributes of a fused layer.

Any thoughts you might have on this would be appreciated!

@martindurant martindurant marked this pull request as ready for review May 3, 2023 16:58
@martindurant
Copy link
Collaborator Author

Good to go?

@lgray
Copy link
Collaborator

lgray commented May 3, 2023

As far as I can tell from torture tests, yes!

@martindurant martindurant merged commit 0b74497 into dask-contrib:main May 3, 2023
@martindurant martindurant deleted the chains branch May 3, 2023 18:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants