Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more ak verbs #146

Merged
merged 14 commits into from
Jan 23, 2023
Merged

Add more ak verbs #146

merged 14 commits into from
Jan 23, 2023

Conversation

lgray
Copy link
Collaborator

@lgray lgray commented Jan 14, 2023

Fixes #124
Fixes #128

add:

  • argcartesian
  • argcombinations
  • isclose
  • singletons
  • argsort (for axis > 0, from Add dak.argsort for axis != 0 #130)
  • sort (for axis > 0)
  • appropriate error for copy
  • full_like (needs issue on awkward opened for meta calculation)
  • unflatten (currently simplistic implementation, needs work to be general since it requires repartitioning, but very important/necessary)
  • to_packed
  • ravel (with warnings about recordarrays)
  • run_lengths (with warning about axis=0)
  • from_regular
  • to_regular
  • broadcast_arrays (dak.broadcast_arrays #128)

remove:

  • concatenate impl in structure.py

@codecov-commenter
Copy link

codecov-commenter commented Jan 14, 2023

Codecov Report

Merging #146 (ce84efe) into main (daacc03) will decrease coverage by 0.11%.
The diff coverage is 93.75%.

@@            Coverage Diff             @@
##             main     #146      +/-   ##
==========================================
- Coverage   95.55%   95.44%   -0.11%     
==========================================
  Files          19       19              
  Lines        1798     1889      +91     
==========================================
+ Hits         1718     1803      +85     
- Misses         80       86       +6     
Impacted Files Coverage Δ
src/dask_awkward/lib/structure.py 95.31% <93.75%> (-0.76%) ⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Collaborator

@douglasdavis douglasdavis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good to me. Something I noticed is a warning coming from the ak.full_like test:

/Users/ddavis/.pyenv/versions/3.11.1/envs/dev/lib/python3.11/site-packages/awkward/operations/ak_full_like.py:160: DeprecationWarning: elementwise comparison failed; this will raise an error in the future.
    if fill_value == 0 or fill_value is _ZEROS:

Getting triggered in upstream awkward, but for some reason I'm not able to reproduce it outside of the tests. Do you have a handle on what's going on here?

@borrow_docstring(ak.copy)
def copy(array):
raise DaskAwkwardNotImplemented("TODO")
raise DaskAwkwardNotImplemented(
"This function is not necessary in the context of dask-awkward."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this, a much better message!

@lgray
Copy link
Collaborator Author

lgray commented Jan 17, 2023

@douglasdavis Yeah - I ran into the same weird irreproducibility problem, and I haven't had time to dig into to it really. Just implementing so far. I can take a look by the end of the week.

For dak.unflatten I think we can get away with even this simplistic implementation so long as we can put some better checks on array compatibility. For instance, if we know that counts came from an earlier dak.num (this is actually a common pattern: flatten -> do stuff -> unflatten), which will have counts appropriate to the partitioning of the input array, we can easily allow that array through. I'm not sure if we can do that lazily with what we have here and now, but perhaps with a little analysis of the task graph we could verify things like that?

Otherwise the complete implementation requires the ability to repartition and shuffle dask-awkward arrays, which I think right now is step N+1 (much much lower priority than optimized reading and faster graph optimization).

I'd like to add yet a few more verbs so we can chew on this. I'd like to check with @nsmith- to make sure we've got all the mostly common ones. I think with the PR we have so far we're pretty good but would like to get regular-use coverage mostly done.

@lgray
Copy link
Collaborator Author

lgray commented Jan 18, 2023

@agoose77 just FYI there are a number of TODO's in this PR that come from needing to .length_zero_array() a form to create an output form and then a typetracer from that. Instead of the operation on the typetracer completing successfully. I'm not sure if it's worth opening an awkward ticket for each one since it's pretty common.

@lgray
Copy link
Collaborator Author

lgray commented Jan 20, 2023

@masonproffitt could you give this branch a try?

@lgray
Copy link
Collaborator Author

lgray commented Jan 20, 2023

@douglasdavis pending a bit of user testing this is good to go now.

Copy link
Collaborator

@douglasdavis douglasdavis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added one inline question, but other than that if you have nothing else to add I think this is good to go!

src/dask_awkward/lib/structure.py Show resolved Hide resolved
@douglasdavis
Copy link
Collaborator

@lgray just want to confirm you have nothing else you'd like to add to this PR and I'll go ahead and merge!

@lgray
Copy link
Collaborator Author

lgray commented Jan 23, 2023

Nope, not for now! More will come later but I think this is enough from typical use patterns.

@masonproffitt
Copy link
Contributor

@masonproffitt could you give this branch a try?

I tested this, and the only new problem that came up was #151

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dak.broadcast_arrays dak.argsort
4 participants