-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dak.unzip #136
Add dak.unzip #136
Conversation
Could a maintainer please run this :-) |
@douglasdavis all the reducers appear to be failing, my test passes. |
Ah, that's from the latest PR on reducers. Let me fix that. |
Codecov Report
@@ Coverage Diff @@
## main #136 +/- ##
==========================================
+ Coverage 95.72% 95.73% +0.01%
==========================================
Files 18 18
Lines 1754 1759 +5
==========================================
+ Hits 1679 1684 +5
Misses 75 75
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
7f4cbdd
to
03ff918
Compare
So I'm really surprised this new implementation works, but it does so long as I provide an empty typetracer to the actual |
Could someone kick this PR for tests? Thanks! |
@jpivarski @martindurant your thoughts on the present implementation would be useful. As noted above I really get the feeling it's delicate, despite being quite effective. |
Just trying to wrap my head around the implications of this. Some thoughts:
It seems to me that a cleaner description would be to_delayed(), which makes it explicit that the intermediate thing is not a dask-awkward collection, and that we don't expect to be able to optimize through it. OR, perhaps better, there should only be a single layer, which fuses unzip/getitem into a single function for each branch. This would return a tuple of dak objects as now. Yes, you would effectively repeat the ak.unzip call for evey branch, but that's a cheap metadata-only operation, so it's fine. |
As I understand it, the issue is that Thinking about it now, ak.unzip(some_ak_array) is entirely equivalent to this tuple(some_ak_array[field_name] for field_name in ak.fields(some_ak_array)) It must be the case that def dak.fields(dak_array):
return ak.fields(dak_array._meta) The getitem-field step, If it's done the above way, then a Wouldn't this be a better way to implement it? Looking at the Dask graph, it wouldn't be possible to determine that the source code contained |
@jpivarski , I think your description and mine are identical in effect, but I didn't explain it as well :) Indeed, simply not having |
Got it - thanks! Addressed this, please run tests :-) |
Does this fail if the input is not a record type? ak.unzip also supports fixed-length upmost level (becomes N elements of the tuple) and other (becomes one-element tuple). I think in this special case, we should add to the docstring, specifying that this function procudes a tuple of dak objects. |
I actually didn't know what it was going to do. What it does in Awkward looks like reasonable behavior: >>> ak.fields(ak.Array([1, 2, 3])) # this much I knew; it's intentional
[]
>>> ak.unzip(ak.Array([1, 2, 3])) # this is an interesting surprise, not unwelcome
(<Array [1, 2, 3] type='3 * int64'>,) In the case of no fields, there's no attempt to get
Sorry that I didn't catch that. (We were writing our comments at the same time, without seeing each others'.) |
fields = ak.fields(array._meta)
if len(fields) == 0:
return tuple([array])
else:
return tuple(array[field] for field in ak.fields(array._meta)) ? could also do an unzip on the meta of the input array. |
What you have looks right to me (although you can reuse the already-computed I don't think you want to run Although I guess you could run |
Yes, I think that. There should also be a test for the fixed-top-level ("tuple-like") case - @jpivarski , what is the right condition for that? It should give
where |
The >>> array = ak.Array([(1, 1.1, "one"), (2, 2.2, "two")])
>>> array
<Array [(1, 1.1, 'one'), (2, 2.2, ...)] type='2 * (int64, float64, string)'>
>>> ak.unzip(array)
(<Array [1, 2] type='2 * int64'>,
<Array [1.1, 2.2] type='2 * float64'>,
<Array ['one', 'two'] type='2 * string'>)
>>> ak.fields(array)
['0', '1', '2']
>>> array["0"]
<Array [1, 2] type='2 * int64'>
>>> array["1"]
<Array [1.1, 2.2] type='2 * float64'>
>>> array["2"]
<Array ['one', 'two'] type='2 * string'> (If you were to use |
Oh ok, thanks for clarifying. Might be worth a test just like your example code. |
please rerun tests (maybe talking to the bot works if you're a new contributor?) |
Fixes #121