-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Struct dtype compat for NumPy 1.14 #2964
Conversation
pandas and numpy were getting reinstalled by conda after the force uninstall here. (cherry picked from commit 3cb77af)
dask/array/numpy_compat.py
Outdated
|
||
def _make_sliced_dtype_new(dtype, index): | ||
# For https://github.com/numpy/numpy/pull/6053 | ||
# TODO: handle either positional or named indexing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, this thankfully won't have to deal with positional indexing, since that's handled elsewhere in __getitem__
.
ping @ahaldane if you have a a spare moment to glance over this. If not no worries :) |
dask/array/numpy_compat.py
Outdated
'names': index, | ||
'formats': [dtype.fields[name][0] for name in index], | ||
'offsets': [dtype.fields[name][1] for name in index], | ||
'itemsize': dtype.itemsize, # is this true? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this is right. The new dtype should be the just like the old dtype except with some fields removed. The new dtype has to have the same itemsize since it is used to view the old data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this code, as well as your old code, ignores field titles.
I'm not sure if its worth the effort to support them, but I know some numpy users use them. We recently got a bug report about indexing involving titles which is now fixed: numpy/numpy#9625
You'd have to add an extra titles
key here. I probably wouldn't worry about it for the old code, since because of that bug titles didn't index properly before numpy 1.14.
Probably needs to be: 'titles': [None if len(dtype.fields[name]) < 3 else dtype.fields[name][2] for name in index],
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ahaldane with that, the np.dtype
constructor complains about ValueError: title already used as a name or title.
I'm not really sure if that should work or not, but I suspect the intent is for the dtype repr to be evalable into a dtype? Should I make an issue on NumPy?
In [50]: a = np.zeros(4, dtype=[(('a', 'b'), 'i'), ('c', 'i'), ('d', 'i')])
In [51]: a[['a']].dtype
Out[51]: dtype({'names':['a'], 'formats':['<i4'], 'offsets':[0], 'titles':['a'], 'itemsize':12})
In [52]: np.dtype({'names':['a'], 'formats':['<i4'], 'offsets':[0], 'titles':['a'], 'itemsize':12})
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-52-9f350bf6f22d> in <module>()
----> 1 np.dtype({'names':['a'], 'formats':['<i4'], 'offsets':[0], 'titles':['a'], 'itemsize':12})
ValueError: title already used as a name or title.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah if it's giving you trouble I would ignore titles for now. Some devs have thought we should deprecate them or caution against them, but enough users still want them that we support them.
As to your exception.. titles cannot be the same string as a field name; you can't have both a title and a name be 'a'. They need to be different, so your example should be
np.dtype({'names':['a'], 'formats':['<i4'], 'offsets':[0], 'titles':['b'], 'itemsize':12})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait I just reread you comment. I don't get the same as you for the line a[['a']].dtype
. In numpy 1.13, for both py3/2, that returns dtype([('a', '<i4')])
here, and it raises an exception in numpy master.
Not sure if you want to try debugging this since it might not be worth the effort, but if you could double check that line and tell me your numpy version, I'd like to investigate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just reading through the docs and getting very confused why my example didn't raise :) I was on the merge commit for numpy/numpy#6053 after doing a git bisect. Trying it out on master now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All good, Out[51]
(correctly) raises on NumPy master for me.
It seems fine to me. I'm not very familiar with NumPy dtype internals though. @shoyer or @jakirkham might know more |
ping @TomAugspurger @mrocklin Anything more before this is ready to go in? |
I was just about to merge it, and noticed the conflict. I'll rebase and merge on green. |
Seems fine to me. Though I haven't been following the NumPy changes to structured dtypes for 1.14. |
Thanks for doing this, @TomAugspurger. |
This is a WIP, I'm surely missing edgecases.