Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Struct dtype compat for NumPy 1.14 #2964

Merged
merged 10 commits into from Dec 8, 2017

Conversation

TomAugspurger
Copy link
Member

This is a WIP, I'm surely missing edgecases.

Jon M. Mease and others added 2 commits December 5, 2017 14:21
pandas and numpy were getting reinstalled by conda after the
force uninstall here.

(cherry picked from commit 3cb77af)

def _make_sliced_dtype_new(dtype, index):
# For https://github.com/numpy/numpy/pull/6053
# TODO: handle either positional or named indexing
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this thankfully won't have to deal with positional indexing, since that's handled elsewhere in __getitem__.

@TomAugspurger
Copy link
Member Author

ping @ahaldane if you have a a spare moment to glance over this. If not no worries :)

'names': index,
'formats': [dtype.fields[name][0] for name in index],
'offsets': [dtype.fields[name][1] for name in index],
'itemsize': dtype.itemsize, # is this true?
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is right. The new dtype should be the just like the old dtype except with some fields removed. The new dtype has to have the same itemsize since it is used to view the old data.

Copy link

@ahaldane ahaldane Dec 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this code, as well as your old code, ignores field titles.

I'm not sure if its worth the effort to support them, but I know some numpy users use them. We recently got a bug report about indexing involving titles which is now fixed: numpy/numpy#9625

You'd have to add an extra titles key here. I probably wouldn't worry about it for the old code, since because of that bug titles didn't index properly before numpy 1.14.

Probably needs to be: 'titles': [None if len(dtype.fields[name]) < 3 else dtype.fields[name][2] for name in index],

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahaldane with that, the np.dtype constructor complains about ValueError: title already used as a name or title.

I'm not really sure if that should work or not, but I suspect the intent is for the dtype repr to be evalable into a dtype? Should I make an issue on NumPy?

In [50]: a = np.zeros(4, dtype=[(('a', 'b'), 'i'), ('c', 'i'), ('d', 'i')])

In [51]: a[['a']].dtype
Out[51]: dtype({'names':['a'], 'formats':['<i4'], 'offsets':[0], 'titles':['a'], 'itemsize':12})

In [52]: np.dtype({'names':['a'], 'formats':['<i4'], 'offsets':[0], 'titles':['a'], 'itemsize':12})
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-52-9f350bf6f22d> in <module>()
----> 1 np.dtype({'names':['a'], 'formats':['<i4'], 'offsets':[0], 'titles':['a'], 'itemsize':12})

ValueError: title already used as a name or title.

Copy link

@ahaldane ahaldane Dec 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah if it's giving you trouble I would ignore titles for now. Some devs have thought we should deprecate them or caution against them, but enough users still want them that we support them.

As to your exception.. titles cannot be the same string as a field name; you can't have both a title and a name be 'a'. They need to be different, so your example should be

np.dtype({'names':['a'], 'formats':['<i4'], 'offsets':[0], 'titles':['b'], 'itemsize':12})

Copy link

@ahaldane ahaldane Dec 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait I just reread you comment. I don't get the same as you for the line a[['a']].dtype. In numpy 1.13, for both py3/2, that returns dtype([('a', '<i4')]) here, and it raises an exception in numpy master.

Not sure if you want to try debugging this since it might not be worth the effort, but if you could double check that line and tell me your numpy version, I'd like to investigate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just reading through the docs and getting very confused why my example didn't raise :) I was on the merge commit for numpy/numpy#6053 after doing a git bisect. Trying it out on master now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good, Out[51] (correctly) raises on NumPy master for me.

@TomAugspurger
Copy link
Member Author

I think we'll punt on titles for now since this is blocking #2960.

ping @mrocklin if you have a chance to review today.

@mrocklin
Copy link
Member

mrocklin commented Dec 6, 2017

It seems fine to me. I'm not very familiar with NumPy dtype internals though. @shoyer or @jakirkham might know more

@jonmmease
Copy link
Contributor

ping @TomAugspurger @mrocklin Anything more before this is ready to go in?

@TomAugspurger
Copy link
Member Author

I was just about to merge it, and noticed the conflict. I'll rebase and merge on green.

@jakirkham
Copy link
Member

Seems fine to me. Though I haven't been following the NumPy changes to structured dtypes for 1.14.

@TomAugspurger TomAugspurger merged commit 574188e into dask:master Dec 8, 2017
@TomAugspurger TomAugspurger deleted the struct-dtype-compat branch December 8, 2017 17:48
@jakirkham
Copy link
Member

Thanks for doing this, @TomAugspurger.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants