Struct dtype compat for NumPy 1.14 #2964

TomAugspurger · 2017-12-05T21:09:42Z

This is a WIP, I'm surely missing edgecases.

pandas and numpy were getting reinstalled by conda after the force uninstall here. (cherry picked from commit 3cb77af)

TomAugspurger · 2017-12-05T21:26:13Z

dask/array/numpy_compat.py

+
+def _make_sliced_dtype_new(dtype, index):
+    # For https://github.com/numpy/numpy/pull/6053
+    # TODO: handle either positional or named indexing


Oh, this thankfully won't have to deal with positional indexing, since that's handled elsewhere in __getitem__.

TomAugspurger · 2017-12-05T21:28:48Z

ping @ahaldane if you have a a spare moment to glance over this. If not no worries :)

ahaldane · 2017-12-05T21:43:49Z

dask/array/numpy_compat.py

+        'names': index,
+        'formats': [dtype.fields[name][0] for name in index],
+        'offsets': [dtype.fields[name][1] for name in index],
+        'itemsize': dtype.itemsize,  # is this true?


Yes this is right. The new dtype should be the just like the old dtype except with some fields removed. The new dtype has to have the same itemsize since it is used to view the old data.

Note that this code, as well as your old code, ignores field titles.

I'm not sure if its worth the effort to support them, but I know some numpy users use them. We recently got a bug report about indexing involving titles which is now fixed: numpy/numpy#9625

You'd have to add an extra titles key here. I probably wouldn't worry about it for the old code, since because of that bug titles didn't index properly before numpy 1.14.

Probably needs to be: 'titles': [None if len(dtype.fields[name]) < 3 else dtype.fields[name][2] for name in index],

@ahaldane with that, the np.dtype constructor complains about ValueError: title already used as a name or title.

I'm not really sure if that should work or not, but I suspect the intent is for the dtype repr to be evalable into a dtype? Should I make an issue on NumPy?

In [50]: a = np.zeros(4, dtype=[(('a', 'b'), 'i'), ('c', 'i'), ('d', 'i')]) In [51]: a[['a']].dtype Out[51]: dtype({'names':['a'], 'formats':['<i4'], 'offsets':[0], 'titles':['a'], 'itemsize':12}) In [52]: np.dtype({'names':['a'], 'formats':['<i4'], 'offsets':[0], 'titles':['a'], 'itemsize':12}) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-52-9f350bf6f22d> in <module>() ----> 1 np.dtype({'names':['a'], 'formats':['<i4'], 'offsets':[0], 'titles':['a'], 'itemsize':12}) ValueError: title already used as a name or title.

Yeah if it's giving you trouble I would ignore titles for now. Some devs have thought we should deprecate them or caution against them, but enough users still want them that we support them.

As to your exception.. titles cannot be the same string as a field name; you can't have both a title and a name be 'a'. They need to be different, so your example should be

np.dtype({'names':['a'], 'formats':['<i4'], 'offsets':[0], 'titles':['b'], 'itemsize':12})

Wait I just reread you comment. I don't get the same as you for the line a[['a']].dtype. In numpy 1.13, for both py3/2, that returns dtype([('a', '<i4')]) here, and it raises an exception in numpy master.

Not sure if you want to try debugging this since it might not be worth the effort, but if you could double check that line and tell me your numpy version, I'd like to investigate.

I was just reading through the docs and getting very confused why my example didn't raise :) I was on the merge commit for numpy/numpy#6053 after doing a git bisect. Trying it out on master now.

All good, Out[51] (correctly) raises on NumPy master for me.

TomAugspurger · 2017-12-06T15:42:16Z

I think we'll punt on titles for now since this is blocking #2960.

ping @mrocklin if you have a chance to review today.

mrocklin · 2017-12-06T17:31:08Z

It seems fine to me. I'm not very familiar with NumPy dtype internals though. @shoyer or @jakirkham might know more

jonmmease · 2017-12-08T15:41:11Z

ping @TomAugspurger @mrocklin Anything more before this is ready to go in?

TomAugspurger · 2017-12-08T15:42:19Z

I was just about to merge it, and noticed the conflict. I'll rebase and merge on green.

jakirkham · 2017-12-08T15:51:24Z

Seems fine to me. Though I haven't been following the NumPy changes to structured dtypes for 1.14.

jakirkham · 2017-12-08T18:00:20Z

Thanks for doing this, @TomAugspurger.

Jon M. Mease and others added 2 commits December 5, 2017 14:21

Move pandas/dumpy upstream install to be the last pkg install operation

da6c07f

pandas and numpy were getting reinstalled by conda after the force uninstall here. (cherry picked from commit 3cb77af)

COMPAT: Fix dtype creation for newer NumPy

6ca5979

TomAugspurger commented Dec 5, 2017

View reviewed changes

Remove incorrect comment

d727d31

ahaldane reviewed Dec 5, 2017

View reviewed changes

jonmmease mentioned this pull request Dec 6, 2017

ENH: Support merging Dask DataFrames on a combination of columns and index levels (GH2950) #2960

Merged

6 tasks

TomAugspurger added 5 commits December 6, 2017 09:35

Remove stale comment

ea9f050

Merge remote-tracking branch 'upstream/master' into struct-dtype-compat

f205e2a

Release note

d49ecd0

More dtypes

02aff96

Better names

e7256de

Merge remote-tracking branch 'upstream/master' into struct-dtype-compat

408d4ba

Conflicts

c633f2b

TomAugspurger merged commit 574188e into dask:master Dec 8, 2017

TomAugspurger deleted the struct-dtype-compat branch December 8, 2017 17:48

hmaarrfk mentioned this pull request Jan 5, 2019

Numpy 1.16.0 test failure, dtype construction seems wrong #4352

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Struct dtype compat for NumPy 1.14 #2964

Struct dtype compat for NumPy 1.14 #2964

TomAugspurger commented Dec 5, 2017

TomAugspurger Dec 5, 2017

TomAugspurger commented Dec 5, 2017

ahaldane Dec 5, 2017

ahaldane Dec 5, 2017 •

edited

Loading

TomAugspurger Dec 6, 2017

ahaldane Dec 6, 2017 •

edited

Loading

ahaldane Dec 6, 2017 •

edited

Loading

TomAugspurger Dec 6, 2017

TomAugspurger Dec 6, 2017

TomAugspurger commented Dec 6, 2017

mrocklin commented Dec 6, 2017

jonmmease commented Dec 8, 2017

TomAugspurger commented Dec 8, 2017

jakirkham commented Dec 8, 2017

jakirkham commented Dec 8, 2017

Struct dtype compat for NumPy 1.14 #2964

Struct dtype compat for NumPy 1.14 #2964

Conversation

TomAugspurger commented Dec 5, 2017

TomAugspurger Dec 5, 2017

Choose a reason for hiding this comment

TomAugspurger commented Dec 5, 2017

ahaldane Dec 5, 2017

Choose a reason for hiding this comment

ahaldane Dec 5, 2017 • edited Loading

Choose a reason for hiding this comment

TomAugspurger Dec 6, 2017

Choose a reason for hiding this comment

ahaldane Dec 6, 2017 • edited Loading

Choose a reason for hiding this comment

ahaldane Dec 6, 2017 • edited Loading

Choose a reason for hiding this comment

TomAugspurger Dec 6, 2017

Choose a reason for hiding this comment

TomAugspurger Dec 6, 2017

Choose a reason for hiding this comment

TomAugspurger commented Dec 6, 2017

mrocklin commented Dec 6, 2017

jonmmease commented Dec 8, 2017

TomAugspurger commented Dec 8, 2017

jakirkham commented Dec 8, 2017

jakirkham commented Dec 8, 2017

ahaldane Dec 5, 2017 •

edited

Loading

ahaldane Dec 6, 2017 •

edited

Loading

ahaldane Dec 6, 2017 •

edited

Loading