Add index to meta after .str.split with expand#7026
Conversation
TomAugspurger
left a comment
There was a problem hiding this comment.
Thanks! Looks nice, just one suggestion and a question. Good to go once those are addressed.
dask/dataframe/accessor.py
Outdated
| @@ -129,6 +129,7 @@ def split(self, pat=None, n=-1, expand=False): | |||
| delimiter = " " if pat is None else pat | |||
| meta = type(self._series._meta)([delimiter.join(["a"] * (n + 1))]) | |||
There was a problem hiding this comment.
Small suggestion, to make this more consistent with the rest of dask.
| meta = type(self._series._meta)([delimiter.join(["a"] * (n + 1))]) | |
| meta = self._series._constructor([delimiter.join(["a"] * (n + 1))], index=self._series._meta_nonempty[:1].index) |
That may not be properly linted.
There was a problem hiding this comment.
The self._series._meta_nonempty[:1].index is certainly nice. But if I'm not mistaken, the self._series._constructor needs to be called with dsk, name, meta, divisions (cf. new_dd_object(dsk, name, meta, divisions)). Hence, your suggestion results in TypeError: new_dd_object() got an unexpected keyword argument 'index'.
Maybe I miss the point.
There was a problem hiding this comment.
Whoops, sorry. Should be self._series._meta._constructor. That's similar to type(self._series._meta)(...), but a bit more correct (newer parts of dask are using the ._constructor approach).
dask/dataframe/accessor.py
Outdated
| delimiter = " " if pat is None else pat | ||
| meta = type(self._series._meta)([delimiter.join(["a"] * (n + 1))]) | ||
| meta = meta.str.split(n=n, expand=expand, pat=pat) | ||
| meta = meta.iloc[:0].set_index(self._series._meta.index) |
There was a problem hiding this comment.
This can be removed if you apply my suggestion from above. It's not needed now that we create a meta with the proper index.
…to-meta-str-split
|
Should be good to go @TomAugspurger My first ever OS contribution 😄 (apart from docs). Huuraaaah. Thanks for the review and feedback. |
TomAugspurger
left a comment
There was a problem hiding this comment.
@dask/maintenance there's some failures on windows-3.8 at https://github.com/dask/dask/pull/7026/checks?check_run_id=1650637862. Are those know failures? (I've been a bit out of the loop recently 😄)
If those can be ignored then this is good to go. Thanks @rubenvdg.
|
Yes, we've seen the mysterious disappearance of |
* Add index to meta after .str.split * Remove old version * iloc[:0] more expressive than .drop(0) * beautify
This PR proposes a fix for #7021.
Without the proposed fix,
dask/dataframe/tests/test_accessors.py::test_str_accessor_expandfails if the index is of any other type thanint.black dask/flake8 dask