TypeError: split() got an unexpected keyword argument 'expand' | string split function doesn't work [TypeError] | dask 0.20 #4179

ZiyadMoraished · 2018-11-06T09:18:35Z

Hi,

I'm trying to split a column by space as follows:
df.CUSTOMER.str.split(expand=True)

here is the error I get:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-35-07645d325084> in <module>()
----> 1 df.CUSTOMER.str.split(expand=True).head()

TypeError: split() got an unexpected keyword argument 'expand'

when I perform it on the top 5 records, it works perfectly.
df.head().CUSTOMER.str.split(expand=True)

i'm using python 3.6 and dask 0.20

The text was updated successfully, but these errors were encountered:

mrocklin · 2018-11-06T12:56:42Z

Perhaps your pandas version is too old? Otherwise I don't know what might be wrong. I recommend providing a minimal failing example.

…

On Tue, Nov 6, 2018 at 4:18 AM Ziyad Moraished ***@***.***> wrote: Hi, I'm trying to split a column by space as follows: df.CUSTOMER.str.split(expand=True) here is the error I get: --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-35-07645d325084> in <module>() ----> 1 df.CUSTOMER.str.split(expand=True).head() TypeError: split() got an unexpected keyword argument 'expand' when I perform it on the top 5 records, it works perfectly. df.head().CUSTOMER.str.split(expand=True) i'm using python 3.6 and dask 0.20 — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#4179>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AASszEntIFKOO659itQQB3yvs2ds1ZEEks5usVPsgaJpZM4YQCWL> .

ZiyadMoraished · 2018-11-08T10:16:42Z

pandas version is 0.23.4

here is a quick failing example:

dask_df = dd.from_pandas(pd.DataFrame({'name' :['Ziyad Moraished']*1000000 }), npartitions= 10000)

dask_df['name'].str.split(' ', exapnd=True)

TypeError                                 Traceback (most recent call last)
<ipython-input-22-2833952eac0a> in <module>()
----> 1 dask_df['name'].str.split(' ', exapnd=True)

TypeError: split() got an unexpected keyword argument 'exapnd'

mrocklin · 2018-11-08T13:23:10Z

Reproduced. Thank you @ZiyadMoraished .

At first it looked like we could just pass through the expand= keyword from our version to Pandas' (though we would want to verify that this works well on previous pandas versions as well). However when I tried this it looked like we weren't getting the metadata correct. Presumably we don't correctly infer that this produces a dataframe rather than a series.

If you start diving in from here:

dask/dask/dataframe/accessor.py

Lines 119 to 120 in 113457b

    
           def split(self, pat=None, n=-1): 
        
               return self._function_map('split', pat=pat, n=n)

You'll eventually get to here:

dask/dask/dataframe/accessor.py

Lines 61 to 62 in 113457b

    
           meta = self._delegate_method(self._series._meta_nonempty, 
        
                                        self._accessor_name, attr, args, kwargs)

Which should be a dataframe with a few text columns, but seems not to be.

If anyone wants to investigate this further that would be welcome.

nixphix · 2018-12-05T19:48:16Z

@mrocklin after adding expand arg to split function it failed meta data check here

dask/dask/dataframe/core.py

Lines 3691 to 3694 in 113457b

    
           if not np.array_equal(np.nan_to_num(meta.columns), 
        
                                 np.nan_to_num(df.columns)): 
        
               raise ValueError("The columns in the computed data do not match" 
        
                                " the columns in the provided metadata")

mrocklin · 2018-12-05T19:53:09Z

What are the differences between meta.columns and df.columns (arguably this should also be in the exception). I wonder if that information would direct you to the fix.

nixphix · 2018-12-05T20:23:23Z

meta.column is RangeIndex(start=0, stop=1, step=1) where as df.column is RangeIndex(start=0, stop=2, step=1)

nixphix · 2018-12-05T20:28:56Z

This is just for the above quoted code sample

dask_df = dd.from_pandas(pd.DataFrame({'name' :['Ziyad Moraished']*1000000 }), npartitions= 10000)

dask_df['name'].str.split(' ', exapnd=True)

we really can't predict the number of splits ahead of time

nixphix · 2018-12-08T17:33:21Z

@mrocklin we could make num splits parameter mandatory if expansion is required, that way we can be sure. Let me know what your thinking.

mrocklin · 2018-12-08T20:24:46Z

we really can't predict the number of splits ahead of time

Hrm, you're right. That is unfortunate.

Let me know what your thinking

I don't know of a good general solution here. I wonder if anyone else has a suggestion.

As you suggest we could ask the user for the information. We could also compute things directly (this would be safer, but more expensive). I don't have strong thoughts on what is best here.

jakirkham · 2019-04-30T15:59:01Z

@TomAugspurger, do you have thoughts on this issue?

mrocklin · 2019-04-30T18:26:51Z

Fixed in #4744

jcrist added good first issue Clearly described and easy to accomplish. Good for beginners to the project. dataframe labels Nov 30, 2018

mrocklin closed this as completed Apr 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TypeError: split() got an unexpected keyword argument 'expand' | string split function doesn't work [TypeError] | dask 0.20 #4179

TypeError: split() got an unexpected keyword argument 'expand' | string split function doesn't work [TypeError] | dask 0.20 #4179

ZiyadMoraished commented Nov 6, 2018

mrocklin commented Nov 6, 2018 via email

ZiyadMoraished commented Nov 8, 2018

mrocklin commented Nov 8, 2018

nixphix commented Dec 5, 2018

mrocklin commented Dec 5, 2018

nixphix commented Dec 5, 2018

nixphix commented Dec 5, 2018

nixphix commented Dec 8, 2018

mrocklin commented Dec 8, 2018

jakirkham commented Apr 30, 2019

mrocklin commented Apr 30, 2019

TypeError: split() got an unexpected keyword argument 'expand' | string split function doesn't work [TypeError] | dask 0.20 #4179

TypeError: split() got an unexpected keyword argument 'expand' | string split function doesn't work [TypeError] | dask 0.20 #4179

Comments

ZiyadMoraished commented Nov 6, 2018

mrocklin commented Nov 6, 2018 via email

ZiyadMoraished commented Nov 8, 2018

mrocklin commented Nov 8, 2018

nixphix commented Dec 5, 2018

mrocklin commented Dec 5, 2018

nixphix commented Dec 5, 2018

nixphix commented Dec 5, 2018

nixphix commented Dec 8, 2018

mrocklin commented Dec 8, 2018

jakirkham commented Apr 30, 2019

mrocklin commented Apr 30, 2019