Change rows in dask.dataframe #653

mrocklin · 2015-09-02T21:21:19Z

This stackoverflow question raises a valid question:

How do we change a few rows in a dask.dataframe?

E.g. how do we change all negative entries to NaN? How do we change a particular column in a particular index range to zero?

The equivalent column operations are handled by assign. Is there an analagous row-wise operation within Pandas that we should copy? The dask.array version of this is possibly something like where.

The text was updated successfully, but these errors were encountered:

jreback · 2015-09-02T21:37:16Z

http://pandas.pydata.org/pandas-docs/stable/indexing.html#the-where-method-and-masking

but note that a .where is equiv to s[mask] (e.g. indexing)

mrocklin · 2015-09-02T21:43:30Z

Cool. Is there a similar function to change values based on loc and columns? For example the SO questioner asks about the following:

df.loc[[2,6], 'a']  = np.pi

jreback · 2015-09-02T21:50:42Z

this should work directly like the above, is their an issue?

mrocklin · 2015-09-02T22:09:13Z

I'm looking for syntax that doesn't involve mutating the underlying dataframe. I like assign because it accomplishes this use case

df['c'] = df.a + df.b
df = df.assign(c=df.a + df.b)

I think that I'm looking for the same thing that operates on row ranges rather than columns.

jreback · 2015-09-02T22:20:41Z

on, then .where is your man so to speak. It returns a new frame. (.mask is the inverse)

In [1]: df = DataFrame(np.arange(10).reshape(5,2),columns=list('AB'))

In [2]: df
Out[2]: 
   A  B
0  0  1
1  2  3
2  4  5
3  6  7
4  8  9

In [3]: df.where(df['A']>6,-df)
Out[3]: 
   A  B
0  0 -1
1 -2 -3
2 -4 -5
3 -6 -7
4  8  9

In [4]: df.mask(df['A']>6,-df)
Out[4]: 
   A  B
0  0  1
1  2  3
2  4  5
3  6  7
4 -8 -9

In [5]: df
Out[5]: 
   A  B
0  0  1
1  2  3
2  4  5
3  6  7
4  8  9

jreback · 2015-09-02T22:21:37Z

The original purpose is actually for masking (but returning a same shaped frame)

In [15]: df.where(df['A']>6)
Out[15]: 
    A   B
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
4   8   9

sinhrks mentioned this issue Sep 18, 2015

Add DataFrame.where and mask #729

Merged

mrocklin closed this as completed Apr 26, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change rows in dask.dataframe #653

Change rows in dask.dataframe #653

mrocklin commented Sep 2, 2015

jreback commented Sep 2, 2015

mrocklin commented Sep 2, 2015

jreback commented Sep 2, 2015

mrocklin commented Sep 2, 2015

jreback commented Sep 2, 2015

jreback commented Sep 2, 2015

Change rows in dask.dataframe #653

Change rows in dask.dataframe #653

Comments

mrocklin commented Sep 2, 2015

jreback commented Sep 2, 2015

mrocklin commented Sep 2, 2015

jreback commented Sep 2, 2015

mrocklin commented Sep 2, 2015

jreback commented Sep 2, 2015

jreback commented Sep 2, 2015