<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Updating-slices" data-toc-modified-id="Updating-slices-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Updating slices</a></span><ul class="toc-item"><li><span><a href="#Updating-an-slice-with--df.at" data-toc-modified-id="Updating-an-slice-with--df.at-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Updating an slice with  <code>df.at</code></a></span></li><li><span><a href="#How-not-to-update-an-slice" data-toc-modified-id="How-not-to-update-an-slice-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>How not to update an slice</a></span></li></ul></li><li><span><a href="#Updating-subsets-of-slices" data-toc-modified-id="Updating-subsets-of-slices-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Updating subsets of slices</a></span></li><li><span><a href="#Adding-new-columns-with-a-shared-value-across-rows-of-a-slice-of-the-dataframe" data-toc-modified-id="Adding-new-columns-with-a-shared-value-across-rows-of-a-slice-of-the-dataframe-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Adding new columns with a shared value across rows of a slice of the dataframe</a></span></li></ul></div>

In [1]:
import pandas as pd
import numpy as np

In [2]:
n = 100

def create_df():

    np.random.seed(1234)
    df = pd.DataFrame({"col_0": np.random.rand(n),
                       "col_1": np.random.rand(n),
                       "col_2":np.random.randint(1,4,n)});
    return df

df = create_df()

In [3]:
df.iloc[[0,1,2]]

Unnamed: 0,col_0,col_1,col_2
0,0.191519,0.767117,2
1,0.622109,0.708115,3
2,0.437728,0.796867,2


We can also use a slice `i:j` to do the selection.

Notice that in this case we do not write `[0:3]` because `0:3` is already an iterable

In [4]:
df.iloc[0:3]

Unnamed: 0,col_0,col_1,col_2
0,0.191519,0.767117,2
1,0.622109,0.708115,3
2,0.437728,0.796867,2


One of the most usefull approaches for slicing consist on generating a vector of booleans used to select the rows.

In [5]:
indices = df["col_2"]>2
indices.shape, df.shape

((100,), (100, 3))

In [6]:
df[indices].head()

Unnamed: 0,col_0,col_1,col_2
1,0.622109,0.708115,3
4,0.779976,0.965837,3
5,0.272593,0.147157,3
6,0.276464,0.029647,3
9,0.875933,0.95081,3


# Updating slices


Let us consider the case we want to write a row of a dataframe into several rows.

In [7]:
v = df.iloc[0] 
v

col_0    0.191519
col_1    0.767117
col_2    2.000000
Name: 0, dtype: float64

## Updating an slice with  `df.at`
The proper way consist on using the method `df.at` which can recieve a set of indices and a vector which we can use to update the rows of the provided indices

In [8]:
df = create_df()
df.at[[0,1,4],:] = v.values
df.head()

Unnamed: 0,col_0,col_1,col_2
0,0.191519,0.767117,2.0
1,0.191519,0.767117,2.0
2,0.437728,0.796867,2.0
3,0.785359,0.557761,1.0
4,0.191519,0.767117,2.0


## How not to update an slice

We can not do `df.iloc[0:3] = v`

In [9]:
df.iloc[0:3] = v
df.head()

Unnamed: 0,col_0,col_1,col_2
0,,,
1,,,
2,,,
3,0.785359,0.557761,1.0
4,0.191519,0.767117,2.0


Notice the NaN values we got are precisely of the rows we wanted to update.

This does not work even if we use another dataframe.

In [10]:
df = create_df()

In [11]:
v = df.iloc[[0]] 
v

Unnamed: 0,col_0,col_1,col_2
0,0.191519,0.767117,2


In [12]:
df.iloc[[0,1,2]] = v

In [13]:
df.head()

Unnamed: 0,col_0,col_1,col_2
0,0.191519,0.767117,2.0
1,,,
2,,,
3,0.785359,0.557761,1.0
4,0.779976,0.965837,3.0


You can notice though that in this case the row of index 0 was correctly updated, this happens because `v`
is a dataframe and has an index of 0

In [14]:
v.index = [10]
df = create_df()
v

Unnamed: 0,col_0,col_1,col_2
10,0.191519,0.767117,2


Therefore, if try to update with the same strategy none of the rows is updated

In [15]:
df.iloc[[0,1,2]] = v
df.head()

Unnamed: 0,col_0,col_1,col_2
0,,,
1,,,
2,,,
3,0.785359,0.557761,1.0
4,0.779976,0.965837,3.0


# Updating subsets of slices

Let us consider the case where we are interested on updating a bunch of rows from `row_indices`. In this case we have a single row `v_subset` that is not of the same length of the dataframe (it is a subset).

How can we update the columns used in `v_subset` for the rows in `row_indices` ?



In [16]:
df = create_df()
v_subset = df.iloc[0][["col_0","col_1"]]
v_subset

col_0    0.191519
col_1    0.767117
Name: 0, dtype: float64

In [17]:
v_subset

col_0    0.191519
col_1    0.767117
Name: 0, dtype: float64

In [18]:
v_subset.axes[0]

Index(['col_0', 'col_1'], dtype='object')

In [19]:
row_indices = [0,1,4]

In [20]:
df.head()

Unnamed: 0,col_0,col_1,col_2
0,0.191519,0.767117,2
1,0.622109,0.708115,3
2,0.437728,0.796867,2
3,0.785359,0.557761,1
4,0.779976,0.965837,3


In [21]:
df.at[row_indices,v_subset.axes[0]] = v_subset.values

Notice that rows 0,1 and 4 got updated but only for the first two columns

In [22]:
df.head()

Unnamed: 0,col_0,col_1,col_2
0,0.191519,0.767117,2
1,0.191519,0.767117,3
2,0.437728,0.796867,2
3,0.785359,0.557761,1
4,0.191519,0.767117,3


# Adding new columns with a shared value across rows of a slice of the dataframe


We are interested in adding a new columns. For each of the new columns we want to append a fixed value across the column, but we want to modify only a slice of the rows in the original dataframe. 

We can do this with `df.loc[indices, newcol]= value` where `indices` is a `pandas.Series` of the same lenght as the number of rows in `df` containing boolean values. Notice that `indices` simply tells us which rows will be updated with `value` (for the other rows a NaN is created).

This is done one column at a time.

In [23]:
df = create_df()
df.head()

Unnamed: 0,col_0,col_1,col_2
0,0.191519,0.767117,2
1,0.622109,0.708115,3
2,0.437728,0.796867,2
3,0.785359,0.557761,1
4,0.779976,0.965837,3


In [24]:
indices = df["col_2"]==1
indices.shape, df.shape, type(df), type(indices)

((100,), (100, 3), pandas.core.frame.DataFrame, pandas.core.series.Series)

In [25]:
for i in range(10):
    namecol = "newcol"+str(i)
    df.loc[indices, namecol]= i

You can see that the new columns have been updated with the same value only for the rows in `indices`.

The rows in `indices` are the ones where `col_2` takes value 1.

In [26]:
df[1:10]

Unnamed: 0,col_0,col_1,col_2,newcol0,newcol1,newcol2,newcol3,newcol4,newcol5,newcol6,newcol7,newcol8,newcol9
1,0.622109,0.708115,3,,,,,,,,,,
2,0.437728,0.796867,2,,,,,,,,,,
3,0.785359,0.557761,1,0.0,1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0
4,0.779976,0.965837,3,,,,,,,,,,
5,0.272593,0.147157,3,,,,,,,,,,
6,0.276464,0.029647,3,,,,,,,,,,
7,0.801872,0.593893,1,0.0,1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0
8,0.958139,0.114066,1,0.0,1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0
9,0.875933,0.95081,3,,,,,,,,,,
