<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Slice-a-set-of-rows-using-df.iloc" data-toc-modified-id="Slice-a-set-of-rows-using-df.iloc-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Slice a set of rows using <code>df.iloc</code></a></span></li><li><span><a href="#Updating-slices" data-toc-modified-id="Updating-slices-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Updating slices</a></span><ul class="toc-item"><li><span><a href="#Updating-an-slice-with--df.at" data-toc-modified-id="Updating-an-slice-with--df.at-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Updating an slice with  <code>df.at</code></a></span></li><li><span><a href="#How-not-to-update-an-slice" data-toc-modified-id="How-not-to-update-an-slice-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>How not to update an slice</a></span></li></ul></li><li><span><a href="#Updating-subsets-of-slices" data-toc-modified-id="Updating-subsets-of-slices-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Updating subsets of slices</a></span></li></ul></div>

In [2]:
import pandas as pd
import numpy as np

In [3]:
n = 100

def create_df():

    np.random.seed(1234)
    df = pd.DataFrame({"col_0": np.random.rand(n),
                       "col_1": np.random.rand(n),
                       "col_2":np.random.randint(1,4,n)});
    return df

df = create_df()

# Slice a set of rows using `df.iloc`

We can select a bunch of rows from a dataframen using the `iloc` function.

For example, we can have a list containing the indices of the rows we want to select.

In [4]:
df = create_df()

In [277]:
df.iloc[[0,1,2]]

Unnamed: 0,col_0,col_1,col_2
0,0.191519,0.767117,2
1,0.622109,0.708115,3
2,0.437728,0.796867,2


We can also use a slice `i:j` to do the selection.

Notice that in this case we do not write `[0:3]` because `0:3` is already an iterable

In [170]:
df.iloc[0:3]

Unnamed: 0,col_0,col_1,col_2
0,0.191519,0.767117,2
1,0.622109,0.708115,3
2,0.437728,0.796867,2


One of the most usefull approaches for slicing consist on generating a vector of booleans used to select the rows.

In [180]:
indices = df["col_2"]>2
indices.shape, df.shape

((100,), (100, 3))

In [181]:
df[indices].head()

Unnamed: 0,col_0,col_1,col_2
1,0.622109,0.708115,3
4,0.779976,0.965837,3
5,0.272593,0.147157,3
6,0.276464,0.029647,3
9,0.875933,0.95081,3


# Updating slices


Let us consider the case we want to write a row of a dataframe into several rows.

In [90]:
v = df.iloc[0] 
v

col_0    0.191519
col_1    0.767117
col_2    2.000000
Name: 0, dtype: float64

## Updating an slice with  `df.at`
The proper way consist on using the method `df.at` which can recieve a set of indices and a vector which we can use to update the rows of the provided indices

In [236]:
df = create_df()
df.at[[0,1,4],:] = v.values
df.head()

Unnamed: 0,col_0,col_1,col_2
0,0.191519,0.767117,2.0
1,0.191519,0.767117,2.0
2,0.437728,0.796867,2.0
3,0.785359,0.557761,1.0
4,0.191519,0.767117,2.0


## How not to update an slice

We can not do `df.iloc[0:3] = v`

In [92]:
df.iloc[0:3] = v
df.head()

Unnamed: 0,col_0,col_1,col_2
0,,,
1,,,
2,,,
3,0.785359,0.557761,1.0
4,0.779976,0.965837,3.0


Notice the NaN values we got are precisely of the rows we wanted to update.

This does not work even if we use another dataframe.

In [152]:
df = create_df()

In [153]:
v = df.iloc[[0]] 
v

Unnamed: 0,col_0,col_1,col_2
0,0.191519,0.767117,2


In [154]:
df.iloc[[0,1,2]] = v

In [155]:
df.head()

Unnamed: 0,col_0,col_1,col_2
0,0.191519,0.767117,2.0
1,,,
2,,,
3,0.785359,0.557761,1.0
4,0.779976,0.965837,3.0


You can notice though that in this case the row of index 0 was correctly updated, this happens because `v`
is a dataframe and has an index of 0

In [230]:
v.index = [10]
df = create_df()
v

Unnamed: 0,col_0,col_1,col_2
10,0.191519,0.767117,2


Therefore, if try to update with the same strategy none of the rows is updated

In [232]:
df.iloc[[0,1,2]] = v
df.head()

Unnamed: 0,col_0,col_1,col_2
0,,,
1,,,
2,,,
3,0.785359,0.557761,1.0
4,0.779976,0.965837,3.0


# Updating subsets of slices

Let us consider the case where we are interested on updating a bunch of rows from `row_indices`. In this case we have a single row `v_subset` that is not of the same length of the dataframe (it is a subset).

How can we update the columns used in `v_subset` for the rows in `row_indices` ?



In [252]:
df = create_df()
v_subset = df.iloc[0][["col_0","col_1"]]
v_subset

col_0    0.191519
col_1    0.767117
Name: 0, dtype: float64

In [254]:
v_subset

col_0    0.191519
col_1    0.767117
Name: 0, dtype: float64

In [262]:
v_subset.axes[0]

Index(['col_0', 'col_1'], dtype='object')

In [249]:
row_indices = [0,1,4]

In [251]:
df.head()

Unnamed: 0,col_0,col_1,col_2
0,0.191519,0.767117,2
1,0.622109,0.708115,3
2,0.437728,0.796867,2
3,0.785359,0.557761,1
4,0.779976,0.965837,3


In [274]:
df.at[row_indices,v_subset.axes[0]] = v_subset.values

Notice that rows 0,1 and 4 got updated but only for the first two columns

In [275]:
df.head()

Unnamed: 0,col_0,col_1,col_2
0,0.191519,0.767117,2
1,0.191519,0.767117,3
2,0.437728,0.796867,2
3,0.785359,0.557761,1
4,0.191519,0.767117,3
