<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Pandas-loc-and-iloc-for-selecting-data" data-toc-modified-id="Pandas-loc-and-iloc-for-selecting-data-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Pandas loc and iloc for selecting data</a></span><ul class="toc-item"><li><span><a href="#Differences-between-loc-and-iloc" data-toc-modified-id="Differences-between-loc-and-iloc-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Differences between loc and iloc</a></span></li><li><span><a href="#Selecting-a-range-of-data-via-slice" data-toc-modified-id="Selecting-a-range-of-data-via-slice-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Selecting a range of data via slice</a></span></li><li><span><a href="#Selecting-via-conditions-and-callable" data-toc-modified-id="Selecting-via-conditions-and-callable-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Selecting via conditions and callable</a></span></li><li><span><a href="#Callable" data-toc-modified-id="Callable-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>Callable</a></span></li><li><span><a href="#loc-and-iloc-are-interchangeable-when-labels-are-0-based-integers" data-toc-modified-id="loc-and-iloc-are-interchangeable-when-labels-are-0-based-integers-1.5"><span class="toc-item-num">1.5&nbsp;&nbsp;</span>loc and iloc are interchangeable when labels are 0-based integers</a></span></li></ul></li></ul></div>

# Pandas loc and iloc for selecting data

## Differences between loc and iloc
The main distinction between loc and iloc is:

- loc is label-based, which means that you have to specify rows and columns based on their row and column labels. 
- iloc is integer position-based, so you have to specify rows and columns by their integer position values (0-based integer position).

In [10]:
import pandas as pd
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all" 

df = pd.read_csv('./data/data_day.csv', index_col=['Day'])
df

Unnamed: 0_level_0,Weather,Temperature,Wind,Humidity
Day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Mon,Sunny,12.79,13,30
Tue,Sunny,19.67,28,96
Wed,Sunny,17.51,16,20
Thu,Cloudy,14.44,11,22
Fri,Shower,10.51,26,79
Sat,Shower,11.07,27,62
Sun,Sunny,17.5,20,10


In [16]:
# Pass label to `loc`
df.loc['Fri', 'Temperature']

"-----------------------------------"
# To get all rows
df.loc[:, 'Temperature']

"-----------------------------------"
# To get all columns
df.loc['Fri', :]

"-----------------------------------"
# Multiple rows
df.loc[['Thu', 'Fri'], 'Temperature']


"-----------------------------------"
# Multiple columns
df.loc['Fri', ['Temperature', 'Wind']]

10.51

'-----------------------------------'

Day
Mon    12.79
Tue    19.67
Wed    17.51
Thu    14.44
Fri    10.51
Sat    11.07
Sun    17.50
Name: Temperature, dtype: float64

'-----------------------------------'

Weather        Shower
Temperature     10.51
Wind               26
Humidity           79
Name: Fri, dtype: object

'-----------------------------------'

Day
Thu    14.44
Fri    10.51
Name: Temperature, dtype: float64

In [18]:
# The equivalent `iloc` statement should take row number 4 and column number 1
df.iloc[4, 1]
"-----------------------------------"
# The equivalent `iloc` statement
df.iloc[:, 1]

"-----------------------------------"

# The equivalent `iloc` statement
df.iloc[4, :]

"-----------------------------------"
# Multiple rows using iloc
df.iloc[[3, 4], 1]

"-----------------------------------"
# Multiple columns using iloc
df.iloc[4, [1, 2]]

10.51

'-----------------------------------'

Day
Mon    12.79
Tue    19.67
Wed    17.51
Thu    14.44
Fri    10.51
Sat    11.07
Sun    17.50
Name: Temperature, dtype: float64

'-----------------------------------'

Weather        Shower
Temperature     10.51
Wind               26
Humidity           79
Name: Fri, dtype: object

'-----------------------------------'

Day
Thu    14.44
Fri    10.51
Name: Temperature, dtype: float64

'-----------------------------------'

Temperature    10.51
Wind              26
Name: Fri, dtype: object

In [20]:
# Multiple rows and columns
rows = ['Thu', 'Fri']
cols=['Temperature','Wind']

df.loc[rows, cols]

Unnamed: 0_level_0,Temperature,Wind
Day,Unnamed: 1_level_1,Unnamed: 2_level_1
Thu,14.44,11
Fri,10.51,26


In [22]:
# the equivalent iloc statement
rows = [3, 4]
cols = [1, 2]
df.iloc[rows, cols]

Unnamed: 0_level_0,Temperature,Wind
Day,Unnamed: 1_level_1,Unnamed: 2_level_1
Thu,14.44,11
Fri,10.51,26


## Selecting a range of data via slice

In [24]:
# Slicing column labels
rows=['Thu', 'Fri']
df.loc[rows, 'Temperature':'Humidity' ]

Unnamed: 0_level_0,Temperature,Wind,Humidity
Day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Thu,14.44,11,22
Fri,10.51,26,79


In [26]:
# Slicing row labels
cols = ['Temperature', 'Wind']
df.loc['Mon':'Thu', cols]

Unnamed: 0_level_0,Temperature,Wind
Day,Unnamed: 1_level_1,Unnamed: 2_level_1
Mon,12.79,13
Tue,19.67,28
Wed,17.51,16
Thu,14.44,11


In [28]:
# Slicing with step
df.loc['Mon':'Fri':2 , :]

Unnamed: 0_level_0,Weather,Temperature,Wind,Humidity
Day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Mon,Sunny,12.79,13,30
Wed,Sunny,17.51,16,20
Fri,Shower,10.51,26,79


In [29]:
df.iloc[[1, 2], 0 : 3]

Unnamed: 0_level_0,Weather,Temperature,Wind
Day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Tue,Sunny,19.67,28
Wed,Sunny,17.51,16


In [31]:
df.iloc[0:4:2, :]

Unnamed: 0_level_0,Weather,Temperature,Wind,Humidity
Day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Mon,Sunny,12.79,13,30
Wed,Sunny,17.51,16,20


## Selecting via conditions and callable

In [32]:
# One condition
df.loc[df.Humidity > 50, :]

Unnamed: 0_level_0,Weather,Temperature,Wind,Humidity
Day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Tue,Sunny,19.67,28,96
Fri,Shower,10.51,26,79
Sat,Shower,11.07,27,62


In [34]:
## multiple conditions
df.loc[
    (df.Humidity > 50) & (df.Weather == 'Shower'), 
    ['Temperature','Wind'],
]

Unnamed: 0_level_0,Temperature,Wind
Day,Unnamed: 1_level_1,Unnamed: 2_level_1
Fri,10.51,26
Sat,11.07,27


In [36]:
# Single condition
df.iloc[list(df.Humidity > 50)]

Unnamed: 0_level_0,Weather,Temperature,Wind,Humidity
Day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Tue,Sunny,19.67,28,96
Fri,Shower,10.51,26,79
Sat,Shower,11.07,27,62


In [38]:
## multiple conditions
df.iloc[
    list((df.Humidity > 50) & (df.Weather == 'Shower')), 
    :,
]

Unnamed: 0_level_0,Weather,Temperature,Wind,Humidity
Day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Fri,Shower,10.51,26,79
Sat,Shower,11.07,27,62


## Callable

In [40]:
# Selecting columns
df.loc[:, lambda df: ['Humidity', 'Wind']]

Unnamed: 0_level_0,Humidity,Wind
Day,Unnamed: 1_level_1,Unnamed: 2_level_1
Mon,30,13
Tue,96,28
Wed,20,16
Thu,22,11
Fri,79,26
Sat,62,27
Sun,10,20


In [41]:
# With condition
df.loc[lambda df: df.Humidity > 50, :]

Unnamed: 0_level_0,Weather,Temperature,Wind,Humidity
Day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Tue,Sunny,19.67,28,96
Fri,Shower,10.51,26,79
Sat,Shower,11.07,27,62


In [43]:
df.iloc[lambda df: [0,1], :]

Unnamed: 0_level_0,Weather,Temperature,Wind,Humidity
Day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Mon,Sunny,12.79,13,30
Tue,Sunny,19.67,28,96


In [45]:
df.iloc[lambda df: list(df.Humidity > 50), :]

Unnamed: 0_level_0,Weather,Temperature,Wind,Humidity
Day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Tue,Sunny,19.67,28,96
Fri,Shower,10.51,26,79
Sat,Shower,11.07,27,62


## loc and iloc are interchangeable when labels are 0-based integers

In [48]:

df = pd.read_csv(
    'data/data_day.csv', 
    header=None, 
    skiprows=[0],
)
df

Unnamed: 0,0,1,2,3,4
0,Mon,Sunny,12.79,13,30
1,Tue,Sunny,19.67,28,96
2,Wed,Sunny,17.51,16,20
3,Thu,Cloudy,14.44,11,22
4,Fri,Shower,10.51,26,79
5,Sat,Shower,11.07,27,62
6,Sun,Sunny,17.5,20,10


In [49]:
df.loc[1, 2]


19.67

In [51]:
df.loc[1, [1, 2]]

1    Sunny
2    19.67
Name: 1, dtype: object

loc and iloc are interchangeable when selecting via a single value or a list of values.

In [53]:
df.loc[1, 2] == df.iloc[1, 2]

True

In [55]:
df.loc[1, [1, 2]] == df.iloc[1, [1, 2]]

1    True
2    True
Name: 1, dtype: bool

https://github.com/BindiChen/machine-learning/blob/master/data-analysis/031-pandas-multiIndex/multiindex-selection.ipynb