Link to Medium blog post: https://towardsdatascience.com/5-use-cases-of-pandas-loc-and-iloc-methods-a94796b1f734

# 5 Use Cases of Pandas loc and iloc Methods

andas is a highly flexible and powerful library for data analysis and manipulation. It provides lots of functions and methods to perform efficient operations in each step of data analysis process.

The loc and iloc are essential Pandas methods used for filtering, selecting, and manipulating data. They allow us to access a particular cell or multiple cells within a dataframe.

In this article, we will go over 5 use-cases of loc and iloc which I think are very helpful in a typical data analysis process.

We will use the Melbourne housing dataset available on Kaggle for the examples. We first read the csv file using the read_csv function.

In [3]:
import numpy as np
import pandas as pd
df = pd.read_csv("melb_data.csv")
print(df.shape)

df.columns

(13580, 21)


Index(['Suburb', 'Address', 'Rooms', 'Type', 'Price', 'Method', 'SellerG',
       'Date', 'Distance', 'Postcode', 'Bedroom2', 'Bathroom', 'Car',
       'Landsize', 'BuildingArea', 'YearBuilt', 'CouncilArea', 'Lattitude',
       'Longtitude', 'Regionname', 'Propertycount'],
      dtype='object')

The dataset contains 21 features about 13580 houses in Melbourne.

### Example 1

The main difference between is the way they access rows and columns:

- loc uses row and column labels
- iloc uses row and column indices

Let’s use both methods to select the first rows in the address column.

In [5]:
df.loc[:5, 'Address'] # df.loc[0:5, 'Address'] works as well

0        85 Turner St
1     25 Bloomburg St
2        5 Charles St
3    40 Federation La
4         55a Park St
5      129 Charles St
Name: Address, dtype: object

In [6]:
df.iloc[:5, 1]

0        85 Turner St
1     25 Bloomburg St
2        5 Charles St
3    40 Federation La
4         55a Park St
Name: Address, dtype: object

You may have noticed that we use the same expression to select the rows. The reason is that Pandas assigns integers row labels by default. Thus, unless we specify row labels, the indices and labels of the rows are the same. The only difference is that the upper limit is inclusive with the loc method.

The column indices also start from 0 so the index of the address column is 1.

### Example 2

We do not have to specify a range to select multiple rows or columns. We can pass them in a list as well.

In [8]:
df.loc[[5,7,9], ['Address', 'Type']]

Unnamed: 0,Address,Type
5,129 Charles St,h
7,98 Charles St,h
9,10 Valiant St,h


In [9]:
df.iloc[[5,7,9], [1,2]]

Unnamed: 0,Address,Rooms
5,129 Charles St,2
7,98 Charles St,2
9,10 Valiant St,2


We have selected the rows with labels (or indices) 5, 7, and 9 of the address and type columns.

### Example 3

We can use the loc method to create a new column. Let’s create a column that takes the value 1 for houses that are more expensive than 1 million. Since each data point (i.e. row) represents a house, we apply the condition on the price column.

In [10]:
df.loc[df.Price > 1000000, 'IsExpensive'] = 1

The “IsExpensive” column is 1 for rows that meet the condition and NaN for the other rows.

In [11]:
df.loc[:4, ['Price','IsExpensive']]

Unnamed: 0,Price,IsExpensive
0,1480000.0,1.0
1,1035000.0,1.0
2,1465000.0,1.0
3,850000.0,
4,1600000.0,1.0


### Example 4

The loc method accepts multiple conditions. Let’s create a new column called category which takes the value “Expensive House” for the rows with a price higher than 1.4 million and a type of “h”.

In [14]:
df.loc[(df.Price > 1400000) & (df.Type == 'h'), 'Category'] = 'Expensive House'
df.loc[:4, ['Price','Category']]

Unnamed: 0,Price,Category
0,1480000.0,Expensive House
1,1035000.0,
2,1465000.0,Expensive House
3,850000.0,
4,1600000.0,Expensive House


We can handle the NaN values later on. For instance, the fillna function of Pandas provides flexible ways of handling the missing values. We can also fill the missing values based on other conditions with the loc method.

### Example 5

We can use the loc method to update the values in an existing column based on a condition. For instance, the following code will apply a 5% discount on the prices higher than 1.4 million.

In [15]:
df.loc[df.Price > 1400000, 'Price'] = df.Price * 0.95
df.loc[:4, ['Price','IsExpensive']]


Unnamed: 0,Price,IsExpensive
0,1406000.0,1.0
1,1035000.0,1.0
2,1391750.0,1.0
3,850000.0,
4,1520000.0,1.0


We can also use the iloc method for this task but we need to provide the index of the price column. Since it is more convenient to use the column labels rather than indices, the loc method is preferred over the iloc method for such tasks.