# Python Pandas - Indexing and Selecting Data

## 1. iloc:
- Integer based indexing

## 2.loc:
- Label based indexing



## .loc()

- Pandas provide various methods to have purely label based indexing. When slicing, the start bound is also included. Integers are valid labels, but they refer to the label and not the position.

- .loc() has multiple access methods like −

A single scalar label
A list of labels

A slice object

A Boolean array

- loc takes two single,list,range operator separated by ','. The first one indicates the row and the second one indicates columns.

In [2]:
import numpy as np
import pandas as pd
dataframe=pd.DataFrame(np.random.randn(5,3),index=['a','b','c','d','e'],columns=['A','B','C'])

In [3]:
dataframe

Unnamed: 0,A,B,C
a,-0.7633,1.079672,-1.730194
b,-0.587342,-0.371306,0.261575
c,0.062957,-2.206176,-1.464644
d,0.066225,-0.102496,-1.221381
e,1.156787,-0.738294,-0.351238


In [4]:
dataframe.loc[:,'A']

a   -0.763300
b   -0.587342
c    0.062957
d    0.066225
e    1.156787
Name: A, dtype: float64

In [6]:
dataframe.loc[:,['A','C']]

Unnamed: 0,A,C
a,-0.7633,-1.730194
b,-0.587342,0.261575
c,0.062957,-1.464644
d,0.066225,-1.221381
e,1.156787,-0.351238


In [7]:
dataframe.loc[:,:]

Unnamed: 0,A,B,C
a,-0.7633,1.079672,-1.730194
b,-0.587342,-0.371306,0.261575
c,0.062957,-2.206176,-1.464644
d,0.066225,-0.102496,-1.221381
e,1.156787,-0.738294,-0.351238


In [8]:
dataframe.loc['a','C']

-1.7301940065428558

In [11]:
dataframe.loc['a']>1

A    False
B     True
C    False
Name: a, dtype: bool

## iloc():

- Pandas provide various methods in order to get purely integer based indexing. Like python and numpy, these are 0-based indexing.

The various access methods are as follows −

An Integer

A list of integers

A range of values

In [12]:
dataframe=pd.DataFrame(np.random.randn(4,4),columns=['a','b','c','d'])

In [13]:
dataframe

Unnamed: 0,a,b,c,d
0,-0.859255,-1.833199,0.786266,-2.098256
1,-1.069508,1.615591,1.980145,0.073876
2,-0.830852,-0.689694,-0.775173,0.589731
3,1.102534,0.864459,-1.111076,0.11451


In [15]:
dataframe.iloc[1,3]

0.07387580844332307

In [16]:
dataframe.iloc[1:3,0:5]

Unnamed: 0,a,b,c,d
1,-1.069508,1.615591,1.980145,0.073876
2,-0.830852,-0.689694,-0.775173,0.589731


# Copying Objects vs Referencing Objects in Python

# Using the 'copy() method'
true_copy_surveys_df = surveys_df.copy()

# Using the '=' operator
ref_surveys_df = surveys_df
You might think that the code ref_surveys_df = surveys_df creates a fresh distinct copy of the surveys_df DataFrame object. However, using the = operator in the simple statement y = x does not create a copy of our DataFrame. Instead, y = x creates a new variable y that references the same object that x refers to. To state this another way, there is only one object (the DataFrame), and both x and y refer to it.

In contrast, the copy() method for a DataFrame creates a true copy of the DataFrame.

In [2]:
import pandas as pd

In [3]:
nba=pd.read_csv('nba-2.csv')

In [4]:
nba

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,
3,R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0
4,Jonas Jerebko,Boston Celtics,8.0,PF,29.0,6-10,231.0,,5000000.0
...,...,...,...,...,...,...,...,...,...
453,Shelvin Mack,Utah Jazz,8.0,PG,26.0,6-3,203.0,Butler,2433333.0
454,Raul Neto,Utah Jazz,25.0,PG,24.0,6-1,179.0,,900000.0
455,Tibor Pleiss,Utah Jazz,21.0,C,26.0,7-3,256.0,,2900000.0
456,Jeff Withey,Utah Jazz,24.0,C,26.0,7-0,231.0,Kansas,947276.0


## 1. Suppose we want to select columns Age, College and Salary for only rows with a labels Amir Johnson and Terry Rozier

In [18]:
nba.set_index('Name',inplace=True)

In [20]:
nba.loc[['Avery Bradley','Terry Rozier'],['Age','College','Salary']]

Unnamed: 0_level_0,Age,College,Salary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Avery Bradley,25.0,Texas,7730337.0
Terry Rozier,22.0,Louisville,1824360.0


## 2. Let’s say we want to select row Amir Jhonson, Terry Rozier and John Holland with all columns in a dataframe.

In [21]:
nba.loc[['Avery Bradley','Terry Rozier','John Holland'],:]

Unnamed: 0_level_0,Team,Number,Position,Age,Height,Weight,College,Salary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
Terry Rozier,Boston Celtics,12.0,PG,22.0,6-2,190.0,Louisville,1824360.0
John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,


In [27]:
import pandas as pd

data = {"Product_Name":["Keyboard","Mouse", "Monitor", "CPU","CPU", "Speakers","Headset"],
        "Unit_Price":[500,200, 5000.235, 10000.550, 10000.550, 250.50,None],
        "No_Of_Units":[5,5, 10, 20, 20, 8,pd.NaT],
        "Available_Quantity":[5,6,10,"Not Available","Not Available", pd.NaT,pd.NaT],
        "Available_Since_Date":['11/5/2021', '4/23/2021', '08/21/2021','09/18/2021','09/18/2021','01/05/2021',pd.NaT],
        "Remarks":[pd.NaT,pd.NaT,pd.NaT,pd.NaT,pd.NaT,pd.NaT,pd.NaT]
       }

df = pd.DataFrame(data)

df = df.astype({"Unit_Price": float})

df

In [30]:
df

Unnamed: 0,Product_Name,Unit_Price,No_Of_Units,Available_Quantity,Available_Since_Date,Remarks
0,Keyboard,500.0,5,5,11/5/2021,NaT
1,Mouse,200.0,5,6,4/23/2021,NaT
2,Monitor,5000.235,10,10,08/21/2021,NaT
3,CPU,10000.55,20,Not Available,09/18/2021,NaT
4,CPU,10000.55,20,Not Available,09/18/2021,NaT
5,Speakers,250.5,8,NaT,01/05/2021,NaT
6,Headset,,NaT,NaT,NaT,NaT


In [29]:
df.dtypes

Product_Name                    object
Unit_Price                     float64
No_Of_Units                     object
Available_Quantity              object
Available_Since_Date            object
Remarks                 datetime64[ns]
dtype: object

In [31]:
df.at[7,'Product_Name']='Test Product'

In [32]:
df

Unnamed: 0,Product_Name,Unit_Price,No_Of_Units,Available_Quantity,Available_Since_Date,Remarks
0,Keyboard,500.0,5,5,11/5/2021,NaT
1,Mouse,200.0,5,6,4/23/2021,NaT
2,Monitor,5000.235,10,10,08/21/2021,NaT
3,CPU,10000.55,20,Not Available,09/18/2021,NaT
4,CPU,10000.55,20,Not Available,09/18/2021,NaT
5,Speakers,250.5,8,NaT,01/05/2021,NaT
6,Headset,,NaT,NaT,NaT,NaT
7,Test Product,,,,,NaT


In [34]:
df.loc[7,'Product_Name']='Headset'

In [35]:
df

Unnamed: 0,Product_Name,Unit_Price,No_Of_Units,Available_Quantity,Available_Since_Date,Remarks
0,Keyboard,500.0,5,5,11/5/2021,NaT
1,Mouse,200.0,5,6,4/23/2021,NaT
2,Monitor,5000.235,10,10,08/21/2021,NaT
3,CPU,10000.55,20,Not Available,09/18/2021,NaT
4,CPU,10000.55,20,Not Available,09/18/2021,NaT
5,Speakers,250.5,8,NaT,01/05/2021,NaT
6,Headset,,NaT,NaT,NaT,NaT
7,Headset,,,,,NaT


## Set the cell value at the row position 3 and the column Remarks to the value No stock available. Will be available in 5 days.

In [37]:
df.loc[3,df.columns.get_loc('Remarks')]='No stock available. will be avialble in 5 days'

In [38]:
df

Unnamed: 0,Product_Name,Unit_Price,No_Of_Units,Available_Quantity,Available_Since_Date,Remarks,5
0,Keyboard,500.0,5,5,11/5/2021,NaT,
1,Mouse,200.0,5,6,4/23/2021,NaT,
2,Monitor,5000.235,10,10,08/21/2021,NaT,
3,CPU,10000.55,20,Not Available,09/18/2021,NaT,No stock available. will be avialble in 5 days
4,CPU,10000.55,20,Not Available,09/18/2021,NaT,
5,Speakers,250.5,8,NaT,01/05/2021,NaT,
6,Headset,,NaT,NaT,NaT,NaT,
7,Headset,,,,,NaT,


In [39]:
df.iat[3,df.columns.get_loc('Remarks')]='No stock available. will be avialble in 5 days'

In [40]:
df

Unnamed: 0,Product_Name,Unit_Price,No_Of_Units,Available_Quantity,Available_Since_Date,Remarks,5
0,Keyboard,500.0,5,5,11/5/2021,NaT,
1,Mouse,200.0,5,6,4/23/2021,NaT,
2,Monitor,5000.235,10,10,08/21/2021,NaT,
3,CPU,10000.55,20,Not Available,09/18/2021,No stock available. will be avialble in 5 days,No stock available. will be avialble in 5 days
4,CPU,10000.55,20,Not Available,09/18/2021,NaT,
5,Speakers,250.5,8,NaT,01/05/2021,NaT,
6,Headset,,NaT,NaT,NaT,NaT,
7,Headset,,,,,NaT,


3 – Row index of the cell for which the value needs to be set.

df.columns.get_loc('Remarks') – To identify the index of the columns remarks.

While using the iat, If you specify only the row index and leave the column index as empty, then all cells in that row will be set to the new value.

## Set Value Using ILOC
You can also set the value of a cell using the iloc attribute of the dataframe. Loc allows you to access the cell of the dataframe using row and column indexes (Integer) rather than using the labels.

It accepts two parameters.

column_index – Integer based label of the column

row_index – Integer based label of the row

You can get the index of the rows or columns by using the get_loc() method available in the
df.index attribute. For example, to get the column location, you can use df.columns.get_loc('Column_Name').

Use the below snippet to set the cell value at the row position 0 and the column Remarks to the value Test Remarks.

In [41]:
df.iloc[0, df.columns.get_loc('Remarks')] = 'Test Remarks'


df

Unnamed: 0,Product_Name,Unit_Price,No_Of_Units,Available_Quantity,Available_Since_Date,Remarks,5
0,Keyboard,500.0,5,5,11/5/2021,Test Remarks,
1,Mouse,200.0,5,6,4/23/2021,NaT,
2,Monitor,5000.235,10,10,08/21/2021,NaT,
3,CPU,10000.55,20,Not Available,09/18/2021,No stock available. will be avialble in 5 days,No stock available. will be avialble in 5 days
4,CPU,10000.55,20,Not Available,09/18/2021,NaT,
5,Speakers,250.5,8,NaT,01/05/2021,NaT,
6,Headset,,NaT,NaT,NaT,NaT,
7,Headset,,,,,NaT,
