# Pandas Extracting rows using .loc[]

Pandas provide a unique method to retrieve rows from a Data frame. DataFrame.loc[] method is a method that takes only index labels and returns row or dataframe if the index label exists in the caller data frame.

#### Extracting single Row

In this example, Name column is made as the index column and then two single rows are extracted one by one in the form of series using index label of rows.


In [1]:
 # importing pandas package
import pandas as pd
  
# making data frame from csv file
data = pd.read_csv("nba.csv", index_col ="Name")
  
# retrieving row by loc method
first = data.loc["Avery Bradley"]
second = data.loc["R.J. Hunter"]
  
  
print(first, "\n\n\n", second)

Team        Boston Celtics
Number                 0.0
Position                PG
Age                   25.0
Height                 6-2
Weight               180.0
College              Texas
Salary           7730337.0
Name: Avery Bradley, dtype: object 


 Team        Boston Celtics
Number                28.0
Position                SG
Age                   22.0
Height                 6-5
Weight               185.0
College      Georgia State
Salary           1148640.0
Name: R.J. Hunter, dtype: object


#### Multiple parameters

In this example, Name column is made as the index column and then two single rows are extracted at the same time by passing a list as parameter.

In [2]:
# importing pandas package
import pandas as pd
  
# making data frame from csv file
data = pd.read_csv("nba.csv", index_col ="Name")
  
# retrieving rows by loc method
rows = data.loc[["Avery Bradley", "R.J. Hunter"]]
  
# checking data type of rows
print(type(rows))
  
# display
rows

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0_level_0,Team,Number,Position,Age,Height,Weight,College,Salary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0


#### Extracting multiple rows with same index

In this example, Team name is made as the index column and one team name is passed to .loc method to check if all values with same team name have been returned or not.

In [3]:
# importing pandas package
import pandas as pd
  
# making data frame from csv file
data = pd.read_csv("nba.csv", index_col ="Team")
  
# retrieving rows by loc method
rows = data.loc["Utah Jazz"]
  
# checking data type of rows
print(type(rows))
  
# display
rows

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0_level_0,Name,Number,Position,Age,Height,Weight,College,Salary
Team,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Utah Jazz,Trevor Booker,33.0,PF,28.0,6-8,228.0,Clemson,4775000.0
Utah Jazz,Trey Burke,3.0,PG,23.0,6-1,191.0,Michigan,2658240.0
Utah Jazz,Alec Burks,10.0,SG,24.0,6-6,214.0,Colorado,9463484.0
Utah Jazz,Dante Exum,11.0,PG,20.0,6-6,190.0,,3777720.0
Utah Jazz,Derrick Favors,15.0,PF,24.0,6-10,265.0,Georgia Tech,12000000.0
Utah Jazz,Rudy Gobert,27.0,C,23.0,7-1,245.0,,1175880.0
Utah Jazz,Gordon Hayward,20.0,SF,26.0,6-8,226.0,Butler,15409570.0
Utah Jazz,Rodney Hood,5.0,SG,23.0,6-8,206.0,Duke,1348440.0
Utah Jazz,Joe Ingles,2.0,SF,28.0,6-8,226.0,,2050000.0
Utah Jazz,Chris Johnson,23.0,SF,26.0,6-6,206.0,Dayton,981348.0


#### Extracting rows between two index labels

In this example, two index label of rows are passed and all the rows that fall between those two index label have been returned (Both index labels Inclusive).

In [4]:
# importing pandas package
import pandas as pd
  
# making data frame from csv file
data = pd.read_csv("nba.csv", index_col ="Name")
  
# retrieving rows by loc method
rows = data.loc["Avery Bradley":"Isaiah Thomas"]
  
# checking data type of rows
print(type(rows))
  
# display
rows 

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0_level_0,Team,Number,Position,Age,Height,Weight,College,Salary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,
R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0
Jonas Jerebko,Boston Celtics,8.0,PF,29.0,6-10,231.0,,5000000.0
Amir Johnson,Boston Celtics,90.0,PF,29.0,6-9,240.0,,12000000.0
Jordan Mickey,Boston Celtics,55.0,PF,21.0,6-8,235.0,LSU,1170960.0
Kelly Olynyk,Boston Celtics,41.0,C,25.0,7-0,238.0,Gonzaga,2165160.0
Terry Rozier,Boston Celtics,12.0,PG,22.0,6-2,190.0,Louisville,1824360.0
Marcus Smart,Boston Celtics,36.0,PG,22.0,6-4,220.0,Oklahoma State,3431040.0


# Extracting rows using Pandas .iloc[]
The Pandas library provides a unique method to retrieve rows from a Data Frame. Dataframe.iloc[] method is used when the index label of a data frame is something other than numeric series of 0, 1, 2, 3….n or in case the user doesn’t know the index label. Rows can be extracted using an imaginary index position which isn’t visible in the data frame.

Extracting single row and comparing with .loc[] In this example, same index number row is extracted by both .iloc[] and.loc[] method and compared. Since the index column by default is numeric, hence the index label will also be integers. 
 

In [5]:
# importing pandas package
import pandas as pd
 
# making data frame from csv file
data = pd.read_csv("nba.csv")
 
# retrieving rows by loc method
row1 = data.loc[3]
 
# retrieving rows by iloc method
row2 = data.iloc[3]
 
# checking if values are equal
row1 == row2

Name        True
Team        True
Number      True
Position    True
Age         True
Height      True
Weight      True
College     True
Salary      True
Name: 3, dtype: bool

Extracting multiple rows with index In this example, multiple rows are extracted, first by passing a list and then by passing integers to extract rows between that range. After that, both the values are compared. 

In [6]:
# importing pandas package
import pandas as pd
 
# making data frame from csv file
data = pd.read_csv("nba.csv")
 
# retrieving rows by loc method
row1 = data.iloc[[4, 5, 6, 7]]
 
# retrieving rows by loc method
row2 = data.iloc[4:8]
 
# comparing values
row1 == row2

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
4,True,True,True,True,True,True,True,False,True
5,True,True,True,True,True,True,True,False,True
6,True,True,True,True,True,True,True,True,True
7,True,True,True,True,True,True,True,True,True


# BOOLEAN INDEXING
## Accessing a DataFrame with a boolean index: 
In order to access a dataframe with a boolean index, we have to create a dataframe in which the index of dataframe contains a boolean value that is “True” or “False”.

In [1]:
# importing pandas as pd
import pandas as pd
  
# dictionary of lists
dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],
        'degree': ["MBA", "BCA", "M.Tech", "MBA"],
        'score':[90, 40, 80, 98]}
  
df = pd.DataFrame(dict, index = [True, False, True, False])
  
print(df)

         name  degree  score
True   aparna     MBA     90
False  pankaj     BCA     40
True   sudhir  M.Tech     80
False   Geeku     MBA     98


## Accessing a Dataframe with a boolean index using .loc[]
In order to access a dataframe with a boolean index using .loc[], we simply pass a boolean value (True or False) in a .loc[] function. 

In [2]:
# importing pandas as pd
import pandas as pd
  
# dictionary of lists
dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],
        'degree': ["MBA", "BCA", "M.Tech", "MBA"],
        'score':[90, 40, 80, 98]}
 
# creating a dataframe with boolean index
df = pd.DataFrame(dict, index = [True, False, True, False])
 
# accessing a dataframe using .loc[] function
print(df.loc[True])


        name  degree  score
True  aparna     MBA     90
True  sudhir  M.Tech     80


## Accessing a Dataframe with a boolean index using .iloc[]
In order to access a dataframe using .iloc[], we have to pass a boolean value (True or False)  but iloc[] function accepts only integer as an argument so it will throw an error so we can only access a dataframe when we pass an integer in iloc[] function 

In [3]:
# importing pandas as pd
import pandas as pd
  
# dictionary of lists
dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],
        'degree': ["MBA", "BCA", "M.Tech", "MBA"],
        'score':[90, 40, 80, 98]}
 
# creating a dataframe with boolean index 
df = pd.DataFrame(dict, index = [True, False, True, False])
  
 
# accessing a dataframe using .iloc[] function
print(df.iloc[1])

name      pankaj
degree       BCA
score         40
dtype: object


## Applying a boolean mask to a dataframe : 
In a dataframe, we can apply a boolean mask. In order to do that we can use __getitems__ or [] accessor. We can apply a boolean mask by giving a list of True and False of the same length as contain in a dataframe. 

In [4]:
# importing pandas as pd
import pandas as pd
  
# dictionary of lists
dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],
        'degree': ["MBA", "BCA", "M.Tech", "MBA"],
        'score':[90, 40, 80, 98]}
  
df = pd.DataFrame(dict, index = [0, 1, 2, 3])
 
print(df[[True, False, True, False]])

     name  degree  score
0  aparna     MBA     90
2  sudhir  M.Tech     80


### Masking data based on column value: 
In a dataframe we can filter a data based on a column value.  In order to filter data, we can apply certain conditions on the dataframe using different operators like ==, >, <, <=, >=.

In [7]:
# importing pandas as pd
import pandas as pd
  
# dictionary of lists
dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],
        'degree': ["BCA", "BCA", "M.Tech", "BCA"],
        'score':[90, 40, 80, 98]}
 
# creating a dataframe
df = pd.DataFrame(dict)
  
# using a comparison operator for filtering of data
print(df['degree'] == 'BCA') 

0     True
1     True
2    False
3     True
Name: degree, dtype: bool


In [10]:
# using a comparison operator for filtering of data
print(df[~(df['degree'] == 'BCA')])

     name  degree  score
2  sudhir  M.Tech     80
