# DataFrame Selection using `loc` and `iloc`

### Import dependencies

In [2]:
import pandas as pd

### Load the data file

In [23]:
#Identify the path to the file
file = "Resources/sampleData.csv"

# Read and display the csv
df_original = pd.read_csv(file)
df_original.head(20)

Unnamed: 0,id,first_name,last_name,Phone Number,Time zone
0,1,Peter,Richardson,7-(789)867-9023,Europe/Moscow
1,2,Janice,Berry,86-(614)973-1727,Asia/Harbin
2,3,Andrea,Hudson,86-(918)527-6371,Asia/Shanghai
3,4,Arthur,Mcdonald,420-(553)779-7783,Europe/Prague
4,5,Kathy,Morales,351-(720)541-2124,Europe/Lisbon
5,6,Juan,Reyes,507-(957)942-8540,America/Panama
6,7,Joseph,Kim,62-(764)534-1192,Asia/Jakarta
7,8,Frances,Hudson,57-(752)864-4744,America/Bogota
8,9,Judy,Day,7-(863)797-2311,Europe/Moscow
9,10,Robert,Ford,92-(784)853-3450,Asia/Karachi


### Set the index to *last_name*

In [18]:
df = df_original.set_index("last_name")
df.head()

Unnamed: 0_level_0,id,first_name,Phone Number,Time zone
last_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Richardson,1,Peter,7-(789)867-9023,Europe/Moscow
Berry,2,Janice,86-(614)973-1727,Asia/Harbin
Hudson,3,Andrea,86-(918)527-6371,Asia/Shanghai
Mcdonald,4,Arthur,420-(553)779-7783,Europe/Prague
Morales,5,Kathy,351-(720)541-2124,Europe/Lisbon


### Use `.loc` and `.iloc` to select data from DataFrame

- [.loc](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.loc.html) is used to select data using **label-based** indexes
- [.iloc](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.iloc.html) is used to select data using **integer-based** indexes

In [24]:
# Grab the data contained within the "Berry" row and the "Phone Number" column
berry_phone = df.loc["Berry", "Phone Number"]
print("Using Loc: " + berry_phone)

also_berry_phone = df.iloc[1, 2]
print("Using Iloc: " + also_berry_phone)

Using Loc: 86-(614)973-1727
Using Iloc: 86-(614)973-1727


### Use `.loc` and `.iloc` to select specific columns and rows

In [32]:
# Grab the first five rows of data and the columns from "id" to "Phone Number"
# The problem with using "last_name" as the index is that the values are not unique so duplicates are returned

richardson_to_morales = df.loc[["Richardson", "Berry", "Hudson","Mcdonald", "Morales"], ["id", "first_name", "Phone Number"]]
print(richardson_to_morales)

print()

# Using iloc[] will not find duplicates since a numeric index is always unique
also_richardson_to_morales = df.iloc[0:4, 0:3]
print(also_richardson_to_morales)

            id first_name       Phone Number
last_name                                   
Richardson   1      Peter    7-(789)867-9023
Richardson  25     Donald   62-(259)282-5871
Berry        2     Janice   86-(614)973-1727
Hudson       3     Andrea   86-(918)527-6371
Hudson       8    Frances   57-(752)864-4744
Hudson      90      Norma  351-(551)598-1822
Mcdonald     4     Arthur  420-(553)779-7783
Morales      5      Kathy  351-(720)541-2124

            id first_name       Phone Number
last_name                                   
Richardson   1      Peter    7-(789)867-9023
Berry        2     Janice   86-(614)973-1727
Hudson       3     Andrea   86-(918)527-6371
Mcdonald     4     Arthur  420-(553)779-7783


### Use `.loc` to select **all** rows for specified columns

In [36]:
# The following will select all rows for columns `first_name` and `Phone Number`
df.loc[:, ["first_name", "Phone Number"]].head()

Unnamed: 0_level_0,first_name,Phone Number
last_name,Unnamed: 1_level_1,Unnamed: 2_level_1
Richardson,Peter,7-(789)867-9023
Berry,Janice,86-(614)973-1727
Hudson,Andrea,86-(918)527-6371
Mcdonald,Arthur,420-(553)779-7783
Morales,Kathy,351-(720)541-2124


### Use a conditional statement show which rows satisfy your selection

In [37]:
named_arthur = df["first_name"] == "Arthur"
named_arthur.head()

last_name
Richardson    False
Berry         False
Hudson        False
Mcdonald       True
Morales       False
Name: first_name, dtype: bool

### Select rows based on a conditional

- Loc and Iloc also allow for conditional statments to filter rows of data
- using Loc on the logic test above only returns rows where the result is True

In [40]:
df.loc[df["first_name"] == "Arthur"]

Unnamed: 0_level_0,id,first_name,Phone Number,Time zone
last_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Mcdonald,4,Arthur,420-(553)779-7783,Europe/Prague
Franklin,82,Arthur,86-(599)522-0287,Asia/Chongqing
Henderson,96,Arthur,81-(353)751-4060,Asia/Tokyo


### Other ways to perform the same task

- In the cell below, we're using `.loc` and providing a conditional test for the row selection. Additionally, we're passing `:` as the column selection, which means we want **all** columns to be returned.

In [41]:
df.loc[df["first_name"] == "Arthur", :]

Unnamed: 0_level_0,id,first_name,Phone Number,Time zone
last_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Mcdonald,4,Arthur,420-(553)779-7783,Europe/Prague
Franklin,82,Arthur,86-(599)522-0287,Asia/Chongqing
Henderson,96,Arthur,81-(353)751-4060,Asia/Tokyo


### Selecting without `.loc`

- In the cell below, we aren't using `.loc` at all. This performs the same exact task as the cells above, and you'll see that a lot of code uses this format for selecting rows from a DataFrame. Using `.loc` allows you to input a column selection and is supposedly more performant, but for most datasets, this format is acceptable.

In [17]:
df[df["first_name"] == "Arthur"].head()

Unnamed: 0_level_0,id,first_name,Phone Number,Time zone
last_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Mcdonald,4,Arthur,420-(553)779-7783,Europe/Prague
Franklin,82,Arthur,86-(599)522-0287,Asia/Chongqing
Henderson,96,Arthur,81-(353)751-4060,Asia/Tokyo


### Selecting data using `.loc` with multiple conditions

Boolean operators:
- AND: &
- OR: |

In [46]:
df.loc[(df["first_name"] == "Arthur") | (df["first_name"] == "Billy"), ["Phone Number", "Time zone"]]

Unnamed: 0_level_0,Phone Number,Time zone
last_name,Unnamed: 1_level_1,Unnamed: 2_level_1
Mcdonald,420-(553)779-7783,Europe/Prague
Clark,62-(213)345-2549,Asia/Makassar
Andrews,86-(859)746-5367,Asia/Chongqing
Price,86-(878)547-7739,Asia/Shanghai
Franklin,86-(599)522-0287,Asia/Chongqing
Henderson,81-(353)751-4060,Asia/Tokyo
