# Lesson 6 - Session 2 - Pandas example 2

## Indexing in DataFrames

1. Indexing by location
2. Indexing by label
3. Indexing by Boolean masks
4. Indexing using slices
5. Column based and row based indexing
6. Hierarchical indexing (needs a multi indexed DataFrame)

In [None]:
# read data from the text variable csv, keep only 5 top rows
import pandas as pd

# sample DataFrame
import io

csv = '''
breed,type,longevity,size,weight
German Shepherd,herding,9.73,large,
Beagle,hound,12.3,small,
Yorkshire Terrier,toy,12.6,small,5.5
Golden Retriever,sporting,12.04,medium,60.0
Bulldog,non-sporting,6.29,medium,45.0
Labrador Retriever,sporting,12.04,medium,67.5
Boxer,working,8.81,medium,
Poodle,non-sporting,11.95,medium,
Dachshund,hound,12.63,small,24.0
Rottweiler,working,9.11,large,
Boston Terrier,non-sporting,10.92,medium,
Shih Tzu,toy,13.2,small,12.5
Miniature Schnauzer,terrier,11.81,small,15.5
Doberman Pinscher,working,10.33,large,
Chihuahua,toy,16.5,small,5.5
Siberian Husky,working,12.58,medium,47.5
Pomeranian,toy,9.67,small,5.0
French Bulldog,non-sporting,9.0,medium,27.0
Great Dane,working,6.96,large,
Shetland Sheepdog,herding,12.53,small,22.0
Cavalier King Charles Spaniel,toy,11.29,small,15.5
German Shorthaired Pointer,sporting,11.46,large,62.5
Maltese,toy,12.25,small,5.0
'''

df = pd.read_csv(io.StringIO(csv))
df = df.head(5)
print(df)
print("#" * 40)
df.set_index('breed', inplace=True)
print(df)

## Indexing by location (1/6)

Location-based a.k.a position-based or integer-based

In [None]:
# select the first row of the DataFrame
df.iloc[0]

In [None]:
# select the first column of the DataFrame
df.iloc[:,0]

In [None]:
# select a cell value by row and column index
df.iloc[0,2]

## Indexing by labels (2/6)

Label-based 

In [None]:
# select the row with index 'Beagle'
df.loc['Beagle']

In [None]:
# select the column longevity
df.loc[:,['longevity']]

## Indexing by Boolean masks (3/6)

In [None]:
# A boolean mask
df["size"]=="small" 

In [None]:
# select dogs with small size
df.loc[df["size"]=="small"]

In [None]:
# select dogs with small size and hound type 
df.loc[(df["size"]=="small") & (df["type"]=="hound")]

# Indexing using slices (4/6)

In [None]:
# select row 1 and row 2, and column 0 and column 1
df.iloc[1:3,0:2]

In [None]:
# select 3 specific rows and 2 specific columns
df.iloc[[1,3,0],[0,2]]

## Column based and row based indexing (5/6)

In [None]:
# select the column longevity
df['longevity']

In [None]:
# select the column longevity, using dot notation
df.longevity

In [None]:
# select columns longevity and size
df[['longevity', 'size']]

In [None]:
# select rows Beagle and Bulldog
df['longevity'][['Beagle', 'Bulldog']]

## Hierarchical indexing (6/6)

In [None]:
# read all data again, make DataFrame multi-indexed using columns size and type
df2 = pd.read_csv(io.StringIO(csv))
df2.set_index(['size', 'type'], inplace=True)

# sort index to improve performance and avoid warnings about performance
df2.sort_index(inplace=True)
df2

In [None]:
# using hierarchical index, (size, type) 
df2.loc[('small', 'hound')]