Title: DataFrame Indexes
Slug: pandas/dataframe-index
Category: Pandas
Tags: DataFrame, set_index, loc, drop, xs
Date: 2017-10-04
Modified: 2017-10-04

#### Import libraries

In [1]:
import pandas as pd

#### Generate data

In [2]:
data = {
    'country': ['UK', 'Canada', 'UK', 'USA', 'France', 'USA', 'Canada'],
    'city': ['London', 'London', 'Birmingham', 'Birmingham', 'Paris', 'Paris', 'Paris'],
    'population': [8788000, 389000, 1101000, 212000, 2244000, 25000, 12000]
}

df0 = pd.DataFrame(data)
df0

Unnamed: 0,city,country,population
0,London,UK,8788000
1,London,Canada,389000
2,Birmingham,UK,1101000
3,Birmingham,USA,212000
4,Paris,France,2244000
5,Paris,USA,25000
6,Paris,Canada,12000


#### Set a new index
If an index is not specified, Pandas will give each row an integer label starting from 0. We can set `city` as the index, but ideally our indexes should be unique.

In [3]:
df1 = df0.set_index('city')
df1

Unnamed: 0_level_0,country,population
city,Unnamed: 1_level_1,Unnamed: 2_level_1
London,UK,8788000
London,Canada,389000
Birmingham,UK,1101000
Birmingham,USA,212000
Paris,France,2244000
Paris,USA,25000
Paris,Canada,12000


In [4]:
# Returns two results - not ideal!
df1.loc['London']

Unnamed: 0_level_0,country,population
city,Unnamed: 1_level_1,Unnamed: 2_level_1
London,UK,8788000
London,Canada,389000


In [5]:
df2 = df0.set_index(df0['city'] + ', ' + df0['country']).drop(['city', 'country'], axis=1)
df2

Unnamed: 0,population
"London, UK",8788000
"London, Canada",389000
"Birmingham, UK",1101000
"Birmingham, USA",212000
"Paris, France",2244000
"Paris, USA",25000
"Paris, Canada",12000


#### Multilevel indexes
Since each country-city combination is unique in our dataset, this pairing makes a good mulitlevel index. First we reset the index to it's original state, then set our new index.

In [6]:
df3 = df0.set_index(['country', 'city'])
df3

Unnamed: 0_level_0,Unnamed: 1_level_0,population
country,city,Unnamed: 2_level_1
UK,London,8788000
Canada,London,389000
UK,Birmingham,1101000
USA,Birmingham,212000
France,Paris,2244000
USA,Paris,25000
Canada,Paris,12000


In [7]:
# Slicing at the top level of the index
df3.loc['UK']

Unnamed: 0_level_0,population
city,Unnamed: 1_level_1
London,8788000
Birmingham,1101000


In [8]:
# Slicing at both levels of the index
df3.loc[[('USA', 'Birmingham')]]

Unnamed: 0_level_0,Unnamed: 1_level_0,population
country,city,Unnamed: 2_level_1
USA,Birmingham,212000


In [9]:
# Slicing at a lower index level
df3.xs('Paris', level=1)

Unnamed: 0_level_0,population
country,Unnamed: 1_level_1
France,2244000
USA,25000
Canada,12000
