# Setting & Resetting the index

## Setup

In [1]:
import pandas as pd

## Creation

Creation of an example DataFrame (starting from a dictionary of dictionaries):

In [2]:
data = {
    "Capital": {
        "Spain": "Madrid",
        "Belgium": "Brussels",
        "France": "Paris",
        "Italy": "Roma",
        "Germany": "Berlin",
        "Portugal": "Lisbon",
        "Norway": "Oslo",
        "Greece": "Athens",
    },
    "Population": {
        "Spain": 46733038,
        "Belgium": 11449656,
        "France": 67076000,
        "Italy": 60390560,
        "Germany": 83122889,
        "Portugal": 10295909,
        "Norway": 5391369,
        "Greece": 10718565,
    },
    "Monarch": {
        "Spain": "Felipe VI",
        "Belgium": "Philippe",
        "Norway": "Harald V",
    },
    "Area": {
        "Spain": 505990,
        "Belgium": 30688,
        "France": 640679,
        "Italy": 301340,
        "Germany": 357022,
        "Portugal": 92212,
        "Norway": 385207,
        "Greece": 131957,
    },
}

In [3]:
# For now, let's forget about these steps:
df = pd.DataFrame(data)
df["Capital"] = df["Capital"].astype("string")
df["Monarch"] = df["Monarch"].astype("string")
df.index.name = "Country"
df = df.reset_index()
df["Country"] = df["Country"].astype("string")

Apple stock data, taken from the [`matplotlib` sample datasets](https://github.com/matplotlib/sample_data/blob/master/aapl.csv)

In [4]:
# For now, let's forget about these steps:
apple = pd.read_csv("AAPL.csv")
apple["Date"] = apple["Date"].astype("datetime64[ns]")
# apple = apple.set_index("Date")
# apple = apple.sort_index()

## Demo 1: Set a column as the index

In [9]:
df

Unnamed: 0_level_0,Capital,Population,Monarch,Area
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Spain,Madrid,46733038,Felipe VI,505990
Belgium,Brussels,11449656,Philippe,30688
France,Paris,67076000,,640679
Italy,Roma,60390560,,301340
Germany,Berlin,83122889,,357022
Portugal,Lisbon,10295909,,92212
Norway,Oslo,5391369,Harald V,385207
Greece,Athens,10718565,,131957


In [10]:
df.index

Index(['Spain', 'Belgium', 'France', 'Italy', 'Germany', 'Portugal', 'Norway',
       'Greece'],
      dtype='object', name='Country')

Set a column as the index:

In [11]:
df = df.set_index("Country")

KeyError: "None of ['Country'] are in the columns"

In [12]:
df

Unnamed: 0_level_0,Capital,Population,Monarch,Area
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Spain,Madrid,46733038,Felipe VI,505990
Belgium,Brussels,11449656,Philippe,30688
France,Paris,67076000,,640679
Italy,Roma,60390560,,301340
Germany,Berlin,83122889,,357022
Portugal,Lisbon,10295909,,92212
Norway,Oslo,5391369,Harald V,385207
Greece,Athens,10718565,,131957


## Exercise 1

In [13]:
apple.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume,Adj Close
0,2008-10-14,116.26,116.4,103.14,104.08,70749800,104.08
1,2008-10-13,104.55,110.53,101.02,110.26,54967000,110.26
2,2008-10-10,85.7,100.0,85.0,96.8,79260700,96.8
3,2008-10-09,93.35,95.8,86.6,88.74,57763700,88.74
4,2008-10-08,85.91,96.33,85.68,89.79,78847900,89.79


Set the "Date" column as the index:

In [14]:
apple.set_index("Date")

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Adj Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2008-10-14,116.26,116.40,103.14,104.08,70749800,104.08
2008-10-13,104.55,110.53,101.02,110.26,54967000,110.26
2008-10-10,85.70,100.00,85.00,96.80,79260700,96.80
2008-10-09,93.35,95.80,86.60,88.74,57763700,88.74
2008-10-08,85.91,96.33,85.68,89.79,78847900,89.79
...,...,...,...,...,...,...
1984-09-13,27.50,27.62,27.50,27.50,7429600,3.14
1984-09-12,26.87,27.00,26.12,26.12,4773600,2.98
1984-09-11,26.62,27.37,26.62,26.87,5444000,3.07
1984-09-10,26.50,26.62,25.87,26.37,2346400,3.01


In [15]:
apple

Unnamed: 0,Date,Open,High,Low,Close,Volume,Adj Close
0,2008-10-14,116.26,116.40,103.14,104.08,70749800,104.08
1,2008-10-13,104.55,110.53,101.02,110.26,54967000,110.26
2,2008-10-10,85.70,100.00,85.00,96.80,79260700,96.80
3,2008-10-09,93.35,95.80,86.60,88.74,57763700,88.74
4,2008-10-08,85.91,96.33,85.68,89.79,78847900,89.79
...,...,...,...,...,...,...,...
6076,1984-09-13,27.50,27.62,27.50,27.50,7429600,3.14
6077,1984-09-12,26.87,27.00,26.12,26.12,4773600,2.98
6078,1984-09-11,26.62,27.37,26.62,26.87,5444000,3.07
6079,1984-09-10,26.50,26.62,25.87,26.37,2346400,3.01


## Demo 2: Reset the index to the default

In [16]:
df

Unnamed: 0_level_0,Capital,Population,Monarch,Area
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Spain,Madrid,46733038,Felipe VI,505990
Belgium,Brussels,11449656,Philippe,30688
France,Paris,67076000,,640679
Italy,Roma,60390560,,301340
Germany,Berlin,83122889,,357022
Portugal,Lisbon,10295909,,92212
Norway,Oslo,5391369,Harald V,385207
Greece,Athens,10718565,,131957


In [17]:
df.index

Index(['Spain', 'Belgium', 'France', 'Italy', 'Germany', 'Portugal', 'Norway',
       'Greece'],
      dtype='object', name='Country')

Reset the index to the default:

In [18]:
df = df.reset_index()

In [19]:
df

Unnamed: 0,Country,Capital,Population,Monarch,Area
0,Spain,Madrid,46733038,Felipe VI,505990
1,Belgium,Brussels,11449656,Philippe,30688
2,France,Paris,67076000,,640679
3,Italy,Roma,60390560,,301340
4,Germany,Berlin,83122889,,357022
5,Portugal,Lisbon,10295909,,92212
6,Norway,Oslo,5391369,Harald V,385207
7,Greece,Athens,10718565,,131957


Note that the data type of the column often has to be fixed after resetting the index:

In [20]:
df.dtypes

Country       object
Capital       string
Population     int64
Monarch       string
Area           int64
dtype: object

In [21]:
df["Country"] = df["Country"].astype("string")

## Exercise 2

In [22]:
apple.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume,Adj Close
0,2008-10-14,116.26,116.4,103.14,104.08,70749800,104.08
1,2008-10-13,104.55,110.53,101.02,110.26,54967000,110.26
2,2008-10-10,85.7,100.0,85.0,96.8,79260700,96.8
3,2008-10-09,93.35,95.8,86.6,88.74,57763700,88.74
4,2008-10-08,85.91,96.33,85.68,89.79,78847900,89.79


Reset the index to the default:

Check the data types:

If necessary, correct the data type of the column that was used for the index: