# Sorting DataFrames

## Setup

In [1]:
import pandas as pd

## Creation

Creation of an example DataFrame (starting from a dictionary of dictionaries):

In [2]:
data = {
    "Capital": {
        "Spain": "Madrid",
        "Belgium": "Brussels",
        "France": "Paris",
        "Italy": "Roma",
        "Germany": "Berlin",
        "Portugal": "Lisbon",
        "Norway": "Oslo",
        "Greece": "Athens",
    },
    "Population": {
        "Spain": 46733038,
        "Belgium": 11449656,
        "France": 67076000,
        "Italy": 60390560,
        "Germany": 83122889,
        "Portugal": 10295909,
        "Norway": 5391369,
        "Greece": 10718565,
    },
    "Monarch": {
        "Spain": "Felipe VI",
        "Belgium": "Philippe",
        "Norway": "Harald V",
    },
    "Area": {
        "Spain": 505990,
        "Belgium": 30688,
        "France": 640679,
        "Italy": 301340,
        "Germany": 357022,
        "Portugal": 92212,
        "Norway": 385207,
        "Greece": 131957,
    },
}

In [3]:
# For now, let's forget about these steps:
df = pd.DataFrame(data)
df["Capital"] = df["Capital"].astype("string")
df["Monarch"] = df["Monarch"].astype("string")

Apple stock data, taken from the [`matplotlib` sample datasets](https://github.com/matplotlib/sample_data/blob/master/aapl.csv)

In [4]:
# For now, let's forget about these steps:
apple = pd.read_csv("AAPL.csv")
apple["Date"] = apple["Date"].astype("datetime64[ns]")
apple = apple.set_index("Date")
apple = apple.sort_index()

## Demo 1: Sort the index

In [5]:
df

Unnamed: 0,Capital,Population,Monarch,Area
Spain,Madrid,46733038,Felipe VI,505990
Belgium,Brussels,11449656,Philippe,30688
France,Paris,67076000,,640679
Italy,Roma,60390560,,301340
Germany,Berlin,83122889,,357022
Portugal,Lisbon,10295909,,92212
Norway,Oslo,5391369,Harald V,385207
Greece,Athens,10718565,,131957


Sort the index (in alphabetical order):

In [6]:
df = df.sort_index()

In [7]:
df

Unnamed: 0,Capital,Population,Monarch,Area
Belgium,Brussels,11449656,Philippe,30688
France,Paris,67076000,,640679
Germany,Berlin,83122889,,357022
Greece,Athens,10718565,,131957
Italy,Roma,60390560,,301340
Norway,Oslo,5391369,Harald V,385207
Portugal,Lisbon,10295909,,92212
Spain,Madrid,46733038,Felipe VI,505990


Sort the index (in the reverse order):

In [8]:
df = df.sort_index(ascending=False)

In [9]:
df

Unnamed: 0,Capital,Population,Monarch,Area
Spain,Madrid,46733038,Felipe VI,505990
Portugal,Lisbon,10295909,,92212
Norway,Oslo,5391369,Harald V,385207
Italy,Roma,60390560,,301340
Greece,Athens,10718565,,131957
Germany,Berlin,83122889,,357022
France,Paris,67076000,,640679
Belgium,Brussels,11449656,Philippe,30688


## Exercise 1

In [None]:
apple.head()

Sort the index of the `apple` DataFrame (by **decreasing dates**, i.e. starting from the most recent dates):

In [10]:
apple.sort_index(ascending=False)

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Adj Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2008-10-14,116.26,116.40,103.14,104.08,70749800,104.08
2008-10-13,104.55,110.53,101.02,110.26,54967000,110.26
2008-10-10,85.70,100.00,85.00,96.80,79260700,96.80
2008-10-09,93.35,95.80,86.60,88.74,57763700,88.74
2008-10-08,85.91,96.33,85.68,89.79,78847900,89.79
...,...,...,...,...,...,...
1984-09-13,27.50,27.62,27.50,27.50,7429600,3.14
1984-09-12,26.87,27.00,26.12,26.12,4773600,2.98
1984-09-11,26.62,27.37,26.62,26.87,5444000,3.07
1984-09-10,26.50,26.62,25.87,26.37,2346400,3.01


Sort the index of the `apple` DataFrame (by **increasing dates**, i.e. starting from the oldest dates):

In [11]:
apple.sort_index()

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Adj Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1984-09-07,26.50,26.87,26.25,26.50,2981600,3.02
1984-09-10,26.50,26.62,25.87,26.37,2346400,3.01
1984-09-11,26.62,27.37,26.62,26.87,5444000,3.07
1984-09-12,26.87,27.00,26.12,26.12,4773600,2.98
1984-09-13,27.50,27.62,27.50,27.50,7429600,3.14
...,...,...,...,...,...,...
2008-10-08,85.91,96.33,85.68,89.79,78847900,89.79
2008-10-09,93.35,95.80,86.60,88.74,57763700,88.74
2008-10-10,85.70,100.00,85.00,96.80,79260700,96.80
2008-10-13,104.55,110.53,101.02,110.26,54967000,110.26


## Demo 2: Sort the columns

In [12]:
df

Unnamed: 0,Capital,Population,Monarch,Area
Spain,Madrid,46733038,Felipe VI,505990
Portugal,Lisbon,10295909,,92212
Norway,Oslo,5391369,Harald V,385207
Italy,Roma,60390560,,301340
Greece,Athens,10718565,,131957
Germany,Berlin,83122889,,357022
France,Paris,67076000,,640679
Belgium,Brussels,11449656,Philippe,30688


Sort the columns (in alphabetical order):

In [15]:
df = df.sort_index(axis=1)

In [14]:
df

Unnamed: 0,Area,Capital,Monarch,Population
Spain,505990,Madrid,Felipe VI,46733038
Portugal,92212,Lisbon,,10295909
Norway,385207,Oslo,Harald V,5391369
Italy,301340,Roma,,60390560
Greece,131957,Athens,,10718565
Germany,357022,Berlin,,83122889
France,640679,Paris,,67076000
Belgium,30688,Brussels,Philippe,11449656


## Exercise 2

In [16]:
apple.head()

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Adj Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1984-09-07,26.5,26.87,26.25,26.5,2981600,3.02
1984-09-10,26.5,26.62,25.87,26.37,2346400,3.01
1984-09-11,26.62,27.37,26.62,26.87,5444000,3.07
1984-09-12,26.87,27.0,26.12,26.12,4773600,2.98
1984-09-13,27.5,27.62,27.5,27.5,7429600,3.14


Sort the columns of the `apple` DataFrame (by **alphabetical order**):

In [17]:
apple.sort_index(axis=1)

Unnamed: 0_level_0,Adj Close,Close,High,Low,Open,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1984-09-07,3.02,26.50,26.87,26.25,26.50,2981600
1984-09-10,3.01,26.37,26.62,25.87,26.50,2346400
1984-09-11,3.07,26.87,27.37,26.62,26.62,5444000
1984-09-12,2.98,26.12,27.00,26.12,26.87,4773600
1984-09-13,3.14,27.50,27.62,27.50,27.50,7429600
...,...,...,...,...,...,...
2008-10-08,89.79,89.79,96.33,85.68,85.91,78847900
2008-10-09,88.74,88.74,95.80,86.60,93.35,57763700
2008-10-10,96.80,96.80,100.00,85.00,85.70,79260700
2008-10-13,110.26,110.26,110.53,101.02,104.55,54967000


## Demo 3: Sort by the values in a column

In [18]:
df

Unnamed: 0,Area,Capital,Monarch,Population
Spain,505990,Madrid,Felipe VI,46733038
Portugal,92212,Lisbon,,10295909
Norway,385207,Oslo,Harald V,5391369
Italy,301340,Roma,,60390560
Greece,131957,Athens,,10718565
Germany,357022,Berlin,,83122889
France,640679,Paris,,67076000
Belgium,30688,Brussels,Philippe,11449656


Sort based on the values of the "Population" columns (in increasing order):

In [19]:
df = df.sort_values("Population")

In [20]:
df

Unnamed: 0,Area,Capital,Monarch,Population
Norway,385207,Oslo,Harald V,5391369
Portugal,92212,Lisbon,,10295909
Greece,131957,Athens,,10718565
Belgium,30688,Brussels,Philippe,11449656
Spain,505990,Madrid,Felipe VI,46733038
Italy,301340,Roma,,60390560
France,640679,Paris,,67076000
Germany,357022,Berlin,,83122889


Sort based on the values of the "Area" columns (in decreasing order):

In [21]:
df = df.sort_values("Area", ascending=False)

In [22]:
df

Unnamed: 0,Area,Capital,Monarch,Population
France,640679,Paris,,67076000
Spain,505990,Madrid,Felipe VI,46733038
Norway,385207,Oslo,Harald V,5391369
Germany,357022,Berlin,,83122889
Italy,301340,Roma,,60390560
Greece,131957,Athens,,10718565
Portugal,92212,Lisbon,,10295909
Belgium,30688,Brussels,Philippe,11449656


Sort based on the values of the "Monarch" columns:

In [23]:
df = df.sort_values("Monarch")

In [24]:
df

Unnamed: 0,Area,Capital,Monarch,Population
Spain,505990,Madrid,Felipe VI,46733038
Norway,385207,Oslo,Harald V,5391369
Belgium,30688,Brussels,Philippe,11449656
France,640679,Paris,,67076000
Germany,357022,Berlin,,83122889
Italy,301340,Roma,,60390560
Greece,131957,Athens,,10718565
Portugal,92212,Lisbon,,10295909


Sort based on the values of the "Monarch" columns, with the missing values coming at the beginning:

In [25]:
df = df.sort_values("Monarch", na_position="first")

In [26]:
df

Unnamed: 0,Area,Capital,Monarch,Population
France,640679,Paris,,67076000
Germany,357022,Berlin,,83122889
Italy,301340,Roma,,60390560
Greece,131957,Athens,,10718565
Portugal,92212,Lisbon,,10295909
Spain,505990,Madrid,Felipe VI,46733038
Norway,385207,Oslo,Harald V,5391369
Belgium,30688,Brussels,Philippe,11449656


## Exercise 3

In [27]:
apple.head()

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Adj Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1984-09-07,26.5,26.87,26.25,26.5,2981600,3.02
1984-09-10,26.5,26.62,25.87,26.37,2346400,3.01
1984-09-11,26.62,27.37,26.62,26.87,5444000,3.07
1984-09-12,26.87,27.0,26.12,26.12,4773600,2.98
1984-09-13,27.5,27.62,27.5,27.5,7429600,3.14


Sort the `apple` DataFrame based on the values of the "Open" column (in increasing order):

In [28]:
apple.sort_values("Open")

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Adj Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1997-07-10,12.88,13.38,12.75,13.25,17606400,3.31
2003-04-16,12.99,13.67,12.92,13.24,36292000,6.62
1997-12-24,13.00,13.25,13.00,13.13,3502000,3.28
1997-12-30,13.00,13.44,12.75,13.19,12250800,3.30
1997-12-26,13.06,13.38,13.00,13.31,3860000,3.33
...,...,...,...,...,...,...
2007-12-27,198.95,202.96,197.80,198.57,28411700,198.57
2007-12-26,199.01,200.96,196.82,198.95,25133300,198.95
2008-01-02,199.27,200.26,192.55,194.84,38542100,194.84
2007-12-31,199.50,200.50,197.75,198.08,19261900,198.08


Sort the `apple` DataFrame based on the values of the "Volume" column (in decreasing order):

In [29]:
apple.sort_values("Volume",ascending=False)

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Adj Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2000-09-29,28.19,29.00,25.37,25.75,265069000,12.88
1997-08-06,25.25,27.75,25.00,26.31,149671200,6.58
1997-08-07,28.75,29.56,28.37,29.19,134124400,7.30
2008-01-23,136.19,140.00,126.14,139.07,120463200,139.07
1999-09-21,73.19,73.25,69.00,69.25,119931200,17.31
...,...,...,...,...,...,...
1984-11-02,25.00,25.12,24.75,24.87,1004800,2.84
1985-10-02,15.75,15.88,15.63,15.63,795200,1.78
1991-10-02,51.75,51.75,49.50,49.75,643600,11.85
1985-09-27,15.88,16.00,15.88,15.88,250400,1.81
