# Pandas DataFrame Sorting

## Introduction
Sorting a DataFrame is extremely easy thanks to two main methods available: <code>.sort_index()</code> and <code>.sort_values()</code>. The only important thing to keep in mind is the concept of immutability and inplace modifications that we have discussed in previous labs.

By default, any sort operation will return a new dataframe with the results of the sort operations applied. As we've said several times, this is the preferred method: immutability is good.

Now it's turn: turn on lab and move to the next section!

In [1]:
import pandas as pd 

In [2]:
# Lists of data
data = {'Revenue': [274515,200734,182527,181945,143015,129184,92224,85965,84893,
                    82345,77867,73620,69864,63191],
        'Employees': [147000,267937,135301,878429,163000,197000,158000,58604,
                      109700,350864,110600,364800,85858,243540],
        'Sector': ['Consumer Electronics','Consumer Electronics','Software Services',
                   'Chip Manufacturing','Software Services','Consumer Electronics',
                   'Consumer Electronics','Software Services','Consumer Electronics',
                   'Consumer Electronics','Chip Manufacturing','Software Services',
                   'Software Services','Consumer Electronics'],
        'Founding Date':['01-04-1976','13-01-1969','04-09-1998','20-02-1974',
                         '04-04-1975','15-09-1987','01-02-1984','04-02-2004',
                         '07-04-1946','01-01-1910','18-07-1968','16-06-1911',
                         '11-11-1998','07-03-1918'],
        'Country':['USA','South Korea','USA','Taiwan','USA','China','USA','USA',
                   'Japan','Japan','USA','USA','China','Japan']} 
index = ['Apple','Samsung','Alphabet','Foxconn','Microsoft','Huawei',
         'Dell Technologies','Meta','Sony','Hitachi','Intel','IBM',
         'Tencent','Panasonic']

In [3]:
df = pd.DataFrame(data, index=index)
df["Revenue per Employee"] = df["Revenue"] / df["Employees"]

In [4]:
df

Unnamed: 0,Revenue,Employees,Sector,Founding Date,Country,Revenue per Employee
Apple,274515,147000,Consumer Electronics,01-04-1976,USA,1.867449
Samsung,200734,267937,Consumer Electronics,13-01-1969,South Korea,0.749184
Alphabet,182527,135301,Software Services,04-09-1998,USA,1.349044
Foxconn,181945,878429,Chip Manufacturing,20-02-1974,Taiwan,0.207125
Microsoft,143015,163000,Software Services,04-04-1975,USA,0.877393
Huawei,129184,197000,Consumer Electronics,15-09-1987,China,0.655756
Dell Technologies,92224,158000,Consumer Electronics,01-02-1984,USA,0.583696
Meta,85965,58604,Software Services,04-02-2004,USA,1.466879
Sony,84893,109700,Consumer Electronics,07-04-1946,Japan,0.773865
Hitachi,82345,350864,Consumer Electronics,01-01-1910,Japan,0.234692


## Sorting Values
To sort dataframes by a given column's values, we use the <code>.sort_values()</code> method. Let's explore the default behavior:

In [5]:
df.sort_values('Employees')

Unnamed: 0,Revenue,Employees,Sector,Founding Date,Country,Revenue per Employee
Meta,85965,58604,Software Services,04-02-2004,USA,1.466879
Tencent,69864,85858,Software Services,11-11-1998,China,0.813716
Sony,84893,109700,Consumer Electronics,07-04-1946,Japan,0.773865
Intel,77867,110600,Chip Manufacturing,18-07-1968,USA,0.704042
Alphabet,182527,135301,Software Services,04-09-1998,USA,1.349044
Apple,274515,147000,Consumer Electronics,01-04-1976,USA,1.867449
Dell Technologies,92224,158000,Consumer Electronics,01-02-1984,USA,0.583696
Microsoft,143015,163000,Software Services,04-04-1975,USA,0.877393
Huawei,129184,197000,Consumer Electronics,15-09-1987,China,0.655756
Panasonic,63191,243540,Consumer Electronics,07-03-1918,Japan,0.259469



<code>df.sort_values('Employees')</code>
In its default form, <code>.sort_values()</code> takes the column name to use as sorting and returns a NEW dataframe sorted in ascending order by that column.

Instead, if we want to sort by descending order, we must pass the ascending= argument as False:

<code>df.sort_values('Employees', ascending=False)</code>

In [6]:
df.sort_values('Employees', ascending=False)

Unnamed: 0,Revenue,Employees,Sector,Founding Date,Country,Revenue per Employee
Foxconn,181945,878429,Chip Manufacturing,20-02-1974,Taiwan,0.207125
IBM,73620,364800,Software Services,16-06-1911,USA,0.201809
Hitachi,82345,350864,Consumer Electronics,01-01-1910,Japan,0.234692
Samsung,200734,267937,Consumer Electronics,13-01-1969,South Korea,0.749184
Panasonic,63191,243540,Consumer Electronics,07-03-1918,Japan,0.259469
Huawei,129184,197000,Consumer Electronics,15-09-1987,China,0.655756
Microsoft,143015,163000,Software Services,04-04-1975,USA,0.877393
Dell Technologies,92224,158000,Consumer Electronics,01-02-1984,USA,0.583696
Apple,274515,147000,Consumer Electronics,01-04-1976,USA,1.867449
Alphabet,182527,135301,Software Services,04-09-1998,USA,1.349044


### Sorting by Multiple Columns
We can pass multiple columns as sorting parameter, and any "tie" would be broken by the second column. This is the full form of the <code>.sort_values()</code> method:

<code>df.sort_values(by=['Country', 'Employees'], ascending=[False, True])</code>


In [7]:
df.sort_values(by=['Country', 'Employees'], ascending=[False, True])

Unnamed: 0,Revenue,Employees,Sector,Founding Date,Country,Revenue per Employee
Meta,85965,58604,Software Services,04-02-2004,USA,1.466879
Intel,77867,110600,Chip Manufacturing,18-07-1968,USA,0.704042
Alphabet,182527,135301,Software Services,04-09-1998,USA,1.349044
Apple,274515,147000,Consumer Electronics,01-04-1976,USA,1.867449
Dell Technologies,92224,158000,Consumer Electronics,01-02-1984,USA,0.583696
Microsoft,143015,163000,Software Services,04-04-1975,USA,0.877393
IBM,73620,364800,Software Services,16-06-1911,USA,0.201809
Foxconn,181945,878429,Chip Manufacturing,20-02-1974,Taiwan,0.207125
Samsung,200734,267937,Consumer Electronics,13-01-1969,South Korea,0.749184
Sony,84893,109700,Consumer Electronics,07-04-1946,Japan,0.773865


The parameter by specifies the columns, and ascending takes a list to define the sorting direction per each column. In this case, we're sorting by Country name in descending order first (in lexicographical order), and by number of Employees in ascending order second.

### Immutability and the inplace parameter
As mentioned, these operations are immutable by default: the method returns a new dataframe with the changes applied. If we want to apply the sorting directly to the underlying dataframe, we must pass the inplace=True parameter.

<code>df.sort_values(by=['Employees'], inplace=True)</code>


In [8]:
df.sort_values(by=['Employees'], inplace=True)

In [9]:
df.head()

Unnamed: 0,Revenue,Employees,Sector,Founding Date,Country,Revenue per Employee
Meta,85965,58604,Software Services,04-02-2004,USA,1.466879
Tencent,69864,85858,Software Services,11-11-1998,China,0.813716
Sony,84893,109700,Consumer Electronics,07-04-1946,Japan,0.773865
Intel,77867,110600,Chip Manufacturing,18-07-1968,USA,0.704042
Alphabet,182527,135301,Software Services,04-09-1998,USA,1.349044


As we've repeated several times, mutating the dataframe is not recommended, it's better to have different copies of the different operations.

## Activities

### 1. Sort the dataframe by Revenue in descending order

Sort the <code>df</code> dataframe by Revenue in descending order. This should be an immutable operation and you should store your results in the variable <code>df_rev_desc</code>.

In [10]:
df_rev_desc = df.sort_values(by=['Revenue'], ascending=False)

### 2. Sort by Sector (asc) and Revenue (desc)

Sort the <code>df</code> dataframe by Sector in ascending order and by Revenue in descending order. This should be an immutable operation and you should store your results in the variable <code>df_sect_rev</code>.

In [None]:
df_sect_rev = df.sort_values(['Sector', 'Revenue'], ascending=[True, False])

## Sorting by the index
Sorting a DataFrame by its index is simple, we use the method <code>sort_index</code> which takes the same <code>ascending</code> and <code>inplace</code> parameters as <code>.sort_values</code>.

Example:
<code>df.sort_index(ascending=False)</code>


In [11]:
df.sort_index(ascending=False)

Unnamed: 0,Revenue,Employees,Sector,Founding Date,Country,Revenue per Employee
Tencent,69864,85858,Software Services,11-11-1998,China,0.813716
Sony,84893,109700,Consumer Electronics,07-04-1946,Japan,0.773865
Samsung,200734,267937,Consumer Electronics,13-01-1969,South Korea,0.749184
Panasonic,63191,243540,Consumer Electronics,07-03-1918,Japan,0.259469
Microsoft,143015,163000,Software Services,04-04-1975,USA,0.877393
Meta,85965,58604,Software Services,04-02-2004,USA,1.466879
Intel,77867,110600,Chip Manufacturing,18-07-1968,USA,0.704042
IBM,73620,364800,Software Services,16-06-1911,USA,0.201809
Huawei,129184,197000,Consumer Electronics,15-09-1987,China,0.655756
Hitachi,82345,350864,Consumer Electronics,01-01-1910,Japan,0.234692


In [12]:
df.index

Index(['Meta', 'Tencent', 'Sony', 'Intel', 'Alphabet', 'Apple',
       'Dell Technologies', 'Microsoft', 'Huawei', 'Panasonic', 'Samsung',
       'Hitachi', 'IBM', 'Foxconn'],
      dtype='object')

The key is understanding what type of index you're dealing with. A string-based index, like the one in the examples, will be sorted in lexicographical order. While a datetime index, for example, will be sorted "temporally".