# **Lecture 9B**
# **Sorting DataFrame**



In this part, we are going to see how we can sort the rows in a DataFrame by the values in one or more columns.

Before you start, you need to run the 2 cells below to connect to Google Drive and import pandas module.

In [1]:
# Run the code below to access files in your Google Drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
# We also need Panadas module in this lecture
# Import Pandas module
import pandas as pd

---
**Example 1:** Sorting a DataFrame by one of the column. It is a very common task to sort a DataFrame. We will start by the most simple case of sorting the data by one column.<br>

The syntax for sorting a DataFrame df is **df.sort_values(by=*column_name*, ascending=*True/False*, ignore_index=*True/False*)**.
* **df** is the DataFrame that you want to sort.
* The option **by=*column_name*** specifies the name of the column for determining the sort order.
* The option **ascending=*True/False*** allows us to choose if the sort is in ascending order (***True***) or descending order (***False***). Default is True.
* The option **ignore_index=*True/False*** allows us to choose if the row index will be reset after the sorting. If True, the row indices will be reset to start from zero after sorting. If False, the row indices will not be reset, i.e. you can use the same index to access the same record before and after sorting. Default is False.


In [None]:
# Read inventory.xlsx data file
inventory = pd.read_excel("/content/drive/MyDrive/Data/inventory.xlsx",sheet_name="data")

# display the first couple records
print("Before sort:")
display(inventory)

# sort the data by unit_price in ascending order and reset the index after sort
# Try modifying the 3 arguments and see how it affect the sorting!
inventory = inventory.sort_values(by="unit_price",ascending=True,ignore_index=True)
print("After sort:")
display(inventory)

Before sort:


Unnamed: 0,product_code,product_name,origin,unit_price,quantity
0,A111,ABC Tomato Soup,Japan,12,52
1,B223,Tasty Lucheon Meat,China,25,60
2,A112,ABC Mushroom Soup,Japan,13,34
3,B201,Tasty Corn Beef,China,16,50
4,C204,Star Chocolate,USA,20,100
5,A342,ABC Chicken Soup,Japan,13,61
6,D871,Jacky Cola,Thailand,4,4
7,B201,Tasty Tuna,China,14,86
8,C491,Star Jello,USA,18,67
9,D481,Jacky Ginger Beer,Thailand,15,13


After sort:


Unnamed: 0,product_code,product_name,origin,unit_price,quantity
6,D871,Jacky Cola,Thailand,4,4
0,A111,ABC Tomato Soup,Japan,12,52
2,A112,ABC Mushroom Soup,Japan,13,34
5,A342,ABC Chicken Soup,Japan,13,61
7,B201,Tasty Tuna,China,14,86
9,D481,Jacky Ginger Beer,Thailand,15,13
3,B201,Tasty Corn Beef,China,16,50
8,C491,Star Jello,USA,18,67
4,C204,Star Chocolate,USA,20,100
1,B223,Tasty Lucheon Meat,China,25,60


---
**Exampe 2:** Sorting by multiple columns. In some situations, sorting will involve more than one column. E.g. you wan to sort the data by ***origin*** and then ***unit_price*** within each category of the ***origin***. The syntax will become **df.sort_values(by=*list_of_columns*, ascending=*list_of_True/False*, ignore_index=*True/False*)**.
* The **by=** option will specify a list of column names. The function will first sort the DataFrame by the first column specified and then sorting the second specified column within each category of the first column. If you have more than 2 columns, it will be done similarly.
* The **ascending=** option will be a list of **True/False** values. These True/False values indicate the sort order for each column involved. This list should have the same length as the list in **by=** option.
* The use of **ignore_index=** option is the same as before.


In [None]:
# Read inventory.xlsx data file
inventory = pd.read_excel("/content/drive/MyDrive/Data/inventory.xlsx",sheet_name="data")

# display the first couple records
print("Before sort:")
display(inventory)

# First sort origin in ascending order
# then sort unit_price in descending order within each category of origin
inventory = inventory.sort_values(by=["product_code"],ascending=[True],ignore_index=False)
print("After sort:")
display(inventory)

Before sort:


Unnamed: 0,product_code,product_name,origin,unit_price,quantity
0,A111,ABC Tomato Soup,Japan,12,52
1,B223,Tasty Lucheon Meat,China,25,60
2,A112,ABC Mushroom Soup,Japan,13,34
3,B201,Tasty Corn Beef,China,16,50
4,C204,Star Chocolate,USA,20,100
5,A342,ABC Chicken Soup,Japan,13,61
6,D871,Jacky Cola,Thailand,4,4
7,B201,Tasty Tuna,China,14,86
8,C491,Star Jello,USA,18,67
9,D481,Jacky Ginger Beer,Thailand,15,13


After sort:


Unnamed: 0,product_code,product_name,origin,unit_price,quantity
0,A111,ABC Tomato Soup,Japan,12,52
2,A112,ABC Mushroom Soup,Japan,13,34
5,A342,ABC Chicken Soup,Japan,13,61
3,B201,Tasty Corn Beef,China,16,50
7,B201,Tasty Tuna,China,14,86
1,B223,Tasty Lucheon Meat,China,25,60
4,C204,Star Chocolate,USA,20,100
8,C491,Star Jello,USA,18,67
9,D481,Jacky Ginger Beer,Thailand,15,13
6,D871,Jacky Cola,Thailand,4,4
