![Data Dunkers Banner](https://github.com/Data-Dunkers/lessons/blob/main/images/top-banner.jpg?raw=true)

<a href="https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fdata-dunkers%2Flessons&branch=main&subPath=data-sorting.ipynb&depth=1" target="_parent"><img src="https://raw.githubusercontent.com/Data-Dunkers/lessons/main/images/open-in-callysto-button.svg?sanitize=true" width="123" height="24" alt="Open in Callysto"/></a>
<a href="https://colab.research.google.com/github/data-dunkers/lessons/blob/main/data-sorting.ipynb" target="_parent"><img src="https://raw.githubusercontent.com/Data-Dunkers/lessons/main/images/open-in-colab-button.svg?sanitize=true" width="123" height="24" alt="Open in Colab"/></a>

# Data Dunkers Lesson: Sorting Data

The corresponding Activity Notebook for this Lesson Notebook can be found [here](https://github.com/Data-Dunkers/activities/blob/main/data-sorting.ipynb).

## Objectives

By the end of this lesson, students will be able to:
- Sort data in a DataFrame based on specific columns.
    - Example: Sort Pascal Siakam's statistics by the number of points scored to identify his highest-scoring seasons.
- Understand the impact of sorting data in ascending vs. descending order.
    - Example: Sort the field goal percentages in descending order to highlight the best shooting performances first.
- Apply multiple sorting criteria to organize data more effectively. 
    - Example: Sort the dataset by field goal %.

## Import / Input

In [11]:
import pandas as pd

# URL of the CSV file containing data for Pascal Siakam
url = 'https://raw.githubusercontent.com/pbeens/Data-Dunkers/main/Data/Pascal_Siakam.csv'

# Read the CSV file into a pandas DataFrame
df = pd.read_csv(url)

# Display the DataFrame
display(df)

Unnamed: 0,PLAYER_ID,SEASON_ID,LEAGUE_ID,TEAM_ID,TEAM_ABBREVIATION,PLAYER_AGE,GP,GS,MIN,FGM,...,REB,AST,STL,BLK,TOV,PF,PTS,FG2M,FG2A,FG2_PCT
0,1627783,2016-17,0,1610612761,TOR,23.0,55,38,859.0,103,...,185,17,26,45,33,109,229,102,198,0.515
1,1627783,2017-18,0,1610612761,TOR,24.0,81,5,1679.0,253,...,364,159,62,42,67,166,589,224,366,0.612
2,1627783,2018-19,0,1610612761,TOR,25.0,80,79,2548.0,519,...,549,248,73,52,154,241,1354,440,731,0.602
3,1627783,2019-20,0,1610612761,TOR,26.0,60,60,2110.0,500,...,439,207,61,53,148,170,1371,369,739,0.499
4,1627783,2020-21,0,1610612761,TOR,27.0,56,56,2006.0,437,...,405,250,64,37,130,174,1196,364,715,0.509
5,1627783,2021-22,0,1610612761,TOR,28.0,68,68,2578.0,596,...,580,360,85,42,181,225,1551,521,989,0.527
6,1627783,2022-23,0,1610612761,TOR,29.0,71,71,2652.0,630,...,556,415,65,36,169,228,1720,537,1026,0.523
7,1627783,2023-24,0,1610612761,TOR,29.0,39,39,1354.0,325,...,246,190,32,10,83,87,865,279,478,0.584
8,1627783,2023-24,0,1610612754,IND,29.0,24,24,786.0,203,...,168,102,17,8,34,62,494,175,297,0.589
9,1627783,2023-24,0,0,TOT,29.0,63,63,2140.0,528,...,414,292,49,18,117,149,1359,454,775,0.586


You can view the raw CSV file [here](https://raw.githubusercontent.com/Data-Dunkers/data/main/NBA/Pascal_Siakam.csv).

As a reminder, here are our columns:

| Field Name | Definition | Field Name | Definition |
|---|---|---|---|
| AST | The total number of assists a player has made. | FTM | The total number of free throws the player has made. |
| BLK | The total number of opponent shots a player has deflected or prevented. | GP | The number of games in which the player has appeared. |
| DREB | The total number of rebounds a player has grabbed on the defensive end. | GS | The number of games in which the player was in the starting lineup. |
| FG_PCT | The percentage of field goal attempts that are successful. | MIN | The total number of minutes the player has played. |
| FG2_PCT | The percentage of two-point field goal attempts that are successful. | OREB | The total number of rebounds a player has grabbed on the offensive end. |
| FG2A | The total number of two-point field goal attempts by the player. | PF | The total number of personal fouls committed by the player. |
| FG2M | The total number of two-point field goals a player has made. | PLAYER_AGE | The age of the player. |
| FG3_PCT | The percentage of three-point field goal attempts that are successful. | PTS | The total number of points a player has scored. |
| FG3A | The total number of three-point field goal attempts by the player. | REB | The total number of rebounds (offensive + defensive) a player has collected. |
| FG3M | The total number of three-point field goals a player has made. | SEASON_ID | The identifier for the basketball season. |
| FGA | The total number of field goal attempts by the player. | STL | The total number of times a player has successfully taken the ball away from an opponent. |
| FGM | The total number of field goals a player has made. | TEAM_ABBREVIATION | The abbreviated name of the team. |
| FT_PCT | The percentage of free throw attempts that are successful. | TEAM_ID | A unique identifier for the team. |
| FTA | The total number of free throw attempts by the player. | TOV | The total number of times a player loses possession of the ball. | 

## Dropping a Line by Index Number

Notice we still have that last line, that has "TOT" for the Team Abbreviation? Let's drop that row (row 9).

In [12]:
display(df.drop(9))

Unnamed: 0,PLAYER_ID,SEASON_ID,LEAGUE_ID,TEAM_ID,TEAM_ABBREVIATION,PLAYER_AGE,GP,GS,MIN,FGM,...,REB,AST,STL,BLK,TOV,PF,PTS,FG2M,FG2A,FG2_PCT
0,1627783,2016-17,0,1610612761,TOR,23.0,55,38,859.0,103,...,185,17,26,45,33,109,229,102,198,0.515
1,1627783,2017-18,0,1610612761,TOR,24.0,81,5,1679.0,253,...,364,159,62,42,67,166,589,224,366,0.612
2,1627783,2018-19,0,1610612761,TOR,25.0,80,79,2548.0,519,...,549,248,73,52,154,241,1354,440,731,0.602
3,1627783,2019-20,0,1610612761,TOR,26.0,60,60,2110.0,500,...,439,207,61,53,148,170,1371,369,739,0.499
4,1627783,2020-21,0,1610612761,TOR,27.0,56,56,2006.0,437,...,405,250,64,37,130,174,1196,364,715,0.509
5,1627783,2021-22,0,1610612761,TOR,28.0,68,68,2578.0,596,...,580,360,85,42,181,225,1551,521,989,0.527
6,1627783,2022-23,0,1610612761,TOR,29.0,71,71,2652.0,630,...,556,415,65,36,169,228,1720,537,1026,0.523
7,1627783,2023-24,0,1610612761,TOR,29.0,39,39,1354.0,325,...,246,190,32,10,83,87,865,279,478,0.584
8,1627783,2023-24,0,1610612754,IND,29.0,24,24,786.0,203,...,168,102,17,8,34,62,494,175,297,0.589


What happens if we change the "9" to another number? Try it!

Let's display the dataframe again, to see if the change is permanent.

In [13]:
display(df)

Unnamed: 0,PLAYER_ID,SEASON_ID,LEAGUE_ID,TEAM_ID,TEAM_ABBREVIATION,PLAYER_AGE,GP,GS,MIN,FGM,...,REB,AST,STL,BLK,TOV,PF,PTS,FG2M,FG2A,FG2_PCT
0,1627783,2016-17,0,1610612761,TOR,23.0,55,38,859.0,103,...,185,17,26,45,33,109,229,102,198,0.515
1,1627783,2017-18,0,1610612761,TOR,24.0,81,5,1679.0,253,...,364,159,62,42,67,166,589,224,366,0.612
2,1627783,2018-19,0,1610612761,TOR,25.0,80,79,2548.0,519,...,549,248,73,52,154,241,1354,440,731,0.602
3,1627783,2019-20,0,1610612761,TOR,26.0,60,60,2110.0,500,...,439,207,61,53,148,170,1371,369,739,0.499
4,1627783,2020-21,0,1610612761,TOR,27.0,56,56,2006.0,437,...,405,250,64,37,130,174,1196,364,715,0.509
5,1627783,2021-22,0,1610612761,TOR,28.0,68,68,2578.0,596,...,580,360,85,42,181,225,1551,521,989,0.527
6,1627783,2022-23,0,1610612761,TOR,29.0,71,71,2652.0,630,...,556,415,65,36,169,228,1720,537,1026,0.523
7,1627783,2023-24,0,1610612761,TOR,29.0,39,39,1354.0,325,...,246,190,32,10,83,87,865,279,478,0.584
8,1627783,2023-24,0,1610612754,IND,29.0,24,24,786.0,203,...,168,102,17,8,34,62,494,175,297,0.589
9,1627783,2023-24,0,0,TOT,29.0,63,63,2140.0,528,...,414,292,49,18,117,149,1359,454,775,0.586


It's not. The Careers line is still there. 

We can make this permanent by asigning it back into the DataFrame variable.

In [14]:
df = df.drop(9)

display(df)

Unnamed: 0,PLAYER_ID,SEASON_ID,LEAGUE_ID,TEAM_ID,TEAM_ABBREVIATION,PLAYER_AGE,GP,GS,MIN,FGM,...,REB,AST,STL,BLK,TOV,PF,PTS,FG2M,FG2A,FG2_PCT
0,1627783,2016-17,0,1610612761,TOR,23.0,55,38,859.0,103,...,185,17,26,45,33,109,229,102,198,0.515
1,1627783,2017-18,0,1610612761,TOR,24.0,81,5,1679.0,253,...,364,159,62,42,67,166,589,224,366,0.612
2,1627783,2018-19,0,1610612761,TOR,25.0,80,79,2548.0,519,...,549,248,73,52,154,241,1354,440,731,0.602
3,1627783,2019-20,0,1610612761,TOR,26.0,60,60,2110.0,500,...,439,207,61,53,148,170,1371,369,739,0.499
4,1627783,2020-21,0,1610612761,TOR,27.0,56,56,2006.0,437,...,405,250,64,37,130,174,1196,364,715,0.509
5,1627783,2021-22,0,1610612761,TOR,28.0,68,68,2578.0,596,...,580,360,85,42,181,225,1551,521,989,0.527
6,1627783,2022-23,0,1610612761,TOR,29.0,71,71,2652.0,630,...,556,415,65,36,169,228,1720,537,1026,0.523
7,1627783,2023-24,0,1610612761,TOR,29.0,39,39,1354.0,325,...,246,190,32,10,83,87,865,279,478,0.584
8,1627783,2023-24,0,1610612754,IND,29.0,24,24,786.0,203,...,168,102,17,8,34,62,494,175,297,0.589


## Sorting

What if we want to sort the data by steals (STL)?

In [15]:
display(df.sort_values('STL'))

Unnamed: 0,PLAYER_ID,SEASON_ID,LEAGUE_ID,TEAM_ID,TEAM_ABBREVIATION,PLAYER_AGE,GP,GS,MIN,FGM,...,REB,AST,STL,BLK,TOV,PF,PTS,FG2M,FG2A,FG2_PCT
8,1627783,2023-24,0,1610612754,IND,29.0,24,24,786.0,203,...,168,102,17,8,34,62,494,175,297,0.589
0,1627783,2016-17,0,1610612761,TOR,23.0,55,38,859.0,103,...,185,17,26,45,33,109,229,102,198,0.515
7,1627783,2023-24,0,1610612761,TOR,29.0,39,39,1354.0,325,...,246,190,32,10,83,87,865,279,478,0.584
3,1627783,2019-20,0,1610612761,TOR,26.0,60,60,2110.0,500,...,439,207,61,53,148,170,1371,369,739,0.499
1,1627783,2017-18,0,1610612761,TOR,24.0,81,5,1679.0,253,...,364,159,62,42,67,166,589,224,366,0.612
4,1627783,2020-21,0,1610612761,TOR,27.0,56,56,2006.0,437,...,405,250,64,37,130,174,1196,364,715,0.509
6,1627783,2022-23,0,1610612761,TOR,29.0,71,71,2652.0,630,...,556,415,65,36,169,228,1720,537,1026,0.523
2,1627783,2018-19,0,1610612761,TOR,25.0,80,79,2548.0,519,...,549,248,73,52,154,241,1354,440,731,0.602
5,1627783,2021-22,0,1610612761,TOR,28.0,68,68,2578.0,596,...,580,360,85,42,181,225,1551,521,989,0.527


What if we want them descending instead of ascending? Simply add `ascending=False` to the arguments in `sort_values()`. (The default, if you don't tell it otherwise, is `ascending=True`)

In [16]:
display(df.sort_values('STL', ascending=False))

Unnamed: 0,PLAYER_ID,SEASON_ID,LEAGUE_ID,TEAM_ID,TEAM_ABBREVIATION,PLAYER_AGE,GP,GS,MIN,FGM,...,REB,AST,STL,BLK,TOV,PF,PTS,FG2M,FG2A,FG2_PCT
5,1627783,2021-22,0,1610612761,TOR,28.0,68,68,2578.0,596,...,580,360,85,42,181,225,1551,521,989,0.527
2,1627783,2018-19,0,1610612761,TOR,25.0,80,79,2548.0,519,...,549,248,73,52,154,241,1354,440,731,0.602
6,1627783,2022-23,0,1610612761,TOR,29.0,71,71,2652.0,630,...,556,415,65,36,169,228,1720,537,1026,0.523
4,1627783,2020-21,0,1610612761,TOR,27.0,56,56,2006.0,437,...,405,250,64,37,130,174,1196,364,715,0.509
1,1627783,2017-18,0,1610612761,TOR,24.0,81,5,1679.0,253,...,364,159,62,42,67,166,589,224,366,0.612
3,1627783,2019-20,0,1610612761,TOR,26.0,60,60,2110.0,500,...,439,207,61,53,148,170,1371,369,739,0.499
7,1627783,2023-24,0,1610612761,TOR,29.0,39,39,1354.0,325,...,246,190,32,10,83,87,865,279,478,0.584
0,1627783,2016-17,0,1610612761,TOR,23.0,55,38,859.0,103,...,185,17,26,45,33,109,229,102,198,0.515
8,1627783,2023-24,0,1610612754,IND,29.0,24,24,786.0,203,...,168,102,17,8,34,62,494,175,297,0.589


Let's reduce the columns we're looking at and save it in a new dataframe named `df2`. 

If we were to continue working with just these columns we then work with `df2` from now on.

In [17]:
columns = ['SEASON_ID', 'TEAM_ABBREVIATION', 'GP', 'GS', 'BLK', 'STL', 'MIN', 'FGM', 'FGA']
df_2 = df[columns]
display(df_2)

Unnamed: 0,SEASON_ID,TEAM_ABBREVIATION,GP,GS,BLK,STL,MIN,FGM,FGA
0,2016-17,TOR,55,38,45,26,859.0,103,205
1,2017-18,TOR,81,5,42,62,1679.0,253,498
2,2018-19,TOR,80,79,52,73,2548.0,519,945
3,2019-20,TOR,60,60,53,61,2110.0,500,1104
4,2020-21,TOR,56,56,37,64,2006.0,437,961
5,2021-22,TOR,68,68,42,85,2578.0,596,1207
6,2022-23,TOR,71,71,36,65,2652.0,630,1313
7,2023-24,TOR,39,39,10,32,1354.0,325,623
8,2023-24,IND,24,24,8,17,786.0,203,370


We'll work with `df2` from now on.

Let's sort on two columns now, for example first by Blocks Per Game (BLK) and then by Steals Per Game (STL). Notice that we must put the column names in a list (`[ ]`).

In [18]:
display(df_2.sort_values(['BLK', 'STL']))

Unnamed: 0,SEASON_ID,TEAM_ABBREVIATION,GP,GS,BLK,STL,MIN,FGM,FGA
8,2023-24,IND,24,24,8,17,786.0,203,370
7,2023-24,TOR,39,39,10,32,1354.0,325,623
6,2022-23,TOR,71,71,36,65,2652.0,630,1313
4,2020-21,TOR,56,56,37,64,2006.0,437,961
1,2017-18,TOR,81,5,42,62,1679.0,253,498
5,2021-22,TOR,68,68,42,85,2578.0,596,1207
0,2016-17,TOR,55,38,45,26,859.0,103,205
2,2018-19,TOR,80,79,52,73,2548.0,519,945
3,2019-20,TOR,60,60,53,61,2110.0,500,1104


Can you confirm that it sorted correctly? 

## Exercise

Add code to the cell below to select the columns 'SEASON_ID', 'FG_PCT_', 'FG2_PCT', and 'FG3_PCT'. Display the data sorted by 'FG_PCT'.

In [19]:
import pandas as pd

url = 'https://raw.githubusercontent.com/Data-Dunkers/data/main/NBA/Pascal_Siakam.csv'

df = pd.read_csv(url)

# Add your code here


## Extra Challenge

Create this graph using the code cell below. You can view the Bar Graph lesson [here](https://github.com/Data-Dunkers/lessons/blob/main/graphing-bar-graphs.ipynb).

![raptors-2023-top-5-points.png](https://raw.githubusercontent.com/Data-Dunkers/activities/main/images/raptors-2023-top-5-points.png)

In [21]:
import pandas as pd
import plotly_express as px

url = 'https://raw.githubusercontent.com/Data-Dunkers/data/main/NBA/raptors-2023.csv'

# put the rest of the code here!

---
Back to [Lessons](https://github.com/Data-Dunkers/lessons/blob/main/lessons.ipynb)

---