# Data Dunkers Activity Notebook: Sorting Our Data

The corresponding lesson notebook for this activity notebook can be found [here](https://github.com/pbeens/Data-Dunkers/blob/main/Demos/sorting-data.ipynb).

# Objectives

Students will be able to:

- Learn how to sort data in a DataFrame based on specific columns.
- Understand the impact of sorting data in ascending vs. descending order.
- Apply multiple sorting criteria to organize data more effectively.


# Setup / Import / Input

In [None]:
import pandas as pd

# URL of the CSV file containing data for Pascal Siakam
url = 'https://raw.githubusercontent.com/pbeens/Data-Dunkers/main/Data/Pascal_Siakam.csv'

# Read the CSV file into a pandas DataFrame
# The pandas read_csv function is used to load the CSV data from the provided URL into a DataFrame.
df = pd.read_csv(url)

# Select specific columns from the DataFrame
# We are interested in the following columns: 'SEASON_ID' (Season identifier),
# 'TEAM_ABBREVIATION' (Team name abbreviation), 'GP' (Games played),
# 'GS' (Games started), 'BLK' (Blocks), 'STL' (Steals), 'MIN' (Minutes played),
# 'FGM' (Field goals made), and 'FGA' (Field goals attempted).
df = df[['SEASON_ID', 'TEAM_ABBREVIATION', 'GP', 'GS', 'BLK', 'STL', 'MIN', 'FGM', 'FGA']]

# Display the DataFrame
# This will show the filtered DataFrame in a tabular format, including only the selected columns.
display(df)


Unnamed: 0,SEASON_ID,TEAM_ABBREVIATION,GP,GS,BLK,STL,MIN,FGM,FGA
0,2016-17,TOR,55,38,45,26,859.0,103,205
1,2017-18,TOR,81,5,42,62,1679.0,253,498
2,2018-19,TOR,80,79,52,73,2548.0,519,945
3,2019-20,TOR,60,60,53,61,2110.0,500,1104
4,2020-21,TOR,56,56,37,64,2006.0,437,961
5,2021-22,TOR,68,68,42,85,2578.0,596,1207
6,2022-23,TOR,71,71,36,65,2652.0,630,1313
7,2023-24,TOR,39,39,10,32,1354.0,325,623
8,2023-24,IND,24,24,8,17,786.0,203,370
9,2023-24,TOT,63,63,18,49,2140.0,528,993


# Process

In [None]:
# Drop the row with index 9 from the DataFrame
# This operation removes the row at index 9. Indexing in pandas is zero-based,
# so this removes the 10th row from the DataFrame.
df = df.drop(9)

# Filter for data related to the Raptors only
# The filter is applied to select rows where 'SEASON_ID' is less than or equal to '2022-23'.
filter = df['SEASON_ID'] <= '2022-23'
df = df[filter]

# Display the filtered DataFrame
# This shows the DataFrame after removing the specified row and applying the filter for season data.
display(df)



Unnamed: 0,SEASON_ID,TEAM_ABBREVIATION,GP,GS,BLK,STL,MIN,FGM,FGA
0,2016-17,TOR,55,38,45,26,859.0,103,205
1,2017-18,TOR,81,5,42,62,1679.0,253,498
2,2018-19,TOR,80,79,52,73,2548.0,519,945
3,2019-20,TOR,60,60,53,61,2110.0,500,1104
4,2020-21,TOR,56,56,37,64,2006.0,437,961
5,2021-22,TOR,68,68,42,85,2578.0,596,1207
6,2022-23,TOR,71,71,36,65,2652.0,630,1313


## Example Sort - One Column

In [None]:
# Sort the DataFrame by the 'STL' (Steals) column
# This will reorder the DataFrame rows based on the values in the 'STL' column in ascending order.
# If you wanted descending order, you could add the argument ascending=False.
df_sorted = df.sort_values('STL')

# Display the sorted DataFrame
# This will show the DataFrame with rows sorted by the number of steals.
display(df_sorted)

Unnamed: 0,SEASON_ID,TEAM_ABBREVIATION,GP,GS,BLK,STL,MIN,FGM,FGA
0,2016-17,TOR,55,38,45,26,859.0,103,205
3,2019-20,TOR,60,60,53,61,2110.0,500,1104
1,2017-18,TOR,81,5,42,62,1679.0,253,498
4,2020-21,TOR,56,56,37,64,2006.0,437,961
6,2022-23,TOR,71,71,36,65,2652.0,630,1313
2,2018-19,TOR,80,79,52,73,2548.0,519,945
5,2021-22,TOR,68,68,42,85,2578.0,596,1207


## Example Sort - Multiple Columns

In [None]:
# Sort the DataFrame by the 'BLK' (Blocks) and 'STL' (Steals) columns
# The DataFrame is first sorted by 'BLK' in ascending order, and within each 'BLK' value,
# it is further sorted by 'STL' in ascending order. This creates a multi-level sort.
df_sorted = df.sort_values(['BLK', 'STL'])

# Display the sorted DataFrame
# This shows the DataFrame with rows sorted first by blocks and then by steals.
display(df_sorted)


Unnamed: 0,SEASON_ID,TEAM_ABBREVIATION,GP,GS,BLK,STL,MIN,FGM,FGA
6,2022-23,TOR,71,71,36,65,2652.0,630,1313
4,2020-21,TOR,56,56,37,64,2006.0,437,961
1,2017-18,TOR,81,5,42,62,1679.0,253,498
5,2021-22,TOR,68,68,42,85,2578.0,596,1207
0,2016-17,TOR,55,38,45,26,859.0,103,205
2,2018-19,TOR,80,79,52,73,2548.0,519,945
3,2019-20,TOR,60,60,53,61,2110.0,500,1104


# Exercise

Modify the program below to only display the columns 'SEASON_ID', 'FG_PCT_', 'FG2_PCT', and 'FG3_PCT' sorted by 'FG_PCT'.

In [None]:
import pandas as pd

url = 'https://raw.githubusercontent.com/pbeens/Data-Dunkers/main/Data/Pascal_Siakam.csv'

df = pd.read_csv(url)

# Add your code here

display(df)

Unnamed: 0,PLAYER_ID,SEASON_ID,LEAGUE_ID,TEAM_ID,TEAM_ABBREVIATION,PLAYER_AGE,GP,GS,MIN,FGM,...,REB,AST,STL,BLK,TOV,PF,PTS,FG2M,FG2A,FG2_PCT
0,1627783,2016-17,0,1610612761,TOR,23.0,55,38,859.0,103,...,185,17,26,45,33,109,229,102,198,0.515
1,1627783,2017-18,0,1610612761,TOR,24.0,81,5,1679.0,253,...,364,159,62,42,67,166,589,224,366,0.612
2,1627783,2018-19,0,1610612761,TOR,25.0,80,79,2548.0,519,...,549,248,73,52,154,241,1354,440,731,0.602
3,1627783,2019-20,0,1610612761,TOR,26.0,60,60,2110.0,500,...,439,207,61,53,148,170,1371,369,739,0.499
4,1627783,2020-21,0,1610612761,TOR,27.0,56,56,2006.0,437,...,405,250,64,37,130,174,1196,364,715,0.509
5,1627783,2021-22,0,1610612761,TOR,28.0,68,68,2578.0,596,...,580,360,85,42,181,225,1551,521,989,0.527
6,1627783,2022-23,0,1610612761,TOR,29.0,71,71,2652.0,630,...,556,415,65,36,169,228,1720,537,1026,0.523
7,1627783,2023-24,0,1610612761,TOR,29.0,39,39,1354.0,325,...,246,190,32,10,83,87,865,279,478,0.584
8,1627783,2023-24,0,1610612754,IND,29.0,24,24,786.0,203,...,168,102,17,8,34,62,494,175,297,0.589
9,1627783,2023-24,0,0,TOT,29.0,63,63,2140.0,528,...,414,292,49,18,117,149,1359,454,775,0.586


# Extra Challenge

Produce this graph using the code stub below. You can view the raw data [here](https://raw.githubusercontent.com/pbeens/Data-Dunkers/main/Data/raptors-2023.csv). You can view the Bar Graph lesson [here](https://github.com/pbeens/Data-Dunkers/blob/main/Demos/bar-graphs.ipynb).

![raptors-2023-top-5-points.png](https://raw.githubusercontent.com/pbeens/Data-Dunkers/main/Images/raptors-2023-top-5-points.png)

In [None]:
import pandas as pd
import plotly.express as px

url = 'https://raw.githubusercontent.com/pbeens/Data-Dunkers/main/Data/raptors-2023.csv'

# put the rest of the code here!