![Data Dunkers Banner](https://github.com/Data-Dunkers/lessons/blob/main/images/top-banner.jpg?raw=true)

<a href="https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2FData-Dunkers%2Factivities&branch=main&subPath=data-sorting.ipynb&depth=1" target="_blank"><img src="https://raw.githubusercontent.com/Data-Dunkers/lessons/main/images/open-in-callysto-button.svg?sanitize=true" width="123" height="24" alt="Open in Callysto"></a>
<a href="https://colab.research.google.com/github/Data-Dunkers/activities/blob/main/data-sorting.ipynb" target="_blank"><img src="https://raw.githubusercontent.com/Data-Dunkers/lessons/main/images/open-in-colab-button.svg?sanitize=true" width="123" height="24" alt="Open in Colab"></a>

# Data Dunkers Activity Notebook: Sorting Our Data

The corresponding lesson notebook for this activity notebook can be found [here](https://github.com/pbeens/Data-Dunkers/blob/main/Demos/sorting-data.ipynb).

# Objectives

Students will be able to:

- Sort data in a DataFrame based on specific columns.
- Understand the impact of sorting data in ascending vs. descending order.
- Apply multiple sorting criteria to organize data more effectively.


# Import / Input

The following code will import the [pandas](https://pandas.pydata.org/) code library and import player statistics for the Toronto Raptors 2023 season.

Then we will select specific columns from the DataFrame. We are interested in We are interested in:
* 'SEASON_ID' (Season identifier)
* 'TEAM_ABBREVIATION' (Team name abbreviation)
* 'GP' (Games played)
* 'GS' (Games started)
* 'BLK' (Blocks)
* 'STL' (Steals)
* 'MIN' (Minutes played)
* 'FGM' (Field goals made)
* 'FGA' (Field goals attempted)

In [None]:
import pandas as pd

url = 'https://raw.githubusercontent.com/pbeens/Data-Dunkers/main/Data/Pascal_Siakam.csv'
df = pd.read_csv(url)

df = df[['SEASON_ID', 'TEAM_ABBREVIATION', 'GP', 'GS', 'BLK', 'STL', 'MIN', 'FGM', 'FGA']]
df

# Process

We will first drop the row with index 9 from the DataFrame. Indexing in pandas starts from zero, so this removes the 10th row from the DataFrame.

In [None]:
df = df.drop(9)
df

Next we will apply a filter to select only rows where 'SEASON_ID' is less than or equal to '2022-23'.

In [None]:
filter = df['SEASON_ID'] <= '2022-23'
df = df[filter]
df

## Example Sort - One Column

`.sort_values('STL')` will reorder the DataFrame rows based on the values in the 'STL' column.

The default is in ascending order, if we want descending order we add `, ascending=False`.

In [None]:
df_sorted = df.sort_values('STL')
df_sorted

## Example Sort - Multiple Columns

We can sort the values by multiple columns, for example by 'BLK' in ascending order, and within each 'BLK' value it is further sorted by 'STL' in ascending order.

In [None]:
df_sorted = df.sort_values(['BLK', 'STL'])
df_sorted

# Exercise

Add code to the cell below to select the columns 'SEASON_ID', 'FG_PCT_', 'FG2_PCT', and 'FG3_PCT'. Display the data sorted by 'FG_PCT'.

In [None]:
import pandas as pd
url = 'https://raw.githubusercontent.com/pbeens/Data-Dunkers/main/Data/Pascal_Siakam.csv'
df = pd.read_csv(url)



# Extra Challenge

Create this graph using the code cell below. You can view the Bar Graph lesson [here](https://github.com/Data-Dunkers/lessons/blob/main/graphing-bar-graphs.ipynb).

![raptors-2023-top-5-points.png](https://raw.githubusercontent.com/Data-Dunkers/activities/main/images/raptors-2023-top-5-points.png)

In [None]:
import pandas as pd
import plotly.express as px

url = 'https://raw.githubusercontent.com/Data-Dunkers/data/main/NBA/raptors-2023.csv'

