<a href="https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2FData-Dunkers%2Factivities&branch=main&subPath=data-sorting.ipynb&depth=1" target="_blank"><img src="https://raw.githubusercontent.com/Data-Dunkers/lessons/main/images/open-in-callysto-button.svg?sanitize=true" width="123" height="24" alt="Open in Callysto"></a>
<a href="https://colab.research.google.com/github/Data-Dunkers/activities/blob/main/data-sorting.ipynb" target="_blank"><img src="https://raw.githubusercontent.com/Data-Dunkers/lessons/main/images/open-in-colab-button.svg?sanitize=true" width="123" height="24" alt="Open in Colab"></a>

# Data Dunkers Activity Notebook: Sorting Our Data

The corresponding lesson notebook for this activity notebook can be found [here](https://github.com/pbeens/Data-Dunkers/blob/main/Demos/sorting-data.ipynb).

# Objectives

Students will be able to:

- Learn how to sort data in a DataFrame based on specific columns.
- Understand the impact of sorting data in ascending vs. descending order.
- Apply multiple sorting criteria to organize data more effectively.


# Setup / Import / Input

In [None]:
import pandas as pd

# URL of the CSV file containing data for Pascal Siakam
url = 'https://raw.githubusercontent.com/pbeens/Data-Dunkers/main/Data/Pascal_Siakam.csv'

# Read the CSV file into a pandas DataFrame
# The pandas read_csv function is used to load the CSV data from the provided URL into a DataFrame.
df = pd.read_csv(url)

# Select specific columns from the DataFrame
# We are interested in the following columns: 'SEASON_ID' (Season identifier),
# 'TEAM_ABBREVIATION' (Team name abbreviation), 'GP' (Games played),
# 'GS' (Games started), 'BLK' (Blocks), 'STL' (Steals), 'MIN' (Minutes played),
# 'FGM' (Field goals made), and 'FGA' (Field goals attempted).
df = df[['SEASON_ID', 'TEAM_ABBREVIATION', 'GP', 'GS', 'BLK', 'STL', 'MIN', 'FGM', 'FGA']]

# Display the DataFrame
# This will show the filtered DataFrame in a tabular format, including only the selected columns.
display(df)


# Process

In [None]:
# Drop the row with index 9 from the DataFrame
# This operation removes the row at index 9. Indexing in pandas is zero-based,
# so this removes the 10th row from the DataFrame.
df = df.drop(9)

# Filter for data related to the Raptors only
# The filter is applied to select rows where 'SEASON_ID' is less than or equal to '2022-23'.
filter = df['SEASON_ID'] <= '2022-23'
df = df[filter]

# Display the filtered DataFrame
# This shows the DataFrame after removing the specified row and applying the filter for season data.
display(df)

## Example Sort - One Column

In [None]:
# Sort the DataFrame by the 'STL' (Steals) column
# This will reorder the DataFrame rows based on the values in the 'STL' column in ascending order.
# If you wanted descending order, you could add the argument ascending=False.
df_sorted = df.sort_values('STL')

# Display the sorted DataFrame
# This will show the DataFrame with rows sorted by the number of steals.
display(df_sorted)

## Example Sort - Multiple Columns

In [None]:
# Sort the DataFrame by the 'BLK' (Blocks) and 'STL' (Steals) columns
# The DataFrame is first sorted by 'BLK' in ascending order, and within each 'BLK' value,
# it is further sorted by 'STL' in ascending order. This creates a multi-level sort.
df_sorted = df.sort_values(['BLK', 'STL'])

# Display the sorted DataFrame
# This shows the DataFrame with rows sorted first by blocks and then by steals.
display(df_sorted)

# Exercise

Modify the program below to only display the columns 'SEASON_ID', 'FG_PCT_', 'FG2_PCT', and 'FG3_PCT' sorted by 'FG_PCT'.

In [None]:
import pandas as pd

url = 'https://raw.githubusercontent.com/pbeens/Data-Dunkers/main/Data/Pascal_Siakam.csv'

df = pd.read_csv(url)

# Add your code here

display(df)

# Extra Challenge

Produce this graph using the code stub below. You can view the raw data [here](https://raw.githubusercontent.com/pbeens/Data-Dunkers/main/Data/raptors-2023.csv). You can view the Bar Graph lesson [here](https://github.com/pbeens/Data-Dunkers/blob/main/Demos/bar-graphs.ipynb).

![raptors-2023-top-5-points.png](https://raw.githubusercontent.com/pbeens/Data-Dunkers/main/Images/raptors-2023-top-5-points.png)

In [None]:
import pandas as pd
import plotly.express as px

url = 'https://raw.githubusercontent.com/pbeens/Data-Dunkers/main/Data/raptors-2023.csv'

# put the rest of the code here!