
# üèÄ Week 3 Tutorial: Python for Basketball Data Manipulation with Pandas
Welcome to **Week 3** of the *Chicago Bulls Sports Science Python Course*!  
In this tutorial, you'll learn how to:
- Create and inspect basketball performance datasets  
- Clean, filter, and group data using **pandas**
- Visualize player workload and session trends  


In [None]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Set random seed for reproducibility
np.random.seed(23)

# Create synthetic Chicago Bulls player data
players = ["Zach LaVine", "DeMar DeRozan", "Nikola Vucevic", "Coby White", "Patrick Williams"]
positions = ["Guard", "Forward", "Center", "Guard", "Forward"]
dates = pd.date_range(start="2025-01-01", periods=14, freq="3D")

data = []
for player, pos in zip(players, positions):
    for date in dates:
        session_type = np.random.choice(["Practice", "Game", "Recovery"], p=[0.5, 0.3, 0.2])
        workload = np.random.randint(300, 700)
        heart_rate = np.random.randint(110, 185)
        jump_count = np.random.randint(40, 120)
        sprint_distance = np.random.randint(800, 2500)
        rpe = np.random.randint(4, 10)
        data.append([player, pos, date, session_type, workload, heart_rate, jump_count, sprint_distance, rpe])

df = pd.DataFrame(data, columns=[
    "Player", "Position", "Date", "Session_Type", "Workload_AU",
    "Avg_HeartRate_bpm", "Jump_Count", "Sprint_Distance_m", "RPE"
])

df.head()



### Step 1: Inspect the dataset
Let's look at dataset structure, summary statistics, and missing values.


In [None]:

print("Data info:")
print(df.info())
print("\nSummary statistics:")
print(df.describe())



### Step 2: Filtering and selection
Now let‚Äôs filter for **Games only** and see workload averages per player.


In [None]:

games_df = df[df["Session_Type"] == "Game"]
avg_workload_games = games_df.groupby("Player")["Workload_AU"].mean().sort_values(ascending=False)
print(avg_workload_games)



### Step 3: Visualizing workloads
We‚Äôll create a simple bar plot to compare game workloads.


In [None]:

plt.figure(figsize=(8,5))
avg_workload_games.plot(kind='bar', color='crimson')
plt.title("Average Game Workload by Player - Chicago Bulls")
plt.ylabel("Workload (AU)")
plt.xlabel("Player")
plt.show()



### Step 4: Grouping and aggregation example
Now let‚Äôs calculate the **average workload by session type** for the entire team.


In [None]:

avg_by_session = df.groupby("Session_Type")[["Workload_AU", "RPE"]].mean().round(1)
avg_by_session



‚úÖ **Summary**
- You learned to generate and inspect basketball data in Pandas  
- You filtered data for specific contexts (e.g., Games)  
- You created grouped summaries and simple plots  
