# 🏀 How Did NBA Player Playstyle Change Over the Years?
**Date:** 2025-09-14

**Authors:** <YOUR NAME(S)>

In this notebook we analyze how NBA player playstyle evolved using advanced player statistics. We focus on usage rate (USG%), shot efficiency (TS%, eFG%), playmaking and rebounding shares, and impact metrics (BPM, VORP). We combine EDA (time trends & position differences), unsupervised clustering of playstyles, and a small supervised task.

## 1. Load Data
We start from a precompiled table of **advanced per-season player stats** (one row per player-season).

In [None]:
import pandas as pd, numpy as np
import matplotlib.pyplot as plt

PATH = 'Advanced.csv'  # update if needed
df = pd.read_csv(PATH)
df.head()

In [None]:
# First, import pandas and create or load your DataFrame
import pandas as pd

# Option 1: Load data from a file
# df = pd.read_csv('your_data_file.csv')  # Uncomment and modify this line to load your data

# Option 2: Create sample data for demonstration
# Creating a sample DataFrame with the required columns
df = pd.DataFrame({
    'season': [1985, 1990, 1995, 2000, 2005],
    'player': ['Player1', 'Player2', 'Player3', 'Player4', 'Player5'],
    'player_id': [1, 2, 3, 4, 5],
    'age': [25, 27, 29, 24, 26],
    'team': ['TeamA', 'TeamB', 'TeamC', 'TeamD', 'TeamE'],
    'pos': ['G', 'F', 'C', 'G', 'F'],
    'g': [80, 75, 82, 70, 78],
    'gs': [70, 65, 80, 60, 75],
    'mp': [2500, 2300, 2800, 1800, 2600],
    'ts_percent': [0.55, 0.58, 0.52, 0.56, 0.59],
    'efg_percent': [0.51, 0.53, 0.48, 0.52, 0.54],
    'x3p_ar': [0.3, 0.35, 0.25, 0.4, 0.38],
    'f_tr': [0.2, 0.25, 0.3, 0.22, 0.28],
    'orb_percent': [5, 4, 8, 3, 6],
    'drb_percent': [12, 15, 20, 10, 14],
    'trb_percent': [8, 10, 15, 7, 9],
    'ast_percent': [15, 12, 8, 20, 10],
    'stl_percent': [2, 1.5, 1, 2.5, 1.8],
    'blk_percent': [1, 2, 4, 0.5, 1.5],
    'tov_percent': [10, 8, 12, 9, 11],
    'usg_percent': [25, 22, 20, 28, 24],
    'obpm': [2.5, 1.8, 0.5, 3.0, 2.0],
    'dbpm': [1.0, 1.5, 3.0, 0.0, 1.2],
    'bpm': [3.5, 3.3, 3.5, 3.0, 3.2],
    'vorp': [4.0, 3.8, 3.5, 3.2, 3.7]
})

# Basic filtering
df = df.copy()
df = df[df['season'] >= 1990]
df = df[df['mp'].fillna(0) >= 300]  # keep players with >=300 minutes in a season

core_feats = ['ts_percent','efg_percent','x3p_ar','f_tr','orb_percent','drb_percent','trb_percent',
              'ast_percent','stl_percent','blk_percent','tov_percent','usg_percent','obpm','dbpm','bpm','vorp']

basic_cols = ['season','player','player_id','age','team','pos','g','gs','mp']

df_core = df[basic_cols + core_feats].dropna()
df_core.head()

### Quick data check

In [None]:
df.shape, df.isnull().sum().sort_values(ascending=False).head(10)

## 2. Cleaning & Preparation
- Keep seasons in a reasonable window (e.g., from 1990 onward) to match modern era.
- Remove rows with too few minutes (`mp`) to reduce noise.
- Select core features for analysis.

In [None]:
# Basic filtering
df = df.copy()
df = df[df['season'] >= 1990]
df = df[df['mp'].fillna(0) >= 300]  # keep players with >=300 minutes in a season

core_feats = ['ts_percent','efg_percent','x3p_ar','f_tr','orb_percent','drb_percent','trb_percent',
              'ast_percent','stl_percent','blk_percent','tov_percent','usg_percent','obpm','dbpm','bpm','vorp']

basic_cols = ['season','player','player_id','age','team','pos','g','gs','mp']

df_core = df[basic_cols + core_feats].dropna()
df_core.head()

## 3. EDA: Time Trends
We examine how key indicators evolved across seasons.
- **USG%** (usage) – offensive load trend.
- **TS% / eFG%** – shooting efficiency.
- **AST% / TRB%** – playmaking / rebounding shares.

In [None]:
season_agg = df_core.groupby('season').agg({
    'usg_percent':'mean', 'ts_percent':'mean', 'efg_percent':'mean',
    'ast_percent':'mean', 'trb_percent':'mean'
}).reset_index()
season_agg.head()

In [None]:
plt.figure(figsize=(10,5))
plt.plot(season_agg['season'], season_agg['usg_percent'])
plt.xlabel('Season'); plt.ylabel('USG% (mean)')
plt.title('Usage Rate (USG%) by Season')
plt.show()

In [None]:
# Import matplotlib.pyplot library first
import matplotlib.pyplot as plt

# Now the plotting code will work
plt.figure(figsize=(10,5))
plt.plot(season_agg['season'], season_agg['usg_percent'])
plt.xlabel('Season'); plt.ylabel('USG% (mean)')
plt.title('Usage Rate (USG%) by Season')
plt.show()

> **Interpretation:** A rising USG% suggests more concentrated offensive load on primary creators; a flat/decline suggests spreading touches.

In [None]:
plt.figure(figsize=(10,5))
plt.plot(season_agg['season'], season_agg['ts_percent'])
plt.xlabel('Season'); plt.ylabel('TS% (mean)')
plt.title('True Shooting% by Season')
plt.show()

plt.figure(figsize=(10,5))
plt.plot(season_agg['season'], season_agg['efg_percent'])
plt.xlabel('Season'); plt.ylabel('eFG% (mean)')
plt.title('eFG% by Season')
plt.show()

> **Interpretation:** Increases in TS%/eFG% align with modern spacing and 3-point emphasis.

In [None]:
plt.figure(figsize=(10,5))
plt.plot(season_agg['season'], season_agg['ast_percent'])
plt.xlabel('Season'); plt.ylabel('AST% (mean)')
plt.title('Assist% by Season')
plt.show()

plt.figure(figsize=(10,5))
plt.plot(season_agg['season'], season_agg['trb_percent'])
plt.xlabel('Season'); plt.ylabel('TRB% (mean)')
plt.title('Rebound% by Season')
plt.show()

> **Interpretation:** Changes in AST% may reflect more ball movement/pick-and-roll reliance; TRB% can shift with pace and shot profile.

## 4. EDA: Position Differences
We compare positions (PG/SG/SF/PF/C) across eras. For compactness, we collapse detailed labels if needed.

In [None]:
def canonical_pos(p):
    if pd.isna(p):
        return np.nan
    p = str(p).upper()
    for k in ['PG','SG','SF','PF','C']:
        if k in p:
            return k
    return p

df_core['pos_simple'] = df_core['pos'].apply(canonical_pos)
era = pd.cut(df_core['season'], bins=[1989,1999,2009,2019,2100], labels=['1990s','2000s','2010s','2020s'])
era_pos = df_core.assign(era=era)

pos_agg = era_pos.groupby(['era','pos_simple']).agg({
    'usg_percent':'mean','ts_percent':'mean','ast_percent':'mean','trb_percent':'mean'
}).reset_index()
pos_agg.head()

In [None]:
for m in ['usg_percent','ts_percent','ast_percent','trb_percent']:
    pivot = pos_agg.pivot(index='pos_simple', columns='era', values=m)
    pivot = pivot.loc[['PG','SG','SF','PF','C']].dropna(how='all')
    pivot.plot(kind='bar', figsize=(10,5))
    plt.title(f'{m} by Position and Era')
    plt.ylabel(m)
    plt.show()

In [None]:
for m in ['usg_percent','ts_percent','ast_percent','trb_percent']:
    pivot = pos_agg.pivot(index='pos_simple', columns='era', values=m)
    
    # Check which positions actually exist in the index
    available_positions = [pos for pos in ['PG','SG','SF','PF','C'] if pos in pivot.index]
    
    # Only select positions that exist in the pivot table
    if available_positions:  # Make sure there's at least one position available
        pivot = pivot.loc[available_positions].dropna(how='all')
        pivot.plot(kind='bar', figsize=(10,5))
        plt.title(f'{m} by Position and Era')
        plt.ylabel(m)
        plt.show()
    else:
        print(f"No matching positions found for {m}. Available positions: {pivot.index.tolist()}")

> **Interpretation:** Expect rising TS% across positions in modern eras, PGs with higher AST%, wings with higher 3P profile, bigs evolving to higher spacing (x3p_ar).

## 5. Unsupervised: Clustering Player-Seasons by Playstyle
We cluster player-seasons using a subset of features, then visualize with PCA.

In [None]:
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans

features = ['ts_percent','efg_percent','x3p_ar','f_tr','ast_percent','trb_percent','tov_percent','usg_percent']
X = df_core[features].astype(float)
X_scaled = StandardScaler().fit_transform(X)

pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

kmeans = KMeans(n_clusters=4, random_state=42)
labels = kmeans.fit_predict(X_scaled)

plt.figure(figsize=(8,6))
plt.scatter(X_pca[:,0], X_pca[:,1], c=labels)
plt.xlabel('PC1'); plt.ylabel('PC2'); plt.title('PCA of Playstyle (colored by KMeans clusters)')
plt.show()

df_core['cluster'] = labels
df_core[['season','player','pos','pos_simple','cluster']].head()

> **Interpretation:** Clusters often align with guards/wings/bigs and shooting profiles (3P oriented vs rim/FT oriented).

## 6. Supervised: Predict "Star" Status from Playstyle
We define **Star = top 20% VORP** across the dataset and train simple models to predict this class from playstyle features.

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, ConfusionMatrixDisplay
from sklearn.ensemble import RandomForestClassifier
from sklearn.inspection import permutation_importance

thr = df_core['vorp'].quantile(0.80)
df_core['star'] = (df_core['vorp'] >= thr).astype(int)

X = df_core[features]
y = df_core['star']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

# Logistic Regression baseline
logreg = LogisticRegression(max_iter=1000)
logreg.fit(X_train, y_train)
y_pred = logreg.predict(X_test)
print('Logistic Regression:')
print(classification_report(y_test, y_pred))

# Random Forest
rf = RandomForestClassifier(n_estimators=300, random_state=42)
rf.fit(X_train, y_train)
y_pred_rf = rf.predict(X_test)
print('Random Forest:')
print(classification_report(y_test, y_pred_rf))

# Permutation importance (RF)
r = permutation_importance(rf, X_test, y_test, n_repeats=10, random_state=42)
imp = pd.Series(r.importances_mean, index=X.columns).sort_values(ascending=False)
imp.plot(kind='bar', figsize=(10,5))
plt.title('Permutation Importance (Random Forest)')
plt.show()

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, ConfusionMatrixDisplay
from sklearn.ensemble import RandomForestClassifier
from sklearn.inspection import permutation_importance

# Adjust the threshold to ensure more balanced classes
# You might need to experiment with different values
thr = df_core['vorp'].quantile(0.80)
df_core['star'] = (df_core['vorp'] >= thr).astype(int)

# Check class distribution before splitting
print("Class distribution:", df_core['star'].value_counts())

X = df_core[features]
y = df_core['star']

# Remove stratify if the minority class is too small
# Option 1: Don't use stratification
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Option 2 (alternative): Adjust threshold to get more balanced classes
# thr = df_core['vorp'].quantile(0.75)  # Try a different threshold
# df_core['star'] = (df_core['vorp'] >= thr).astype(int)
# X = df_core[features]
# y = df_core['star']
# X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

# Logistic Regression baseline
logreg = LogisticRegression(max_iter=1000)
logreg.fit(X_train, y_train)
y_pred = logreg.predict(X_test)
print('Logistic Regression:')
print(classification_report(y_test, y_pred))

# Random Forest
rf = RandomForestClassifier(n_estimators=300, random_state=42)
rf.fit(X_train, y_train)
y_pred_rf = rf.predict(X_test)
print('Random Forest:')
print(classification_report(y_test, y_pred_rf))

# Permutation importance (RF)
r = permutation_importance(rf, X_test, y_test, n_repeats=10, random_state=42)
imp = pd.Series(r.importances_mean, index=X.columns).sort_values(ascending=False)
imp.plot(kind='bar', figsize=(10,5))
plt.title('Permutation Importance (Random Forest)')
plt.show()

> **Interpretation:** Feature importance helps explain which aspects of playstyle (usage, efficiency, playmaking, etc.) best distinguish star-level impact.

## 7. Conclusion & Next Steps
- **Trends**: summarize observed shifts (e.g., rising TS%/eFG%, changes in USG%).
- **Positions**: how roles evolved across eras.
- **Clustering**: distinct playstyle groups emerged.
- **Prediction**: which features best predict star status.

**Next**: add team-level context, era-aware targets (percentiles per season), or more sophisticated explainability (SHAP).