# Correlation Heatmaps

## Trevor Rowland, Scott Campbell :: 2-4-2025

This notebook serves as a repository of functions for creating correlation heatmaps, used to identify relationships between features in our datasets. This notebook will use the `players` dataset, containing aggregated player stats for all NBA players from 2004-2024.

## 1. Importing Packages and Data

In [9]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.read_pickle('/Users/dB/Documents/repos/github/bint-capstone/data-sources/nba/nba_player_stats_2004_2024.pkl')
df.head()


Unnamed: 0,player_id,player_name,nickname,team_id,team_abbreviation,age,gp,w,l,w_pct,...,blka_rank,pf_rank,pfd_rank,pts_rank,plus_minus_rank,nba_fantasy_pts_rank,dd2_rank,td3_rank,wnba_fantasy_pts_rank,season
0,243,Aaron McKie,Aaron,1610612755,PHI,32.0,68,35,33,0.515,...,146,215,81,332,157,274,223,14,285,2004-05
1,1425,Aaron Williams,Aaron,1610612761,TOR,33.0,42,13,29,0.31,...,105,142,81,376,400,377,223,14,377,2004-05
2,1502,Adonal Foyle,Adonal,1610612744,GSW,30.0,78,31,47,0.397,...,232,386,10,247,256,141,129,14,168,2004-05
3,1559,Adrian Griffin,Adrian,1610612741,CHI,30.0,69,38,31,0.551,...,129,153,81,334,119,308,223,14,313,2004-05
4,1733,Al Harrington,Al,1610612737,ATL,25.0,66,10,56,0.152,...,446,445,81,53,463,65,59,14,64,2004-05


In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

def create_correlation_heatmap(df):
    """
    Creates dummy variables for categorical columns and generates a correlation heatmap.
    
    Parameters:
        df (pd.DataFrame): Input DataFrame containing both numerical and categorical variables
        
    Returns:
        tuple: (correlation_matrix, fig, ax) - The correlation matrix and plot objects
    """
    # Create a copy to avoid modifying original DataFrame
    df_copy = df.copy()
    
    # Get categorical columns
    categorical_cols = df_copy.select_dtypes(include=['object', 'category']).columns
    
    # Create dummy variables for categorical columns
    df_dummy = pd.get_dummies(df_copy, columns=categorical_cols, drop_first=True)
    
    # Calculate correlation matrix
    corr_matrix = df_dummy.corr()
    
    # Set up the plot
    plt.figure(figsize=(15, 12))
    
    # Create mask for upper triangle
    mask = np.triu(np.ones_like(corr_matrix, dtype=bool))
    
    # Create heatmap
    sns.heatmap(
        corr_matrix,
        annot=True,           # Show correlation values
        fmt='.2f',            # Format to 2 decimal places
        cmap='coolwarm',      # Red-blue diverging colormap
        square=True,          # Make cells square
        mask=mask,            # Hide upper triangle
        center=0             # Center the colormap at 0
    )
    
    plt.title('Correlation Heatmap', pad=20, fontsize=14)
    plt.xticks(rotation=45, ha='right')
    plt.tight_layout()
    
    return corr_matrix, plt.gcf(), plt.gca()

corr_matrix, fig, ax = create_correlation_heatmap(df)

KeyboardInterrupt: 