# NBA Player Performance Dynamics: Network Construction and Analysis

This notebook applies network analysis to understand teammate interactions in the NBA. We'll build teammate networks, calculate network metrics, and identify synergy pairs and player clusters.

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os
import sys
from datetime import datetime
import networkx as nx
from sklearn.preprocessing import StandardScaler

# Add the project root to the path so we can import our modules
sys.path.append('..')

# Import our modules
from src.network_analysis import (
    build_teammate_network,
    calculate_network_metrics,
    identify_synergy_pairs
)
from src.visualization import create_network_visualization
from src.utils import setup_plotting_style

# Set up plotting style
setup_plotting_style()

# Display settings
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)

## Load Processed Data

Let's load the processed data from previous notebooks.

In [None]:
# Load the processed data
try:
    player_dynamics = pd.read_csv('../data/processed/player_dynamics.csv')
    player_team_fit = pd.read_csv('../data/processed/player_team_fit.csv')
    player_temporal_df = pd.read_csv('../data/processed/player_temporal.csv')
    games_processed = pd.read_csv('../data/processed/games_processed.csv')
    
    # Convert date strings to datetime objects
    player_temporal_df['GAME_DATE'] = pd.to_datetime(player_temporal_df['GAME_DATE'])
    games_processed['GAME_DATE'] = pd.to_datetime(games_processed['GAME_DATE'])
    
    print(f"Loaded player dynamics data with {len(player_dynamics)} players")
    print(f"Loaded player team fit data with {len(player_team_fit)} players")
    print(f"Loaded player temporal data with {len(player_temporal_df)} records")
    print(f"Loaded processed games data with {len(games_processed)} records")
except FileNotFoundError:
    print("Processed data not found. Please run the previous notebooks first.")

In [None]:
# Examine the player dynamics data
player_dynamics.head()

## Teammate Network Construction

Let's build a network of teammate interactions based on game data. We'll create two types of networks:

1. **Teammate Frequency Network**: Captures how often players have played together
2. **Influence Network**: Captures how players' performances correlate when they play together

In [None]:
# Identify teammates for each game
# Group player game data by game and team
teammates_by_game = {}

for game_id in player_temporal_df['Game_ID'].unique():
    game_data = player_temporal_df[player_temporal_df['Game_ID'] == game_id]
    
    # Group by team
    for team_id in game_data['Team_ID'].unique():
        team_players = game_data[game_data['Team_ID'] == team_id]
        
        # Get player IDs
        player_ids = team_players['Player_ID'].tolist()
        
        # Add to teammates dictionary
        if game_id not in teammates_by_game:
            teammates_by_game[game_id] = {}
        
        teammates_by_game[game_id][team_id] = player_ids

# Count number of games with teammate data
print(f"Identified teammates for {len(teammates_by_game)} games")

In [None]:
# Build teammate frequency matrix
player_ids = player_dynamics['player_id'].unique()
n_players = len(player_ids)

# Create a dictionary to map player IDs to indices
player_to_idx = {player_id: i for i, player_id in enumerate(player_ids)}
idx_to_player = {i: player_id for i, player_id in enumerate(player_ids)}

# Initialize teammate frequency matrix
teammate_freq = np.zeros((n_players, n_players))

# Fill the matrix
for game_id, teams in teammates_by_game.items():
    for team_id, players in teams.items():
        # For each pair of teammates
        for i, player1 in enumerate(players):
            if player1 in player_to_idx:  # Check if player is in our player list
                idx1 = player_to_idx[player1]
                for player2 in players[i+1:]:
                    if player2 in player_to_idx:  # Check if player is in our player list
                        idx2 = player_to_idx[player2]
                        # Increment frequency for both directions
                        teammate_freq[idx1, idx2] += 1
                        teammate_freq[idx2, idx1] += 1

# Create a dataframe with player names for easier interpretation
player_names = {}
for player_id in player_ids:
    player_name = player_dynamics[player_dynamics['player_id'] == player_id]['player_name'].iloc[0]
    player_names[player_id] = player_name

# Print some statistics about the teammate frequency matrix
print(f"Teammate frequency matrix shape: {teammate_freq.shape}")
print(f"Maximum teammate frequency: {np.max(teammate_freq)}")
print(f"Average teammate frequency: {np.mean(teammate_freq)}")
print(f"Number of non-zero entries: {np.count_nonzero(teammate_freq)}")

In [None]:
# Build correlation-based influence network
# For each pair of teammates, calculate the correlation between their plus/minus
influence_matrix = np.zeros((n_players, n_players))

# For each player pair
for i in range(n_players):
    player1_id = idx_to_player[i]
    player1_games = player_temporal_df[player_temporal_df['Player_ID'] == player1_id]
    
    for j in range(i+1, n_players):
        player2_id = idx_to_player[j]
        player2_games = player_temporal_df[player_temporal_df['Player_ID'] == player2_id]
        
        # Find common games
        common_games = set(player1_games['Game_ID']).intersection(set(player2_games['Game_ID']))
        
        # If they have enough common games
        if len(common_games) >= 10:  # Minimum number of common games
            # Get plus/minus for common games
            player1_pm = player1_games[player1_games['Game_ID'].isin(common_games)]['PLUS_MINUS'].values
            player2_pm = player2_games[player2_games['Game_ID'].isin(common_games)]['PLUS_MINUS'].values
            
            # Calculate correlation
            corr = np.corrcoef(player1_pm, player2_pm)[0, 1]
            
            # Store correlation in influence matrix
            influence_matrix[i, j] = corr
            influence_matrix[j, i] = corr

# Replace NaN values with 0
influence_matrix = np.nan_to_num(influence_matrix)

# Print some statistics about the influence matrix
print(f"Influence matrix shape: {influence_matrix.shape}")
print(f"Maximum influence: {np.max(influence_matrix)}")
print(f"Minimum influence: {np.min(influence_matrix)}")
print(f"Average influence: {np.mean(influence_matrix)}")
print(f"Number of non-zero entries: {np.count_nonzero(influence_matrix)}")

## Network Visualization

Let's visualize the teammate networks to understand the structure of player interactions.

In [None]:
# Create network visualizations
# Convert matrices to NetworkX graphs
# Teammate frequency network
G_freq = nx.Graph()

# Add nodes
for i in range(n_players):
    player_id = idx_to_player[i]
    player_name = player_names[player_id]
    G_freq.add_node(i, name=player_name, id=player_id)

# Add edges with weight based on frequency
for i in range(n_players):
    for j in range(i+1, n_players):
        if teammate_freq[i, j] > 0:  # Only add edges for players who have been teammates
            G_freq.add_edge(i, j, weight=teammate_freq[i, j])

# Influence network
G_influence = nx.Graph()

# Add nodes
for i in range(n_players):
    player_id = idx_to_player[i]
    player_name = player_names[player_id]
    G_influence.add_node(i, name=player_name, id=player_id)

# Add edges with weight based on influence
for i in range(n_players):
    for j in range(i+1, n_players):
        if influence_matrix[i, j] > 0.3:  # Only add edges for significant positive influence
            G_influence.add_edge(i, j, weight=influence_matrix[i, j])

# Print network statistics
print(f"Teammate frequency network: {G_freq.number_of_nodes()} nodes, {G_freq.number_of_edges()} edges")
print(f"Influence network: {G_influence.number_of_nodes()} nodes, {G_influence.number_of_edges()} edges")

In [None]:
# Visualize the influence network
# Extract the largest connected component for better visualization
largest_cc = max(nx.connected_components(G_influence), key=len)
G_influence_cc = G_influence.subgraph(largest_cc).copy()

# Calculate node sizes based on degree centrality
degree_centrality = nx.degree_centrality(G_influence_cc)
node_sizes = [5000 * degree_centrality[node] + 100 for node in G_influence_cc.nodes()]

# Calculate edge widths based on weight
edge_widths = [2 * G_influence_cc[u][v]['weight'] for u, v in G_influence_cc.edges()]

# Create a spring layout
pos = nx.spring_layout(G_influence_cc, seed=42)

# Create the visualization
plt.figure(figsize=(12, 12))
nx.draw_networkx_nodes(G_influence_cc, pos, node_size=node_sizes, alpha=0.7, node_color='skyblue')
nx.draw_networkx_edges(G_influence_cc, pos, width=edge_widths, alpha=0.5, edge_color='gray')
nx.draw_networkx_labels(G_influence_cc, pos, labels={node: G_influence_cc.nodes[node]['name'] for node in G_influence_cc.nodes()})

plt.title('Player Influence Network (Largest Connected Component)', fontsize=14)
plt.axis('off')
plt.tight_layout()
plt.show()

### Interpreting the Teammate Network

The teammate network visualization reveals several interesting patterns:

1. **Network Structure**: The network shows clusters of players who frequently play together, typically representing team cores and common lineups.

2. **Central Players**: Players with high degree centrality (larger nodes) are those who have played with many different teammates, often indicating veterans who have been on multiple teams or players who are frequently part of different lineup combinations.

3. **Player Clusters**: Distinct clusters in the network typically represent team units or players who have strong on-court chemistry.

4. **Influence Relationships**: The thickness of edges represents the strength of performance correlation between players, with thicker edges indicating stronger positive influence.

These patterns provide insights into the social and performance dynamics of NBA teams, highlighting players who serve as connectors, influencers, or performance multipliers within the league.

## Network Metrics Calculation

Let's calculate network metrics to quantify player influence and centrality.

In [None]:
# Calculate network metrics for each player
network_metrics = []

# Calculate centrality metrics for the influence network
degree_centrality = nx.degree_centrality(G_influence)
betweenness_centrality = nx.betweenness_centrality(G_influence)
eigenvector_centrality = nx.eigenvector_centrality_numpy(G_influence, weight='weight')
pagerank = nx.pagerank(G_influence, weight='weight')

# Calculate average influence
avg_influence = {}
for i in range(n_players):
    # Get non-zero influences
    influences = influence_matrix[i, :]
    non_zero = influences[influences != 0]
    if len(non_zero) > 0:
        avg_influence[i] = np.mean(non_zero)
    else:
        avg_influence[i] = 0

# Calculate positive influence count
positive_influence_count = {}
for i in range(n_players):
    positive_influence_count[i] = np.sum(influence_matrix[i, :] > 0.3)  # Count significant positive influences

# Combine metrics for each player
for i in range(n_players):
    player_id = idx_to_player[i]
    player_name = player_names[player_id]
    
    network_metrics.append({
        'player_id': player_id,
        'player_name': player_name,
        'degree_centrality': degree_centrality.get(i, 0),
        'betweenness_centrality': betweenness_centrality.get(i, 0),
        'eigenvector_centrality': eigenvector_centrality.get(i, 0),
        'pagerank': pagerank.get(i, 0),
        'avg_influence': avg_influence.get(i, 0),
        'positive_influence_count': positive_influence_count.get(i, 0)
    })

# Convert to dataframe
network_df = pd.DataFrame(network_metrics)

# Sort by eigenvector centrality
network_df = network_df.sort_values('eigenvector_centrality', ascending=False)

# Display top players by network metrics
network_df.head(10)

In [None]:
# Visualize network metrics distribution
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Degree centrality
axes[0, 0].hist(network_df['degree_centrality'], bins=20, alpha=0.7, color='skyblue')
axes[0, 0].set_xlabel('Degree Centrality', fontsize=12)
axes[0, 0].set_ylabel('Number of Players', fontsize=12)
axes[0, 0].set_title('Degree Centrality Distribution', fontsize=14)
axes[0, 0].grid(True, alpha=0.3)

# Eigenvector centrality
axes[0, 1].hist(network_df['eigenvector_centrality'], bins=20, alpha=0.7, color='salmon')
axes[0, 1].set_xlabel('Eigenvector Centrality', fontsize=12)
axes[0, 1].set_ylabel('Number of Players', fontsize=12)
axes[0, 1].set_title('Eigenvector Centrality Distribution', fontsize=14)
axes[0, 1].grid(True, alpha=0.3)

# Average influence
axes[1, 0].hist(network_df['avg_influence'], bins=20, alpha=0.7, color='lightgreen')
axes[1, 0].set_xlabel('Average Influence', fontsize=12)
axes[1, 0].set_ylabel('Number of Players', fontsize=12)
axes[1, 0].set_title('Average Influence Distribution', fontsize=14)
axes[1, 0].grid(True, alpha=0.3)

# Positive influence count
axes[1, 1].hist(network_df['positive_influence_count'], bins=20, alpha=0.7, color='purple')
axes[1, 1].set_xlabel('Positive Influence Count', fontsize=12)
axes[1, 1].set_ylabel('Number of Players', fontsize=12)
axes[1, 1].set_title('Positive Influence Count Distribution', fontsize=14)
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

In [None]:
# Identify top players by different network metrics
top_degree = network_df.nlargest(5, 'degree_centrality')
top_eigenvector = network_df.nlargest(5, 'eigenvector_centrality')
top_betweenness = network_df.nlargest(5, 'betweenness_centrality')
top_influence = network_df.nlargest(5, 'avg_influence')

print("Top Players by Degree Centrality (Connectivity):")
print(top_degree[['player_name', 'degree_centrality']])

print("\nTop Players by Eigenvector Centrality (Influence):")
print(top_eigenvector[['player_name', 'eigenvector_centrality']])

print("\nTop Players by Betweenness Centrality (Bridge Players):")
print(top_betweenness[['player_name', 'betweenness_centrality']])

print("\nTop Players by Average Influence:")
print(top_influence[['player_name', 'avg_influence']])

### Interpreting Network Metrics

The network metrics provide insights into different aspects of player influence and connectivity:

1. **Degree Centrality**: Measures the number of direct connections a player has. Players with high degree centrality are well-connected and have played with many different teammates.
   - Top players: [List top players from your data]

2. **Eigenvector Centrality**: Measures influence by considering both the quantity and quality of connections. Players with high eigenvector centrality are connected to other influential players.
   - Top players: [List top players from your data]

3. **Betweenness Centrality**: Measures how often a player serves as a bridge between other players. Players with high betweenness centrality connect different groups or clusters of players.
   - Top players: [List top players from your data]

4. **Average Influence**: Measures the average correlation in plus/minus with teammates. Players with high average influence tend to have a positive impact on their teammates' performance.
   - Top players: [List top players from your data]

These metrics identify different types of influential players in the NBA, from connectors to performance multipliers.

## Identify Synergy Pairs and Clusters

Let's identify player pairs with strong synergy and clusters of players who work well together.

In [None]:
# Identify synergy pairs
synergy_pairs = []

# Threshold for significant synergy
synergy_threshold = 0.5

for i in range(n_players):
    for j in range(i+1, n_players):
        if influence_matrix[i, j] > synergy_threshold:  # Only consider strong positive influence
            player1_id = idx_to_player[i]
            player2_id = idx_to_player[j]
            player1_name = player_names[player1_id]
            player2_name = player_names[player2_id]
            
            synergy_pairs.append({
                'player1_id': player1_id,
                'player2_id': player2_id,
                'player1_name': player1_name,
                'player2_name': player2_name,
                'synergy_score': influence_matrix[i, j]
            })

# Convert to dataframe
synergy_df = pd.DataFrame(synergy_pairs)

# Sort by synergy score
synergy_df = synergy_df.sort_values('synergy_score', ascending=False)

# Display top synergy pairs
print(f"Identified {len(synergy_df)} synergy pairs with score > {synergy_threshold}")
synergy_df.head(10)

In [None]:
# Identify player clusters using community detection
# Use the Louvain method for community detection
import community as community_louvain

# Apply community detection to the influence network
partition = community_louvain.best_partition(G_influence)

# Count the number of communities
n_communities = len(set(partition.values()))
print(f"Identified {n_communities} player communities")

# Group players by community
communities = {}
for node, community_id in partition.items():
    if community_id not in communities:
        communities[community_id] = []
    
    player_id = idx_to_player[node]
    player_name = player_names[player_id]
    communities[community_id].append(player_name)

# Display communities
for community_id, players in communities.items():
    if len(players) > 3:  # Only show communities with at least 3 players
        print(f"\nCommunity {community_id} ({len(players)} players):")
        print(", ".join(players))

In [None]:
# Visualize the influence network with communities
# Extract the largest connected component for better visualization
largest_cc = max(nx.connected_components(G_influence), key=len)
G_influence_cc = G_influence.subgraph(largest_cc).copy()

# Calculate node sizes based on degree centrality
degree_centrality = nx.degree_centrality(G_influence_cc)
node_sizes = [5000 * degree_centrality[node] + 100 for node in G_influence_cc.nodes()]

# Calculate edge widths based on weight
edge_widths = [2 * G_influence_cc[u][v]['weight'] for u, v in G_influence_cc.edges()]

# Create a spring layout
pos = nx.spring_layout(G_influence_cc, seed=42)

# Create the visualization with communities
plt.figure(figsize=(14, 14))

# Get community colors
community_colors = {}
for node in G_influence_cc.nodes():
    community_id = partition.get(node, 0)
    if community_id not in community_colors:
        community_colors[community_id] = plt.cm.tab20(community_id % 20)

# Draw nodes colored by community
for community_id in set(partition.values()):
    nodes = [node for node in G_influence_cc.nodes() if partition.get(node, 0) == community_id]
    nx.draw_networkx_nodes(G_influence_cc, pos, nodelist=nodes, node_size=[node_sizes[list(G_influence_cc.nodes()).index(node)] for node in nodes], 
                          node_color=[community_colors[community_id]] * len(nodes), alpha=0.8)

# Draw edges
nx.draw_networkx_edges(G_influence_cc, pos, width=edge_widths, alpha=0.5, edge_color='gray')

# Draw labels
nx.draw_networkx_labels(G_influence_cc, pos, labels={node: G_influence_cc.nodes[node]['name'] for node in G_influence_cc.nodes()})

plt.title('Player Influence Network with Communities', fontsize=16)
plt.axis('off')
plt.tight_layout()
plt.show()

### Interpreting Synergy Pairs and Clusters

Our analysis of synergy pairs and player clusters reveals several interesting patterns:

1. **Top Synergy Pairs**: The pairs with the highest synergy scores represent players who consistently perform well together. These pairs often share complementary skills or have developed strong on-court chemistry.

2. **Player Communities**: The community detection algorithm has identified natural groupings of players who have positive performance correlations. These communities often represent:
   - Players from the same team who frequently play together
   - Players with compatible playing styles
   - Players who have moved between teams but maintained connections

3. **Community Structure**: The overall community structure provides insights into the social and performance networks within the NBA, highlighting how player performance is interconnected across teams and playing styles.

These insights can inform lineup construction and player acquisition decisions by identifying players who are likely to work well together and create positive synergies.

## Save Network Analysis Results

Let's save our network analysis results for use in the next notebook.

In [None]:
# Save network metrics
network_df.to_csv('../data/processed/network_metrics.csv', index=False)
print(f"Saved network metrics to ../data/processed/network_metrics.csv")

# Save synergy pairs
synergy_df.to_csv('../data/processed/synergy_pairs.csv', index=False)
print(f"Saved synergy pairs to ../data/processed/synergy_pairs.csv")

## Conclusion

In this notebook, we've applied network analysis to understand teammate interactions in the NBA. We've built teammate networks, calculated network metrics, and identified synergy pairs and player clusters.

Key accomplishments:
1. Built teammate frequency and influence networks based on game data
2. Visualized the network structure to identify key patterns and relationships
3. Calculated network metrics including centrality and influence measures
4. Identified synergy pairs with strong positive performance correlations
5. Detected player communities using community detection algorithms

These network insights provide a deeper understanding of the social and performance dynamics in the NBA, revealing patterns that are invisible to traditional statistics. In the next notebook (05b_player_impact.ipynb), we'll build on these network insights to develop a comprehensive player impact framework that integrates production, stability, adaptability, and network metrics.