##Objective 

In this section, we'll upload our raw data scraped from basketball-reference.com and use various data cleaning and manipulation strategies to clean and manipulate it. We'll save the processed data and use R to perform data analysis.

##Preparation

###Import libraries

In [None]:
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import pandas as pd
import re
import warnings
warnings.filterwarnings(action='ignore')

###Google Drive Connection

In [None]:
#connect with google drive to import raw data 
from google.colab import drive
import sys, os
# Mount google drive
drive.mount('/content/drive', force_remount=False)

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


##Part B: Data Preparation and Description

###Dataset description
The data is taken from the website basketball-reference.com. basketball-reference.com.
It has information of all NBA players who were active during the 2009/2010 season to the 2020/2021 season. We define *the most current seasons* as the 2020/2021 season. 



###Data Dictionary for raw data
The raw data is in two sets: seasons_data.csv and team_data.csv. 

1. The `seasons_data.csv` data set contains data of NBA players statistics. The set include 7490 obeservations and 38 attributes . Each of the column is defined as follows:

-   `Player` : Name of the player 
-   `Season` : Season played
-   `Draft` : Team drafted in.  
The NBA draft happens every year in June. It is where teams in the National Basketball Association (NBA) choose players who have never played in the NBA before. If a team chooses a player, that player cannot sign a contract to play for any teams other than that team
-   `Rank` :  
-   `Experience` : Total years of experience 
-   `Height` : Height of the player
-   `Weight` : Weight of the player
-   `Pos`: Position  
      - PG Point Guard
      - SG Shooting Guard
      - SF Small Forward
      - PF Power Forward
      - C Center
-   `Age` : Age of Player
-   `Tm` : Team
-   `G` : Games
-   `MP` :  Minutes Played Per Game
-   `FG` : Field Goals Per Game
-   `FGA` : Field Goal Attempts
-   `FG%` : Field Goal Percentage
-   `3P`: 3-Point Field Goals 
-   `3PA`: 3-Point Field Goal Attempts
-   `3P%`: 3-Point Field Goal Percentage
-   `2P` : 2-Point Field Goals 
-   `2PA` : 2-point Field Goal Attempts
-   `2P%` : 2-Point Field Goal Percentage
-   `eFG%` : Effective Field Goal Percentage(This statistic adjusts for the fact that a 3-point field goal is worth one more point than a 2-point field goal.)
-   `FT` : Free Throws 
-   `FTA` : Free Throw Attempts 
-   `FT%` : Free Throw Percentage
-   `ORB`: Offensive Rebounds 
-   `DRB` : Defensive Rebounds 
-   `TRB` : Total Rebounds 
-   `AST` : Assists 
-   `STL` : Steals 
-   `BLK` : Blocks 
-   `TOV`: Turnovers 
-   `PF` : Personal Fouls 
-   `PTS` : Points per game 
-   `PER` : Player Efficiency Rating (A measure of per-minute production standardized such that the league average is 15)
-   `TS%`: True Shooting Percentage (A measure of shooting efficiency that takes into account 2-point field goals, 3-point field goals, and free throws)
-   `BPM`: Box Plus/Minus (A box score estimate of the points per 100 possessions a player contributed above a league-average player, translated to an average team)
-   `WS`:  Win Shares(An estimate of the number of wins contributed by a player)
-   `Player_Career_Salary` : Player's career salary
-   `Player_Season_Salary` : Player's salary per season 

2. The `team_data.csv` data set contains data of team statistics. The set include 360 obeservations and 44 attributes . Each of the column is defined as follows:

-   `team_id` : Unique Team ID 
-   `name` : Name of the team 
-   `location` : Location of the team
-   `other_names`: Other names of the team
-   `seasons` : Total number of seasons played by a team
-   `seasons_years` :  First year to last year played by a team 
-   `tot_record` : overall record of the team 
-   `tot_record_pct` : record percentage
-   `playoffs` : Total playoffs game played by team
-   `championships` : Total championships played  
-   `season_year`: 
-   `g` : Games
-   `mp` : Minutes Played Per Game
-   `fg` : Field Goals Per Game
-   `fga` : Field Goal Attempts Per Game
-   `fg_pct` : Field Goal Percentage
-   `fg3` : Team's 3-Point Field Goals
-   `fg3a` : Team's 3 Point Field Goals attempts
-   `fg3_pct` : Percent of Team's 3 Point Field Goals Attempted
-   `fg2` : Team's 2-Point Field Goals
-   `fg2a` : Team's 2-Point Field Goals attempts
-   `fg2_pct` : Team's 2-Point Field Goals percentage
-   `ft` : Team's Free throw
-   `fta` : Free throw attempt
-   `orb` : Offensive Rebounds
-   `drb` : Defensive Rebounds 
-   `trb` : Total  Rebounds 
-   `ast` : Team's Assists
-   `stl` : Team's Steals
-   `blk` : Team's blocks
-   `tov` : Turnovers
-   `pf` : Personal fouls
-   `pts` : Points per game
-   `wins` : Total games won by team 
-   `losses` : Total games lost by team 
-   `off_rtg` : Offensive Rating
-   `def_rtg` : Defensive Rating
-   `arena_name` : Arena's name
-   `attendance` : The total attendance per season
-   `total_salary` : Total salary of team 
-   `avg_player_salary` : Average salary per team per season
-   `avg_team_age` : Average player's age per team per season 
-   `avg_team_exp` : Average player's experience per team per season 

###1. Put all the relevant data variables into one dataframe. 
Explain how you clean your dataset and transform your data variable if any and provide a data dictionary for all your variables.

In [None]:
# Load  season statistic and season statistics into dataframe
season_stats = pd.read_csv('/content/drive/MyDrive/IS5126/Raw_Data/season_data.csv',encoding='latin')
team_stats = pd.read_csv('/content/drive/MyDrive/IS5126/Raw_Data/team_data.csv')

In [None]:
#check the number of observation and attributes
print("season_stats shape:", season_stats.shape)
print("team_stats shape:", team_stats.shape)

season_stats shape: (7490, 38)
team_stats shape: (360, 44)


In [None]:
#view season information 
season_stats.head(2)

Unnamed: 0,Player,Year,Season,Draft,Rank,Experience,Height,Weight,Pos,Age,...,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,Player_Career_Salary,Player_Season_Salary
0,Arron Afflalo,2010,2009-10,Detroit Pistons,26.0,11,6-5,210lb,SG,24,...,193,252,138,46,30,74,225,724,"$58,852,159","$1,086,240"
1,Alexis AjinÃ§a,2010,2009-10,Charlotte Bobcats,,7,7-2,248lb,C,21,...,3,4,0,1,1,2,5,10,"$24,244,516","$1,372,080"


In [None]:
#view player information per season
team_stats.head(2)

Unnamed: 0,team_id,name,location,other_names,seasons,seasons_years,tot_record,tot_record_pct,playoffs,championships,...,wins,losses,off_rtg,def_rtg,arena_name,attendance,total_salary,avg_player_salary,avg_team_age,avg_team_exp
0,WAS,Washington Wizards,"Washington, District of Columbia","Washington Wizards, Washington Bullets, Capita...",61,1961-62 to 2021-22,2214-2686,0.452,30,1,...,26,56,104.2,109.4,Verizon Center,664398,73440274,4590017.12,27.21,4.62
1,UTA,Utah Jazz,"Salt Lake City, Utah","Utah Jazz, New Orleans Jazz",48,1974-75 to 2021-22,2097-1748,0.545,30,0,...,53,29,110.7,105.0,EnergySolutions Arena,794512,71905244,5992103.67,24.67,3.27


### Data Manipulation/Cleaning

#### Step 1 : Column's datatype conversion 


In [None]:
#check data type of column
season_stats.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7490 entries, 0 to 7489
Data columns (total 38 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   Player                7490 non-null   object 
 1   Year                  7490 non-null   int64  
 2   Season                7490 non-null   object 
 3   Draft                 5952 non-null   object 
 4   Rank                  3900 non-null   float64
 5   Experience            7490 non-null   int64  
 6   Height                7490 non-null   object 
 7   Weight                7490 non-null   object 
 8   Pos                   7490 non-null   object 
 9   Age                   7490 non-null   int64  
 10  Tm                    7490 non-null   object 
 11  G                     7490 non-null   int64  
 12  GS                    7490 non-null   int64  
 13  MP                    7490 non-null   int64  
 14  FG                    7490 non-null   int64  
 15  FGA                  

Player salary is stored as string. Data type is changed to numeric.

In [None]:
#Convert player career salary and player season salary to int 
season_stats['Player_Career_Salary']=[int(re.sub(r'[^\d.]+', '', s)) if isinstance(s, str) else s for s in season_stats['Player_Career_Salary'].values]
season_stats['Player_Season_Salary']=[int(re.sub(r'[^\d.]+', '', s)) if isinstance(s, str) else s for s in season_stats['Player_Season_Salary'].values]
print(season_stats['Player_Career_Salary'].head(1))
print(season_stats['Player_Season_Salary'].head(1))

0    58852159
Name: Player_Career_Salary, dtype: int64
0    1086240.0
Name: Player_Season_Salary, dtype: float64


Height and Weight of players are stored as string. Converting them both as float.

In [None]:
#Convert height to inches(float)
def convert_height_to_inches(height):
    feat,inches = height.split("-")
    height_in_inches = 12 * float(feat) + float(inches)
    return height_in_inches

In [None]:
season_stats['Height'] = season_stats['Height'].apply(convert_height_to_inches)
season_stats['Height'].head(1)

0    77.0
Name: Height, dtype: float64

In [None]:
#Convert weight to float
def convert_weight(weight):
  weight_updated = weight.replace('lb','')
  return float(weight_updated)

In [None]:
season_stats['Weight'] = season_stats['Weight'].apply(convert_weight)
season_stats['Weight'].head(1)

0    210.0
Name: Weight, dtype: float64

In [None]:
team_stats.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 360 entries, 0 to 359
Data columns (total 44 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   team_id            360 non-null    object 
 1   name               360 non-null    object 
 2   location           360 non-null    object 
 3   other_names        360 non-null    object 
 4   seasons            360 non-null    object 
 5   seasons_years      360 non-null    object 
 6   tot_record         360 non-null    object 
 7   tot_record_pct     360 non-null    float64
 8   playoffs           360 non-null    int64  
 9   championships      360 non-null    int64  
 10  season_year        360 non-null    object 
 11  g                  360 non-null    int64  
 12  mp                 360 non-null    int64  
 13  fg                 360 non-null    int64  
 14  fga                360 non-null    int64  
 15  fg_pct             360 non-null    float64
 16  fg3                360 non

#### Step 2 : Handle Missing observations

In [None]:
#check the null values count for each column 
print(season_stats.isnull().sum())

Player                     0
Year                       0
Season                     0
Draft                   1538
Rank                    3590
Experience                 0
Height                     0
Weight                     0
Pos                        0
Age                        0
Tm                         0
G                          0
GS                         0
MP                         0
FG                         0
FGA                        0
FG%                       38
3P                         0
3PA                        0
3P%                      859
2P                         0
2PA                        0
2P%                       84
eFG%                      38
FT                         0
FTA                        0
FT%                      365
ORB                        0
DRB                        0
TRB                        0
AST                        0
STL                        0
BLK                        0
TOV                        0
PF            

The following columns in the dataframe season stats contain null values- 

1) Rank

2) Draft

3) 3P%

4) FP%

5)Player_Season_Salary

Rank is only assigned on a scale of one to one hundred. As a result, the remaining columns can be replaced with 0. Furthermore, there are many missing values for rank, which we keep to avoid losing numerous observations.

There are 432 observation for which player’s season salary is missing. We discovered that the basketball-reference.com website has not stored the salary of the player who are earning less than the minimum salary. As a result, we believe we should discard the observations in which the player's season salary is missing. 

Other variables seems to be  insignificant. Rest of  the rows with null values are deleted. 


In [None]:
#drop observation where the player season salary is missing
season_stats.dropna(subset=['Player_Season_Salary'],inplace=True)

In [None]:
#verify for null value
season_stats[season_stats['Player_Season_Salary'].isna()]

Unnamed: 0,Player,Year,Season,Draft,Rank,Experience,Height,Weight,Pos,Age,...,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,Player_Career_Salary,Player_Season_Salary


In [None]:
#substitute missing value(null) with 0 
season_stats.fillna(0,inplace=True)

In [None]:
print(team_stats.isnull().sum())

team_id              0
name                 0
location             0
other_names          0
seasons              0
seasons_years        0
tot_record           0
tot_record_pct       0
playoffs             0
championships        0
season_year          0
g                    0
mp                   0
fg                   0
fga                  0
fg_pct               0
fg3                  0
fg3a                 0
fg3_pct              0
fg2                  0
fg2a                 0
fg2_pct              0
ft                   0
fta                  0
ft_pct               0
orb                  0
drb                  0
trb                  0
ast                  0
stl                  0
blk                  0
tov                  0
pf                   0
pts                  0
wins                 0
losses               0
off_rtg              0
def_rtg              0
arena_name           0
attendance           0
total_salary         0
avg_player_salary    0
avg_team_age         0
avg_team_ex

There are no missing value in team_stats dataframe

#### Step 3 : Renaming column names

In [None]:
#rename season_stats column name 
season_stats.rename(columns = {'Season':'Season_Year', 'Pos':'Position','Tm':'Team_ID','G':'Games',
                               'MP': 'Mins_Played','FG':'Field_Goal','FGA':'FG_attempts','FG%':'FG_percent',
                               '3P':'3_Point_FG','3PA':'3P_attempts','3P%':'3P_percent', 
                               '2P':'2_Point_FG','2PA':'2P_attempts','2P%':'2P_percent', 
                               'FT':'Free_Throw','FTA':'FT_attempts','FT%':'FT_percent', 
                               'PTS':'Points', 'ORB':'Offensive_Rebounds','DRB':'Defensive_Rebounds','TRB':'Total_Rebounds',
                               'AST':'Assists','STL':'Steals','BLK':'Blocks',
                               'TOV':'Turnovers','PF':'Personal_Fouls'
                               }, inplace = True)
                              
season_stats.head(2)

Unnamed: 0,Player,Year,Season_Year,Draft,Rank,Experience,Height,Weight,Position,Age,...,Defensive_Rebounds,Total_Rebounds,Assists,Steals,Blocks,Turnovers,Personal_Fouls,Points,Player_Career_Salary,Player_Season_Salary
0,Arron Afflalo,2010,2009-10,Detroit Pistons,26.0,11,77.0,210.0,SG,24,...,193,252,138,46,30,74,225,724,58852159,1086240.0
1,Alexis AjinÃ§a,2010,2009-10,Charlotte Bobcats,0.0,7,86.0,248.0,C,21,...,3,4,0,1,1,2,5,10,24244516,1372080.0


#### Step 4 : Handle duplicates

During any given season, numerous players have played for many teams. We have observations per team for such players, as well as a total observation for all the teams he has played for.

Aaron Gordon, for example, has played for two teams - ORL and DEN in the 2020-21 season. Therefore, he has 3 observation. Two observation for each team and one combined observation with `Team_ID` = TOT. 

As a result, we are creating two dataframe from season_stats.
1. First  data frame without Team_ID “TOT”.
2. Second data frame with Team_ID “TOT” -  All other team's observation will be dropped for player who have played for multiple team in one season. This will be used in panel analysis and linear regression analysis. For panel analysis, we need unique key for entity-time pair.


In [None]:
player_year_2021 = season_stats[season_stats['Year']==2021]
player_year_2021[player_year_2021['Player']=='Aaron Gordon']

Unnamed: 0,Player,Year,Season_Year,Draft,Rank,Experience,Height,Weight,Position,Age,...,Defensive_Rebounds,Total_Rebounds,Assists,Steals,Blocks,Turnovers,Personal_Fouls,Points,Player_Career_Salary,Player_Season_Salary
7016,Aaron Gordon,2021,2020-21,Orlando Magic,4.0,7,80.0,235.0,PF,25,...,207,284,161,33,34,97,89,618,77610369,18136364.0
7017,Aaron Gordon,2021,2020-21,Orlando Magic,4.0,7,80.0,235.0,PF,25,...,127,166,105,16,20,67,49,364,77610369,18136364.0
7018,Aaron Gordon,2021,2020-21,Orlando Magic,4.0,7,80.0,235.0,PF,25,...,80,118,56,17,14,30,40,254,77610369,18136364.0


In [None]:
# create DF retaining only Team_ID=TOT for each player who have played in multiple team during one season
season_unbalanced_data = season_stats.copy()
season_unbalanced_data.drop_duplicates(['Player','Season_Year'],keep= 'first',inplace=True)

In [None]:
# create DF without Team_ID=TOT
index_names = season_stats[ season_stats['Team_ID'] == 'TOT' ].index
season_stats.drop(index_names, inplace = True)

#### Step 5 : Combine dataframe for analysis

​​In this step, we have combined key data variables from both season_stats and team_stats dataframes for data analysis where we need data from both dataframes

In [None]:
#select columns from teams _stats to merge it with seasons stats
team_stats_DA = team_stats[["team_id", "name","season_year","championships","pts","wins","losses"]]

team_stats_DA.rename(columns = {'team_id': 'Team_ID','season_year':'Season_Year','championships':'Championships_by_team', 'pts':'Points_by_team',
                              'wins':'Wins','losses%':'Losses'}, inplace = True)
team_stats_DA.head(2)

Unnamed: 0,Team_ID,name,Season_Year,Championships_by_team,Points_by_team,Wins,losses
0,WAS,Washington Wizards,2009-10,1,7892,26,56
1,UTA,Utah Jazz,2009-10,0,8547,53,29


In [None]:
#combined seasons and teams
player_combined_stats= season_stats.merge(team_stats_DA, on=["Team_ID","Season_Year"])
print(player_combined_stats.shape)

(5878, 43)


#### Step 7 : Calculate per game statistics for each player in the unbalanced dataset

In order to avoid multiple values scraping we decided to per game statistics using the existing total season data for each player.

For attributes which we will be using in further analysis we are calculating the per game statistics as total/games.


In [None]:
season_unbalanced_data_per_game = season_unbalanced_data.copy()

# Mins_Played
season_unbalanced_data_per_game['Mins_Played_per_game'] = round(season_unbalanced_data_per_game['Mins_Played']/season_unbalanced_data_per_game['Games'],1)
# Total_Rebounds
season_unbalanced_data_per_game['Total_Rebounds_per_game'] = round(season_unbalanced_data_per_game['Total_Rebounds']/season_unbalanced_data_per_game['Games'],1)
# Assists
season_unbalanced_data_per_game['Assists_per_game'] = round(season_unbalanced_data_per_game['Assists']/season_unbalanced_data_per_game['Games'],1)
# Steals
season_unbalanced_data_per_game['Steals_per_game'] = round(season_unbalanced_data_per_game['Steals']/season_unbalanced_data_per_game['Games'],1)
# Blocks
season_unbalanced_data_per_game['Blocks_per_game'] = round(season_unbalanced_data_per_game['Blocks']/season_unbalanced_data_per_game['Games'],1)
# Turnovers
season_unbalanced_data_per_game['Turnovers_per_game'] = round(season_unbalanced_data_per_game['Turnovers']/season_unbalanced_data_per_game['Games'],1)
# Personal_Fouls
season_unbalanced_data_per_game['Personal_Fouls_per_game'] = round(season_unbalanced_data_per_game['Personal_Fouls']/season_unbalanced_data_per_game['Games'],1)
# Points
season_unbalanced_data_per_game['Points_per_game'] = round(season_unbalanced_data_per_game['Points']/season_unbalanced_data_per_game['Games'],1)

#### Step 6 : Create balanced season's dataframe

Both team_data and season_data  are panel dataset. Panel data can be balanced or unbalanced. In a balanced panel, all panel members (cross-sectional data) have measurements in all periods, or each panel member is observed every year. If a balanced panel contains N panel members and T periods, the number of observations (n) in the dataset is necessarily n = N×T.

Whereas for an unbalanced panel, each panel member in a data set has different numbers of observations, or at least one-panel member is not observed every period. If an unbalanced panel contains N panel members and T periods, then the following strict inequality holds for the number of observations (n) in the dataset: n < N×T.
We have selected players who have played in all seasons to create balanced dataset.

In [None]:
#select player who have played in all seasons
season_balanced_panel = season_unbalanced_data_per_game.copy()
season_balanced_panel = season_balanced_panel.groupby('Player').filter(lambda x: len(x) == 12)

#### Step 6 : Save all dataframe as CSV

We have chosen R to perform data analysis. Therefore, all dataframe are stored as csv

In [None]:
#containes key variable of season_stats and team_stats
player_combined_stats.to_csv('/content/drive/MyDrive/IS5126/processed_data/player_combined_stats_updated.csv')

#transformed season_stats DF
season_stats.to_csv('/content/drive/MyDrive/IS5126/processed_data/seasons_stats_updated.csv')

#transformed team_stats DF
team_stats.to_csv('/content/drive/MyDrive/IS5126/processed_data/team_stats_updated.csv')

#balance dataframe 
season_balanced_panel.to_csv('/content/drive/MyDrive/IS5126/processed_data/season_balanced_panel_updated.csv')

#unbalanced dataframe
season_unbalanced_data.to_csv('/content/drive/MyDrive/IS5126/processed_data/season_unbalanced_data_updated.csv')

#unbalanced dataframe with per game stats
season_unbalanced_data_per_game.to_csv('/content/drive/MyDrive/IS5126/processed_data/season_unbalanced_per_game_data_updated.csv')

###2. For the most current season in the dataset, (4 points)

  2a. How many active players are there?

 




In [None]:
# Get player for season 2020-21   =
player_year_2021 = season_stats[season_stats['Year']==2021]
# Player can be in multiple teams in the same season. Hence take unique values to avoid duplicates
print('Number of active players in the current season:',player_year_2021['Player'].nunique())

player_season_2020_21 = season_stats[season_stats['Season_Year']=='2020-21']
# Player can be in multiple teams in the same season. Hence take unique values to avoid duplicates
print('Number of active players in the current season:',player_season_2020_21['Player'].nunique())

Number of active players in the current season: 481
Number of active players in the current season: 481



- 2a) There are 481 active player for season 2020-21, 

 2b. How many players in each position?

In [None]:
#2b Number of players for each position 
print(player_year_2021.groupby('Position')[['Player']].nunique())

          Player
Position        
C             94
PF           113
PG            87
SF            88
SG           115


- 2b) We can see above the number of player for each position in 2020-21 
      where 
      - PG Point Guard
      - SG Shooting Guard
      - SF Small Forward
      - PF Power Forward
      - C Center.





2c. what is the average age, weight, experience, salary in the season?




During any given season, numerous players have played for many teams. We have observations per team for such players, as well as a total observation for all the teams he has played for.

Aaron Gordon, for example, has played for two teams in the 2020-21 season: ORL and DEN.

As a result, we drop duplicates (in a temporary dataframe) while performing aggregate operations on them.

In [None]:
player_year_2021[player_year_2021['Player']=='Aaron Gordon']

Unnamed: 0,Player,Year,Season_Year,Draft,Rank,Experience,Height,Weight,Position,Age,...,Defensive_Rebounds,Total_Rebounds,Assists,Steals,Blocks,Turnovers,Personal_Fouls,Points,Player_Career_Salary,Player_Season_Salary
7017,Aaron Gordon,2021,2020-21,Orlando Magic,4.0,7,80.0,235.0,PF,25,...,127,166,105,16,20,67,49,364,77610369,18136364.0
7018,Aaron Gordon,2021,2020-21,Orlando Magic,4.0,7,80.0,235.0,PF,25,...,80,118,56,17,14,30,40,254,77610369,18136364.0


In [None]:
player_avg_2021 = player_year_2021.copy()
player_avg_2021 = player_avg_2021.drop_duplicates('Player')

# Average age in season 2020-21
avg_age = player_avg_2021['Age'].mean()
print('Average age of players in the active season is {:.2f} years '.format(avg_age))

#Average weight in season 2020-21
avg_weight = player_avg_2021['Weight'].mean()
print('Average weight of players in the active season is {:.2f} lb'.format(avg_weight))

#Average experience in season 2020-21
avg_exp = player_avg_2021['Experience'].mean()
print('Average experience of players in the active season is {:.2f} years'.format(avg_exp))

# Average salary in season 2020-21
avg_sal = player_avg_2021['Player_Season_Salary'].mean()
print('Average salary of players in the active season is ${}'.format(avg_sal))

Average age of players in the active season is 25.86 years 
Average weight of players in the active season is 217.84 lb
Average experience of players in the active season is 5.42 years
Average salary of players in the active season is $7461400.623700623


Average age of players in the active season is 25.86 years

Average weight of players in the active season is 217.84 lb

Average experience of players in the active season is 5.42 years

Average salary of players in the active season is $7461400.62

2d. What is average career salary?

In [None]:
avg_c_sal = player_avg_2021['Player_Career_Salary'].mean()
print('Average salary of players in the active season is ${}'.format(avg_c_sal))

Average salary of players in the active season is $38525320.20997921


Average salary of players in the active season is $38525320.21

###3. More descriptive statistics on salaries: (5 points)

  3a(i). how many players were active in each season? 

  

In [None]:
# Number of players active per season
print(season_stats.groupby('Season_Year')[['Player']].nunique())

             Player
Season_Year        
2009-10         434
2010-11         445
2011-12         447
2012-13         449
2013-14         385
2014-15         485
2015-16         476
2016-17         484
2017-18         483
2018-19         482
2019-20         480
2020-21         481


In [None]:
# just checking if same result with year
print("Number of players per season")
print(season_stats.groupby('Year')[['Player']].nunique())

Number of players per season
      Player
Year        
2010     434
2011     445
2012     447
2013     449
2014     385
2015     485
2016     476
2017     484
2018     483
2019     482
2020     480
2021     481


  3a(ii). What is the average salary by season?
How about variance of salary be season?

In [None]:
# Average Salary by season
season_stats_avg = season_stats.copy()
season_stats_avg = season_stats_avg.drop_duplicates('Player')
print("Average Salary per season")
print(season_stats_avg.groupby('Season_Year')[['Player_Season_Salary']].mean())
print("Variance of Salary per season")
print(season_stats_avg.groupby('Season_Year')[['Player_Season_Salary']].var())

Average Salary per season
             Player_Season_Salary
Season_Year                      
2009-10              4.542692e+06
2010-11              1.598152e+06
2011-12              1.082372e+06
2012-13              1.279599e+06
2013-14              2.121178e+06
2014-15              1.079574e+06
2015-16              1.363413e+06
2016-17              1.281229e+06
2017-18              1.681998e+06
2018-19              1.609515e+06
2019-20              1.835243e+06
2020-21              2.206113e+06
Variance of Salary per season
             Player_Season_Salary
Season_Year                      
2009-10              2.235539e+13
2010-11              5.318920e+12
2011-12              1.204142e+12
2012-13              1.297810e+12
2013-14              1.378882e+12
2014-15              1.393054e+12
2015-16              1.579053e+12
2016-17              1.707684e+12
2017-18              3.475538e+12
2018-19              2.828833e+12
2019-20              4.068773e+12
2020-21              4.384

 3b. who are the top 10% best paid players in the most current season? Which teams did
these players play for?


There are 481 players in season 2020-21. Hence, top 48 players makes 10% player of this season.

In [None]:
player_sal_2021=player_year_2021[['Player','Team_ID','Player_Season_Salary']].copy()
player_sal_2021 = player_sal_2021.drop_duplicates('Player')
top_48_player = player_sal_2021.sort_values(['Player_Season_Salary'], ascending=False).head(48)
print(top_48_player)
print(top_48_player.groupby('Team_ID')[['Player']].nunique())


                     Player Team_ID  Player_Season_Salary
6936          Stephen Curry     GSW            43006362.0
7295             Chris Paul     PHO            41358814.0
7460      Russell Westbrook     WAS            41358814.0
7044           James Harden     HOU            41254920.0
7450              John Wall     HOU            41254920.0
6968           Kevin Durant     BRK            40108950.0
7114           LeBron James     LAL            39219566.0
7008            Paul George     LAC            35450412.0
6918            Mike Conley     UTA            34504132.0
7448           Kemba Walker     BOS            34379100.0
7175          Kawhi Leonard     LAC            34379100.0
6891           Jimmy Butler     MIA            34379100.0
7056          Tobias Harris     PHI            34358850.0
7102           Kyrie Irving     BRK            33460350.0
7233        Khris Middleton     MIL            33051724.0
6938          Anthony Davis     LAL            32742000.0
7181         D


  3c. who are the bottom 10% best paid players in the most current season? Which teams
did these players play for?


There are 481 players in season 2020-21. Hence, bottom 48 players in terms of salary makes 10% player of this season.

In [None]:
bottom_48_player = player_sal_2021.sort_values(['Player_Season_Salary'], ascending=True).head(48)
print(bottom_48_player)
print(bottom_48_player.groupby('Team_ID')[['Player']].nunique())

                       Player Team_ID  Player_Season_Salary
6886            Elijah Bryant     MIL               24611.0
7337         Cameron Reynolds     SAS               33299.0
7280           Cameron Oliver     HOU               43070.0
6987              Malik Fitts     LAC               61528.0
7136              Mason Jones     HOU               61528.0
6897            Devin Cannady     ORL               61528.0
7037               Donta Hall     ORL               99020.0
7300             Norvel Pelle     BRK               99020.0
7049             Jared Harper     NYK               99020.0
6924               Tyler Cook     BRK               99020.0
6984             Yogi Ferrell     CLE              118983.0
7199            Naji Marshall     NOP              120000.0
7186             Didi Louzada     NOP              123056.0
6867         Ignas Brazdeikis     NYK              148530.0
7407            Isaiah Thomas     NOP              158907.0
6947           Mamadi Diakite     MIL   


  3d. who are the middle 50% by salary? Which teams did they play for?


All the players whose salary are between 25 and 75 % quartile will belong to middle 50% salary 

In [None]:
player_sal_25 = player_sal_2021['Player_Season_Salary'].quantile(0.25)
player_sal_75 = player_sal_2021['Player_Season_Salary'].quantile(0.5)
middle_50_pct_player = player_sal_2021[player_sal_2021['Player_Season_Salary'].between(player_sal_25, player_sal_75)]
print(middle_50_pct_player.sort_values(['Player_Season_Salary'], ascending=True))

                  Player Team_ID  Player_Season_Salary
7252      Svi Mykhailiuk     DET             1663861.0
7434   Jarred Vanderbilt     MIN             1663861.0
7020     Devonte' Graham     CHO             1663861.0
7425      Gary Trent Jr.     POR             1663861.0
6949      Hamidou Diallo     OKC             1663861.0
...                  ...     ...                   ...
7213      T.J. McConnell     IND             3500000.0
7462       Derrick White     SAS             3516284.0
6863        Tony Bradley     PHI             3542060.0
7312  Michael Porter Jr.     DEN             3550800.0
7162          Kyle Kuzma     LAL             3562178.0

[125 rows x 3 columns]



  3e. over the career of each of the active players in the most current season, how much
money was paid to by season?

There are 481 players in current season, the total money paid to them was 3588933700.0 dollars.

In [None]:
print(player_sal_2021['Player_Season_Salary'].sum())

3588933700.0


###4. Team-player statistics: (3 points)

  4a. what is the average salary of each team by season?


In [None]:
print('Average salary of each team by season')
# print(team_stats.groupby('season_year')[['total_salary']].mean())
team_stats[['team_id','season_year','avg_player_salary']]

Average salary of each team by season


Unnamed: 0,team_id,season_year,avg_player_salary
0,WAS,2009-10,4590017.12
1,UTA,2009-10,5992103.67
2,TOR,2009-10,4238902.06
3,SAS,2009-10,5667022.07
4,SAC,2009-10,3553458.84
...,...,...,...
355,LAC,2016-17,6457641.17
356,LAC,2017-18,5417192.09
357,LAC,2018-19,4734828.08
358,LAC,2019-20,7441216.44



  4b. what is the average age of the players by season? Average and variance of experience
by season of each team?

In [None]:
season_stats.columns

Index(['Player', 'Year', 'Season_Year', 'Draft', 'Rank', 'Experience',
       'Height', 'Weight', 'Position', 'Age', 'Team_ID', 'Games', 'GS',
       'Mins_Played', 'Field_Goal', 'FG_attempts', 'FG_percent', '3_Point_FG',
       '3P_attempts', '3P_percent', '2_Point_FG', '2P_attempts', '2P_percent',
       'eFG%', 'Free_Throw', 'FT_attempts', 'FT_percent', 'Offensive_Rebounds',
       'Defensive_Rebounds', 'Total_Rebounds', 'Assists', 'Steals', 'Blocks',
       'Turnovers', 'Personal_Fouls', 'Points', 'Player_Career_Salary',
       'Player_Season_Salary'],
      dtype='object')

In [None]:
season_stats.groupby(['Team_ID', 'Season_Year']).mean()['Experience']

Team_ID  Season_Year
ATL      2009-10        12.071429
         2010-11        11.470588
         2011-12        12.812500
         2012-13        11.000000
         2013-14        10.769231
                          ...    
WAS      2016-17         7.222222
         2017-18         8.785714
         2018-19         7.708333
         2019-20         5.190476
         2020-21         5.052632
Name: Experience, Length: 360, dtype: float64

In [None]:
season_stats.groupby(['Team_ID', 'Season_Year']).var()['Experience']

Team_ID  Season_Year
ATL      2009-10        28.994505
         2010-11        22.514706
         2011-12        17.629167
         2012-13        19.058824
         2013-14        23.525641
                          ...    
WAS      2016-17        11.477124
         2017-18         7.258242
         2018-19        16.998188
         2019-20        15.261905
         2020-21        15.052632
Name: Experience, Length: 360, dtype: float64

In [None]:
season_stats.groupby(['Team_ID', 'Season_Year']).mean()['Age']

Team_ID  Season_Year
ATL      2009-10        26.571429
         2010-11        27.470588
         2011-12        29.312500
         2012-13        26.888889
         2013-14        26.153846
                          ...    
WAS      2016-17        25.666667
         2017-18        27.500000
         2018-19        26.750000
         2019-20        25.428571
         2020-21        25.473684
Name: Age, Length: 360, dtype: float64

In [None]:
print('Average age of each team by season')
print(team_stats.groupby('name')[['avg_team_age']].mean())
print('\n')
print('Average experience of each team by season')
print(team_stats.groupby('name')[['avg_team_exp']].mean())
print('\n')
print('Variance of experience of team by season')
print(team_stats.groupby('name')[['avg_team_exp']].var())

Average age of each team by season
                        avg_team_age
name                                
Atlanta Hawks              26.757500
Boston Celtics             26.150000
Brooklyn Nets              26.489167
Charlotte Hornets          25.915000
Chicago Bulls              26.497500
Cleveland Cavaliers        26.676667
Dallas Mavericks           27.987500
Denver Nuggets             25.766667
Detroit Pistons            26.160000
Golden State Warriors      26.400833
Houston Rockets            26.425833
Indiana Pacers             26.202500
Los Angeles Clippers       27.717500
Los Angeles Lakers         27.265000
Memphis Grizzlies          26.145833
Miami Heat                 28.087500
Milwaukee Bucks            26.400000
Minnesota Timberwolves     25.585000
New Orleans Pelicans       26.045000
New York Knicks            26.628333
Oklahoma City Thunder      25.698333
Orlando Magic              25.782500
Philadelphia 76ers         25.318333
Phoenix Suns               26.045833
Por


  4c. provide the information in b. in a "cross-tabulation" format, i.e. teams are on rows
and seasons are on columns, and statistics are cell values.

In [None]:
pd.crosstab(season_stats.Team_ID, season_stats.Season_Year,values = season_stats.Experience, aggfunc = 'mean')

Season_Year,2009-10,2010-11,2011-12,2012-13,2013-14,2014-15,2015-16,2016-17,2017-18,2018-19,2019-20,2020-21
Team_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
ATL,12.071429,11.470588,12.8125,11.0,10.769231,10.0,9.823529,9.35,5.65,5.333333,6.05,5.9375
BOS,11.166667,11.238095,12.0625,10.263158,10.428571,9.136364,7.5625,8.2,6.111111,6.733333,4.733333,4.842105
BRK,,,,9.25,11.0,8.611111,7.411765,6.190476,6.473684,5.888889,7.0,6.84
CHA,11.315789,10.473684,8.933333,8.5,9.466667,,,,,,,
CHI,9.944444,12.066667,11.642857,11.642857,10.1875,10.928571,9.6875,7.666667,5.631579,5.368421,4.8125,5.4
CHO,,,,,,8.647059,8.294118,7.578947,7.933333,8.266667,5.5625,3.933333
CLE,9.222222,7.421053,7.157895,8.666667,9.428571,10.7,10.722222,11.095238,11.25,7.521739,5.1,5.272727
DAL,12.944444,12.944444,13.0,11.105263,12.153846,11.684211,11.125,6.958333,6.95,7.894737,6.352941,5.647059
DEN,12.384615,11.947368,10.875,9.8,8.875,8.55,7.736842,8.052632,8.375,6.5,5.45,5.25
DET,10.5,10.769231,8.857143,9.2,9.333333,9.05,8.882353,7.8,7.6,7.888889,6.15,5.1


In [None]:
pd.crosstab(season_stats.Team_ID, season_stats.Season_Year,values = season_stats.Experience, aggfunc = 'var')

Season_Year,2009-10,2010-11,2011-12,2012-13,2013-14,2014-15,2015-16,2016-17,2017-18,2018-19,2019-20,2020-21
Team_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
ATL,28.994505,22.514706,17.629167,19.058824,23.525641,20.4,17.904412,25.818421,12.871053,21.733333,22.997368,22.329167
BOS,34.617647,32.290476,27.929167,37.204678,10.571429,13.361472,12.529167,13.028571,12.810458,12.780952,10.352381,11.251462
BRK,,,,29.533333,35.5,31.898693,22.007353,8.261905,5.374269,12.575163,20.2,17.306667
CHA,20.22807,18.040936,14.495238,9.676471,8.838095,,,,,,,
CHI,18.761438,9.066667,7.324176,13.631868,23.895833,20.686813,16.7625,16.235294,6.356725,6.912281,8.9625,13.094737
CHO,,,,,,12.367647,14.720588,14.923977,18.066667,19.780952,17.195833,10.780952
CLE,30.535948,19.701754,20.02924,15.176471,14.417582,23.168421,22.212418,24.990476,18.197368,18.715415,11.463158,15.255411
DAL,20.996732,24.408497,45.272727,33.877193,32.807692,25.894737,20.783333,27.519928,27.944737,28.210526,9.742647,13.617647
DEN,18.423077,19.385965,18.516667,18.6,13.183333,11.102632,17.538012,13.052632,18.383333,12.533333,9.313158,15.460526
DET,9.961538,13.025641,17.208791,11.742857,11.878788,10.997368,8.735294,9.885714,12.147368,14.928105,10.555263,14.2


In [None]:
pd.crosstab(season_stats.Team_ID, season_stats.Season_Year,values = season_stats.Age, aggfunc = 'mean')

Season_Year,2009-10,2010-11,2011-12,2012-13,2013-14,2014-15,2015-16,2016-17,2017-18,2018-19,2019-20,2020-21
Team_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
ATL,26.571429,27.470588,29.3125,26.888889,26.153846,27.0,27.470588,28.2,25.45,25.142857,25.95,26.0
BOS,28.166667,27.904762,28.4375,27.368421,26.714286,26.181818,24.5,25.266667,25.555556,25.866667,24.8,25.105263
BRK,,,,27.5,28.705882,27.777778,26.0,25.809524,25.473684,25.277778,26.571429,27.36
CHA,27.263158,26.842105,25.866667,26.222222,26.466667,,,,,,,
CHI,26.944444,27.466667,28.0,28.285714,27.125,28.285714,27.5625,25.888889,24.894737,24.105263,24.375,25.9
CHO,,,,,,26.117647,25.941176,25.789474,25.6,26.133333,24.9375,24.533333
CLE,27.055556,26.210526,25.263158,24.722222,24.785714,27.95,28.833333,30.047619,29.3,26.391304,24.75,25.818182
DAL,29.333333,28.444444,30.75,29.315789,29.0,28.578947,29.1875,26.75,27.15,27.736842,27.294118,27.176471
DEN,27.615385,27.210526,25.875,24.733333,26.0,25.85,25.526316,25.473684,26.0,25.125,25.0,25.65
DET,26.857143,27.230769,27.071429,25.933333,26.166667,27.0,25.882353,25.466667,26.45,26.333333,25.9,24.9


###Explanation

###5. What other data from the website can you use to explain salary? 
Produce a table of summary statistics for key variables you shall use in your analysis in the following Part C.
Summary statistics should include at least sample average, standard deviation, min/max.
(5 points)

####Determinants of NBA Player Salaries
According to our preliminary data analysis, players in the National Basketball Association (NBA) are paid handsomely. A player's pay in the NBA can be influenced by a variety of factors. Some variables may have a positive impact on salary, while others may have a negative impact or have no impact at all.
In this project, We will look at how the independent variables listed below explain our dependent variable(Salary of a player per season).

- `Experience`: *Number of years the player has played in NBA*
- `Height` : *Height of the player at the start of the season*
- `Weight`: *Weight of the player at the start of the season*
- `Postition` *Players in a basketball game have assigned basketball positions: center, power forward, small forward, point guard, and shooting guard*
- `Age`: *Age of the player at the start of the season*
- `Games`: *Total number of games played in a season*
- `Points` : *In the game of basketball, points are scored anytime a player puts the ball through the basket. This can be done with the a variety of shots – free throws, 2-pointers, 3-pointers and more.*
- `field_goal` : *A field goal is a basket scored on any shot or tap other than a free throw, worth two or three points depending on the distance of the attempt from the basket*
- `three_point_field_goals` : *A semi-circle painted on the court, from outside of which a successful shot earns three points*
- `free_throw` *A free shot taken from the free throw line as the result of a foul*
- `rebounds` :  *A rebound is when a player retrieves the basketball directly after a missed shot. If a player on offense grabs the ball after a missed shot, then it is an offensive rebound. If a defensive player retrieves the ball then it is listed as a defensive rebound.*
- `assists` : *A pass to a teammate that leads directly to a goal*
- `steals` : *A steal in basketball is when a defensive player is responsible for a legal turnover due to his/her positive, defensive actions*
- `blocks`: *To stop the movement of an opponent, or stop an opponent's pass or shot*
- `turnovers`: *A player loses the ball to a member of the other team without taking a shot*
- `personal_fouls` : *A foul that involves illegal physical contact such as blocking, charging, elbowing or holding*




In [None]:
#summary stats of key independent variable
season_stats_summary = season_unbalanced_data_per_game[['Experience', 'Height','Weight','Age', 'Games', 'Age','Field_Goal','3_Point_FG','2_Point_FG','Mins_Played_per_game','Total_Rebounds_per_game','Assists_per_game','Steals_per_game','Blocks_per_game','Turnovers_per_game','Personal_Fouls_per_game','Points_per_game']]
season_stats_summary.describe()

Unnamed: 0,Experience,Height,Weight,Age,Games,Age.1,Field_Goal,3_Point_FG,2_Point_FG,Mins_Played_per_game,Total_Rebounds_per_game,Assists_per_game,Steals_per_game,Blocks_per_game,Turnovers_per_game,Personal_Fouls_per_game,Points_per_game
count,5531.0,5531.0,5531.0,5531.0,5531.0,5531.0,5531.0,5531.0,5531.0,5531.0,5531.0,5531.0,5531.0,5531.0,5531.0,5531.0,5531.0
mean,8.557404,78.799675,221.35979,26.400651,52.724101,26.400651,196.559031,45.09058,151.468451,20.717845,3.696764,1.91186,0.650154,0.41911,1.176442,1.785464,8.700506
std,4.600152,3.46332,26.385527,4.261187,23.30002,4.261187,165.293546,52.939328,140.060555,9.190563,2.459086,1.811226,0.420107,0.436215,0.790648,0.739299,5.915214
min,1.0,65.0,135.0,19.0,1.0,19.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,5.0,76.0,200.0,23.0,36.0,23.0,59.0,1.0,40.0,13.4,1.9,0.7,0.3,0.1,0.6,1.3,4.2
50%,8.0,79.0,220.0,26.0,59.0,26.0,161.0,25.0,112.0,20.5,3.1,1.3,0.6,0.3,1.0,1.8,7.3
75%,12.0,81.0,240.0,29.0,72.0,29.0,293.5,74.0,219.0,28.35,4.8,2.5,0.9,0.5,1.6,2.3,12.0
max,22.0,90.0,360.0,43.0,83.0,43.0,857.0,402.0,734.0,42.0,16.0,11.7,2.5,3.7,5.7,6.0,36.1


In [None]:
#Correlation matrix with dependent variable Player_Season_Salary and independent variables
season_stats_corr = season_unbalanced_data_per_game[['Player_Season_Salary','Experience', 'Height','Weight','Age', 'Games', 'Age','Field_Goal','3_Point_FG','2_Point_FG','Mins_Played_per_game','Total_Rebounds_per_game','Assists_per_game','Steals_per_game','Blocks_per_game','Turnovers_per_game','Personal_Fouls_per_game','Points_per_game']]
season_stats_corr.corr()

Unnamed: 0,Player_Season_Salary,Experience,Height,Weight,Age,Games,Age.1,Field_Goal,3_Point_FG,2_Point_FG,Mins_Played_per_game,Total_Rebounds_per_game,Assists_per_game,Steals_per_game,Blocks_per_game,Turnovers_per_game,Personal_Fouls_per_game,Points_per_game
Player_Season_Salary,1.0,0.401215,0.05261,0.110903,0.271388,0.260295,0.271388,0.555668,0.361998,0.518949,0.555893,0.482605,0.472918,0.413928,0.277398,0.542375,0.343084,0.633309
Experience,0.401215,1.0,0.023627,0.09325,0.666218,0.36225,0.666218,0.375891,0.207852,0.365048,0.427548,0.294458,0.283575,0.299192,0.17002,0.314805,0.28865,0.342847
Height,0.05261,0.023627,1.0,0.806781,0.005562,-0.033266,0.005562,-0.045531,-0.315778,0.065623,-0.111712,0.420889,-0.456448,-0.275335,0.493793,-0.18895,0.199609,-0.099587
Weight,0.110903,0.09325,0.806781,1.0,0.035729,0.021868,0.035729,0.037031,-0.290008,0.153318,-0.02962,0.478735,-0.35066,-0.200719,0.47409,-0.083203,0.268443,-0.014574
Age,0.271388,0.666218,0.005562,0.035729,1.0,0.030184,1.0,0.002429,0.07704,-0.026253,0.066514,0.018765,0.084055,0.010714,-0.035426,0.001401,0.014954,0.013517
Games,0.260295,0.36225,-0.033266,0.021868,0.030184,1.0,0.030184,0.716253,0.49498,0.658202,0.627009,0.427868,0.336682,0.448797,0.266633,0.408475,0.486223,0.499219
Age,0.271388,0.666218,0.005562,0.035729,1.0,0.030184,1.0,0.002429,0.07704,-0.026253,0.066514,0.018765,0.084055,0.010714,-0.035426,0.001401,0.014954,0.013517
Field_Goal,0.555668,0.375891,-0.045531,0.037031,0.002429,0.716253,0.002429,1.0,0.600396,0.953223,0.84865,0.614053,0.592284,0.628897,0.346878,0.752233,0.558441,0.91208
3_Point_FG,0.361998,0.207852,-0.315778,-0.290008,0.07704,0.49498,0.07704,0.600396,1.0,0.330588,0.584347,0.078086,0.469081,0.465849,-0.098198,0.43011,0.2338,0.612315
2_Point_FG,0.518949,0.365048,0.065623,0.153318,-0.026253,0.658202,-0.026253,0.953223,0.330588,1.0,0.780673,0.695165,0.521687,0.566119,0.446487,0.725183,0.570678,0.844959


We see Points_per_game, Field_Goal,2_Point_FG, Mins_Played_per_game, Total_Rebounds_per_game, Assists_per_game, and Turnovers_per_game are correlated with Player_Season_Salary.