## **Analyzing "awards_players"**

We can analyze this dataset by examining the correlation between the performance of winning players and coaches to identify the key factors that contribute to their success. Using a **horizontal bar plot**, we can quickly see which variables have the strongest impact on award outcomes for players and coaches.

#### Performance Correlation to Winning an Award

- **All-Star Game Most Valuable Player:**  
  Analyze all playersâ€™ statistics from the All-Star Game to determine performance factors contributing to MVP selection.

- **Coach of the Year:**  
  Evaluate all coaching statistics to identify performance indicators correlated with earning this award.

- **Defensive Player of the Year:**  
  Analyze playersâ€™ defensive statistics (e.g., steals, blocks, defensive rebounds, and defensive efficiency) to assess their impact on winning this award.

- **Kim Perrot Sportsmanship Award:**  
  Examine playersâ€™ foul statistics and conduct-related data to evaluate correlations with sportsmanship recognition.

- **Most Improved Player:**  
  Compare a playerâ€™s performance metrics from the previous season to the current season to measure improvement trends.

- **Most Valuable Player:**  
  Analyze comprehensive regular-season player statistics to identify key factors influencing MVP selection.

- **Rookie of the Year:**  
  Evaluate rookie player statistics to determine which performance metrics are most predictive of winning this award.

- **Sixth Woman of the Year:**  
  Analyze playersâ€™ performance statistics along with the percentage of games played off the bench (i.e., not started) to assess their eligibility and impact.

- **WNBA Finals Most Valuable Player:**  
  Analyze player performance statistics from the postseason and Finals series to determine the strongest correlates of Finals MVP selection.

- **WNBA All-Decade Team:**  
  Examine team and player performance data over the past ten years to identify consistent excellence leading to All-Decade recognition.

- **WNBA All-Decade Team Honorable Mention:**  
  Criteria and analysis parameters are yet to be defined.


### **Introduction to the Dataset**

This section provides a brief analysis of the dataset, highlighting its key metrics and characteristics. Additionally, it explores interesting variables to uncover patterns and insights.

In [187]:
import sys
import os
sys.path.append('..')

from data_scripts import _store_data as sd
import importlib
import data_scripts.awards_players_data as apd

importlib.reload(apd)
from pathlib import Path

sd.load_data(Path("../data"))
apd.load_dataset()
display(sd.df_info_table(sd.awards_players_df))
top_players = sd.awards_players_df['playerID'].value_counts().head(10).reset_index()
top_players.columns = ['playerID', 'award_count']
display(top_players)
unique_awards = sd.awards_players_df['award'].value_counts().reset_index()
unique_awards.columns = ['award', 'award_count']
display(unique_awards)


Unnamed: 0,Non-Null Count,Null Count,Missing %,Dtype,Unique Values
playerID,95,0,0.0,object,58
award,95,0,0.0,object,12
year,95,0,0.0,int64,10
lgID,95,0,0.0,object,1


Unnamed: 0,playerID,award_count
0,leslili01w,10
1,swoopsh01w,8
2,catchta01w,5
3,jacksla01w,4
4,tauradi01w,4
5,thompti01w,2
6,thibami99w,2
7,mcconsu01w,2
8,hugheda99w,2
9,fordch01w,2


Unnamed: 0,award,award_count
0,Coach of the Year,10
1,Defensive Player of the Year,10
2,Most Valuable Player,10
3,Rookie of the Year,10
4,WNBA All-Decade Team,10
5,WNBA Finals Most Valuable Player,10
6,Kim Perrot Sportsmanship Award,9
7,Most Improved Player,9
8,All-Star Game Most Valuable Player,8
9,WNBA All Decade Team Honorable Mention,5


This dataset contais **$95$ awards** for **$58$ unique players and coaches** across **$10$ different years** and **$12$ award types**. The data is complete, with no missing values in the **playerID**, **award**, **year** or **lgID** columns.

The results table shows a highly skewed distribution of awards, the top player, **leslili01w**, leads significantly with **$10$ awards** while the second-highest player, **swoophs01w**, has **$8$ awards**. This indicates that a small number of players are responsible for a large share of the total awards given.

In the last table, it is possible to see that one award name is very similar to another â€” the **Kim Perrot Sportsmanship Award** and **Kim Perrot Sportsmanship**. Since these refer to the same award, the name **Kim Perrot Sportsmanship** will be standardized to **Kim Perrot Sportsmanship Award** before performing the analysis. Also, it seems that some awards are missing in some of the 10 years.



### **Cleaning**

#### Fixing Awards Error

In [188]:
sd.awards_players_df['award'] = sd.awards_players_df['award'].replace(
    "Kim Perrot Sportsmanship", "Kim Perrot Sportsmanship Award"
)

With this adjustment, the name will also be updated in the stored dataset to ensure consistency before conducting the analysis.

#### Dropping Columns with Unique Values

In [189]:
del sd.awards_players_df["lgID"]

Since the `lgID` column contains only one unique value, it can be removed as it will not affect the analysis of the dataset.

### **EDA**

### Correlation between Performance and Awards

In [190]:
apd.asgmvp_analyze()


ðŸ“Š Correlation Analysis: All-Star Game MVP â€” Correlation with Winning Award
  PPG                 : +0.143  |  Points per game
  RPG                 : +0.143  |  Rebounds per game
  MPG                 : +0.116  |  Minutes per game
  APG                 : +0.093  |  Assists per game
  TS%                 : +0.060  |  True shooting percentage
  3P%                 : +0.046  |  Three-point percentage
  FG%                 : +0.044  |  Field goal percentage
  FT%                 : +0.034  |  Free throw percentage
--------------------------------------------------------------------------------


This graph shows that the most important stats for determining an **All-Star Game MVP** are **points** and **rebounds per game**, with **minutes** and **assists per game** closely following.

In [191]:
apd.coy_analyze()


ðŸ“Š Correlation Analysis: Coach of the Year â€” Correlation with Winning Award
  won                 : +0.252  |  Games won
  Win%                : +0.183  |  Winning percentage
  PostWin%            : +0.076  |  Postseason win percentage
  post_wins           : +0.028  |  Playoff games won
  lost                : -0.096  |  Games lost
  post_losses         : -0.156  |  Playoff games lost
--------------------------------------------------------------------------------


This graph shows that **more wins** and a **higher win percentage** strongly increase a coachâ€™s chances of winning **Coach of the Year**, while **more losses** have a negative impact, and **postseason losses** affect it even more.


In [192]:
apd.dpoy_analyze()


ðŸ“Š Correlation Analysis: Defensive Player of the Year â€” Correlation with Winning Award
  SPG                 : +0.218  |  Steals per game
  DRPG                : +0.167  |  Defensive rebounds per game
  BPG                 : +0.158  |  Blocks per game
  TOV/G               : +0.112  |  Turnovers per game
--------------------------------------------------------------------------------


Here, we can see that the defensive stat that most increases a playerâ€™s chances of winning **Defensive Player of the Year** is **steals per game**, noticeably ahead of the second most influential stat, **defensive rebounds per game**.


In [193]:
apd.fmvp_analyze()


ðŸ“Š Correlation Analysis: Finals MVP â€” Correlation with Winning Award
  PostPPG             : +0.235  |  Postseason points per game
  PostRPG             : +0.123  |  Postseason rebounds per game
  PostAPG             : +0.113  |  Postseason assists per game
  PostFG%             : +0.037  |  Postseason field goal percentage
--------------------------------------------------------------------------------


As shown in this graph, **postseason points per game** is the dominant factor influencing a playerâ€™s chances of winning the **Finals MVP** â€” its impact is roughly the sum of the second and third most influential stats.


In [194]:
apd.kpsw_analyze()


ðŸ“Š Correlation Analysis: Kim Perrot Sportsmanship Award â€” Correlation with Winning Award
  PF/G                : +0.006  |  Personal fouls per game
  DQ/G                : -0.009  |  Disqualifications per game
--------------------------------------------------------------------------------


For the **Kim Perrot Sportsmanship Award**, we can see that **personal fouls** and **game disqualifications** have little impact on a playerâ€™s chances of winning.


In [195]:
apd.mip_analyze()


ðŸ“Š Correlation Analysis: Most Improved Player â€” Correlation with Winning Award
  PPG_Improvement     : +0.063  |  Increase in points per game from previous season
  RPG_Improvement     : +0.063  |  Increase in rebounds per game from previous season
  APG_Improvement     : +0.036  |  Increase in assists per game from previous season
--------------------------------------------------------------------------------


As can be seen, for the **Most Improved Player (MIP)** award, although the influence is relatively small, having better stats in **points**, **rebounds**, and **assists per game** increases a playerâ€™s chances of winning the award.


In [196]:
apd.mvp_analyze()


ðŸ“Š Correlation Analysis: Most Valuable Player â€” Correlation with Winning Award
  PPG                 : +0.206  |  Points per game
  RPG                 : +0.177  |  Rebounds per game
  APG                 : +0.094  |  Assists per game
  TS%                 : +0.059  |  True shooting percentage
--------------------------------------------------------------------------------


Similar to the **Finals MVP (FMVP)**, a player with better **points per game (PPG)** and **rebounds per game (RPG)** has a higher chance of winning the **regular season MVP** award.


In [197]:
apd.roty_analyze()


ðŸ“Š Correlation Analysis: Rookie of the Year â€” Correlation with Winning Award
  PPG                 : +0.302  |  Points per game
  RPG                 : +0.252  |  Rebounds per game
  SPG                 : +0.215  |  Steals per game
  APG                 : +0.198  |  Assists per game
  BPG                 : +0.140  |  Blocks per game
--------------------------------------------------------------------------------


As expected, the **Rookie of the Year** is a player who exceeds expectations as a newcomer, posting better stats than other rookies in **points per game (PPG)**, **rebounds per game (RPG)**, **steals per game (SPG)**, **assists per game (APG)**, and **blocks per game (BPG)**.

Since the WNBA holds a draft to select rookies, where the first picks are generally assigned to the teams with the worst records in the previous season, we can analyze whether a teamâ€™s previous-year rank correlates with producing the Rookie of the Year. In some cases, the worst teams may trade their top draft picks to other teams, allowing a strong team to acquire a high draft pick. However, we do not have information about such trades in our dataset.

In [198]:
apd.roty_rank_of_team()

Unnamed: 0,playerID,year,tmID,prev_year_team_rank
7,stileja01w,2,POR,7
1,catchta01w,3,IND,6
2,fordch01w,4,DET,8
8,tauradi01w,5,PHO,7
3,johnste01w,6,WAS,4
0,augusse01w,7,MIN,6
6,pricear01w,8,CHI,7
5,parkeca01w,9,LAS,7
4,mccouan01w,10,ATL,7


As we can see, only one Rookie of the Year (playerID johnste01w, year 6) came from a team that reached the playoffs, which supports the idea that ROTY winners are more likely to come from teams with lower ranks in the previous season. This suggests that a teamâ€™s poor performance in the prior year is generally a good predictor for acquiring the Rookie of the Year, likely due to having higher draft picks. Additionally, when a team enters the WNBA, it is considered to have finished in last place the previous year, which also gives it access to the top draft picks.

For both the **WNBA All-Decade Team** and the **WNBA All-Decade Team Honorable Mention**, since a standard basketball lineup is typically composed of **1 center**, **2 guards**, and **2 forwards**, we will use the only year available in the data to examine whether the teams selected for these awards appear to follow this positional structure.

In [199]:
apd.analyse_all_decade_team_positions()

All-Decade Team Members:


Unnamed: 0,playerID,pos
0,birdsu01w,G
1,catchta01w,F
2,coopecy01w,G
3,griffyo01w,C-F
4,jacksla01w,F-C
5,leslili01w,C
6,smithka01w,G-F
7,staleda01w,G
8,swoopsh01w,F-G
9,thompti01w,F



Honorable Players:


Unnamed: 0,playerID,pos
0,boltoru01w,G
1,holdsch01w,F
2,penicti01w,G
3,tauradi01w,F-G
4,weathte01w,G


As we can see, if we adjust players with multiple positions to fit a base lineup, the **WNBA All-Decade** Team ends up with **2 centers**, **4 forwards**, and **4 guards**, which roughly maintains a standard positional balance. That said, we cannot be 100% certain that this structure is always strictly followed.

On the other hand, the **WNBA All-Decade Team Honorable Mention** includes **no centers**, and while it does not concentrate on a single position, it also does not follow an ideal base lineup. Under the same assumptions, this group would consist of **2 forwards** and **3 guards**. Since these selections are honorable mentions, it is likely that positional balance was not a strict criterion.

These are only hypotheses, and with the available information we cannot determine with certainty whether fixed positional bases were consistently used when selecting these teams.

Now we will see if there are players who won two or more awards in a single year and whether we can take that into consideration for predicting awards.

In [200]:
apd.players_multiple_awards_filtered()

Unnamed: 0,playerID,year,award,num_awards,multiple_awards
0,swoopsh01w,1,"[Defensive Player of the Year, Most Valuable Player]",2,True
1,leslili01w,2,"[All-Star Game Most Valuable Player, Most Valuable Player, WNBA Finals Most Valuable Player]",3,True
2,leslili01w,3,"[All-Star Game Most Valuable Player, WNBA Finals Most Valuable Player]",2,True
3,swoopsh01w,3,"[Defensive Player of the Year, Most Valuable Player]",2,True
4,leslili01w,5,"[Defensive Player of the Year, Most Valuable Player]",2,True
5,swoopsh01w,6,"[All-Star Game Most Valuable Player, Most Valuable Player]",2,True
6,jacksla01w,8,"[Defensive Player of the Year, Most Valuable Player]",2,True
7,parkeca01w,9,"[Most Valuable Player, Rookie of the Year]",2,True
8,tauradi01w,10,"[Most Valuable Player, WNBA Finals Most Valuable Player]",2,True


With this table, we can notice some interesting patterns regarding the awards themselves. Whenever a player wins two or more awards in a single year, one of them is always the Most Valuable Player (MVP), suggesting that the MVP award is strongly associated with receiving other accolades. With this in mind, we can use the MVP collumns to infer the likelihood of a player winning other awards, even when there is limited information for those awards, such as the Finals MVP or All-Star Game MVP. In year 9, there is also a player who is both Rookie of the Year and MVP, which could present a challenge for our prediction, since rookies do not have previous year metrics that could be useful to predict MVP, potentially reducing prediction accuracy.

### **EDA Conclusions**

A **significant positive correlation** was established between **superior on-field performance** in a specific area and the **likelihood of receiving a corresponding performance-based award**. However, this relationship was **not uniformly observed** across all categories â€” as **minimal correlation** was found between **objective performance metrics** and the selection for the **Most Improved Player (MIP)** and **Kim Perrot Sportsmanship Award**. In essence, excelling statistically increases the odds of recognition â€” but not all awards are purely performance-driven.


In [201]:
sd.save_data(Path("../data"))