# WNBA Player Statistics

We are going to user player scoring data from the website [stats.wnba.com](https://stats.wnba.com) that has been saved to a CSV file.

In [None]:
import pandas as pd
try:
    df = pd.read_csv('../data/wnba-player-scoring-1997-2023.csv')
except:
    df = pd.read_csv('https://raw.githubusercontent.com/callysto/basketball-and-data-science/main/content/data/wnba-player-scoring-1997-2023.csv')
df

Let's try a quick visualization of points versus minutes played per game.

In [None]:
import plotly.express as px
px.scatter(df, x='MIN', y='PTS', title='Points vs. Minutes Played Per Game', hover_data=['PLAYER', 'TEAM', 'SEASON'])

There are a lot of abreviations in the column titles, we can use the glossary on [one of the stats pages](https://stats.wnba.com/team/1611661319/players-traditional) to create a dictionary for translating abreviations to what they mean.

In [None]:
column_titles = {
    'GP':'Games Played',
    'MIN':'Minutes Played',
    'PTS':'Points',
    'FGM':'Field Goals Made',
    'FGA':'Field Goals Attempted',
    'FG%':'Field Goal Percentage',
    '3PM':'3 Point Field Goals Made',
    '3PA':'3 Point Field Goals Attempted',
    '3P%':'3 Point Field Goal Percentage',
    'FTM':'Free Throws Made',
    'FTA':'Free Throws Attempted',
    'FT%':'Free Throw Percentage',
    'OREB':'Offensive Rebounds',
    'DREB':'Defensive Rebounds',
    'REB':'Rebounds',
    'AST':'Assists',
    'TOV':'Turnovers',
    'STL':'Steals',
    'BLK':'Blocks',
    'PF':'Personal Fouls',
    '+/-':'Plus Minus'}
print(f'For example, the column "FG%" means: {column_titles["FG%"]}')

Now we can use that dictionary to set the axis titles.

In [None]:
x = 'MIN'
y = 'PTS'
x_title = column_titles[x]
y_title = column_titles[y]
title = f'{y_title} vs. {x_title}'
px.scatter(df, x=x, y=y, title=title, hover_data=['PLAYER', 'TEAM', 'SEASON']).update_xaxes(title_text=x_title).update_yaxes(title_text=y_title)

It might be interesting to see who scored the most per minute played. We'll create a new column for "Points per Minute" and also add it to our `column_titles` dictionary.

In [None]:
df['PPM'] = df['PTS'] / df['MIN']
column_titles['PPM'] = 'Points Per Minute'
df.sort_values(by='PPM', ascending=False).head(10)

There are players in that "top ten" list that only played a few games. Let's filter it to include only those who played more than 20 games.

In [None]:
df[ df['GP']>20 ].sort_values(by='PPM', ascending=False).head(10)

We notice that there are some players with high PPM values in multiple seasons. Let's find each player's career averages.

In [None]:
career_averages = df.groupby('PLAYER').mean(numeric_only=True)
career_averages

Now we can make a bar graph of the top 15 "points per minute" players who played an average of more than 20 games per season.

In [None]:
graph_this = career_averages[career_averages['GP']>20].sort_values(by='PPM', ascending=False).head(15)
px.bar(graph_this, y='PPM', title='Top 15 Points Per Minute Players').update_yaxes(title_text=column_titles['PPM'])