Import pandas and numpy packages and give them abbreviations for easier callback

In [None]:
import pandas as pd
import numpy as py

Read in my data file in .csv format

In [None]:
data = pd.read_csv("/Users/brycebangerter/Documents/NBA_player_of_the_week.csv")

View first 10 rows of data

In [None]:
data.head(10)

Describe the data to see if there are any odd values that cast doubt on cleanliness/formatting of data

In [None]:
data.describe()

Change the column names to have underscores instead of spaces for easier callback

In [None]:
data.rename(index=str, columns={'Draft Year':'Draft_Year', 'Season short': 'Season_Short', 'Seasons in league': 'Seasons_In_League', 'Height': 'Height_In', 'Weight': 'Weight_lbs'}, inplace=True)

strip the label "kg" at the end of some of the data inputs for weight

In [None]:
data['Weight_lbs'] = data['Weight_lbs'].map(lambda x: x.rstrip('kg'))

I wanted to drop Conference and real value columns because they don't matter. 

In [None]:
data.drop(data.columns[[1,4,12]], axis=1, inplace=True)

View data table to see that all changes have been made correctly

In [None]:
data

Want to see what type (string, integer) each column's data are

In [None]:
data.info()

Convert Weights from string to integer so I can manipulate it

In [None]:
data['Weight_lbs'] = data['Weight_lbs'].astype(int)

Defining a function to convert from kg to pounds on the weights in 2017 and 2018 since they were the only years listed as kg when all the other years were listed in pounds. Then I ran the function and converted the data type to integer because the output turned it into a float.

In [None]:
def kg_to_lbs(row):
    if (row["Season_Short"] >= 2017):
        return (row["Weight_lbs"] * 2.20462)
    else:
        return row["Weight_lbs"]

In [None]:
data['Weight_lbs'] = data.apply(kg_to_lbs, axis=1)

In [None]:
data['Weight_lbs'] = data['Weight_lbs'].astype(int)

Wanted to see the bottom and top rows of the data to see if the function performed correctly

In [None]:
data.tail()

In [None]:
data.head()

With this function, I change the team names to their abbreviations. I also change outdated franchise names to their current franchise name. For example, the Washington Bullets became the Washington Wizards in 1997, so I renamed the Bullets data to be Wizards data so they would be counted and grouped together, since the franchise is the same. 

In [None]:
def abbreviation(row):
    if (row["Team"] == "Los Angeles Lakers"):
        return "LAL"
    elif (row["Team"] == "San Antonio Spurs"):
        return "SAS"
    elif (row["Team"] == "Cleveland Cavaliers"):
        return "CLE"
    elif (row["Team"] == "Miami Heat"):
        return "MIA"
    elif (row["Team"] == "Houston Rockets"):
        return "HOU"
    elif (row["Team"] == "Utah Jazz"):
        return "UTA"
    elif (row["Team"] == "Phoenix Suns"):
        return "PHX"
    elif (row["Team"] == "Golden State Warriors"):
        return "GSW"
    elif (row["Team"] == "Chicago Bulls"):
        return "CHI"
    elif (row["Team"] == "Orlando Magic"):
        return "ORL"
    elif (row["Team"] == "Boston Celtics"):
        return "BOS"
    elif (row["Team"] == "Oklahoma City Thunder"):
        return "OKC"
    elif (row["Team"] == "Denver Nuggets"):
        return "DEN"
    elif (row["Team"] == "Philadelphia Sixers"):
        return "PHI"
    elif (row["Team"] == "New York Knicks"):
        return "NYK"
    elif (row["Team"] == "Portland Trail Blazers"):
        return "POR"
    elif (row["Team"] == "Atlanta Hawks"):
        return "ATL"
    elif (row["Team"] == "New Jersey Nets"):
        return "BKN"
    elif (row["Team"] == "Toronto Raptors"):
        return "TOR"
    elif (row["Team"] == "Dallas Mavericks"):
        return "DAL"
    elif (row["Team"] == "Detroit Pistons"):
        return "DET"
    elif (row["Team"] == "Los Angeles Clippers"):
        return "LAC"
    elif (row["Team"] == "Milwaukee Bucks"):
        return "MIL"
    elif (row["Team"] == "Minnesota Timberwolves"):
        return "MIN"
    elif (row["Team"] == "Indiana Pacers"):
        return "IND"
    elif (row["Team"] == "Washington Wizards"):
        return "WAS"
    elif (row["Team"] == "Sacramento Kings"):
        return "SAC"
    elif (row["Team"] == "Seattle SuperSonics"):
        return "OKC"
    elif (row["Team"] == "Charlotte Hornets"):
        return "CHA"
    elif (row["Team"] == "New Orleans Hornets"):
        return "NOP"
    elif (row["Team"] == "Charlotte Bobcats"):
        return "CHA"
    elif (row["Team"] == "Memphis Grizzlies"):
        return "MEM"
    elif (row["Team"] == "Washington Bullets"):
        return "WAS"
    elif (row["Team"] == "Brooklyn Nets"):
        return "BKN"
    elif (row["Team"] == "New Orleans Pelicans"):
        return "NOP"
    else:
        return row["Team"]

In [None]:
data['Team'] = data.apply(abbreviation, axis=1)

I do various value counts for Teams (shows each team and how often they had a player that was player of the week), Seasons In League (shows how many players of the week were in a particular year of their career when they won the award), Age (shows how many players of the week were a particular age when they won the award), and Position (shows how many players of the week were a particular position).

In [None]:
data['Team'].value_counts()

In [None]:
data['Seasons_In_League'].value_counts()

In [None]:
data['Age'].value_counts()

In [None]:
data['Position'].value_counts()

This shows the top 10 players with the most player of the week awards and the number of times they won it

In [None]:
data['Player'].value_counts().nlargest(10)

This shows the statistics of the data, such as mean, standard deviation, minimum and maximum for each continuous variable in the data

In [None]:
data.describe()

I made box and whisker plots for various variables

In [None]:
bbox = data['Seasons_In_League'].plot(kind="box")

In [None]:
bbox = data['Age'].plot(kind="box")

In [None]:
bbox = data['Weight_lbs'].plot(kind='box')

In [None]:
bbox = data['Weight_lbs'].plot(kind="box")

I import the data visualization packages I need

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

Made a bar graph that shows the frequency of player of the week awards by team

In [None]:
plt.figure(figsize=(15,4))
sns.countplot(x="Team", data=data)
plt.show()

Made a bar graph and histogram to represent the frequency/distribution of age for player of the week award recipients

In [None]:
plt.figure(figsize=(10,4))
sns.countplot(x="Age", data=data)
plt.show()

In [None]:
sns.distplot(data.Age)

Made a histogram to represent the distribution of seasons in the league for player of the week award recipients. It appears to be normally distributed such that players who are at the beginning or end of their career are players of the week less often than those in the "prime" years of the middle of their career. 

In [None]:
sns.distplot(data.Seasons_In_League)

Made a bar graph to show the frequency of positions that win player of the week

In [None]:
plt.figure(figsize=(15,4))
sns.countplot(x="Position", data=data)
plt.show()

Lastly, I wanted to show the distribution of weight for play of the week award recipients. We can see that it is unclear how normally distributed it is. This may be because certain weights might correlate to "overweight" guards or "underweight" forwards/centers. If you were over/underweight for your position, it is plausible that you would be less likely to be a player of the week. 

In [None]:
sns.distplot(data.Weight_lbs)