<h1 align="center"  style='font-size:30px'> 🏀 <a style="color:white;font-weight:700;background-color:black;">⛹️NBA</a><a style="color:red;font-weight:700;background-color:black;">2K</a><a style="color:white;font-weight:700;background-color:black;">20</a> : <a style="color:black;">Data Visualization</a> </h1>

This is a visualization for NBA2K20 dataset kindly provided by <a href="https://www.kaggle.com/isaienkov/nba2k20-player-dataset">isainekov</a>.

**NBA 2K20** is a basketball simulation video game developed by Visual Concepts and published by 2K Sports, based on the National Basketball Association (NBA). It is the 21st installment in the NBA 2K franchise, the successor to NBA 2K19, and the predecessor to NBA 2K21. Anthony Davis of the Los Angeles Lakers is the cover athlete for the regular edition of the game, while Dwyane Wade is the cover athlete for the 'Legend Edition'.NBA 2K20 was released on September 6, 2019, for Microsoft Windows, Nintendo Switch, PlayStation 4, and Xbox One, and on November 18, 2019 for Stadia.

The player mainly plays NBA games with real-life or customized players and teams; games follow the rules and objectives of NBA games. Several game modes are present and many settings can be customized. Up to six expansion teams can be created and used in both MyLeague and MyGM Modes, with the possibility of a 36-team league, and any team can be relocated and rebranded.

<img style="width:100%;height:100%" src="https://4y96.com/wp-content/uploads/2020/04/nba2k20-banner_0.jpg"/>
<h3 align="center" style='font-size:10px'>Image source : 4y96.com</h3>

### Basic Library

In [None]:
# Data wrangling and math
import numpy as np
import pandas as pd 

# Visualization tools
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use('fivethirtyeight')
%matplotlib inline

# Datetime module
from datetime import datetime, date

# Data Exploration

<h3 class="list-group-item list-group-item-action active">Let's take a look at the data</h3>

In [None]:
df = pd.read_csv('/kaggle/input/nba2k20-player-dataset/nba2k20-full.csv')
df.head()

In [None]:
print(f'There are {df.shape[0]} players (rows) and {df.shape[1]} columns.\n')
print(f'Column names: {df.columns.values}')

<h3 class="list-group-item list-group-item-action active">Which player got the best and worst rating?</h3>


In [None]:
print(f"The best rating is {df.rating.max()} which are held by 👑{df[df.rating == df.rating.max()].full_name.values[0]} and {df[df.rating == df.rating.max()].full_name.values[1]}\n")
print(f"The worst rating is {df.rating.min()} which is held by {df[df.rating == df.rating.min()].full_name.values[0]}")

<h3 class="list-group-item list-group-item-action active">What about players' favorite jersey number?</h3>

In [None]:
jersey_number = df[['jersey','full_name']].groupby('jersey').count().sort_values(by='full_name', ascending=False)
fig, ax = plt.subplots(figsize=(10,5))
sns.barplot(x=jersey_number.index, y=jersey_number.full_name)
plt.xticks(rotation=90)
plt.xlabel('Jersey number')
plt.ylabel('Players count')
plt.title('Favorite Jersey Number')
plt.show()

Wow! There are tons of players with #0 on their jersey compared to other number.

<h3 class="list-group-item list-group-item-action active">How about player's country?</h3>

In [None]:
country = df[['country','full_name']].groupby('country').count().sort_values(by='full_name', ascending=False)
fig, ax = plt.subplots(figsize=(10,5))
sns.barplot(x=country.index, y=country.full_name)
plt.xticks(rotation=90)
plt.xlabel('Country')
plt.ylabel('Players count')
plt.title('Players\' Country')
plt.show()

Of course, it's USA. Now, which country got the best average rating?

<h3 class="list-group-item list-group-item-action active">Country based on rating</h3>

In [None]:
country_rating = df[['country','rating']].groupby('country').mean().sort_values(by='rating', ascending=False)
fig, ax = plt.subplots(figsize=(10,5))
sns.barplot(x=country_rating.index, y=country_rating.rating)
plt.xticks(rotation=90)
plt.xlabel('Country')
plt.ylabel('Average rating')
plt.title('Country Based on Rating')
plt.show()

<h3 class="list-group-item list-group-item-action active">Which team is the most diverse?</h3>

In [None]:
country_nunique = df.groupby(df['team'])['country'].nunique().sort_values(ascending = False).head(10)
fig, ax = plt.subplots(figsize=(10,5))
sns.barplot(x=country_nunique.values, y=country_nunique.index)
plt.xticks()
plt.xlabel('Team')
plt.ylabel('Number of unique country')
plt.title('Diversity in Team')
plt.show()

<h3 class="list-group-item list-group-item-action active">How old are the players?</h3>

In [None]:
def get_age(birthday): 
    today = date.today() 
    age = today.year - birthday.year - ((today.month, today.day) < (birthday.month, birthday.day))
    return age

df['b_day'] = pd.to_datetime(df['b_day'])
df['age'] = df['b_day'].apply(lambda x : get_age(x))
df

In [None]:
print(f"The oldest player is {df[df.age == df.age.max()].full_name.values[0]} ({df.age.max()} years old)\n")
print(f"The youngest player are {df[df.age == df.age.min()].full_name.values[0]} and {df[df.age == df.age.min()].full_name.values[1]} ({df.age.min()} years old)")

<h3 class="list-group-item list-group-item-action active">Any free agents?</h3>

In [None]:
free_agents = df[df['team'].isna()]
print(f'There are {free_agents.shape[0]} free agents.')
free_agents.head(3)

<h3 class="list-group-item list-group-item-action active">Which team got the best rating?</h3>

In [None]:
team_rating = df[['team','rating']].groupby('team').sum().sort_values(by='rating', ascending=False)
fig, ax = plt.subplots(figsize=(10,5))
sns.barplot(x=team_rating.index, y=team_rating.rating)
plt.xticks(rotation=90)
plt.xlabel('Team')
plt.ylabel('Sum of rating')
plt.title('Teams Based on Rating Sum')
plt.show()

 🦌 Milwaukee Bucks got the best sum of rating in total! What happened to, the former 2018 Champions, Golden State Warriors though? In this season, <a href="https://www.nytimes.com/2019/11/13/sports/basketball/golden-state-warriors-kerr.html">The Warriors Are in the Last Place.</a>

In [None]:
team_rating = df[['team','rating']].groupby('team').mean().sort_values(by='rating', ascending=False)
fig, ax = plt.subplots(figsize=(10,5))
sns.barplot(x=team_rating.index, y=team_rating.rating)
plt.xticks(rotation=90)
plt.xlabel('Team')
plt.ylabel('Average of rating')
plt.title('Teams Based on Rating Average')
plt.show()

Interesting, the average players' rating for each team almost form a linear line. They're quite similiar.

<h3 class="list-group-item list-group-item-action active">Which team got the largest number of players?</h3>

In [None]:
player_count = df[['team','full_name']].groupby('team').count().sort_values(by='full_name', ascending=False)
fig, ax = plt.subplots(figsize=(10,5))
sns.barplot(x=player_count.index, y=player_count.full_name)
plt.xticks(rotation=90)
plt.xlabel('Team')
plt.ylabel('Player count')
plt.title('Teams Based on Player Count')
plt.show()

<h3 class="list-group-item list-group-item-action active">Who got the highest and lowest salary?</h3>

In [None]:
print(f"The biggest salary is {df.salary.max()} which is held by {df[df.salary == df.salary.max()].full_name.values[0]} \n")
print(f"The lowest salary is {df.salary.min()} which is held by {df[df.salary == df.salary.min()].full_name.values[0]}")

Now let's look at the salary distribution

In [None]:
def get_salary(salary):
    salary = salary.replace('$', '')
    return float(salary)

df['salary'] = df['salary'].apply(lambda x: get_salary(x))
df

In [None]:
plt.rcParams['figure.figsize'] = (10, 8)
sns.distplot(df['salary'], color = 'blue')
plt.xlabel('Salary range for players')
plt.ylabel('Count of players')
plt.title('Players\' Salary Distribution')
plt.xticks()
plt.show()

<h3 class="list-group-item list-group-item-action active">What about their weight & height?</h3>

In [None]:
height = df['height'].str.split('/',expand=True)
height.columns = ['height_ft', 'height_m']
df = pd.concat([df, height], axis=1)

weight = df['weight'].str.split('/',expand=True)
weight.columns = ['weight_lbs', 'weight_kg']
df = pd.concat([df, weight], axis=1)
df = df.drop(['weight'], axis=1)
df['weight_lbs'] = df['weight_lbs'].str.replace('lbs.', '')
df['weight_kg'] = df['weight_kg'].str.replace('kg.', '')

# Let's convert the type so it become lighter
df['height_m'] = df['height_m'].astype(np.float64)
df['weight_lbs'] = df['weight_lbs'].astype(np.int32)
df['weight_kg'] = df['weight_kg'].astype(np.float64)

df

In [None]:
fig = plt.figure(figsize = (10, 5))
plt.hist(df.weight_kg)
plt.xlabel('Weight in kg')
plt.ylabel('Count of Players')
plt.title('Players\' Weight Distribution')
plt.show()

In [None]:
fig = plt.figure(figsize = (10, 5))
plt.hist(df.height_m)
plt.xlabel('Height in m')
plt.ylabel('Count of Players')
plt.title('Players\' Height Distribution')
plt.show()

<h3 class="list-group-item list-group-item-action active">What position does most of the players play?</h3>

In [None]:
fig = plt.figure(figsize = (10, 5))
sns.countplot('position', data = df, order = df['position'].value_counts().index)
plt.xlabel('Various Positions in NBA')
plt.ylabel('Count of Players')
plt.title('Positions of player')
plt.show()

In [None]:
labels = ['G', 'F','C','F-C', '-', 'G-F', 'F-G', 'C-F'] 
size = df['position'].value_counts()
colors = plt.cm.RdYlBu(np.linspace(0, 1, 8))
explode = [0.1,0, 0, 0, 0, 0, 0, 0]

plt.pie(size, labels = labels, colors = colors, explode = explode, shadow = True, startangle = 90)
plt.title('Distribution of Players\' Position')
plt.legend()
plt.show()

Let's observe the average players' rating for each position

In [None]:
position_rating = df[['position','rating']].groupby('position').mean().sort_values(by='rating', ascending=False)
fig, ax = plt.subplots(figsize=(10,5))
sns.barplot(x=position_rating.rating, y=position_rating.index)
plt.xticks()
plt.xlabel('Position')
plt.ylabel('Average rating')
plt.title('Position Based on Rating')
plt.show()

# The Best of The Best

<h3 class="list-group-item list-group-item-action active">The Wonderkid</h3>
<br>Top 10 Young Players Based on Rating

In [None]:
df.sort_values(['age','rating'], ascending = False).reset_index()[['full_name', 'age', 'team', 'country', 'rating']].tail(10).style.background_gradient('viridis')

<h3 class="list-group-item list-group-item-action active">The Old Gold</h3>
<br>Top 10 Old Players Based on Rating

In [None]:
df.sort_values(['age','rating'], ascending = False).reset_index()[['full_name', 'age', 'team', 'country', 'rating']].head(10).style.background_gradient('viridis')

<h3 class="list-group-item list-group-item-action active">Best Shortest Player</h3>

In [None]:
df.sort_values(['height','rating'], ascending = True).reset_index()[['full_name', 'height_m', 'team', 'country', 'rating']].head(10).style.background_gradient('viridis')

<h3 class="list-group-item list-group-item-action active">Best Tallest Player</h3>

In [None]:
df.sort_values(['height','rating'], ascending = True).reset_index()[['full_name', 'height_m', 'team', 'country', 'rating']].tail(10).style.background_gradient('viridis')

<h3 class="list-group-item list-group-item-action active">Best Player for Each Position</h3>

In [None]:
df.iloc[df.groupby(df['position'])['rating'].idxmax()].sort_values(by='rating', ascending=False).reset_index()[['position', 'full_name',
                    'age','team', 'country', 'rating']].style.background_gradient('Blues')

<h3 class="list-group-item list-group-item-action active">Best Player for Each Team<h3>

In [None]:
df.iloc[df.groupby(df['team'])['rating'].idxmax()].reset_index().sort_values(by='rating', ascending=False).reset_index()[['position', 'full_name',
                    'age','team', 'country', 'rating']].style.background_gradient('Blues')

<h3 class="list-group-item list-group-item-action active">Best Player for Each Country<h3>

In [None]:
df.iloc[df.groupby(df['country'])['rating'].idxmax()].reset_index().sort_values(by='rating', ascending=False).reset_index()[['position', 'full_name',
                    'age','team', 'country', 'rating']].style.background_gradient('Blues')

<a style="color:blue;font-size:20px;">Work in progress...</a>