# **NBA Regular Season Data Analysis**

- ***The objective of this analysis is to see if we can find correlation between things such as age with the player stats. We are also going to search for some curious stats on this season of 2022-2023***

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os
import matplotlib.pyplot as plt

- ***Lets start with importing our database for the 2022-2022 season, witch contais stats for all the NBA players on the regular season***

In [None]:
df = pd.read_csv('../input/20222023-nba-player-stats-regular/2022-2023 NBA Player Stats - Regular.csv', delimiter = ';', index_col = 'Player', encoding = 'ISO-8859-1')

- ***Who played the most games this season?***

In [None]:
most_games = df['G'].sort_values(ascending=False)
most_games.head(15)

- ***9 players have played 82 games, meaning they participated of the whole NBA calendar. But we can see that someone played 83! How?***
- ***Mikal Bridges has played more games than any team on the NBA, as teams plays 82 games per regular season, probably because due to a switch in the middle of the season. Let's see some of his stats and see in which of the teams he did best***

In [None]:
mikal = df.loc['Mikal Bridges']
mikal

- ***He played 27 games for Brooklin Nets and 56 for Phoenix Suns and had pretty significant stats on both teams. Let's compare them in a graphic.***

In [None]:
bar_width = 0.25

r1 = np.arange(len(mikal['PTS']))
r2 = [x + bar_width for x in r1]
r3 = [x + bar_width for x in r2]

plt.bar(r1, mikal['PTS'], color='#2ec4b6', width=bar_width, edgecolor='white', label='PTS')
plt.bar(r2, mikal['AST'], color='#e71d36', width=bar_width, edgecolor='white', label='AST')
plt.bar(r3, mikal['TRB'], color='#ff9f1c', width=bar_width, edgecolor='white', label='TRB')

plt.xlabel('Tm', fontweight='bold')
plt.xticks([r + bar_width for r in range(len(mikal['G']))], mikal['Tm'])

plt.legend()
plt.show()

- ***Just by looking at those three stats we can get to the conclusion that Mikal Bridges was a scorer in Brooklin and more of a team player in Phoenix, as his average assist number was higher, and his average points were significantly lower. Let's dive deeply into his stats to see why that happened.***

In [None]:
plt.bar(r1, mikal['STL'], color='#2ec4b6', width=bar_width, edgecolor='white', label='STL')
plt.bar(r2, mikal['BLK'], color='#e71d36', width=bar_width, edgecolor='white', label='BLK')
plt.bar(r3, mikal['TOV'], color='#ff9f1c', width=bar_width, edgecolor='white', label='TOV')

plt.xlabel('Tm', fontweight='bold')

plt.legend()
plt.show()

- ***As we can confirm, although he scored less points, he improved in almost every other stat, averaging better defensive stats, assists and less turnovers, meaning thet he is a great all arounder, with high adaptability in the game. With solid stats, we can see why Mikal Bridges has played more league games than any other player (and team)!***

- ***Now let's look at some league stats. In this analysis, we are going to filter players who played at least 41 games. We are also cleaning data to reduce bias by removing the rows which "TOT" is the players team (as it sums up the stats if the player has played on more than one team). ***

In [None]:
df = df[df['Tm'] != 'TOT']
df = df[df['G'] > 40]

- ***Now let's see if older players are performing better than younger ones***

In [None]:
df_u26 = df[df['Age'] < 26]
df_u31 = df[(df['Age'] >= 26) & (df['Age'] < 31)]
df_over = df[df['Age'] >= 31]

avg_stats_under = df_u26[['PTS', 'AST', 'TRB', 'STL', 'BLK', 'MP']].mean()
avg_stats_peak = df_u31[['PTS', 'AST', 'TRB', 'STL', 'BLK', 'MP']].mean()
avg_stats_over = df_over[['PTS', 'AST', 'TRB', 'STL', 'BLK', 'MP']].mean()

data = {'Under 26': avg_stats_under, '26-30': avg_stats_peak, 'Over 30': avg_stats_over}

df_avg_stats = pd.DataFrame(data)
df_avg_stats = df_avg_stats.round(decimals = 1)
df_avg_stats.sum(axis=0)
df_avg_stats.sum(axis=1)

print(df_avg_stats)

- ***Players at the "peak years" perform better and have more minutes than other ages. With more time in the league, they tend to read the game better, what probably explains why the assists number is ascending as time goes by.***

- ***Lebron James is known for his longevity. Las year he had 37 years and was playing his 20th season. Let's compare him with the NBA.***

In [None]:
df_LeBron = df.loc['LeBron James']
avg_stats_LeBron = df_LeBron[['PTS', 'AST', 'TRB', 'STL', 'BLK', 'MP']]

df_LeBron = df_avg_stats.assign(Lebron = avg_stats_LeBron)
df_LeBron = df_LeBron.round(decimals = 1)
print(df_LeBron)

- ***Here we can see how he is still considered one of the best players in the NBA: way above average stats in comparison to the rest of the league.***

- ***A relevant question is: isn't it unfair? Comparing a solid starter with data that includes substitutes? So, we can filter only players that started at least 41 games, half of the season and repeat this table.***

In [None]:
df_LeBron_start  = df[df['GS'] > 40]
avg_NBA_stats = df_LeBron_start[['PTS', 'AST', 'TRB', 'STL', 'BLK', 'MP']].mean()
data = {'NBA Starters': avg_NBA_stats, 'LeBron': avg_stats_LeBron}
df_Lebron_vs_NBA = pd.DataFrame(data)
df_Lebron_vs_NBA = df_Lebron_vs_NBA.round(decimals = 1)
df_Lebron_vs_NBA.sum(axis=0)
df_Lebron_vs_NBA.sum(axis=1)

print(df_Lebron_vs_NBA)

- ***One more time, the "King" has above-average stats, proving that even among starters he still has a dominance in the NBA, playing in his year -20- in the league. A legend never to be seen.***