# NBA Players Data Analysis
![](https://www.nbaanalysis.net/wp-content/uploads/2021/05/LeBron-James-explains-why-Stephen-Curry-is-his-pick-for-2021-NBA-MVP.jpeg)




This is an exploratory data analysis project on some of the greatest NBA players currently active in the league. It provides a comparison between the multiple historical seasons these players have had and in the end gives a crucial statistic which determines most MVP Seasons. 

The data used in this project is from kaggle and can found at this link: https://www.kaggle.com/drgilermo/nba-players-stats?select=Players.csv

## Downloading the Dataset

I used a basketball stats dataset from Kaggle.

In [2]:
!pip install jovian opendatasets plotly --upgrade --quiet

[K     |████████████████████████████████| 20.6 MB 44.9 MB/s 
[?25h

Let's begin by downloading the data, and listing the files within the dataset.

In [6]:
dataset_url = 'https://www.kaggle.com/drgilermo/nba-players-stats?select=Players.csv' 

In [5]:
import opendatasets as od
od.download(dataset_url)

Please provide your Kaggle credentials to download this dataset. Learn more: http://bit.ly/kaggle-creds
Your Kaggle username: siddharthkatta
Your Kaggle Key: ··········


100%|██████████| 2.13M/2.13M [00:00<00:00, 83.3MB/s]

Downloading nba-players-stats.zip to ./nba-players-stats






The dataset has been downloaded and extracted.

In [7]:
data_dir = 'nba-players-stats'

In [None]:
import os
os.listdir(data_dir)

In [9]:
project_name = "nba_data_analysis"

In [10]:
!pip install jovian --upgrade -q

In [11]:
import jovian

In [None]:
jovian.commit(project=project_name)

[jovian] Detected Colab notebook...[0m
[jovian] Please enter your API key ( from https://jovian.ai/ ):[0m
API KEY: 

## Data Preparation and Cleaning

This section of the project displays the basic information of the dataset and the summary statistics. 



In [None]:
import pandas as pd
import plotly.express as px

In [None]:
NBA_df = pd.read_csv(data_dir + '/Seasons_Stats.csv')

In [None]:
NBA_df.info

In [None]:
NBA_df.describe()

In [None]:
fig = px.histogram(NBA_df, x='Age', marginal='box', color_discrete_sequence=['red'], nbins=47, title='Distribution of age')
fig.update_layout(bargap=0.1)
fig.show()

As expected most players in this dataset are young between the ages of 20 and 35. This is when most players are physically in their prime and can perform at a high level. 

In [None]:
fig = px.histogram(NBA_df, x='Year', marginal='box', nbins=47, title='Distribution of Year')
fig.update_layout(bargap=0.1)
fig.show()

The amount of data collected from the NBA has been gradually increasing over the years. With the growing importance of data analytics in sports I believe that this data collection will keep growing into the future. 

In [None]:
fig = px.histogram(NBA_df, x='Year', marginal='box', color_discrete_sequence=['red'], y='PF', nbins=47, title='Personal Fouls per year ')
fig.update_layout(bargap=0.1)
fig.show()

Foul drawing has become 'skill' in the past few years. NBA refs are whistle happy and the players are happy to oblige. When someone cant seem to hit a shot that night they just drive to the paint chuck up a shot and get a foul call. This leads to easy points at the charity stripe. 

The NBA knows that offense increases viewership, so they have simply made defense impossible by calling touch fouls. 

In [None]:
fig = px.histogram(NBA_df, x='Year', marginal='box', color_discrete_sequence=['red'], y='3PA', nbins=47, title='3 Point Attemps over the years ')
fig.update_layout(bargap=0.1)
fig.show()

This graph clearly shows the graudal increase in 3 point attemps since the 1980s. The record for most three pointers taken in a season keeps growing every year as teams continut to take more threes. 

In [None]:
import jovian

In [None]:
jovian.commit()

[jovian] Detected Colab notebook...[0m
[jovian] Uploading colab notebook to Jovian...[0m
Committed successfully! https://jovian.ai/siddharthkatta123/nba-data-analysis


'https://jovian.ai/siddharthkatta123/nba-data-analysis'

## Exploratory Analysis and Visualization

In this part of the project I will filter the dataset based on the most recent MVP winners and create comparisons between their respective MVP seasons. 

**Table of Contents:**

1. Scoring comparison. 
2. Shooting efficiency. 
3. Overall Efficiency. 
4. Conclusion.



Let's begin by importing`matplotlib.pyplot` and `seaborn`.

In [None]:
import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline

sns.set_style('darkgrid')
matplotlib.rcParams['font.size'] = 14
matplotlib.rcParams['figure.figsize'] = (9, 5)
matplotlib.rcParams['figure.facecolor'] = '#00000000'

In [None]:
lebron = NBA_df.Player == 'LeBron James'
durant = NBA_df.Player == 'Kevin Durant'
curry = NBA_df.Player == 'Stephen Curry'
brook = NBA_df.Player == 'Russell Westbrook'
rose = NBA_df.Player == 'Derrick Rose'

# 1. Scoring Comparison. 

Scoring is the most lucrative skill in the NBA. So, naturally let's start by looking at the points per season comparison between these players. 

**Takeaways:**

1. Lebron was alone at the top for a couple of seasons until KD entered in 2008. Then in 2010 KD scored more than Lebron and continued to do so until the pinnacle of his career where he won the NBA MVP award in 2014. 


2. In the same 2010 season where KD crossed Lebron a small kid from Davidson enterd the league. You might have heard his name Steph Curry. After have an abysmal first three seasons dealing with injuries he decided to shoot more threes in 2013 (more on this below) which drastically increased his productivity. He shot and made threes at an incredible clip making him the greatest shooter of all time and changing the NBA game forever. 


3. Playing alongside an all time great scorer Russell Westbrook had to share the ball with Durant but still managed to put up impressive numbers coming only second to Steph Curry in the 2015 season. This constraint was lifted when Kevin Durant decided to join the Warriors in the 2016. Westbrook went ballistic as the highest scorer in the 2017 season and also won MVP honors that year. 

In [None]:
plt.figure(figsize=(20,10))
plt.plot(NBA_df[lebron].Year,NBA_df[lebron].PTS, marker = 'o')
plt.plot(NBA_df[durant].Year,NBA_df[durant].PTS, marker = 'x')
plt.plot(NBA_df[curry].Year,NBA_df[curry].PTS, marker = 'x')
plt.plot(NBA_df[brook].Year,NBA_df[brook].PTS, marker = 'o')



plt.xlabel('Season')
plt.ylabel('Total points scored')
plt.legend(['Lebron', 'Durant', 'Curry','Westbrook'])

plt.title("Points per season comparison")

# 2. Shooting Efficiency 

Many modern NBA teams live or die by the three point line. In the "good old days" players didn't shoot as many three pointers as they do in the modern game. The game was mostly based on bigs who could get to the cup or shoot from the mid range. This gave a severe advatage to taller and broader players compared to the smaller ones. That is until a 6'3 guard from Davidon lit up the league from the three point line. 

**Takeaways:**

1. Steph Curry took the league by storm in the 2013 season as he quadrupled his number of 3PA to 600 compared to his previous season of 150. The previous high among these players was by Durant of around 400. So, this type of volume shooting from Curry was mind boggling. 


2. Now based on common sense we would think that as a player shoots more threes his shooting percentage decreases. Since he is taking more shots farther away from the rim..... right? well yes, if you were a normal. Steph Curry is not normal. In the 2013 year yes he had a worse shooting percentage than before, but since then his true shooting percentage has gradually been increasing with his 3 point attempts ultimately peaking in 2016. 

In [None]:
plt.figure(figsize=(15,10))
plt.plot(NBA_df[lebron].Year,NBA_df[lebron]["3PA"], marker = 'o')
plt.plot(NBA_df[durant].Year,NBA_df[durant]["3PA"], marker = 'x')
plt.plot(NBA_df[curry].Year,NBA_df[curry]["3PA"], marker = 'x')
plt.plot(NBA_df[brook].Year,NBA_df[brook]["3PA"], marker = 'o')


plt.xlabel('Season')
plt.ylabel('3PA')
plt.legend(['Lebron', 'Durant', 'Curry', 'Westbrook'])

plt.title("Number of 3 pointers attempted per season")

In [None]:
plt.figure(figsize=(15,10))
plt.plot(NBA_df[lebron].Year,NBA_df[lebron]["TS%"], marker = 'o')
plt.plot(NBA_df[durant].Year,NBA_df[durant]["TS%"], marker = 'x')
plt.plot(NBA_df[curry].Year,NBA_df[curry]["TS%"], marker = 'x')
plt.plot(NBA_df[brook].Year,NBA_df[brook]["TS%"], marker = 'o')


plt.xlabel('Season')
plt.ylabel('TS%')
plt.legend(['Lebron', 'Durant', 'Curry', 'Westbrook'])

plt.title("True Shooting comparison")

# **Overall efficiency**

Since there is a lot going on in this chart I'm going to break it down into smaller pieces.

**Lebron James**

There is no question that Lebron has been dominating the league since he entered in 2003. According to the graph 2009 was arguably Lebrons best year in the league (this dataset only includes data till the 2017 season) coincidentally he won MVP that year. His other MVPs were in 2010, 2012, and 2013 (Derrick Rose won in 2011) even though Lebron had the highest PER. 

**Kevin Durant**

After Lebrons domination till 2013 in 2014 KD showed up. The graph shows that KD had the highest PER in 2014 that's the same year that he won MVP. 

**Steph Curry**

This small town gaurd from Davidson took the league by storm draining threes from all over the court. In 2015 and 2016 he had the highest PER in the league and you guessed it he won back to back MVPs in 2015 and 2016. 

**Russell Westbrook**

In 2017  after Durant left the Thunder Russell Westbrook had the monster season with the highest PER which ultimately lead to him winning the MVP award that year.

**Derrick Rose**

In 2011 Rose won the MVP which put a hole in what could have been a 5 time MVP streak for Lebron James. As the chart shows this MVP was an anamoly since Lebron had a much more efficient season. This was most likely due to voter fatigue and people being tired of seeing Lebron win. The Bulls also had a better record. 

**The one comomon factor between all the MVPs in this chart is that whoever has the highest PER in a given year wins MVP. Or at least they did for the better part of the 2010s. So, at the end of the year it would be a good bet to predict that the player with the highest PER will win MVP that season.**



In [None]:
plt.figure(figsize=(15,10))
plt.plot(NBA_df[lebron].Year,NBA_df[lebron].PER, marker = 'o')
plt.plot(NBA_df[durant].Year,NBA_df[durant].PER, marker = 'x')
plt.plot(NBA_df[curry].Year,NBA_df[curry].PER, marker = 'x')
plt.plot(NBA_df[brook].Year,NBA_df[brook].PER, marker = 'o')
plt.plot(NBA_df[rose].Year,NBA_df[rose].PER, marker = 'o')

plt.xlabel('Season')
plt.ylabel('PER')
plt.legend(['Lebron', 'Durant', 'Curry', 'Westbrook', 'Rose'])

plt.title("PER comparison")

## Inferences and Conclusion

The NBA has evolved over time, the number of points scored per game has increase dramatically with the growth of the three point line. The foul baiting and getting to the free throw line is also a major factor that has significantly contributed to the increase in scoring. 

To win an MVP award a player must have the highest PER with a good record. 

In [None]:
import jovian

In [None]:
jovian.commit()

<IPython.core.display.Javascript object>