# Player Performance Analysis (2023-2024 NBA season)

## <span style="color:blue">   Objective(s): </span>

### Assess player performance during the season by exploring key statistics such as:

• points

• assists

• rebounds

• efficiency ratings

### Analysis aims to identify top-performing players and provide insights on player strengths and weaknesses. 

### → <span style="color:red">  helps teams make data-driven decisions about player development, game strategy, and overall team composition.</span>


In [51]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# <span style="color:blue">   Load the dataset(s)</span>

In [64]:
nba_data = pd.read_csv('NBA Stats 202324 All Stats.csv')
nba_data.head()

Unnamed: 0,RANK,NAME,TEAM,POS,AGE,GP,MPG,USG%,TO%,FTA,...,APG,SPG,BPG,TPG,P+R,P+A,P+R+A,VI,ORtg,DRtg
0,1,Joel Embiid,Phi,C,30.2,6,41.4,35.7,15.8,78,...,5.7,1.2,1.5,4.2,43.8,38.7,49.5,12.2,117.1,108.0
1,2,Jalen Brunson,Nyk,G,27.8,13,39.8,36.4,9.3,120,...,7.5,0.8,0.2,2.7,35.7,39.8,43.2,9.3,114.8,114.7
2,3,Damian Lillard,Mil,G,33.9,4,39.1,31.4,10.0,38,...,5.0,1.0,0.0,2.3,34.5,36.3,39.5,8.2,127.6,115.7
3,4,Shai Gilgeous-Alexander,Okc,G,25.9,10,39.9,32.3,8.9,81,...,6.4,1.3,1.7,2.2,37.4,36.6,43.8,11.2,118.3,106.9
4,5,Tyrese Maxey,Phi,G,23.6,6,44.6,28.1,8.6,28,...,6.8,0.8,0.3,2.2,35.0,36.7,41.8,9.1,120.9,113.3


# <span style="color:blue">  Inspect the Dataset</span>

In [93]:
#Check data types and columns

nba_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 213 entries, 0 to 212
Data columns (total 29 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   RANK    213 non-null    int64  
 1   NAME    213 non-null    object 
 2   TEAM    213 non-null    object 
 3   POS     213 non-null    object 
 4   AGE     213 non-null    float64
 5   GP      213 non-null    int64  
 6   MPG     213 non-null    float64
 7   USG%    213 non-null    float64
 8   TO%     213 non-null    float64
 9   FTA     213 non-null    int64  
 10  FT%     213 non-null    float64
 11  2PA     213 non-null    int64  
 12  2P%     213 non-null    float64
 13  3PA     213 non-null    int64  
 14  3P%     213 non-null    float64
 15  eFG%    213 non-null    float64
 16  TS%     213 non-null    float64
 17  PPG     213 non-null    float64
 18  RPG     213 non-null    float64
 19  APG     213 non-null    float64
 20  SPG     213 non-null    float64
 21  BPG     213 non-null    float64
 22  TP

## <span style="color:red">  RESULTS:</span>

### **Basic Structure:** 

• 213 entries 

• 29 columns

### **Data Types:** 

• Columns consist of-

a) integers (e.g., RANK, GP, FTA) and 

b) floats (e.g., PPG, APG, USG%) for performance metrics.

• Object type columns are-

a) categorical (e.g. NAME, TEAM, and POS.)

In [98]:
#Check for missing values
nba_data.isnull().sum()

RANK     0
NAME     0
TEAM     0
POS      0
AGE      0
GP       0
MPG      0
USG%     0
TO%      0
FTA      0
FT%      0
2PA      0
2P%      0
3PA      0
3P%      0
eFG%     0
TS%      0
PPG      0
RPG      0
APG      0
SPG      0
BPG      0
TPG      0
P+R      0
P+A      0
P+R+A    0
VI       0
ORtg     0
DRtg     0
dtype: int64

## <span style="color:red">  RESULTS:</span>

**No Missing Values**

In [102]:
#Check basic statistics

nba_data.describe()

Unnamed: 0,RANK,AGE,GP,MPG,USG%,TO%,FTA,FT%,2PA,2P%,...,APG,SPG,BPG,TPG,P+R,P+A,P+R+A,VI,ORtg,DRtg
count,213.0,213.0,213.0,213.0,213.0,213.0,213.0,213.0,213.0,213.0,...,213.0,213.0,213.0,213.0,213.0,213.0,213.0,213.0,213.0,213.0
mean,107.0,27.738967,7.657277,19.994366,17.933803,12.603756,15.347418,0.577742,38.403756,0.470427,...,1.81831,0.542254,0.387793,0.969014,11.950235,10.300939,13.764789,5.60892,94.569953,93.167606
std,61.631972,4.573337,5.123927,13.518864,8.838268,12.279127,23.407817,0.367595,52.976111,0.240656,...,2.005416,0.498676,0.468278,0.960433,10.677686,10.019123,12.350744,3.298308,45.789178,38.262896
min,1.0,20.3,1.0,0.9,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,54.0,24.2,4.0,6.0,12.6,6.6,2.0,0.0,4.0,0.393,...,0.4,0.1,0.0,0.3,3.0,2.2,3.5,4.2,89.1,103.0
50%,107.0,27.0,6.0,19.1,17.0,11.0,6.0,0.71,17.0,0.5,...,1.0,0.4,0.2,0.7,8.4,6.6,9.6,5.9,109.5,107.6
75%,160.0,31.1,11.0,33.6,22.2,14.6,20.0,0.864,47.0,0.594,...,2.3,0.9,0.5,1.4,18.2,15.8,21.2,7.9,119.1,111.7
max,213.0,39.5,20.0,44.6,85.0,100.0,141.0,1.0,259.0,1.0,...,8.8,2.4,2.5,4.6,43.8,39.8,50.8,15.0,300.0,122.7


## <span style="color:red">  RESULTS:</span>

**Outliers** in columns:

• **ORtg** (offensive rating) 

and 

• **DRtg** (defensive rating)

= as the maximum values are quite large compared to the mean.

# <span style="color:blue">  Player Information Table</span>

<table border="1">
    <tr>
        <th>Column Name</th>
        <th>Description</th>
    </tr>
    <tr>
        <td>RANK</td>
        <td>The player's rank based on their overall performance in the dataset (likely performance ranking).</td>
    </tr>
    <tr>
        <td>NAME</td>
        <td>The name of the player.</td>
    </tr>
    <tr>
        <td>TEAM</td>
        <td>The team the player plays for (e.g., "Phi" for Philadelphia, "Nyk" for New York Knicks).</td>
    </tr>
    <tr>
        <td>POS</td>
        <td>The position the player plays (e.g., "C" for center, "G" for guard).</td>
    </tr>
    <tr>
        <td>AGE</td>
        <td>The player's age in years.</td>
    </tr>
    <tr>
        <td>GP</td>
        <td>Games played – the number of games the player has participated in during the season.</td>
    </tr>
    <tr>
        <td>MPG</td>
        <td>Minutes per game – the average number of minutes the player plays per game.</td>
    </tr>
    <tr>
        <td>USG%</td>
        <td>Usage percentage – the percentage of team plays used by the player while they are on the floor.</td>
    </tr>
    <tr>
        <td>TO%</td>
        <td>Turnover percentage – the percentage of a player's possessions that end in a turnover.</td>
    </tr>
    <tr>
        <td>FTA</td>
        <td>Free throw attempts – the total number of free throws attempted by the player.</td>
    </tr>
    <tr>
        <td>FT%</td>
        <td>Free throw percentage – the percentage of successful free throws out of all attempts.</td>
    </tr>
    <tr>
        <td>2PA</td>
        <td>Two-point field goal attempts – the total number of two-point shots attempted by the player.</td>
    </tr>
    <tr>
        <td>2P%</td>
        <td>Two-point field goal percentage – the percentage of successful two-point field goals out of all attempts.</td>
    </tr>
    <tr>
        <td>3PA</td>
        <td>Three-point field goal attempts – the total number of three-point shots attempted by the player.</td>
    </tr>
    <tr>
        <td>3P%</td>
        <td>Three-point field goal percentage – the percentage of successful three-point shots out of all attempts.</td>
    </tr>
    <tr>
        <td>eFG%</td>
        <td>Effective field goal percentage – a shooting efficiency metric that accounts for the extra value of three-point shots.</td>
    </tr>
    <tr>
        <td>TS%</td>
        <td>True shooting percentage – a measure of shooting efficiency that incorporates field goals, free throws, and three-point shooting.</td>
    </tr>
    <tr>
        <td>PPG</td>
        <td>Points per game – the average number of points the player scores per game.</td>
    </tr>
    <tr>
        <td>RPG</td>
        <td>Rebounds per game – the average number of rebounds the player grabs per game.</td>
    </tr>
    <tr>
        <td>APG</td>
        <td>Assists per game – the average number of assists the player records per game.</td>
    </tr>
    <tr>
        <td>SPG</td>
        <td>Steals per game – the average number of steals the player records per game.</td>
    </tr>
    <tr>
        <td>BPG</td>
        <td>Blocks per game – the average number of blocks the player records per game.</td>
    </tr>
    <tr>
        <td>TPG</td>
        <td>Turnovers per game – the average number of turnovers committed by the player per game.</td>
    </tr>
    <tr>
        <td>P+R</td>
        <td>Points + Rebounds – the sum of points and rebounds averaged per game.</td>
    </tr>
    <tr>
        <td>P+A</td>
        <td>Points + Assists – the sum of points and assists averaged per game.</td>
    </tr>
    <tr>
        <td>P+R+A</td>
        <td>Points + Rebounds + Assists – the sum of points, rebounds, and assists averaged per game.</td>
    </tr>
    <tr>
        <td>VI</td>
        <td>A performance index that could be an aggregate score based on various metrics (depending on the dataset's calculation method).</td>
    </tr>
    <tr>
        <td>ORtg</td>
        <td>Offensive rating – an estimate of the player's offensive efficiency per 100 possessions.</td>
    </tr>
    <tr>
        <td>DRtg</td>
        <td>Defensive rating – an estimate of the player's defensive efficiency per 100 possessions.</td>
    </tr>
</table>


## <span style="color:blue">  Cleaning your dataset </span>

1. Check for Missing Values (already complete)
2. Check for Duplicates
3. Ensure Consistent Data Types

In [126]:
# Check for duplicate rows
duplicates = nba_data.duplicated().sum()

# Show duplicate rows (if any)
duplicates_rows = nba_data[nba_data.duplicated()]

# Display the number of duplicates and the duplicate rows
print(f"Number of duplicate rows: {duplicates}")
print("Duplicate rows:")
print(duplicates_rows)



Number of duplicate rows: 0
Duplicate rows:
Empty DataFrame
Columns: [RANK, NAME, TEAM, POS, AGE, GP, MPG, USG%, TO%, FTA, FT%, 2PA, 2P%, 3PA, 3P%, eFG%, TS%, PPG, RPG, APG, SPG, BPG, TPG, P+R, P+A, P+R+A, VI, ORtg, DRtg]
Index: []

[0 rows x 29 columns]


In [129]:
# Ensure Consistent data types by checking
nba_data.dtypes

RANK       int64
NAME      object
TEAM      object
POS       object
AGE      float64
GP         int64
MPG      float64
USG%     float64
TO%      float64
FTA        int64
FT%      float64
2PA        int64
2P%      float64
3PA        int64
3P%      float64
eFG%     float64
TS%      float64
PPG      float64
RPG      float64
APG      float64
SPG      float64
BPG      float64
TPG      float64
P+R      float64
P+A      float64
P+R+A    float64
VI       float64
ORtg     float64
DRtg     float64
dtype: object