# NBA Basketball Players Analysis (2000-2009)

This notebook contains a comprehensive analysis of NBA basketball players' performance data from the 2000 to 2009 seasons. The analysis includes three main parts:

1. **Double-Digit Stats Analysis**: Find players with ≥10 in at least 2 of 5 main categories
2. **Most Valuable Players**: Calculate player value using custom formula
3. **Team Performance**: Find team with highest average points per year

## Dataset Information

- **Source**: NBA2000-2009.csv
- **Rows**: 1,832 entries
- **Columns**: 18 statistical categories
- **Period**: 2000-2009 seasons (10 years)


In [15]:
import pandas as pd
import numpy as np

print('Loading NBA dataset...')
# Load the data from the correct path
df = pd.read_csv('../Data/NBA2000-2009.csv')
print(f'Dataset loaded successfully! Shape: {df.shape}')
print(f'Columns: {list(df.columns)}')
print(f'Years covered: {df["YEAR"].min()} - {df["YEAR"].max()}')
print(f'Total players: {df["PLAYER"].nunique()}')
df.head()

Loading NBA dataset...
Dataset loaded successfully! Shape: (1830, 18)
Columns: ['PLAYER', 'TEAM', 'YEAR', 'GP', 'MIN', 'PTS', 'FGM', 'FGA', '3PM', '3PA', 'FTM', 'FTA', 'OREB', 'DREB', 'AST', 'STL', 'BLK', 'TOV']
Years covered: 2000 - 2009
Total players: 571


Unnamed: 0,PLAYER,TEAM,YEAR,GP,MIN,PTS,FGM,FGA,3PM,3PA,FTM,FTA,OREB,DREB,AST,STL,BLK,TOV
0,Allen Iverson,PHI,2000,71,42.0,31.1,10.7,25.5,1.4,4.3,8.2,10.1,0.7,3.1,4.6,2.5,0.3,3.3
1,Jerry Stackhouse,DET,2000,80,40.2,29.8,9.7,24.1,2.1,5.9,8.3,10.1,1.2,2.7,5.1,1.2,0.7,4.1
2,Shaquille O'Neal,LAL,2000,74,39.5,28.7,11.0,19.2,0.0,0.0,6.7,13.1,3.9,8.8,3.7,0.6,2.8,2.9
3,Kobe Bryant,LAL,2000,68,40.9,28.5,10.3,22.2,0.9,2.9,7.0,8.2,1.5,4.3,5.0,1.7,0.6,3.2
4,Vince Carter,TOR,2000,75,39.7,27.6,10.2,22.1,2.2,5.3,5.1,6.7,2.3,3.2,3.9,1.5,1.1,2.2


## Part One: Double-Digit Stats Analysis

**Objective**: Identify players who achieved double-digit stats (≥10) in at least 2 of 5 main statistical categories:

- Points (PTS)
- Rebounds (REB) - sum of offensive and defensive rebounds
- Assists (AST)
- Steals (STL)
- Blocks (BLK)

**Output**: Players meeting the criteria, sorted by year then player name.


In [16]:
# Calculate total rebounds
df['REB'] = df['OREB'] + df['DREB']

# Define the 5 main categories
categories = ['PTS', 'REB', 'AST', 'STL', 'BLK']

# Filter players who have >=10 in at least 2 categories
double_digit_mask = (df[categories] >= 10).sum(axis=1) >= 2

# Create the result DataFrame
double_double = df[double_digit_mask][['PLAYER', 'YEAR', 'PTS', 'AST', 'REB', 'STL', 'BLK']].copy()

# Sort by year (ascending) then by player name (ascending)
double_double = double_double.sort_values(['YEAR', 'PLAYER'])

# Reset index
double_double = double_double.reset_index(drop=True)

print(f'Found {len(double_double)} players meeting double-digit criteria')
print('\nFirst 5 results:')
double_double.head()

Found 88 players meeting double-digit criteria

First 5 results:


Unnamed: 0,PLAYER,YEAR,PTS,AST,REB,STL,BLK
0,Antonio Davis,2000,13.7,1.4,10.1,0.3,1.9
1,Antonio McDyess,2000,20.8,2.1,12.0,0.6,1.5
2,Chris Webber,2000,27.1,4.2,11.1,1.3,1.7
3,Dikembe Mutombo,2000,10.0,1.0,13.5,0.4,2.7
4,Elton Brand,2000,20.1,3.2,10.1,1.0,1.6


In [17]:
# Save Part One results
double_double.to_csv('double_double.csv', index=False)
print('Saved double_double.csv successfully!')
print(f'File size: {double_double.shape[0]} rows × {double_double.shape[1]} columns')

Saved double_double.csv successfully!
File size: 88 rows × 7 columns


## Part Two: Most Valuable Players

**Objective**: Calculate player value using the formula:

```math
VALUE = (PTS + OREB + DREB + AST + STL + BLK) - (TOV + Missed FG + Missed FT)
```

Where:

- Missed FG = Field Goals Attempted - Field Goals Made
- Missed FT = Free Throws Attempted - Free Throws Made

**Output**: Players ranked by their average value over 10 seasons.


In [18]:
# Calculate missed field goals and free throws
df['Missed_FG'] = df['FGA'] - df['FGM']
df['Missed_FT'] = df['FTA'] - df['FTM']

# Calculate player value for each season
df['VALUE'] = (df['PTS'] + df['OREB'] + df['DREB'] + df['AST'] + df['STL'] + df['BLK']) - (df['TOV'] + df['Missed_FG'] + df['Missed_FT'])

# Group by player and calculate average value over all seasons
player_avg_value = df.groupby('PLAYER')['VALUE'].mean().reset_index()

# Sort by value (descending) then by player name (ascending)
best_player = player_avg_value.sort_values(['VALUE', 'PLAYER'], ascending=[False, True])

# Round to 2 decimal places
best_player['VALUE'] = best_player['VALUE'].round(2)

# Reset index
best_player = best_player.reset_index(drop=True)

print(f'Calculated values for {len(best_player)} players')
print('\nTop 5 most valuable players:')
best_player.head()

Calculated values for 571 players

Top 5 most valuable players:


Unnamed: 0,PLAYER,VALUE
0,Kevin Garnett,29.74
1,LeBron James,27.9
2,Chris Paul,26.57
3,Shaquille O'Neal,26.55
4,Dwyane Wade,26.12


In [19]:
# Save Part Two results
best_player.to_csv('best_player.csv', index=False)
print('Saved best_player.csv successfully!')
print(f'File size: {best_player.shape[0]} rows × {best_player.shape[1]} columns')

Saved best_player.csv successfully!
File size: 571 rows × 2 columns


## Part Three: Team Performance Analysis

**Objective**: Find the team with the highest average points per year.

**Note**: A team's average score is the sum of its players' average scores.

**Output**: Top-performing team for each year, sorted by year.


In [20]:
# Group by year and team, sum the points
team_year_pts = df.groupby(['YEAR', 'TEAM'])['PTS'].sum().reset_index()

# Find the team with max points for each year
max_PTS_of_year = team_year_pts.loc[team_year_pts.groupby('YEAR')['PTS'].idxmax()]

# Round to 2 decimal places
max_PTS_of_year['PTS'] = max_PTS_of_year['PTS'].round(2)

# Sort by year (ascending)
max_PTS_of_year = max_PTS_of_year.sort_values('YEAR')

# Reset index
max_PTS_of_year = max_PTS_of_year.reset_index(drop=True)

print(f'Found top team for each of {max_PTS_of_year["YEAR"].nunique()} years')
print('\nTeam with highest average points per year:')
max_PTS_of_year.head()

Found top team for each of 10 years

Team with highest average points per year:


Unnamed: 0,YEAR,TEAM,PTS
0,2000,SAC,100.2
1,2001,LAL,95.9
2,2002,ORL,97.0
3,2003,DEN,96.0
4,2004,BOS,106.2


In [21]:
# Save Part Three results
max_PTS_of_year.to_csv('max_PTS_of_year.csv', index=False)
print('Saved max_PTS_of_year.csv successfully!')
print(f'File size: {max_PTS_of_year.shape[0]} rows × {max_PTS_of_year.shape[1]} columns')

Saved max_PTS_of_year.csv successfully!
File size: 10 rows × 3 columns


## Analysis Summary

All three parts of the NBA analysis have been completed successfully:

1.  **Double-Digit Stats**: Found players with ≥10 in at least 2 categories
2.  **Most Valuable Players**: Calculated player values using custom formula
3.  **Team Performance**: Identified top team by average points per year

All results have been saved to CSV files in the Result/ directory.


In [22]:
# Final validation - check all files exist
import os

files_to_check = ['double_double.csv', 'best_player.csv', 'max_PTS_of_year.csv']
print('Validation Results:')
for file in files_to_check:
    exists = os.path.exists(file)
    size = os.path.getsize(file) if exists else 0
    print(f' - {file}: {"Created" if exists else "Missing"} ({size:,} bytes)')

print('\n Analysis Complete! All tasks finished successfully.')

Validation Results:
 - double_double.csv: Created (3,756 bytes)
 - best_player.csv: Created (10,706 bytes)
 - max_PTS_of_year.csv: Created (159 bytes)

🎯 Analysis Complete! All tasks finished successfully.
