# Load Kaggle Dataset
The Kaggle dataset from https://www.kaggle.com/datasets/eoinamoore/historical-nba-data-and-player-box-scores was loaded.

Make sure to have the Kaggle Authentication Key Downloaded in your environment.

The Kaggle data will be saved to ./data/*.csv

In [3]:
import kaggle

# Download the dataset
kaggle.api.dataset_download_files(
    'eoinamoore/historical-nba-data-and-player-box-scores',
    path='../data',  # where to save
    unzip=True      # automatically unzip
)

Dataset URL: https://www.kaggle.com/datasets/eoinamoore/historical-nba-data-and-player-box-scores


# Explore the Dataset
Now we will upload the data we downloaded into pandas dataframes, so that we can easily view and traverse.

**Import libraries**

In [None]:
# Import libraries for data exploration
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set display options
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)

print("‚úÖ Libraries imported successfully!")

**Load tables into Pandas dataframe**

In [7]:
# Load the main datasets
player_stats = pd.read_csv('../data/PlayerStatistics.csv')
games = pd.read_csv('../data/Games.csv')
players = pd.read_csv('../data/Players.csv')

print(f"üìä Player Statistics: {player_stats.shape}")
print(f"üèÄ Games: {games.shape}")
print(f"üë• Players: {players.shape}")

  player_stats = pd.read_csv('../data/PlayerStatistics.csv')


üìä Player Statistics: (1633902, 35)
üèÄ Games: (72097, 17)
üë• Players: (6678, 14)


  games = pd.read_csv('../data/Games.csv')


**Filter for current players**

Since this will be used as a supporting app during your fantasy season, we are only interested in current players

In [16]:
 # Step 1: Find current active players (played in 2024-25 season)
print("üîç Identifying current active players...")
current_season_games = player_stats[player_stats['gameDate'] >= '2024-10-01']
current_players = current_season_games[['firstName', 'lastName']].drop_duplicates()
print(f"üë• Found {len(current_players)} current active players")

# Step 2: Filter ALL historical games for only these current players
print("\nüìö Filtering ALL historical games for current players...")
print(f"üìä Original dataset: {player_stats.shape[0]:,} games")

# Merge to keep only games from current players (across all years)
player_stats_current = player_stats.merge(
  current_players,
  on=['firstName', 'lastName'],
  how='inner'
)

print(f"üìä Filtered dataset: {player_stats_current.shape[0]:,} games")
print(f"üìÖ Date range: {player_stats_current['gameDate'].min()[:10]} to {player_stats_current['gameDate'].max()[:10]}")

# Show reduction
reduction = (1 - len(player_stats_current) / len(player_stats)) * 100
print(f"üéØ Dataset reduced by {reduction:.1f}%")

# Replace original dataframe
player_stats = player_stats_current

üîç Identifying current active players...
üë• Found 828 current active players

üìö Filtering ALL historical games for current players...
üìä Original dataset: 1,633,902 games
üìä Filtered dataset: 240,561 games
üìÖ Date range: 1951-11-01 to 2025-11-09
üéØ Dataset reduced by 85.3%


**Explore the "player" table**

In [20]:
# Check the structure of player statistics
print("=== PLAYER STATISTICS COLUMNS ===")
print(player_stats_current.columns.tolist())
print("\n=== FIRST 3 ROWS ===")
player_stats_current.head(3)

=== PLAYER STATISTICS COLUMNS ===
['firstName', 'lastName', 'personId', 'gameId', 'gameDate', 'playerteamCity', 'playerteamName', 'opponentteamCity', 'opponentteamName', 'gameType', 'gameLabel', 'gameSubLabel', 'seriesGameNumber', 'win', 'home', 'numMinutes', 'points', 'assists', 'blocks', 'steals', 'fieldGoalsAttempted', 'fieldGoalsMade', 'fieldGoalsPercentage', 'threePointersAttempted', 'threePointersMade', 'threePointersPercentage', 'freeThrowsAttempted', 'freeThrowsMade', 'freeThrowsPercentage', 'reboundsDefensive', 'reboundsOffensive', 'reboundsTotal', 'foulsPersonal', 'turnovers', 'plusMinusPoints', 'espn_fantasy_score']

=== FIRST 3 ROWS ===


Unnamed: 0,firstName,lastName,personId,gameId,gameDate,playerteamCity,playerteamName,opponentteamCity,opponentteamName,gameType,gameLabel,gameSubLabel,seriesGameNumber,win,home,numMinutes,points,assists,blocks,steals,fieldGoalsAttempted,fieldGoalsMade,fieldGoalsPercentage,threePointersAttempted,threePointersMade,threePointersPercentage,freeThrowsAttempted,freeThrowsMade,freeThrowsPercentage,reboundsDefensive,reboundsOffensive,reboundsTotal,foulsPersonal,turnovers,plusMinusPoints,espn_fantasy_score
0,Domantas,Sabonis,1627734,22500197,2025-11-09T21:00:00Z,Sacramento,Kings,Minnesota,Timberwolves,,,,,0,1,29.5,20.0,3.0,0.0,1.0,17.0,5.0,0.294,2.0,0.0,0.0,12.0,10.0,0.833,8.0,5.0,13.0,4.0,3.0,-19.0,28.0
1,Domantas,Sabonis,1627734,22500162,2025-11-03T21:00:00Z,Sacramento,Kings,Denver,Nuggets,,,,,0,0,36.52,13.0,5.0,0.0,1.0,10.0,5.0,0.5,0.0,0.0,0.0,4.0,3.0,0.75,13.0,4.0,17.0,3.0,2.0,-14.0,39.0
2,Domantas,Sabonis,1627734,22500142,2025-11-01T17:00:00Z,Sacramento,Kings,Milwaukee,Bucks,,,,,1,0,36.57,24.0,6.0,0.0,1.0,13.0,8.0,0.615,2.0,0.0,0.0,10.0,8.0,0.8,8.0,5.0,13.0,5.0,3.0,5.0,48.0


**Load in ESPN Fantasy Scoring Stats and check we have all the data in our tables**

In [21]:
# ESPN Fantasy Scoring Requirements:
# 3PM = 5 pts, 2PM = 3 pts, FTM = 1 pt, Missed shot = -1 pt
# REB = 1 pt, AST = 2 pts, STL = 4 pts, BLK = 4 pts, TOV = -2 pts

required_stats = {
  'fieldGoalsMade': 'For calculating made 2PT/3PT shots',
  'fieldGoalsAttempted': 'For calculating missed shots',
  'threePointersMade': 'For 3PT bonus (5 pts each)',
  'threePointersAttempted': 'For calculating missed 3PT shots',
  'freeThrowsMade': 'For FT points (1 pt each)',
  'freeThrowsAttempted': 'For calculating missed FT shots',
  'reboundsTotal': 'For rebounds (1 pt each)',
  'assists': 'For assists (2 pts each)',
  'steals': 'For steals (4 pts each)',
  'blocks': 'For blocks (4 pts each)',
  'turnovers': 'For turnovers (-2 pts each)'
}

print("üìã ESPN FANTASY SCORING REQUIREMENTS:")
all_available = True
for stat, description in required_stats.items():
    available = stat in player_stats_current.columns
    print(f"  {stat}: {'‚úÖ' if available else '‚ùå'} - {description}")
    if not available:
        all_available = False

print(f"\n{'‚úÖ ALL STATS AVAILABLE!' if all_available else '‚ùå MISSING STATS - Cannot calculate fantasy scores'}")

üìã ESPN FANTASY SCORING REQUIREMENTS:
  fieldGoalsMade: ‚úÖ - For calculating made 2PT/3PT shots
  fieldGoalsAttempted: ‚úÖ - For calculating missed shots
  threePointersMade: ‚úÖ - For 3PT bonus (5 pts each)
  threePointersAttempted: ‚úÖ - For calculating missed 3PT shots
  freeThrowsMade: ‚úÖ - For FT points (1 pt each)
  freeThrowsAttempted: ‚úÖ - For calculating missed FT shots
  reboundsTotal: ‚úÖ - For rebounds (1 pt each)
  assists: ‚úÖ - For assists (2 pts each)
  steals: ‚úÖ - For steals (4 pts each)
  blocks: ‚úÖ - For blocks (4 pts each)
  turnovers: ‚úÖ - For turnovers (-2 pts each)

‚úÖ ALL STATS AVAILABLE!


**Calculate Fantasy Scores with current data**
This function calculates the Fantasy Score (raw), which is just based on the player's raw stats.

In [22]:
def calculate_espn_fantasy_score(row):
    """
    Calculate ESPN Fantasy Basketball score for a player's game
    
    Scoring:
    - 3PM = 5 pts (includes 3PT bonus)
    - 2PM = 3 pts
    - FTM = 1 pt
    - Missed shot = -1 pt
    - REB = 1 pt, AST = 2 pts, STL = 4 pts, BLK = 4 pts, TOV = -2 pts
    """
    # Made shots
    threepointers_made = row['threePointersMade'] * 5
    twopointers_made = (row['fieldGoalsMade'] - row['threePointersMade']) * 3
    freethrows_made = row['freeThrowsMade'] * 1
    
    # Missed shots (-1 each)
    fg_missed = (row['fieldGoalsAttempted'] - row['fieldGoalsMade']) * -1
    ft_missed = (row['freeThrowsAttempted'] - row['freeThrowsMade']) * -1
    
    # Other stats
    rebounds = row['reboundsTotal'] * 1
    assists = row['assists'] * 2
    steals = row['steals'] * 4
    blocks = row['blocks'] * 4
    turnovers = row['turnovers'] * -2
    
    total_score = (threepointers_made + twopointers_made + freethrows_made +
                 fg_missed + ft_missed + rebounds + assists + steals + blocks + turnovers)
    
    return total_score

In [23]:
# Calculate fantasy scores for all games
player_stats_current['espn_fantasy_score'] = player_stats_current.apply(calculate_espn_fantasy_score, axis=1)

print("‚úÖ ESPN Fantasy scores calculated!")
print(f"üìà Average fantasy score: {player_stats_current['espn_fantasy_score'].mean():.2f}")
print(f"üìä Score range: {player_stats_current['espn_fantasy_score'].min():.1f} to {player_stats_current['espn_fantasy_score'].max():.1f}")

‚úÖ ESPN Fantasy scores calculated!
üìà Average fantasy score: 20.97
üìä Score range: -17.0 to 119.0


**Find top 5 players at obtaining fantasy points**

In [25]:
# Group by player and calculate stats
player_fantasy_stats = player_stats_current.groupby(['firstName', 'lastName']).agg({
    'espn_fantasy_score': ['mean', 'std', 'count', 'sum'],
    'points': 'mean',
    'reboundsTotal': 'mean',
    'assists': 'mean',
    'gameDate': ['min', 'max']  # Career span
    }).round(2)

# Flatten column names
player_fantasy_stats.columns = [
    'avg_fantasy_score', 'std_fantasy_score', 'games_played', 'total_fantasy_points',
    'avg_points', 'avg_rebounds', 'avg_assists', 'career_start', 'career_end'
    ]

# Filter for players with significant games (e.g., minimum 100 games)
qualified_players = player_fantasy_stats[player_fantasy_stats['games_played'] >= 100]

# Get top 5 by average fantasy score
top_5_fantasy = qualified_players.nlargest(5, 'avg_fantasy_score')

# Top 5 players by average fantasy score (minimum games to qualify)
print("üèÜ TOP 5 FANTASY BASKETBALL PLAYERS (All-Time)")
print("=" * 60)
  
for i, (name, stats) in enumerate(top_5_fantasy.iterrows(), 1):
    first_name, last_name = name
    print(f"{i}. {first_name} {last_name}")
    print(f"   üéØ Avg Fantasy Score: {stats['avg_fantasy_score']:.1f}")
    print(f"   üéÆ Games Played: {int(stats['games_played'])}")
    print(f"   üìà Total Fantasy Points: {stats['total_fantasy_points']:,.0f}")
    print(f"   üìä Consistency (Std Dev): {stats['std_fantasy_score']:.1f}")
    print(f"   üèÄ Avg Stats: {stats['avg_points']:.1f}pts, {stats['avg_rebounds']:.1f}reb, {stats['avg_assists']:.1f}ast")
    print(f"   üìÖ Career: {stats['career_start'][:4]} - {stats['career_end'][:4]}")
    print("-" * 50)

üèÜ TOP 5 FANTASY BASKETBALL PLAYERS (All-Time)
1. Victor Wembanyama
   üéØ Avg Fantasy Score: 51.0
   üéÆ Games Played: 141
   üìà Total Fantasy Points: 7,193
   üìä Consistency (Std Dev): 20.7
   üèÄ Avg Stats: 21.6pts, 10.3reb, 3.6ast
   üìÖ Career: 2023 - 2025
--------------------------------------------------
2. Nikola Jokic
   üéØ Avg Fantasy Score: 50.4
   üéÆ Games Played: 909
   üìà Total Fantasy Points: 45,781
   üìä Consistency (Std Dev): 21.4
   üèÄ Avg Stats: 21.6pts, 10.7reb, 7.0ast
   üìÖ Career: 2015 - 2025
--------------------------------------------------
3. Luka Doncic
   üéØ Avg Fantasy Score: 50.3
   üéÆ Games Played: 548
   üìà Total Fantasy Points: 27,571
   üìä Consistency (Std Dev): 20.3
   üèÄ Avg Stats: 27.7pts, 8.3reb, 7.8ast
   üìÖ Career: 2018 - 2025
--------------------------------------------------
4. LeBron James
   üéØ Avg Fantasy Score: 49.0
   üéÆ Games Played: 2010
   üìà Total Fantasy Points: 98,496
   üìä Consistency (Std De