# The value of draft picks in the NBA

*The last code review date: April 2, 2024.*

The NBA Draft is an annual event where NBA teams select players from colleges, developmental leagues, or other professional leagues for their rosters. The selection order for the teams is determined based on the results of a lottery, with the team with the worst regular-season record typically having the best odds of getting the first pick. Consequently, the higher the pick, the better the chance for a team to acquire a talented player.

The history of the NBA Draft dates back to 1947 when the NBA was still a young organization, and the professional draft became a way to bring in new talent to the league.

The first NBA Draft took place on May 1, 1947, at the Commodore Hotel in New York. It consisted of three rounds with 11 teams participating. Teams selected players from universities as well as other leagues and organizations. The first-ever pick in NBA Draft history was Hall-of-Famer Clifton "Cliff" Barkley, selected by the Philadelphia Warriors (now Golden State Warriors).

Over time, the draft process has evolved. In 1966, the NBA introduced the draft lottery for teams that didn't make the playoffs, aiming to combat tanking (intentionally losing games to secure a higher draft pick).

Since 1985, the NBA has implemented a system of protected picks, allowing teams to protect one player from being chosen by other teams in the first round of the draft. This system has been modified and refined over the years.

Today, the NBA Draft consists of two rounds involving 30 teams. The draft order is determined by the regular-season standings, with teams with the worst records receiving the highest picks. Teams can also trade picks before or during the draft. Accordingly, the first-round picks range from 1 to 30, while the second-round picks range from 31 to 60.

In addition to being a valuable opportunity to acquire talented rookies, draft picks serve as assets for the following reasons. NBA players can switch teams through various methods, including:

1. Trades: One of the most common ways for players to move between teams. Teams can swap players with each other, involving one or multiple players, draft picks, and additional compensation. Trades must comply with league rules, including salary cap constraints and other restrictions.

2. Free Agency: After their contract expires, players have the option to become free agents and sign with any other team interested in their services. There are several types of free agency, such as unrestricted free agency, restricted free agency, and player rights restoration.

During NBA trades, teams typically exchange the following elements:

- Players: The primary components of the deal often include one or more players, ranging from star players to reserves or young talents that clubs decide to trade.
- Draft picks: Teams may include draft picks in the deal, allowing the other team to obtain the right to choose in a specific round of the draft.
- Additional compensation: Besides players and draft picks, teams may negotiate additional compensation, such as monetary funds or player rights.
- Salary cap considerations: In some cases, teams may include players with specific contract sizes to comply with NBA salary cap limits.
Every year, draft picks are traded depending on the needs and strategies of specific teams:

- Teams aiming for rebuilding or refreshing their roster may trade their experienced players for high-round draft picks. This can enable teams to acquire young and promising players for the future.
- Teams focused on immediate success may trade young players or late-round draft picks for experienced players who can help them in the current season.
For instance, in 1984, the Portland Trail Blazers traded the second overall pick (which was Michael Jordan) to the Chicago Bulls for Sam Bowie and cash. This trade is often considered one of the most unsuccessful in NBA history, as Jordan became a legend while Bowie failed to meet expectations.

In modern NBA, there is a growing trend of trades involving "star player for 5-8 draft picks." There are two polarized views within the fan community: draft picks are better than a star or good player because teams consistently pick valuable players under those picks. Or a star player is better because draft picks often end up being players who haven't realized their potential.

The research objective is to determine the value of a pick, that is, to understand which player is typically chosen: a bust or a superstar. This will help determine how fair the compensation in the form of draft picks is in the modern NBA market.

In [1]:
import pandas as pd  # Importing pandas library for data manipulation
import numpy as np  # Importing numpy library for numerical operations

import matplotlib.pyplot as plt  # Importing matplotlib for data visualization

import requests  # Importing requests library for making HTTP requests
import requests_cache  # Importing requests_cache for caching HTTP requests
from bs4 import BeautifulSoup  # Importing BeautifulSoup for web scraping

import datetime  # Importing datetime module for handling date and time
import time  # Importing time module for time-related functions
import random  # Importing random module for generating random numbers

from sklearn.cluster import KMeans  # Importing KMeans from sklearn.cluster for clustering

from itertools import chain  # Importing chain from itertools for iterating over elements of multiple lists

from tqdm import tqdm  # Importing tqdm for progress bar visualization

import sqlite3  # Importing sqlite3 for SQLite database operations

import re  # Importing re module for regular expressions

from IPython.display import display  # Importing display from IPython.display for displaying data in Jupyter Notebook

from io import StringIO  # Importing StringIO for working with string buffers

import warnings  # Importing warnings module for suppressing warnings during code execution


In [2]:
# Set display options to show all rows and columns
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)

In [3]:
# Suppressing SettingWithCopyWarning warnings
pd.options.mode.chained_assignment = None  # default='warn'

# Set the warning level to "ignore"
warnings.simplefilter(action='ignore', category=FutureWarning)

# Data web scraping for analysis.

## Scraping data from Basketball-Reference.com for all players drafted since 1950.

In [4]:
# Define current year
current_year = datetime.datetime.now().year  # Using datetime.datetime.now() to get current date and time, and .year attribute to extract the year

In [5]:
# Setting cache expiration duration to 1 month (30 days)
expire_after = datetime.timedelta(days=30)

# Installing cache for requests to prevent redundant API calls
requests_cache.install_cache('basketball_reference_com_draft_cache', expire_after=expire_after)

def scraping_data_NBA_Draft_players(date_start, date_finish, cache): 
    
    """
    Set cache expiration duration to 1 month (30 days).
    
    Parse_data_NBA_Draft_players - Parse NBA draft data from Basketball-Reference.com.

    Parameters:
        date_start (int): The starting year for data parsing.
        date_finish (int): The ending year for data parsing.
        cache (list): List of cached responses to prevent redundant API calls.

    Returns:
        tuple: A tuple containing two elements:
            - dict: A dictionary of DataFrames for each year within the specified range.
            - DataFrame: Combined DataFrame with draft data from all years within the specified range.

    This function parses NBA draft data from Basketball-Reference.com for the specified range of years.
    It retrieves draft tables for each year, extracts relevant information, and combines them into a single DataFrame.
    """
    
    # Creating a dictionary to store all tables
    all_df = {}

    def parse_basketball_reference(year):
        
        # Forming URL based on the year
        url = f'https://www.basketball-reference.com/draft/NBA_{year}.html'
        
        # Checking for URL presence in the cache
        url_to_find = re.escape(url)
        url_found = False
        for row in cache:
            row_str = str(row)
            if re.search(url_to_find, row_str):
                url_found = True
                break
        
        if not url_found:
            # Random delay between 7 to 9 seconds
            delay = random.randint(7, 9)
            time.sleep(delay)

        # Checking if the season is the current or next year
        if year in [current_year - 1, current_year]:
            # If current or next year, make request without caching
            response = requests.get(url)
        else:
            # Otherwise, use cached session
            with requests_cache.CachedSession('basketball_reference_com_draft_cache') as session:
                response = session.get(url)

        html_content = response.text

        # Using BeautifulSoup to extract the table
        soup = BeautifulSoup(html_content, 'html.parser')
        table = soup.find('table', {'id': 'stats'})
        headers = [th.get_text() for th in table.find('thead').find_all('th', {"scope": "col"})]

        rows = table.find('tbody').find_all('tr')

        # Parsing data
        data = []
        for row in rows:
            cols = row.find_all(['th', 'td'])
            cols = [ele.text.strip() for ele in cols]
            data.append([ele for ele in cols if ele])

        # Creating DataFrame
        df = pd.DataFrame(data, columns=headers)
        
        df['Draft'] = f'Draft_{year}'
        
        # Returning the formed DataFrame
        return df  
    
    # Using tqdm to display progress
    progress_bar = tqdm(range(date_start, date_finish + 1))
    
    # Processing data for each year in the specified range
    for year in progress_bar:
        df_name = f'df_{year}'
        all_df[df_name] = parse_basketball_reference(year)
        # Updating progress bar
        progress_bar.set_description(f'Processing data for year {year}')

    # Combining all tables into one
    all_data = pd.concat(all_df.values(), ignore_index=True) 

    all_data = all_data[all_data['Rk'] != 'Rk']
    
    # Returning a dictionary of all tables and the sorted combined table
    return all_df, all_data

# Checking for URL presence in the cache
conn = sqlite3.connect('basketball_reference_com_draft_cache.sqlite')
cursor = conn.cursor()
cursor.execute("SELECT * FROM responses ")
cache_basketball_reference_com_draft = cursor.fetchall()

# Closing the database connection
conn.close()

# Running the function
all_dfs_Draft, all_data_Draft_combined = scraping_data_NBA_Draft_players(1950, 2023, cache_basketball_reference_com_draft)

Processing data for year 2023: 100%|████████████| 74/74 [00:25<00:00,  2.93it/s]


## Scraping data from Basketball-Reference.com for all All-Star Games since 1950.

In [6]:
# Setting cache expiration duration to 1 month (30 days)
expire_after = datetime.timedelta(days=30)

# Installing cache for requests to prevent redundant API calls
requests_cache.install_cache('basketball_reference_com_allstar_game', expire_after=expire_after)

def scraping_data_NBA_allstar(date_start, date_finish, cache):

    """
    Set cache expiration duration to 1 month (30 days).

    Parse_data_NBA_Draft_players - Parse NBA draft data from Basketball-Reference.com.

    Parameters:
        date_start (int): The starting year for data parsing.
        date_finish (int): The ending year for data parsing.
        cache (list): List of cached responses to prevent redundant API calls.

    Returns:
        tuple: A tuple containing two elements:
            - dict: A dictionary of DataFrames for each year within the specified range.
            - DataFrame: Combined DataFrame with draft data from all years within the specified range.

    This function parses NBA draft data from Basketball-Reference.com for the specified range of years.
    It retrieves draft tables for each year, extracts relevant information, and combines them into a single DataFrame.
    """
    
    # Creating a dictionary to store all tables
    all_df = {}

    def scraping_basketball_reference(year):
        try:
            # Forming URL based on the year
            url = f'https://www.basketball-reference.com/allstar/NBA_{year}.html'
            
            # Escaping the URL to use in search operation
            url_to_find = re.escape(url)
        
            url_found = False
        
            # Checking if the URL is present in the cache
            for row in cache:
                row_str = str(row)  # Converting data to string
                if re.search(url_to_find, row_str):  # Searching for URL in the string
                    url_found = True
                    break
        
            if not url_found:
            # Random delay between 7 to 9 seconds to avoid overloading the server
                delay = random.randint(7, 9)
                time.sleep(delay)

            # Checking if the season is the current or next year
            if year in [current_year - 1, current_year]:
            # If current year or next, making request without caching
                response = requests.get(url)
            else:
                # Otherwise, using cached session
                with requests_cache.CachedSession('basketball_reference_com_allstar_game') as session:
                    response = session.get(url)

            html_content = response.text

            # Using BeautifulSoup to extract the table
            soup = BeautifulSoup(html_content, 'html.parser')

            # Determining the correct table ID based on the year
            name_West = 'West' if 1951 <= year <= 2017 else \
                        'Team Stephen' if year == 2018 else \
                        'Team Giannis' if year in [2019, 2020] else \
                        'Team Durant' if year in [2021, 2022] else \
                        'Giannis' if year == 2023 else \
                        'West' if year == 2024 else None 

            name_East = 'East' if 1951 <= year <= 2017 else \
                        'Team LeBron' if 2018 <= year <= 2021 else \
                        'Team Lebron' if year == 2022 else \
                        'LeBron' if year == 2023 else \
                        'East' if year == 2024 else None

            # Finding and extracting tables for Eastern and Western conferences
            table_1 = soup.find('table', {'id': name_East})
            headers_1 = [th.get_text() for th in table_1.find('thead').find_all('th', {"scope": "col"})]
            rows_1 = table_1.find('tbody').find_all('tr')

            table_2 = soup.find('table', {'id': name_West})
            headers_2 = [th.get_text() for th in table_2.find('thead').find_all('th', {"scope": "col"})]
            rows_2 = table_2.find('tbody').find_all('tr')

            # Parsing data from both tables
            data_1 = []
            for row in rows_1:
                cols = row.find_all(['th', 'td'])
                cols = [ele.text.strip() for ele in cols]
                data_1.append([ele for ele in cols if ele])

            data_2 = []
            for row in rows_2:
                cols = row.find_all(['th', 'td'])
                cols = [ele.text.strip() for ele in cols]
                data_2.append([ele for ele in cols if ele])

            # Creating DataFrames for both tables
            headers_df_1 = pd.DataFrame([headers_1], columns=headers_1)
            df_1 = pd.DataFrame(data_1, columns=headers_1)
            df_1 = pd.concat([headers_df_1, df_1], ignore_index=True)

            headers_df_2 = pd.DataFrame([headers_2], columns=headers_2)
            df_2 = pd.DataFrame(data_2, columns=headers_2)
            df_2 = pd.concat([headers_df_2, df_2], ignore_index=True)

            # Concatenating both DataFrames into one
            df = pd.concat([df_1, df_2], ignore_index=True)

            # Adding a column indicating the all-star year
            df['allstar_year'] = f'allstar_{year}'

            # Returning the formed DataFrame
            return df  
        except Exception as e:
            # Handling exceptions and printing error message
            print(f"Error processing year {year}: {e}")
            return None

    # Processing data for each year in the specified range
    for year in tqdm(range(date_start, date_finish + 1)):
        df_name = f'df_{year}'
        all_df[df_name] = scraping_basketball_reference(year)

    # Removing any empty values from the dictionary
    all_df = {k: v for k, v in all_df.items() if v is not None}

    # Combining all tables into one
    all_data = pd.concat(all_df.values(), ignore_index=True) 

    # Returning a dictionary of all tables and the combined table
    return all_df, all_data

# Checking for URL presence in the cache
conn = sqlite3.connect('basketball_reference_com_allstar_game.sqlite')
cursor = conn.cursor()
cursor.execute("SELECT * FROM responses ")
cache_basketball_reference_com_allstar_game = cursor.fetchall()

# Closing the database connection
conn.close()

# Example usage of parse_data_NBA_allstar function
all_dfs_allstar, all_data_combined_allstar = scraping_data_NBA_allstar(1951, 2024, cache_basketball_reference_com_allstar_game)

 68%|█████████████████████████████              | 50/74 [00:13<00:40,  1.69s/it]

Error processing year 1999: 'NoneType' object has no attribute 'find'


100%|███████████████████████████████████████████| 74/74 [00:19<00:00,  3.88it/s]


*In 1999, the NBA All-Star Game didn't happen because players went on strike due to a dispute between the National Basketball Association (NBA) and the players' union. The strike came about because they couldn't agree on tariffs and the terms of the collective bargaining agreement.*

**Next, I'm gonna classify each NBA All-Star Game appearance as either starting or coming off the bench. To do this, I'm going to add some new columns to the DataFrame to keep track of the gap between starters and reserves at each data point. Then, I'll categorize each player as a starter or a bench player based on that gap. Once that's done, your data will be good to go for some more in-depth analysis on All-Star Game participants and their team roles.**

In [7]:
# Adding a column 'str_Starters' and initializing it with zeros
all_data_combined_allstar['str_Starters'] = 0

# Iterating through the DataFrame rows
str_starters_distance = 0
for i in range(len(all_data_combined_allstar)):
    if all_data_combined_allstar.iloc[i]['Starters'] == 'Starters':
        str_starters_distance = 0
    else:
        str_starters_distance += 1
    all_data_combined_allstar.at[i, 'str_Starters'] = str_starters_distance

# Adding a column 'str_Reserves' and initializing it with zeros
all_data_combined_allstar['str_Reserves'] = 0

# Iterating through the DataFrame rows
str_reserves_distance = 0
for i in range(len(all_data_combined_allstar)):
    if all_data_combined_allstar.iloc[i]['Starters'] == 'Reserves':
        str_reserves_distance = 0
    else:
        str_reserves_distance += 1
    all_data_combined_allstar.at[i, 'str_Reserves'] = str_reserves_distance

# Resetting the index of the DataFrame
all_data_combined_allstar.reset_index(drop=True, inplace=True)

# Adding columns 'Starter' and 'Reserve' and initializing them with zeros
all_data_combined_allstar['Starter'] = 0
all_data_combined_allstar['Reserve'] = 0

# Iterating through the DataFrame rows
for i in range(len(all_data_combined_allstar)):
    if all_data_combined_allstar.loc[i, 'str_Starters'] <= 6:
        all_data_combined_allstar.at[i, 'Starter'] = 1
    elif all_data_combined_allstar.loc[i, 'str_Starters'] >= 6:
        all_data_combined_allstar.at[i, 'Reserve'] = 1

# Removing rows where 'Starters' column value is 'Totals'
all_data_combined_allstar = all_data_combined_allstar[all_data_combined_allstar['Starters'] != 'Totals']

# Removing rows where 'MP' column value is 'MP'
all_data_combined_allstar = all_data_combined_allstar[all_data_combined_allstar['MP'] != 'MP']

## Scraping data on awards:
- Finals Most Valuable (MVP) Player (Bill Russell Trophy)
- MVP & ABA Most Valuable Player Award Winners
- Rookie of the Year winner
- Defensive Player of the Year (Hakeem Olajuwon Trophy) Award Winners
- Twyman-Stokes Teammate of the Year Award Winners

In [8]:
# Setting cache expiration duration to 1 month (30 days)
expire_after = datetime.timedelta(days=30)
    
# Installing cache for requests to prevent redundant API calls
requests_cache.install_cache('basketball_reference_com_award_name', expire_after=expire_after)

def get_award_table(award_name, cache):
    
    """
    Set cache expiration duration to 1 month (30 days).

    get_award_table - Retrieve data for a specific basketball award from Basketball-Reference.com.

    Parameters:
        award_name (str): Name of the basketball award. It can be one of the following: 'MVP_final', 'MVP', 'ROY', 'DPOY', 'TMOY'.
        cache (list): List of cached responses to prevent redundant API calls.

    Returns:
        pandas.DataFrame or None: DataFrame containing the data for the specified award if found, otherwise None.

    This function retrieves data for a specific basketball award from Basketball-Reference.com.
    It accepts the name of the award and a cache list to prevent redundant API calls.
    The function first checks if the URL is present in the cache and if not, it makes a request
    with a cached session to the corresponding award page. It then parses the HTML content,
    extracts the table data based on the award's table ID, and creates a DataFrame.
    If the table is not found, it prints a message and returns None.

    Example:
        df_MVP_final = get_award_table('MVP_final', cache_basketball_reference_com_award_name)
    """

    # Dictionary containing URLs for various awards
    urls = {
         'MVP_final': 'https://www.basketball-reference.com/awards/finals_mvp.html',
         'MVP': 'https://www.basketball-reference.com/awards/mvp.html',
         'ROY': 'https://www.basketball-reference.com/awards/roy.html',
         'DPOY': 'https://www.basketball-reference.com/awards/dpoy.html',
         'TMOY': 'https://www.basketball-reference.com/awards/tmoy.html'
    }

    # Dictionary containing table IDs for various awards
    table_ids = {
        'MVP_final': 'finals_mvp_summary',
        'MVP': 'mvp_summary',
        'ROY': 'roy_NBA',
        'DPOY': 'dpoy_summary',
        'TMOY': 'tmoy_summary'
    }

    # Dictionary containing column names for various awards
    column_names = {
        'MVP_final': ['Player', 'Lg', 'MVP_final'],
        'MVP': ['Player', 'Lg', 'MVP'],
        'ROY': ['Season', 'Lg', 'Player', 'Voting', 'Age', 'Tm', 'G', 'MP', 'PTS', 'TRB', 'AST', 'STL', 'BLK', 'FG%', '3P%', 'FT%', 'WS', 'WS/48'],
        'DPOY': ['Player', 'Lg', 'DPOY'],
        'TMOY': ['Player', 'Lg', 'Teammate of the Year']
    }
    
    # Retrieving URL, table ID, and column names based on the award_name parameter
    url = urls.get(award_name)
    table_id = table_ids.get(award_name)
    columns = column_names.get(award_name)    
    
    # Checking if the URL is present in the cache
    url_to_find = re.escape(url)
    url_found = False
    for row in cache:
        row_str = str(row)  # Converting data to string
        if re.search(url_to_find, row_str):  # Searching for URL in the string
            url_found = True
            break
        
    if not url_found:
    # Random delay between 7 to 9 seconds to avoid overloading the server
        delay = random.randint(7, 9)
        time.sleep(delay)

    # Making request with cached session
    with requests_cache.CachedSession('basketball_reference_com_award_name') as session:
        response = session.get(url)

    html_content = response.text

    soup = BeautifulSoup(html_content, 'html.parser')
    table = soup.find('table', {'id': table_id})

    if table:
        # Extracting column headers
        headers = [th.get_text() for th in table.find('thead').find_all('th', {"scope": "col"})]
        # Extracting rows from the table
        rows = table.find('tbody').find_all('tr')

        data = []
        # Extracting data from rows
        for row in rows:
            cols = row.find_all(['th', 'td'])
            cols = [ele.text.strip() for ele in cols]
            data.append([ele for ele in cols if ele])

        # Creating DataFrame from extracted data
        df = pd.DataFrame(data, columns=headers)
        # Renaming columns based on predefined column names
        df.columns = columns

        return df
    else:
        # Handling the case where table is not found
        print("Table not found. Please check the URL or table ID.")

        
# Checking for URL presence in the cache
conn = sqlite3.connect('basketball_reference_com_award_name.sqlite')
cursor = conn.cursor()
cursor.execute("SELECT * FROM responses ")
cache_basketball_reference_com_award_name = cursor.fetchall()

# Closing the database connection
conn.close()
        
# Using the function to get tables for MVP_final, MVP, ROY, DPOY, and TMOY awards
df_MVP_final = get_award_table('MVP_final', cache_basketball_reference_com_award_name)
df_MVP = get_award_table('MVP', cache_basketball_reference_com_award_name)
df_ROY = get_award_table('ROY', cache_basketball_reference_com_award_name)
df_DPOY = get_award_table('DPOY', cache_basketball_reference_com_award_name)
df_TMOY = get_award_table('TMOY', cache_basketball_reference_com_award_name)

##  Scraping Naismith Memorial Basketball Hall of Fame Inductees.

In [9]:
def get_hof_player_names():
    
    """
    get_hof_player_names - Retrieve names of basketball players from the Basketball Hall of Fame.

    Returns:
        list: List of strings containing names of basketball players in the Hall of Fame.

    This function retrieves names of basketball players from the Basketball Hall of Fame page on Basketball-Reference.com.
    It scrapes the HTML content of the page, extracts the table containing player names, and filters out players.
    The function then processes the names to remove additional information such as affiliations or categories,
    resulting in a clean list of player names.

    Example:
        hof_player_names = get_hof_player_names()
    """
    
    # URL of the Basketball Hall of Fame page
    url_hof = 'https://www.basketball-reference.com/awards/hof.html'
    
    # Sending a GET request to fetch the HTML content
    response_hof = requests.get(url_hof)
    html_content_hof = response_hof.text
    
    # Parsing HTML content
    soup_hof = BeautifulSoup(html_content_hof, 'html.parser')
    
    # Finding the table containing player names
    table_hof = soup_hof.find('table', {'id': 'hof'})
    
    # Extracting column headers
    headers_hof = [th.get_text() for th in table_hof.find('thead').find_all('th', {"scope": "col"})]
    
    # Extracting rows from the table
    rows_hof = table_hof.find('tbody').find_all('tr')
    
    # Extracting data from the table rows
    data_hof = [[ele.text.strip() for ele in row.find_all(['th', 'td'])] for row in rows_hof]
    
    # Creating a DataFrame from the extracted data
    df_hof = pd.DataFrame(data_hof, columns=headers_hof)
    
    # Filtering only players from the DataFrame
    df_hof = df_hof[df_hof['Category'] == 'Player']
    
    # Cleaning player names by removing additional information
    for keyword in ['Player', 'WNBA', 'CBB', 'Coach', 'NBL', "Int'l"]:
        df_hof['Name'] = df_hof['Name'].str.split(keyword).str[0].str.strip()
    
    # Converting player names to a list
    hof_player_names = df_hof['Name'].tolist()
    
    return hof_player_names

# Calling the function to retrieve a list of Hall of Fame player names
hof_player_names = get_hof_player_names()

## NBA Champions Data.

*I couldn't find any nicely organized data online for all the NBA champions. The best I found was a table listing players who've won 2 or more times. Sure, I could've tried scraping data from basketball_reference_com, but they didn't have it neatly organized either, and that would've taken more time. So, in about 10 minutes, with some help from GPT, I whipped up this variable.*

In [10]:
data_champ = {
    'Minneapolis_Lakers_1950': 'Don Carlson, Arnie Ferrin, Normie Glick, Bud Grant, Bob Harrison, Billy Hassett, Tony Jaros, Slater Martin, George Mikan, Vern Mikkelsen, Jim Pollard, Herm Schaefer, Gene Stump, Paul Walther',
    'Rochester_Royals_1951': 'Bill Calhoun, Jack Coleman, Bob Davies, Red Holzman, Arnie Johnson, Joe McNamee, Ed Mikan, Paul Noel, Arnie Risen, Pep Saul, Bobby Wanzer',
    'Minneapolis_Lakers_1952': 'Bob Harrison, Lew Hitch, Joe Hutton, Slater Martin, George Mikan, Vern Mikkelsen, John Pilch, Jim Pollard, Pep Saul, Howie Schultz, Whitey Skoog',
    'Minneapolis_Lakers_1953': 'Bob Harrison, Lew Hitch, Jim Holstein, Slater Martin, George Mikan, Vern Mikkelsen, Jim Pollard, Pep Saul, Dick Schnittker, Howie Schultz, Whitey Skoog',
    'Minneapolis_Lakers_1954': 'Jim Fritsche, Bob Harrison, Jim Holstein, Clyde Lovellette, Slater Martin, George Mikan, Vern Mikkelsen, Jim Pollard, Pep Saul, Dick Schnittker, Whitey Skoog',
    'Syracuse_Nationals_1955': 'Dick Farley, Bill Gabor, Billy Kenville, Red Kerr, George King, Earl Lloyd, Jackie Moore, Wally Osterkorn, Red Rocha, Dolph Schayes, Paul Seymour, Connie Simmons, Jim Tucker',
    'Philadelphia_Warriors_1956': 'Paul Arizin, Ernie Beck, Walt Davis, George Dempsey, Jack George, Tom Gola, Joe Graboski, Larry Hennessy, Neil Johnston, Jackie Moore, Bob Schafer',
    'Boston_Celtics_1957': 'Bob Cousy, Tom Heinsohn, Dick Hemric, Jim Loscutoff, Jack Nichols, Togo Palazzi, Andy Phillip, Frank Ramsey, Arnie Risen, Bill Russell, Bill Sharman, Lou Tsioropoulos',
    'St_Louis_Hawks_1958': 'Jack Coleman, Walt Davis, Cliff Hagan, Ed Macauley, Slater Martin, Jack McMahon, Red Morrison, Med Park, Worthy Patterson, Bob Pettit, Frank Selvy, Chuck Share, Win Wilfong',
    'Boston_Celtics_1959': 'Bob Cousy, Tom Heinsohn, K.C. Jones, Sam Jones, Jim Loscutoff, Frank Ramsey, Bill Russell, Bill Sharman, Bennie Swain, Lou Tsioropoulos',
    'Boston_Celtics_1960': 'Bob Cousy, Gene Guarilia, Tom Heinsohn, K.C. Jones, Sam Jones, Maurice King, Jim Loscutoff, Frank Ramsey, John Richter, Bill Russell, Bill Sharman',
    'Boston_Celtics_1961': 'Bob Cousy, Gene Conley, Gene Guarilia, Tom Heinsohn, K.C. Jones, Sam Jones, Jim Loscutoff, Frank Ramsey, Bill Russell, Tom Sanders, Bill Sharman',
    'Boston_Celtics_1962': 'Carl Braun, Al Butler, Bob Cousy, Gene Guarilia, Tom Heinsohn, K.C. Jones, Sam Jones, Jim Loscutoff, Gary Phillips, Frank Ramsey, Bill Russell, Tom Sanders',
    'Boston_Celtics_1963': 'Bob Cousy, Jack Foley, Gene Guarilia, John Havlicek, Tom Heinsohn, K.C. Jones, Sam Jones, Jim Loscutoff, Clyde Lovellette, Frank Ramsey, Bill Russell, Tom Sanders, Dan Swartz',
    'Boston_Celtics_1964': 'John Havlicek, Tom Heinsohn, K.C. Jones, Sam Jones, Jim Loscutoff, Clyde Lovellette, Johnny McCarthy, Willie Naulls, Frank Ramsey, Bill Russell, Tom Sanders, Larry Siegfried',
    'Boston_Celtics_1965': 'Ron Bonham, Mel Counts, John Havlicek, Tom Heinsohn, K.C. Jones, Sam Jones, Willie Naulls, Bevo Nordmann, Bill Russell, Tom Sanders, Larry Siegfried, John Thompson, Gerry Ward',
    'Boston_Celtics_1966': 'Ron Bonham, Mel Counts, Si Green, John Havlicek, K.C. Jones, Sam Jones, Willie Naulls, Don Nelson, Bill Russell, Tom Sanders, Woody Sauldsberry, Larry Siegfried, John Thompson, Ron Watts',
    'Philadelphia_76ers_1967': 'Wilt Chamberlain, Larry Costello, Billy Cunningham, Dave Gambee, Hal Greer, Matt Guokas, Luke Jackson, Wali Jones, Bill Melchionni, Chet Walker, Bob Weiss',
    'Boston_Celtics_1968': 'Wayne Embry, Mal Graham, John Havlicek, Bailey Howell, Johnny Jones, Sam Jones, Don Nelson, Bill Russell, Tom Sanders, Larry Siegfried, Tom Thacker, Rick Weitzman',
    'Boston_Celtics_1969': 'Jim Barnes, Em Bryant, Don Chaney, Mal Graham, John Havlicek, Bailey Howell, Rich Johnson, Sam Jones, Don Nelson, Bud Olsen, Bill Russell, Tom Sanders, Larry Siegfried',
    'New_York Knicks_1970': 'Dick Barnett, Nate Bowman, Bill Bradley, Dave DeBusschere, Walt Frazier, Bill Hosket, Don May, Willis Reed, Mike Riordan, Cazzie Russell, Dave Stallworth, John Warren',
    'Milwaukee_Bucks_1971': 'Kareem Abdul Jabbar, Lucius Allen, Bob Boozer, Dick Cunningham, Bob Dandridge, Gary Freeman, Bob Greacen, Jon McGlocklin, McCoy McLemore, Oscar Robertson, Greg Smith, Jeff Webb, Marv Winkler, Bill Zopf',
    'Los_Angeles_Lakers_1972': 'Elgin Baylor, Wilt Chamberlain, Jim Cleamons, Leroy Ellis, Keith Erickson, Gail Goodrich, Happy Hairston, Jim McMillian, Pat Riley, Flynn Robinson, John Trapp, Jerry West',
    'New_York_Knicks_1973': 'Dick Barnett, Henry Bibby, Bill Bradley, Dave DeBusschere, Walt Frazier, John Gianelli, Phil Jackson, Jerry Lucas, Dean Meminger, Earl Monroe, Luther Rackley, Willis Reed, Tom Riker, Harthorne Wingo',
    'Boston_Celtics_1974': 'Don Chaney, Dave Cowens, Steve Downing, Hank Finkel, Phil Hankinson, John Havlicek, Steve Kuberski, Don Nelson, Paul Silas, Paul Westphal, Jo Jo White, Art Williams',
    'Golden_State_Warriors_1975': 'Rick Barry, Butch Beard, Steve Bracey, Bill Bridges, Derrek Dickey, Charles Dudley, Charles Johnson, George Johnson, Frank Kendrick, Jeff Mullins, Clifford Ray, Phil Smith, Jamaal Wilkes',
    'Boston_Celtics_1976': 'Jo Jo White, Dave Cowens, John Havlicek, Charlie Scott, Paul Silas, Steve Kuberski, Don Nelson, Jim Ard, Glenn McDonald, Kevin Stacom, Jerome Anderson, Tom Boswell',
    'Portland_Trail_Blazers_1977': 'Corky Calhoun, Johnny Davis, Herm Gilliam, Bob Gross, Lionel Hollins, Robin Jones, Maurice Lucas, Clyde Mayes, Lloyd Neal, Larry Steele, Dave Twardzik, Wally Walker, Bill Walton',
    'Washington_Bullets_1978': 'Greg Ballard, Phil Chenier, Bob Dandridge, Kevin Grevey, Elvin Hayes, Tom Henderson, Charles Johnson, Mitch Kupchak, Joe Pace, Wes Unseld, Phil Walker, Larry Wright',
    'Seattle_SuperSonics_1979': 'Dennis Awtrey, Fred Brown, Lars Hansen, Joe Hassett, Dennis Johnson, John Johnson, Tom LaGarde, Jackie Robinson, Lonnie Shelton, Jack Sikma, Paul Silas, Dick Snyder, Wally Walker, Gus Williams',
    'Los_Angeles_Lakers_1980': 'Kareem Abdul Jabbar, Ron Boone, Marty Byrnes, Kenny Carr, Jim Chones, Michael Cooper, Don Ford, Spencer Haywood, Brad Holland, Magic Johnson, Mark Landsberger, Butch Lee, Ollie Mack, Norm Nixon, Jamaal Wilkes',
    'Boston_Celtics_1981': 'Tiny Archibald, Larry Bird, M.L. Carr, Terry Duerod, Eric Fernsten, Chris Ford, Gerald Henderson, Wayne Kreklow, Cedric Maxwell, Kevin McHale, Robert Parish, Rick Robey',
    'Los_Angeles_Lakers_1982': 'Kareem Abdul Jabbar, Jim Brewer, Michael Cooper, Clay Johnson, Magic Johnson, Eddie Jordan, Mitch Kupchak, Mark Landsberger, Bob McAdoo, Mike McGee, Kevin McKenna, Norm Nixon, Kurt Rambis, Jamaal Wilkes',
    'Philadelphia_76ers_1983': 'J.J. Anderson, Maurice Cheeks, Earl Cureton, Franklin Edwards, Julius Erving, Marc Iavaroni, Clemon Johnson, Reggie Johnson, Bobby Jones, Moses Malone, Mark McNamara, Clint Richardson, Russ Schoene, Andrew Toney',
    'Boston_Celtics_1984': 'Danny Ainge, Larry Bird, Quinn Buckner, M.L. Carr, Carlos Clark, Gerald Henderson, Dennis Johnson, Greg Kite, Cedric Maxwell, Kevin McHale, Robert Parish, Scott Wedman',
    'Los_Angeles_Lakers_1985': 'Kareem Abdul Jabbar, Michael Cooper, Magic Johnson, Earl Jones, Mitch Kupchak, Ronnie Lester, Bob McAdoo, Mike McGee, Chuck Nevitt, Kurt Rambis, Byron Scott, Larry Spriggs, Jamaal Wilkes, James Worthy',
    'Boston_Celtics_1986': 'Danny Ainge, Larry Bird, Rick Carlisle, Dennis Johnson, Greg Kite, Kevin McHale, Robert Parish, Jerry Sichting, David Thirdkill, Sam Vincent, Bill Walton, Scott Wedman, Sly Williams',
    'Los_Angeles_Lakers_1987': 'Kareem Abdul Jabbar, Adrian Branch, Frank Brickowski, Michael Cooper, A.C. Green, Magic Johnson, Wes Matthews, Kurt Rambis, Byron Scott, Mike Smrek, Billy Thompson, Mychal Thompson, James Worthy',
    'Los_Angeles_Lakers_1988': 'Kareem Abdul Jabbar, Tony Campbell, Michael Cooper, A.C. Green, Magic Johnson, Jeff Lamp, Wes Matthews, Kurt Rambis, Byron Scott, Mike Smrek, Billy Thompson, Mychal Thompson, Ray Tolbert, Milt Wagner, James Worthy',
    'Detroit_Pistons_1989': 'Mark Aguirre, Adrian Dantley, Darryl Dawkins, Fennis Dembo, Joe Dumars, James Edwards, Steve Harris, Vinnie Johnson, Bill Laimbeer, John Long, Rick Mahorn, Pace Mannion, Dennis Rodman, Jim Rowinski, John Salley, Isiah Thomas, Micheal Williams',
    'Detroit_Pistons_1990': 'Mark Aguirre, William Bedford, Joe Dumars, James Edwards, Dave Greenwood, Scott Hastings, Gerald Henderson, Vinnie Johnson, Stan Kimbrough, Bill Laimbeer, Ralph Lewis, Dennis Rodman, John Salley, Isiah Thomas',
    'Chicago_Bulls_1991': 'B.J. Armstrong, Bill Cartwright, Horace Grant, Craig Hodges, Dennis Hopson, Michael Jordan, Stacey King, Cliff Levingston, John Paxson, Will Perdue, Scottie Pippen, Scott Williams',
    'Chicago_Bulls_1992': 'B.J. Armstrong, Bill Cartwright, Horace Grant, Bob Hansen, Craig Hodges, Dennis Hopson, Michael Jordan, Stacey King, Cliff Levingston, Chuck Nevitt, John Paxson, Will Perdue, Scottie Pippen, Mark Randall, Rory Sparrow, Scott Williams',
    'Chicago_Bulls_1993': 'B.J. Armstrong, Ricky Blanton, Bill Cartwright, Joe Courtney, Jo Jo English, Horace Grant, Michael Jordan, Stacey King, Rodney McCray, Ed Nealy, John Paxson, Will Perdue, Scottie Pippen, Trent Tucker, Darrell Walker, Corey Williams, Scott Williams',
    'Houston_Rockets_1994': 'Scott Brooks, Matt Bullard, Sam Cassell, Earl Cureton, Mario Elie, Carl Herrera, Robert Horry, Chris Jent, Vernon Maxwell, Hakeem Olajuwon, Richard Petruska, Eric Riley, Larry Robinson, Kenny Smith, Otis Thorpe',
    'Houston_Rockets_1995': 'Tim Breaux, Scott Brooks, Chucky Brown, Adrian Caldwell, Sam Cassell, Pete Chilcutt, Clyde Drexler, Mario Elie, Carl Herrera, Robert Horry, Charles Jones, Vernon Maxwell, Tracy Murray, Hakeem Olajuwon, Kenny Smith, Žan Tabak, Otis Thorpe',
    'Chicago_Bulls_1996': 'Randy Brown, Jud Buechler, Jason Caffey, James Edwards, Jack Haley, Ron Harper, Michael Jordan, Steve Kerr, Toni Kukoc, Luc Longley, Scottie Pippen, Dennis Rodman, John Salley, Dickey Simpkins, Bill Wennington',
    'Chicago_Bulls_1997': 'Randy Brown, Jud Buechler, Jason Caffey, Bison Dele, Ron Harper, Michael Jordan, Steve Kerr, Toni Kukoč, Luc Longley, Robert Parish, Scottie Pippen, Dennis Rodman, Dickey Simpkins, Matt Steigenga, Bill Wennington',
    'Chicago_Bulls_1998': 'Keith Booth, Randy Brown, Jud Buechler, Scott Burrell, Jason Caffey, Ron Harper, Michael Jordan, Steve Kerr, Joe Kleine, Toni Kukoč, Rusty LaRue, Luc Longley, Scottie Pippen, Dennis Rodman, Dickey Simpkins, David Vaughn, Bill Wennington',
    'San_Antonio_Spurs_1999': 'Antonio Daniels, Tim Duncan, Mario Elie, Sean Elliott, Andrew Gaze, Jaren Jackson, Avery Johnson, Steve Kerr, Jerome Kersey, Gerard King, Will Perdue, David Robinson, Malik Rose, Brandon Williams',
    'Los_Angeles_Lakers_2000': "Kobe Bryant, John Celestand, Derek Fisher, Rick Fox, Devean George, A.C. Green, Ron Harper, Robert Horry, Sam Jacobson, Travis Knight, Tyronn Lue, Shaquille O'Neal, Glen Rice, John Salley, Brian Shaw",
    'Los_Angeles_Lakers_2001': "Kobe Bryant, Derek Fisher, Greg Foster, Rick Fox, Devean George, Horace Grant, Ron Harper, Robert Horry, Tyronn Lue, Mark Madsen, Stanislav Medvedenko, Shaquille O'Neal, Mike Penberthy, Isaiah Rider, Brian Shaw",
    'Los_Angeles_Lakers_2002': "Kobe Bryant, Joe Crispin, Derek Fisher, Rick Fox, Devean George, Robert Horry, Lindsey Hunter, Mark Madsen, Jelani McCoy, Stanislav Medvedenko, Shaquille O'Neal, Mike Penberthy, Mitch Richmond, Brian Shaw, Samaki Walker",
    'San_Antonio_Spurs_2003': "Mengke Bateer, Bruce Bowen, Devin Brown, Speedy Claxton, Tim Duncan, Danny Ferry, Manu Ginóbili, Anthony Goldwire, Stephen Jackson, Steve Kerr, Tony Parker, David Robinson, Malik Rose, Steve Smith, Kevin Willis",
    'Detroit_Pistons_2004': 'Chucky Atkins, Chauncey Billups, Elden Campbell, Hubert Davis, Tremaine Fowlkes, Darvin Ham, Richard Hamilton, Lindsey Hunter, Mike James, Darko Miličić, Mehmet Okur, Tayshaun Prince, Željko Rebrača, Bob Sura, Ben Wallace, Rasheed Wallace, Corliss Williamson',
    'San_Antonio_Spurs_2005': 'Brent Barry, Bruce Bowen, Devin Brown, Tim Duncan, Manu Ginobili, Dion Glover, Robert Horry, Linton Johnson, Sean Marks, Tony Massenburg, Nazr Mohammed, Rasho Nesterovic, Tony Parker, Glenn Robinson, Malik Rose, Beno Udrih, Mike Wilks',
    'Miami_Heat_2006': "Derek Anderson, Shandon Anderson, Earl Barron, Michael Doleac, Gerald Fitch, Udonis Haslem, Jason Kapono, Alonzo Mourning, Shaquille O'Neal, Gary Payton, James Posey, Wayne Simien, Dwyane Wade, Antoine Walker, Matt Walsh, Jason Williams, Dorell Wright",
    'San_Antonio_Spurs_2007': 'Brent Barry, Matt Bonner, Bruce Bowen, Jackie Butler, Tim Duncan, Francisco Elson, Melvin Ely, Michael Finley, Manu Ginóbili, Robert Horry, Fabricio Oberto, Tony Parker, Beno Udrih, Jacque Vaughn, James White, Eric Williams',
    'Boston_Celtics_2008': 'Ray Allen, Tony Allen, P.J. Brown, Sam Cassell, Glen Davis, Kevin Garnett, Eddie House, Kendrick Perkins, Paul Pierce, Scot Pollard, James Posey, Leon Powe, Gabe Pruitt, Rajon Rondo, Brian Scalabrine',
    'Los_Angeles_Lakers_2009': 'Trevor Ariza, Shannon Brown, Kobe Bryant, Andrew Bynum, Jordan Farmar, Derek Fisher, Pau Gasol, D.J. Mbenga, Chris Mihm, Adam Morrison, Lamar Odom, Josh Powell, Vladimir Radmanović, Sasha Vujačić, Luke Walton, Sun Yue',
    'Los_Angeles_Lakers_2010': 'Shannon Brown, Kobe Bryant, Andrew Bynum, Jordan Farmar, Derek Fisher, Pau Gasol, D.J. Mbenga, Adam Morrison, Lamar Odom, Josh Powell, Sasha Vujacic, Luke Walton, Metta World Peace',
    'Dallas_Mavericks_2011': 'Alexis Ajinça, J.J. Barea, Rodrigue Beaubois, Corey Brewer, Caron Butler, Brian Cardinal, Tyson Chandler, Brendan Haywood, Dominique Jones, Jason Kidd, Ian Mahinmi, Shawn Marion, Steve Novak, Dirk Nowitzki, Sasha Pavlović, DeShawn Stevenson, Peja Stojaković, Jason Terry',
    'Miami_Heat_2012': 'Joel Anthony, Shane Battier, Chris Bosh, Mario Chalmers, Norris Cole, Eddy Curry, Mickell Gladness, Terrel Harris, Udonis Haslem, Juwan Howard, LeBron James, James Jones, Mike Miller, Dexter Pittman, Ronny Turiaf, Dwyane Wade',
    'Miami_Heat_2013': 'Ray Allen, Chris Andersen, Joel Anthony, Shane Battier, Chris Bosh, Mario Chalmers, Norris Cole, Josh Harrellson, Terrel Harris, Udonis Haslem, Juwan Howard, LeBron James, James Jones, Rashard Lewis, Mike Miller, Dexter Pittman, Jarvis Varnado, Dwyane Wade',
    'San_Antonio_Spurs_2014': 'Jeff Ayres, Aron Baynes, Marco Belinelli, Matt Bonner, Shannon Brown, Austin Daye, Nando De Colo, Boris Diaw, Tim Duncan, Manu Ginóbili, Danny Green, Damion James, Othyus Jeffers, Cory Joseph, Kawhi Leonard, Patty Mills, Tony Parker, Tiago Splitter, Malcolm Thomas',
    'Golden_State_Warriors_2015': "Stephen Curry, Klay Thompson, Draymond Green, Andre Iguodala, Harrison Barnes, Andrew Bogut, David Lee, Shaun Livingston, Leandro Barbosa, Festus Ezeli, Justin Holiday, Ognjen Kuzmić, James Michael McAdoo, Brandon Rush, Marreese Speights",
    'Cleveland_Cavaliers_2016': 'Jared Cunningham, Matthew Dellavedova, Channing Frye, Joe Harris, Kyrie Irving, LeBron James, Richard Jefferson, Dahntay Jones, James Jones, Sasha Kaun, Kevin Love, Jordan McRae, Timofey Mozgov, Iman Shumpert, J.R. Smith, Tristan Thompson, Anderson Varejão, Mo Williams',
    'Golden_State_Warriors_2017': 'Matt Barnes, Ian Clark, Stephen Curry, Kevin Durant, Draymond Green, Andre Iguodala, Damian Jones, Shaun Livingston, Kevon Looney, James Michael McAdoo, Patrick McCaw, JaVale McGee, Zaza Pachulia, Klay Thompson, Anderson Varejão, Briante Weber, David West',
    'Golden_State_Warriors_2018': 'Jordan Bell, Chris Boucher, Omri Casspi, Quinn Cook, Stephen Curry, Kevin Durant, Draymond Green, Andre Iguodala, Damian Jones, Shaun Livingston, Kevon Looney, Patrick McCaw, JaVale McGee, Zaza Pachulia, Klay Thompson, David West, Nick Young',
    'Toronto_Raptors_2019': 'OG Anunoby, Chris Boucher, Lorenzo Brown, Marc Gasol, Danny Green, Serge Ibaka, Kawhi Leonard, Jeremy Lin, Kyle Lowry, Jordan Loyd, Patrick McCaw, Jodie Meeks, C.J. Miles, Malcolm Miller, Greg Monroe, Eric Moreland, Norman Powell, Malachi Richardson, Pascal Siakam, Jonas Valančiūnas, Fred VanVleet, Delon Wright',
    'Los_Angeles_Lakers_2020': 'Kostas Antetokounmpo, Avery Bradley, Devontae Cacok, Kentavious Caldwell Pope, Alex Caruso, Quinn Cook, Troy Daniels, Anthony Davis, Jared Dudley, Danny Green, Talen Horton Tucker, Dwight Howard, LeBron James, Kyle Kuzma, JaVale McGee, Markieff Morris, Zach Norvell, Rajon Rondo, J.R. Smith, Dion Waiters',
    'Milwaukee_Bucks_2021': 'Jaylen Adams, Giannis Antetokounmpo, Thanasis Antetokounmpo, D.J. Augustin, Elijah Bryant, Pat Connaughton, Torrey Craig, Mamadi Diakite, Donte DiVincenzo, Bryn Forbes, Jrue Holiday, Justin Jackson, Rodions Kurucs, Brook Lopez, Sam Merrill, Khris Middleton, Jordan Nwora, Bobby Portis, Jeff Teague, Axel Toupane, P.J. Tucker, D.J. Wilson',
    'Golden_State_Warriors_2022': 'Nemanja Bjelica, Chris Chiozza, Stephen Curry, Jeff Dowtin, Draymond Green, Andre Iguodala, Jonathan Kuminga, Damion Lee, Kevon Looney, Moses Moody, Gary Payton II, Jordan Poole, Otto Porter Jr., Klay Thompson, Juan Toscano-Anderson, Quinndary Weatherspoon, Andrew Wiggins',
    'Denver_Nuggets_2023': 'Christian Braun, Bruce Brown, Thomas Bryant, Kentavious Caldwell Pope, Vlatko Čančar, Aaron Gordon, Jeff Green, Bones Hyland, Reggie Jackson, Nikola Jokić, DeAndre Jordan, Jamal Murray, Zeke Nnaji, Michael Porter Jr., Davon Reed, Ish Smith, Peyton Watson, Jack White'
}

In [11]:
# Creating a list of all players without using for loop and split
players = list(chain.from_iterable([players_list.split(', ') for players_list in data_champ.values()]))

# Creating DataFrame and counting the number of mentions for each player
df_count_rings_for_player = pd.DataFrame(players, columns=['Player'])
df_count_rings_for_player['Ring_Count'] = df_count_rings_for_player['Player'].map(df_count_rings_for_player['Player'].value_counts())
df_count_rings_for_player = df_count_rings_for_player.drop_duplicates().reset_index(drop=True)
# Replacing 'Kareem Abdul Jabbar' with 'Kareem Abdul-Jabbar' for consistency
df_count_rings_for_player['Player'] = df_count_rings_for_player['Player'].replace('Kareem Abdul Jabbar', 'Kareem Abdul-Jabbar')

# Sorting the DataFrame by the 'Ring_Count' column in descending order
df_count_rings_for_player = df_count_rings_for_player.sort_values(by='Ring_Count', ascending=False)

#  Merging Tables and Processing Data.

In [12]:
# Grouping the data by the 'Starters' column and aggregating the sum of 'Starter' and 'Reserve' columns
new_df = all_data_combined_allstar.groupby('Starters').agg({'Starter': 'sum', 'Reserve': 'sum'}).reset_index()

# Renaming the columns of the new DataFrame
new_df.columns = ['Player', 'Starter_sum', 'Reserve_sum']

# Merging the new DataFrame with the 'all_data_Draft_combined' DataFrame based on the 'Player' column using a left join
df_all_info_players = all_data_Draft_combined.merge(new_df, on='Player', how='left')

# Merging additional tables with player information into 'df_all_info_players'
dfs_to_merge = [df_count_rings_for_player[['Player', 'Ring_Count']],
                df_MVP_final[['Player', 'MVP_final']],
                df_MVP[['Player', 'MVP']],
                df_DPOY[['Player', 'DPOY']],
                df_TMOY[['Player', 'Teammate of the Year']]]

for df in dfs_to_merge:
    df_all_info_players = df_all_info_players.merge(df, on='Player', how='left')

# Checking if each player is Rookie of the Year (ROY) or in the Hall of Fame
df_all_info_players['ROY'] = df_all_info_players['Player'].isin(df_ROY[['Player']]).astype(int)
df_all_info_players['Hall of Fame'] = df_all_info_players['Player'].isin(hof_player_names).astype(int)

# Renaming columns of the DataFrame for clarity and consistency
new_column_names = [
    'Rk_1', 'Overall Pick', 'Tm', 'Player', 'College', 'Yrs', 'G', 'MP', 'PTS', 'TRB',
    'AST', 'FG%', '3P%', 'FT%', 'MP_per_game', 'PTS_per_game', 'TRB_per_game', 'AST_per_game', 'WS', 'WS/48', 'BPM',
    'VORP', 'Draft', 'Starter_sum', 'Reserve_sum', 'Ring_Count', 'MVP_final', 'MVP', 'DPOY', 'Teammate of the Year', 'ROY', 'Hall of Fame'
]

df_all_info_players.columns = new_column_names

In [13]:
def shift_data_by_condition(df, condition, start_column, end_column):
    """
    Shifts data in a DataFrame based on specified conditions.

    Args:
        df (pandas DataFrame): Input DataFrame to perform data shifting on.
        condition (boolean Series): Condition to select rows for shifting data.
        start_column (str): Name of the column to start shifting the data.
        end_column (str): Name of the column to end shifting the data.

    Returns:
        pandas DataFrame: DataFrame with data shifted based on the given conditions.

    This function shifts the data in the specified columns of the DataFrame based on the provided condition.
    It selects rows of the DataFrame that meet the condition and shifts the data in the columns
    from the start_column to the end_column by one position to the right.
    The function fills any empty positions with None.

    Example:
        df_shifted = shift_data_by_condition(df_all_info_players,
            (df_all_info_players['Overall Pick'].notnull()) & (df_all_info_players['Tm'].str.len() != 3),
            'Rk_1', 'VORP'
        )
    """
    columns_to_shift = df.columns[df.columns.get_loc(start_column):df.columns.get_loc(end_column) + 1]
    df.loc[condition, columns_to_shift] = df.loc[condition, columns_to_shift].shift(1, axis=1, fill_value=None)
    return df

# Applying the function 4 times with different conditions
df_all_info_players = shift_data_by_condition(df_all_info_players,
    (df_all_info_players['Overall Pick'].notnull()) & (df_all_info_players['Tm'].str.len() != 3),
    'Rk_1', 'VORP'
)

df_all_info_players = shift_data_by_condition(df_all_info_players,
    (df_all_info_players['Overall Pick'].notnull()) & (df_all_info_players['College'].str.len() < 3),
    'College', 'VORP'
)

df_all_info_players = shift_data_by_condition(df_all_info_players,
    (df_all_info_players['Overall Pick'].notnull()) & (df_all_info_players['MP'].notnull()) & (df_all_info_players['3P%'].notnull()) & (df_all_info_players['WS/48'].isnull()),
    '3P%', 'WS/48'
)

df_all_info_players = shift_data_by_condition(df_all_info_players,
    (df_all_info_players['Overall Pick'].notnull()) & (df_all_info_players['MP'].notnull()) & (df_all_info_players['BPM'].notnull()) & (df_all_info_players['VORP'].isnull()),
    '3P%', 'VORP'
)

In [14]:
# Assigning 'Overall Pick' to 'Rk_1' if 'Rk_1' is null
df_all_info_players['Rk_1'] = df_all_info_players.apply(lambda row: row['Overall Pick'] if pd.isnull(row['Rk_1']) else row['Rk_1'], axis=1)

# Removing the row with 'Rk' value from the DataFrame
df_all_info_players = df_all_info_players[df_all_info_players['Overall Pick'] != 'Rk'].reset_index()

# Replacing all None values with 0
df_all_info_players.fillna(value=0, inplace=True)

# Defining columns with numeric integer and float types
cols_numeric_int = ['Yrs', 'G', 'MP', 'PTS', 'TRB', 'AST', 'Starter_sum', 'Reserve_sum', 'Ring_Count', 'MVP_final', 'MVP', 'DPOY', 'Teammate of the Year', 'ROY', 'Hall of Fame']
cols_numeric_float = ['MP_per_game', 'PTS_per_game', 'TRB_per_game', 'AST_per_game']

# Converting columns to numeric types and filling NaN values with 0
df_all_info_players[cols_numeric_int] = df_all_info_players[cols_numeric_int].apply(pd.to_numeric, errors='coerce').fillna(0).astype(int)
df_all_info_players[cols_numeric_float] = df_all_info_players[cols_numeric_float].apply(pd.to_numeric, errors='coerce').fillna(0).astype(float)

# Converting 'Starter_sum' and 'Reserve_sum' columns to integer type
df_all_info_players[['Starter_sum', 'Reserve_sum']] = df_all_info_players[['Starter_sum', 'Reserve_sum']].astype(int)

Let's check the data.

In [15]:
df_all_info_players.describe()

Unnamed: 0,index,Yrs,G,MP,PTS,TRB,AST,MP_per_game,PTS_per_game,TRB_per_game,AST_per_game,Starter_sum,Reserve_sum,Ring_Count,MVP_final,MVP,DPOY,Teammate of the Year,ROY,Hall of Fame
count,8533.0,8533.0,8533.0,8533.0,8533.0,8533.0,8533.0,8533.0,8533.0,8533.0,8533.0,8533.0,8533.0,8533.0,8533.0,8533.0,8533.0,8533.0,8533.0,8533.0
mean,4266.0,2.423181,138.619126,3366.6803,1475.21868,632.9257,326.498887,7.140642,2.91411,1.31902,0.645759,0.08262,0.110278,0.111215,0.006446,0.008672,0.00457,0.001289,0.0,0.016758
std,2463.409257,4.159289,269.437092,7597.542471,3699.029104,1625.982165,932.119846,10.333821,4.773412,2.180069,1.222964,0.740208,0.628592,0.535371,0.122793,0.153624,0.094889,0.044619,0.0,0.128373
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-0.3,-1.264,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,2133.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,4266.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,6399.0,3.0,125.0,1685.0,573.0,266.0,112.0,13.7,4.7,2.2,0.8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
max,8532.0,22.0,1611.0,57446.0,40067.0,23924.0,15806.0,45.8,30.1,22.9,11.2,20.0,12.0,11.0,6.0,6.0,4.0,3.0,0.0,1.0


In [16]:
df_all_info_players.sort_values(by='AST_per_game', ascending=False).head(100)

Unnamed: 0,index,Rk_1,Overall Pick,Tm,Player,College,Yrs,G,MP,PTS,TRB,AST,FG%,3P%,FT%,MP_per_game,PTS_per_game,TRB_per_game,AST_per_game,WS,WS/48,BPM,VORP,Draft,Starter_sum,Reserve_sum,Ring_Count,MVP_final,MVP,DPOY,Teammate of the Year,ROY,Hall of Fame
4500,4500,1,1,LAL,Magic Johnson,Michigan State,13,906,33245,17707,6559,10141,0.52,0.303,0.848,36.7,19.5,7.2,11.2,155.8,0.225,7.5,80.0,Draft_1979,10,1,5,3,3,0,0,0,1
5650,5650,16,16,UTA,John Stockton,Gonzaga,19,1504,47764,19711,4051,15806,0.515,0.384,0.826,31.8,13.1,2.7,10.5,207.7,0.209,6.8,106.5,Draft_1984,4,6,0,0,0,0,0,0,1
8171,8171,5,5,DAL,Trae Young,Oklahoma,6,404,13803,10334,1442,3838,0.436,0.354,0.873,34.2,25.6,3.6,9.5,37.3,0.13,3.0,17.4,Draft_2018,2,1,0,0,0,0,0,0,0
1120,1120,1,1,CIN,Oscar Robertson,Cincinnati,14,1040,43886,26710,7804,9887,0.485,0.0,0.838,42.2,25.7,7.5,9.5,189.2,0.207,-0.1,1.2,Draft_1960,10,2,1,0,1,0,0,0,1
7377,7377,4,4,NOH,Chris Paul,Wake Forest,19,1252,42804,22097,5599,11769,0.471,0.369,0.87,34.2,17.6,4.5,9.4,208.2,0.233,7.0,97.1,Draft_2005,4,7,0,0,0,0,0,0,0
4935,4935,2,2,DET,Isiah Thomas,Indiana,13,979,35516,18822,3478,9061,0.452,0.29,0.759,36.3,19.2,3.6,9.3,80.7,0.109,2.6,41.6,Draft_1981,10,1,2,1,0,0,0,0,1
6214,6214,7,7,CLE,Kevin Johnson,California,12,735,25061,13127,2404,6711,0.493,0.305,0.841,34.1,17.9,3.3,9.1,92.8,0.178,3.9,37.3,Draft_1987,1,2,0,0,0,0,0,0,0
7679,7679,1,1,WAS,John Wall,Kentucky,11,647,22588,12088,2704,5735,0.43,0.322,0.776,34.9,18.7,4.2,8.9,44.5,0.094,2.2,24.1,Draft_2010,1,3,0,0,0,0,0,0,0
8300,8300,12,12,SAC,Tyrese Haliburton,Iowa State,4,242,7958,4155,883,2102,0.48,0.399,0.854,32.9,17.2,3.6,8.7,25.0,0.151,4.4,12.8,Draft_2020,1,1,0,0,0,0,0,0,0
6728,6728,2,2,DAL,Jason Kidd,California,19,1391,50111,17529,8725,12091,0.4,0.349,0.785,36.0,12.6,6.3,8.7,138.6,0.133,3.8,73.5,Draft_1994,5,4,1,0,0,0,0,0,1


We got the right data. For example:

- Robert Parish holds the record for the most games played - 1611 games.
- Kareem Abdul-Jabbar tops the list for most minutes played - 57 446 minutes.
- LeBron James has the record for most points scored, with over 40 000 points. This record gets updated regularly.
- Michael Jordan leads in points per game with an average of 30.1.
- John Stockton tops the charts for most assists with 15 806 passes.
The data's solid, so we can start our analysis.

In [17]:
# Creating a pivot table to rearrange data based on 'Overall Pick' and 'Draft'
df_nba_draft_picks_players = df_all_info_players.pivot_table(index='Overall Pick', columns='Draft', values='Player', aggfunc='first').reset_index()

# Converting 'Overall Pick' to numeric data type and removing rows with 'Totals'
df_nba_draft_picks_players = df_nba_draft_picks_players[df_nba_draft_picks_players['Overall Pick'].astype(str).str.len() <= 3].astype({'Overall Pick': int})

# Sorting the DataFrame by 'Overall Pick'
df_nba_draft_picks_players = df_nba_draft_picks_players.sort_values('Overall Pick').reset_index(drop=True)

# Clustering.

*I'm going to use KMeans clustering. The
point of KMeans is to split up your data into a set number of clusters (usually called "k"). It sets up "k" centroids (basically, the center of each cluster) in your data space, then assigns each data point to the nearest centroid. After that, it recalculates those centroids based on the average of the points in each cluster, and keeps doing that until the clusters settle down.
This will help me group players objectively into different categories. Then, by diving into those clusters, I'll figure out what a draft pick is really worth on average in today's NBA. That'll give us a good idea of whether draft pick compensation is fair in today's NBA market.*

## Getting Ready for Clustering.

*I'm getting the data ready for more analysis and clustering by standardizing the numerical data, cleaning out any errors, and fixing up any potential mess caused by wrong values.*

In [18]:
# Filtering data, keeping only rows where 'Overall Pick' is not equal to 'Totals'
df_all_info_players = df_all_info_players[df_all_info_players['Overall Pick'] != 'Totals']

# Removing rows where the length of values in the 'Overall Pick' column exceeds 3
df_all_info_players = df_all_info_players[df_all_info_players['Overall Pick'].astype(str).apply(len) <= 3]

# Dropping duplicate rows based on 'Player' and 'College' columns
df_all_info_players.drop_duplicates(subset=['Player', 'College'], keep=False, inplace=True)

# Keeping only the rows with the minimum 'Overall Pick' value for each unique combination of 'Player' and 'College'
df_all_info_players = df_all_info_players[df_all_info_players.groupby(['Player', 'College'])['Overall Pick'].transform(min) == df_all_info_players['Overall Pick']]

## Clustering

In [19]:
# Filtering data
df_all_info_players_until_2020 = df_all_info_players[df_all_info_players['Draft'].isin(['Draft_1950', 'Draft_1951', 'Draft_1952', 'Draft_1953', 'Draft_1954', 'Draft_1955', 'Draft_1956', 'Draft_1957', 'Draft_1958', 'Draft_1959', 'Draft_1960', 'Draft_1961', 'Draft_1962', 'Draft_1963', 'Draft_1964', 'Draft_1965', 'Draft_1966', 'Draft_1967', 'Draft_1968', 'Draft_1969', 'Draft_1970', 'Draft_1971', 'Draft_1972', 'Draft_1973', 'Draft_1974', 'Draft_1975', 'Draft_1976', 'Draft_1977', 'Draft_1978', 'Draft_1979', 'Draft_1980', 'Draft_1981', 'Draft_1982', 'Draft_1983', 'Draft_1984', 'Draft_1985', 'Draft_1986', 'Draft_1987', 'Draft_1988', 'Draft_1989', 'Draft_1990', 'Draft_1991', 'Draft_1992', 'Draft_1993', 'Draft_1994', 'Draft_1995', 'Draft_1996', 'Draft_1997', 'Draft_1998', 'Draft_1999', 'Draft_2000', 'Draft_2001', 'Draft_2002', 'Draft_2003', 'Draft_2004', 'Draft_2005', 'Draft_2006', 'Draft_2007', 'Draft_2008', 'Draft_2009', 'Draft_2010', 'Draft_2011', 'Draft_2012', 'Draft_2013', 'Draft_2014', 'Draft_2015', 'Draft_2016', 'Draft_2017', 'Draft_2018', 'Draft_2019'])]

# Resetting indexes and removing duplicates
df_all_info_players_until_2020.reset_index(drop=True, inplace=True)
df_all_info_players_until_2020.drop_duplicates(inplace=True)

In [20]:
def kmeans_clustering(features, n_clusters, df):
    
    """
    kmeans_clustering - Perform KMeans clustering on the given features.

    Parameters:
        features (pandas.DataFrame): DataFrame containing the features used for clustering.
        n_clusters (int): Number of clusters to form as well as the number of centroids to generate.
        df (pandas.DataFrame): Original DataFrame containing the data.

    Returns:
        tuple: A tuple containing three elements:
            - pandas.DataFrame: DataFrame containing the centroids of the clusters.
            - pandas.DataFrame: DataFrame with cluster labels added to the original DataFrame.
            - pandas.Series: Series containing the count of players for each cluster.

    This function performs KMeans clustering on the given features using the specified number of clusters.
    It creates a KMeans object, fits it to the features, and retrieves the centroids of the clusters.
    The function then adds cluster labels to the original DataFrame and sorts the centroids DataFrame by the mean values.
    Additionally, it counts the number of players for each cluster and returns the sorted centroids DataFrame,
    the DataFrame with cluster labels, and the count of players for each cluster.

    Example:
        centroids, clustered_df, player_counts = kmeans_clustering(features, 5, df)
    """

    # Creating a KMeans object
    kmeans = KMeans(n_clusters=n_clusters)
    
    # Applying clustering to the features
    kmeans.fit(features)
    
    # Getting the centroids of the clusters
    centroids_df_n_clusters = pd.DataFrame(kmeans.cluster_centers_, columns=features.columns)

    # Getting cluster labels for each row in the original DataFrame
    df_kmeans = df.copy()
    df_kmeans['cluster'] = kmeans.labels_

    # Sorting DataFrame by the mean values in each cluster
    first_column_name = features.columns[0]
    centroids_df_n_clusters = centroids_df_n_clusters.sort_values(first_column_name, ascending=False)

    # Counting the number of players for each cluster
    count_players_for_cluster = df_kmeans.groupby('cluster').size()
    
    return centroids_df_n_clusters, df_kmeans, count_players_for_cluster

In [21]:
# Selecting the desired features
features = df_all_info_players_until_2020[[
    'PTS', 'TRB', 'AST', 'MP_per_game', 
    'PTS_per_game', 'TRB_per_game', 'AST_per_game', 
    'Starter_sum', 'Reserve_sum', 'Ring_Count', 
    'MVP_final', 'MVP', 'DPOY', 'Teammate of the Year', 
    'ROY', 'Hall of Fame'
]]

In [22]:
# Calling the function kmeans_clustering to perform k-means clustering.
# The result is assigned to centroids_df_5_clusters, df_kmeans_5_clusters, and count_players_for_cluster_5 variables.
centroids_df_5_clusters, df_kmeans_5_clusters, count_players_for_cluster_5 = kmeans_clustering(
    features=features,  # Features used for clustering
    n_clusters=5,  # Number of clusters
    df=df_all_info_players_until_2020  # DataFrame containing the data
)

I've experimented with clustering from n = 4 to n = 10. 
Here's what I found:
- There's always a group hovering around 0 (folks who haven't hit the NBA courts much, probably just a single season). This bunch ranges from 6156 players (n=10) to 6603 players (n=4). With a total of 8001 players in the mix as of March 10, 2024, this is hands down the biggest bunch.
- Then, there's always a group that's like the "all-time" squad. It spans from 4 players (n=8) to 102 players (n=4).
- The rest of the players are divided up pretty evenly among n-2 groups.
  
So, based on this, I figure clustering into 5 groups is the way to go:
- The "all-time" crew should be somewhere in the ballpark of 50-60 players. Remember the NBA's 75th anniversary in 2023? They rolled out a list of the top 75 players ever. About 50 of those were pretty much locked in, while the other 25 sparked some debate. So, a group of 50 players sounds about right to me.
- A group of players who haven't seen NBA action or have only logged a few minutes would tally up to 6481 players in that scenario, which doesn't fall into either extreme.
  
One downside though is that this method doesn't really break down the clusters in fine detail which could be necessary for certain studies. But for now, I'm good with the n=5 clustering. Another hitch with this approach is that it tends to overemphasize cumulative stats (like total points, rebounds, assists), while important stuff like championship wins doesn't carry much weight. That's why iconic players like Earvin "Magic" Johnson end up getting lost in the shuffle. We'll fix that by using weighted clustering.

## Weighted clustering

In [23]:
# Determining weights
weighted_feature_weights_dict = {
    'PTS': 1, 
    'TRB': 0.2, 
    'AST': 0.2, 
    'MP_per_game': 0.8, 
    'PTS_per_game': 1, 
    'TRB_per_game': 0.8, 
    'AST_per_game': 0.8, 
    'Starter_sum': 750, 
    'Reserve_sum': 550, 
    'Ring_Count': 1000, 
    'MVP_final': 2000, 
    'MVP': 1500, 
    'DPOY': 900, 
    'Teammate of the Year': 50, 
    'ROY': 250, 
    'Hall of Fame': 2500
}

In [24]:
def weighted_kmeans_clustering(features_weights_dict, n_clusters, df, random_state):
    
    """
    weighted_kmeans_clustering - Perform weighted KMeans clustering on the given features.

    Parameters:
        features_weights_dict (dict): Dictionary containing feature names as keys and their respective weights as values.
        n_clusters (int): Number of clusters to form as well as the number of centroids to generate.
        df (pandas.DataFrame): Original DataFrame containing the data.
        random_state (int): Random seed for reproducibility.

    Returns:
        tuple: A tuple containing three elements:
            - pandas.DataFrame: DataFrame containing the centroids of the clusters.
            - pandas.DataFrame: DataFrame with cluster labels added to the original DataFrame.
            - pandas.DataFrame: DataFrame containing the count of players for each cluster.

    This function performs weighted KMeans clustering on the given features using the specified number of clusters.
    It creates a DataFrame from the features weights dictionary, multiplies features by their respective weights,
    and initializes KMeans clustering with the weighted features.
    The function then adds cluster labels to the original DataFrame, retrieves the centroids of the clusters,
    and counts the number of players for each cluster.
    Finally, it returns the DataFrame with centroids, the DataFrame with cluster labels,
    and the DataFrame containing the count of players for each cluster.

    Example:
        centroids, clustered_df, player_counts = weighted_kmeans_clustering(features_weights_dict, 5, df, random_state=42)
    """

    # Create DataFrame from the features weights dictionary
    features_df = pd.DataFrame.from_dict(features_weights_dict, orient='index', columns=['weight'])
    
    # Multiply features by their respective weights
    weighted_features = df[features_df.index].mul(features_df['weight'])

    # Initialize KMeans clustering with weighted features
    weighted_kmeans = KMeans(n_clusters=n_clusters, random_state=random_state)
    weighted_kmeans.fit(weighted_features)

    # Create a copy of the DataFrame with cluster labels
    df_weighted_kmeans = df.copy()
    df_weighted_kmeans['cluster'] = weighted_kmeans.labels_

    # Get centroids of clusters
    centroids_weighted = weighted_kmeans.cluster_centers_
    centroids_df_weighted = pd.DataFrame(centroids_weighted, columns=weighted_features.columns).reset_index()
    centroids_df_weighted.columns = ['cluster', *weighted_features.columns]

    # Count the number of players in each cluster
    count_players_for_cluster = df_weighted_kmeans['cluster'].value_counts().sort_index().reset_index()
    count_players_for_cluster.columns = ['cluster', 'count_players_for_cluster']

    # Apply the division to each column
    for feature, weight in weighted_feature_weights_dict.items():
        centroids_df_weighted[feature] /= weight

    return centroids_df_weighted, df_weighted_kmeans, count_players_for_cluster

In [25]:
# Calling the function weighted_kmeans_clustering to perform weighted k-means clustering.
# The result is assigned to centroids_df_weighted_5, df_all_info_players_weighted_kmeans_5_clusters,
# and count_players_for_cluster_5_weighted_5 variables.
centroids_df_weighted_5, df_all_info_players_weighted_kmeans_5_clusters, count_players_for_cluster_5_weighted_5 = weighted_kmeans_clustering(
    features_weights_dict=weighted_feature_weights_dict,  # Dictionary containing feature weights
    n_clusters=5,  # Number of clusters
    df=df_all_info_players_until_2020,  # DataFrame containing the data
    random_state=23  # Random state for reproducibility
)

In [26]:
count_players_for_cluster_5_weighted_5

Unnamed: 0,cluster,count_players_for_cluster
0,0,6028
1,1,232
2,2,752
3,3,42
4,4,408


In [27]:
# Finding the maximum value of PTS per game among the clusters.
max_pts_per_game = centroids_df_weighted_5['PTS_per_game'].max()

# Extracting PTS per game values from centroids DataFrame.
pts_per_game_values = centroids_df_weighted_5['PTS_per_game']

# Creating a copy of centroids DataFrame to preserve original data.
centroids_df_weighted_5_copy = centroids_df_weighted_5.copy()

# Looping through each row in the centroids DataFrame to assign cluster labels based on PTS per game values.
for index, row in centroids_df_weighted_5.iterrows():
    # Assigning cluster label "All-time NBA players" to the cluster with the highest PTS per game.
    if row['PTS_per_game'] == max_pts_per_game:
        centroids_df_weighted_5.at[index, 'cluster'] = "All-time NBA players"
    # Assigning cluster label "Superstar" to the cluster with the second-highest PTS per game.
    elif row['PTS_per_game'] == pts_per_game_values.nlargest(2).iloc[-1]:
        centroids_df_weighted_5.at[index, 'cluster'] = "Superstar"
    # Assigning cluster label "Role-playing players" to the cluster with the third-highest PTS per game.
    elif row['PTS_per_game'] == pts_per_game_values.nlargest(3).iloc[-1]:
        centroids_df_weighted_5.at[index, 'cluster'] = "Role-playing players"
    # Assigning cluster label "Reserve players" to the cluster with the fourth-highest PTS per game.
    elif row['PTS_per_game'] == pts_per_game_values.nlargest(4).iloc[-1]:
        centroids_df_weighted_5.at[index, 'cluster'] = "Reserve players"
    # Assigning cluster label "Players who have not played in the NBA" to other clusters.
    else:
        centroids_df_weighted_5.at[index, 'cluster'] = "Players who have not played in the NBA"

# Creating a mapping dictionary to map original clusters to new cluster labels.
cluster_mapping = dict(zip(centroids_df_weighted_5_copy['cluster'], centroids_df_weighted_5['cluster']))

cluster_mapping

{0: 'Players who have not played in the NBA',
 1: 'Superstar',
 2: 'Reserve players',
 3: 'All-time NBA players',
 4: 'Role-playing players'}

In [28]:
# Finding the key corresponding to the value 'All-time NBA players' in cluster_mapping
cluster_key_weighted = next(key for key, value in cluster_mapping.items() if value == 'All-time NBA players')

# Filtering the DataFrame to retrieve players belonging to the 'All-time NBA players' cluster
df_all_info_players_weighted_kmeans_5_clusters[df_all_info_players_weighted_kmeans_5_clusters['cluster'] == cluster_key_weighted]

Unnamed: 0,index,Rk_1,Overall Pick,Tm,Player,College,Yrs,G,MP,PTS,TRB,AST,FG%,3P%,FT%,MP_per_game,PTS_per_game,TRB_per_game,AST_per_game,WS,WS/48,BPM,VORP,Draft,Starter_sum,Reserve_sum,Ring_Count,MVP_final,MVP,DPOY,Teammate of the Year,ROY,Hall of Fame,cluster
428,499,5,2,MLH,Bob Pettit,LSU,11,792,30690,20880,12849,2369,0.436,0.0,0.761,38.8,26.4,16.2,3.0,136.0,0.213,0.0,0.0,Draft_1954,9,2,1,0,2,0,0,0,1,3
784,932,14,14,SYR,Hal Greer,Marshall,15,1122,39788,21586,5665,4540,0.452,0.0,0.801,35.5,19.2,5.0,4.0,102.7,0.124,0.0,0.0,Draft_1958,2,8,1,0,0,0,0,0,1,3
852,1022,1,3,PHW,Wilt Chamberlain,Kansas,14,1045,47859,31419,23924,4643,0.54,0.0,0.511,45.8,30.1,22.9,4.4,247.3,0.248,0.0,0.0,Draft_1959,9,4,2,1,4,0,0,0,1,3
929,1120,1,1,CIN,Oscar Robertson,Cincinnati,14,1040,43886,26710,7804,9887,0.485,0.0,0.838,42.2,25.7,7.5,9.5,189.2,0.207,-0.1,1.2,Draft_1960,10,2,1,0,1,0,0,0,1,3
930,1121,2,2,LAL,Jerry West,West Virginia,14,932,36571,25192,5366,6238,0.474,0.0,0.814,39.2,27.0,5.8,6.7,162.6,0.213,4.7,1.6,Draft_1960,11,1,1,1,0,0,0,0,1,3
1130,1370,9,9,BOS,John Havlicek,Ohio State,16,1270,46471,26395,8007,6114,0.439,0.0,0.815,36.6,20.8,6.3,4.8,131.7,0.136,1.1,11.5,Draft_1962,10,3,8,1,0,0,0,0,1,3
1743,2131,1,1,SDR,Elvin Hayes,Houston,16,1303,50000,27313,16279,2398,0.452,0.0,0.67,38.4,21.0,12.5,1.8,120.8,0.116,0.6,21.4,Draft_1968,4,8,1,0,0,0,0,0,1,3
1934,2365,1,1,MIL,Kareem Abdul-Jabbar,UCLA,20,1560,57446,38387,17440,5660,0.559,0.0,0.721,36.8,24.6,11.2,3.6,273.4,0.228,5.7,85.7,Draft_1969,13,5,6,2,6,0,0,0,1,3
3002,3601,40,40,PHO,George Gervin,Eastern Michigan,10,791,26536,20708,3607,2214,0.511,0.0,0.844,33.5,26.2,4.6,2.8,88.1,0.159,2.4,29.8,Draft_1974,7,2,0,0,0,0,0,0,1,3
3306,3935,6,6,BUF,Adrian Dantley,Notre Dame,15,955,34151,23177,5455,2830,0.54,0.0,0.818,35.8,24.3,5.7,3.0,134.2,0.189,3.1,43.8,Draft_1976,5,1,1,0,0,0,0,0,1,3


In [29]:
count_players_for_cluster_5_weighted_5

Unnamed: 0,cluster,count_players_for_cluster
0,0,6028
1,1,232
2,2,752
3,3,42
4,4,408


In [30]:
centroids_df_weighted_5.sort_values(by='PTS', ascending=False)

Unnamed: 0,cluster,PTS,TRB,AST,MP_per_game,PTS_per_game,TRB_per_game,AST_per_game,Starter_sum,Reserve_sum,Ring_Count,MVP_final,MVP,DPOY,Teammate of the Year,ROY,Hall of Fame
3,All-time NBA players,26007.738095,8875.238095,5394.595238,35.840476,22.402381,7.533333,4.728571,7.047619,3.785714,1.904762,0.880952,1.071429,0.1428571,0.07142857,0.0,0.809524
1,Superstar,15109.211207,5525.25431,3211.939655,31.680603,16.415948,5.923276,3.403879,1.103448,2.0,0.818966,0.060345,0.086207,0.06896552,0.03017241,0.0,0.271552
4,Role-playing players,8818.142157,3744.52451,2004.693627,27.472059,12.245588,5.005637,2.738971,0.17402,0.504902,0.446078,0.004902,0.004902,0.02696078,0.00245098,0.0,0.039216
2,Reserve players,3976.643617,2007.515957,913.430851,21.679122,8.398803,4.046543,1.874601,0.015957,0.043883,0.325798,0.00133,0.00133,0.002659574,-2.220446e-18,0.0,0.005319
0,Players who have not played in the NBA,141.07996,77.410584,32.316191,3.188354,1.09008,0.574336,0.249486,0.001327,0.005806,0.03152,0.000166,0.000332,-1.835569e-16,6.52256e-17,0.0,0.001991


*Conclusion:*
As a result of our research, we conducted a weighted analysis of NBA players aiming to pinpoint the key characteristics influencing player clustering. Using the KMeans clustering method with weights, we identified five primary player clusters along with a separate cluster of players who didn't see action in NBA matches.

1. The first cluster, dubbed "All-time NBA players," comprises players boasting dominant performance across all metrics. These players practically ensure multiple championship victories and a decade or more of club success. Here are the average stats:
    - Total points: ~ 26,000
    - Total rebounds: ~ 8,876
    - Total assists: ~ 5,396
    - Average minutes played per game: ~ 35.84
    - Average points per game: ~ 22.40
    - Average rebounds per game: ~ 7.53
    - Average assists per game: ~ 4.73
    - All-Star Game starts: ~ 3.79
    - All-Star Game bench appearances: ~ 7.05
    - Number of championships (championship rings): ~ 3.79

    **Total number of players in this cluster: 42**
Additionally, these players have a near 100% probability of success (0.78 doesn't tell the whole story, as some players haven't yet finished their careers, thus the figure is conservative).

2. The second cluster, named "Superstar," consists of players with high stats across various categories, though not as dominant as those in the first cluster. These players usually participate multiple times in their careers and in the All-Star Game, possibly clinching a championship, but they aren't the ones ensuring a team's decade-long league dominance. Here are the average stats:
    - Total points: ~ 15,058
    - Total rebounds: ~ 5,510
    - Total assists: ~ 3,195
    - Average minutes played per game: ~ 31.64
    - Average points per game: ~ 16.37
    - Average rebounds per game: ~ 5.92
    - Average assists per game: ~ 3.39
    - All-Star Game starts: ~ 1.08
    - All-Star Game bench appearances: ~ 2.00
    - Number of championships (championship rings): ~ 0.82
   
   **Total number of players in this cluster: 235**
3. The third cluster, labeled "Role-playing players," consists of players with more modest statistical performances. These are players who typically logged significant minutes but didn't have a major impact on the core statistics. Usually, this suggests that the player was actively involved in the team's grunt work, contributed on the court (thus logging ample playing time and helping the team secure championships), but wasn't the primary player on the court:
    - Total points: ~ 8,789
    - Total rebounds: ~ 3,737
    - Total assists: ~ 2,003
    - Average minutes played per game: ~ 27.46
    - Average points per game: ~ 12.23
    - Average rebounds per game: ~ 5.00
    - Average assists per game: ~ 2.74
    - All-Star Game starts: ~ 0.18
    - All-Star Game bench appearances: ~ 0.49
      
    **Number of championships (championship rings): ~ 0.44**
4. The fourth cluster, "Reserve players," includes players whose statistical performance is lower than players in the previous clusters. These are players who averaged only 21.5 minutes on the court, which corresponds to the minutes of a player who serves as a reserve:
    - Total points: ~ 3,968
    - Total rebounds: ~ 2,005
    - Total assists: ~ 910
    - Average minutes played per game: ~ 21.67
    - Average points per game: ~ 8.39
    - Average rebounds per game: ~ 4.05
    - Average assists per game: ~ 1.87
    - All-Star Game starts: ~ 0.02
    - All-Star Game bench appearances: ~ 0.04
    - Number of championships (championship rings): ~ 0.32
    
   **Total number of players in this cluster: 753**

5. The fifth cluster basically consists of players who haven't seen any NBA action. Quite a few players haven't played a single game in the NBA. The rest have only played a few games, in which they couldn't demonstrate their ability to compete at the highest level of basketball:
    - Total points: ~ 140
    - Total rebounds: ~ 77
    - Total assists: ~ 32
    - Average minutes played per game: ~ 3.2
    - Average points per game: ~ 1
    - Average rebounds per game: ~ 0.6
    - Average assists per game: ~ 0.25
    - All-Star Game starts: ~ 0.001
    - All-Star Game bench appearances: ~ 0.006
    - Number of championships (championship rings): ~ 0.031
      
    **Total number of players in this cluster: 6027**
The proposed clusters align with observations of the NBA over a span of 15 years. These clusters will be further utilized to calculate the probability of selecting a player corresponding to a specific cluster under a given pick.

# Draft Pick Value Analysis

*First off, let's set up a table to check out the value of each pick individually. I'll group all the data by each pick and gather intel on the total number of players in each cluster accordingly.*

## The table shows the likelihood of picking a player from a specific cluster with each draft pick.

In [31]:
# Grouping the data by Overall Pick and cluster
grouped_data = df_all_info_players_weighted_kmeans_5_clusters.groupby(['Overall Pick', 'cluster']).size().unstack(fill_value=0)

# Calculating the total number of players for each pick
overall_counts = df_all_info_players_weighted_kmeans_5_clusters['Overall Pick'].value_counts()

# Merging data for each pick
cluster_distribution_by_pick = overall_counts.to_frame(name='Total Players').merge(grouped_data, left_index=True, right_index=True, how='left').fillna(0).reset_index()

# Filtering out rows with Overall Pick equal to 0
cluster_distribution_by_pick = cluster_distribution_by_pick[cluster_distribution_by_pick['Overall Pick'] != 0]

# Renaming columns using cluster_mapping
cluster_distribution_by_pick = cluster_distribution_by_pick.rename(columns=cluster_mapping)

# Adding probability columns to the cluster_distribution_by_pick table
prob_columns = ['All-time NBA players', 'Superstar', 'Role-playing players', 'Reserve players', 'Players who have not played in the NBA']
for column in prob_columns:
    cluster_distribution_by_pick[f'Probability_{column}'] = cluster_distribution_by_pick[column] / cluster_distribution_by_pick['Total Players'] * 100

# Converting Overall Pick to integer type and sorting by Overall Pick
cluster_distribution_by_pick['Overall Pick'] = cluster_distribution_by_pick['Overall Pick'].astype(int)
cluster_distribution_by_pick = cluster_distribution_by_pick.sort_values(by='Overall Pick')

# Reordering columns as specified
new_order_columns_for_distribution_by_pic = [
    'Overall Pick', 
    'Total Players', 
    'All-time NBA players', 
    'Probability_All-time NBA players', 
    'Superstar', 
    'Probability_Superstar', 
    'Role-playing players', 
    'Probability_Role-playing players', 
    'Reserve players', 
    'Probability_Reserve players', 
    'Players who have not played in the NBA', 
    'Probability_Players who have not played in the NBA'
]

cluster_distribution_by_pick = cluster_distribution_by_pick[new_order_columns_for_distribution_by_pic]

# Resetting all indexes and setting index again
cluster_distribution_by_pick.reset_index(drop=True, inplace=True)

cluster_distribution_by_pick

Unnamed: 0,Overall Pick,Total Players,All-time NBA players,Probability_All-time NBA players,Superstar,Probability_Superstar,Role-playing players,Probability_Role-playing players,Reserve players,Probability_Reserve players,Players who have not played in the NBA,Probability_Players who have not played in the NBA
0,1,68,10,14.705882,21,30.882353,13,19.117647,16,23.529412,8,11.764706
1,2,70,5,7.142857,18,25.714286,20,28.571429,19,27.142857,8,11.428571
2,3,69,5,7.246377,17,24.637681,22,31.884058,19,27.536232,6,8.695652
3,4,70,2,2.857143,19,27.142857,17,24.285714,19,27.142857,13,18.571429
4,5,69,5,7.246377,15,21.73913,15,21.73913,20,28.985507,14,20.289855
5,6,67,2,2.985075,11,16.41791,16,23.880597,23,34.328358,15,22.38806
6,7,70,1,1.428571,9,12.857143,23,32.857143,23,32.857143,14,20.0
7,8,66,1,1.515152,7,10.606061,21,31.818182,18,27.272727,19,28.787879
8,9,69,3,4.347826,12,17.391304,8,11.594203,23,33.333333,23,33.333333
9,10,69,1,1.449275,14,20.289855,11,15.942029,21,30.434783,22,31.884058


*Conclusion:* Looking at the table, it's clear that the first pick holds the most value. No other pick even comes close in terms of value to the top spot: there's nearly a 15% chance of nabbing an All-time NBA player and almost a 31% shot at snagging a "Superstar."
Moving down the line, picks 2-10 show some promise, with a decently tight range: there's about a 1.5 - 7% chance of landing an All-time NBA player and a 10-25% shot at getting a superstar. But here's the kicker: picks 2-10 also come with a big risk of being a bust, ranging from 11% of players who never played in the NBA (2nd pick) to over 30% (10th pick). Basically, teams are targeting players with similar talent levels in this range, meaning their potential ceilings are alike. However, the lower the pick, the riskier the choice, increasing the chances of not reaching their full potential.
Once you hit the 11th pick, the odds of getting an All-time NBA player or a superstar take a nosedive. At this point, players usually end up as Role-playing or Reserve players. And by the time you reach the 31st pick (typically the second round, meaning the second pick for each team), the chances of getting such players are slim to none: less than 10% for Role-playing and less than 15% for Reserve players.

*Since trades involve unspecified picks, like a superstar being traded for 5 first-round picks, relying solely on the table showing the "value" of each individual pick won't help understand such deals. Let's whip up another table detailing the value of picks in the 1st round/2nd round. The 1st round pick encompasses selections 1-30, while the 2nd round pick covers picks 31-60. To get a better grasp, let's break down the value of picks by decades.*

## The table displays the likelihood of picking a player from a specific cluster, broken down by decades.

In [32]:
# Converting the 'Draft' column to datetime format for ease of working with decades
df_all_info_players_weighted_kmeans_5_clusters['Draft'] = pd.to_datetime(df_all_info_players_weighted_kmeans_5_clusters['Draft'].str.split('_').str[1], format='%Y')
df_all_info_players_weighted_kmeans_5_clusters['Decade'] = df_all_info_players_weighted_kmeans_5_clusters['Draft'].dt.year // 10 * 10
df_all_info_players_weighted_kmeans_5_clusters['Overall Pick'] = df_all_info_players_weighted_kmeans_5_clusters['Overall Pick'].astype(int)

*First off, we'll whip up a table showing the likelihood of picking a player from a particular cluster under each first-round pick. Then, we'll dive into another table laying out the chances of snagging a player from a specific cluster under each second-round pick.*

In [33]:
# Filtering the data
filtered_data = df_all_info_players_weighted_kmeans_5_clusters[df_all_info_players_weighted_kmeans_5_clusters['Overall Pick'] <= 30]

# Grouping the data by decades and overall picks
grouped = filtered_data.groupby(['Decade', 'Overall Pick'])

# Counting the number of players with each cluster for each pick
cluster_counts_for_first_pick = filtered_data.groupby(['Decade', 'Overall Pick', 'cluster']).size().unstack(fill_value=0)

# Adding a column with the sum of values for each decade
cluster_counts_for_first_pick['Total'] = cluster_counts_for_first_pick.sum(axis=1)

# Grouped data by decades and sum of cluster values for each decade
cluster_counts_first_round_pick = cluster_counts_for_first_pick.groupby(level=0).sum().reset_index()

# Renaming columns
cluster_counts_first_round_pick = cluster_counts_first_round_pick.rename(columns=cluster_mapping)

# Adding probability columns to the table
prob_columns_for_5_clusters_with_All_time = ['All-time NBA players', 'Superstar', 'Role-playing players', 'Reserve players', 'Players who have not played in the NBA']
for column in prob_columns_for_5_clusters_with_All_time:
    cluster_counts_first_round_pick[f'Probability_{column}'] = cluster_counts_first_round_pick[column] / cluster_counts_first_round_pick['Total'] * 100

# Reordering columns in the specified order
new_order_for_5_clusters_with_All_time = [
    'Decade', 'Total', 'All-time NBA players', 'Probability_All-time NBA players', 
    'Superstar', 'Probability_Superstar', 'Role-playing players', 'Probability_Role-playing players', 
    'Reserve players', 'Probability_Reserve players', 'Players who have not played in the NBA', 
    'Probability_Players who have not played in the NBA'
]
cluster_counts_first_round_pick = cluster_counts_first_round_pick[new_order_for_5_clusters_with_All_time]

cluster_counts_first_round_pick

cluster,Decade,Total,All-time NBA players,Probability_All-time NBA players,Superstar,Probability_Superstar,Role-playing players,Probability_Role-playing players,Reserve players,Probability_Reserve players,Players who have not played in the NBA,Probability_Players who have not played in the NBA
0,1950,286,3,1.048951,14,4.895105,14,4.895105,37,12.937063,218,76.223776
1,1960,277,5,1.805054,21,7.581227,30,10.830325,48,17.32852,173,62.454874
2,1970,291,5,1.718213,31,10.652921,43,14.776632,82,28.178694,130,44.67354
3,1980,298,9,3.020134,50,16.778523,59,19.798658,70,23.489933,110,36.912752
4,1990,299,10,3.344482,40,13.377926,61,20.401338,88,29.431438,100,33.444816
5,2000,298,9,3.020134,32,10.738255,69,23.154362,90,30.201342,98,32.885906
6,2010,300,0,0.0,22,7.333333,52,17.333333,117,39.0,109,36.333333


Key Takeaways:
- There are no players from the "All-time NBA players" cluster in the 2010s decade. This might be because most of these players' careers aren't over yet, so it's premature to assess this decade.
- The chances of picking players from the "Players who have not played in the NBA" cluster decline (while the likelihood of selecting players from the "Role-playing players" and "Reserve players" clusters moderately rises) with each decade. This suggests teams are getting more strategic in their draft picks.
- The likelihood of selecting players from the "All-time NBA players" and "Superstar" clusters remains consistent. It hovers around 1-3% for the former and 10-15% for the latter.

Consequently, the odds of snagging players from these clusters in the first round of the NBA draft range from 11-20%. So, if one NBA team is offering another team a trade involving a "Superstar" player, a fair deal would demand at least 5 first-round picks, but ideally, compensation should be more in the ballpark of 7-8 first-round picks.

In [34]:
# Filtering data for the second round
df_second_round = df_all_info_players_weighted_kmeans_5_clusters[(df_all_info_players_weighted_kmeans_5_clusters['Overall Pick'] >= 31) & (df_all_info_players_weighted_kmeans_5_clusters['Overall Pick'] <= 60)]

# Grouping data by decades and picks in the second round
grouped_second_round_pick = df_second_round.groupby(['Decade', 'Overall Pick'])

# Counting the number of players with each cluster for each pick in the second round
cluster_counts_for_second_round_pick = df_second_round.groupby(['Decade', 'Overall Pick', 'cluster']).size().unstack(fill_value=0)

# Adding a column with the sum of values for each decade in the second round
cluster_counts_for_second_round_pick['Total'] = cluster_counts_for_second_round_pick.sum(axis=1)

# Summing cluster values for each decade in the second round
cluster_counts_second_round_pick = cluster_counts_for_second_round_pick.groupby(level=0).sum().reset_index()

# Getting the key for the 'All-time NBA players' cluster
key_for_all_time_players = [key for key, value in cluster_mapping.items() if value == 'All-time NBA players']

# Checking if there is a column corresponding to the 'All-time NBA players' players in the table
column_name = key_for_all_time_players[0]

if column_name in cluster_counts_second_round_pick.columns:
    # Code for the case when the 'All-time NBA players' column exists in the table
    cluster_counts_second_round_pick = cluster_counts_second_round_pick.rename(columns=cluster_mapping)
    
    # Adding probability columns to the table
    for column in prob_columns_for_5_clusters_with_All_time:
        cluster_counts_second_round_pick[f'Probability_{column}'] = cluster_counts_second_round_pick[column] / cluster_counts_second_round_pick['Total'] * 100
    
    # Reordering columns in the specified order in the second round
    cluster_counts_second_round_pick = cluster_counts_second_round_pick[new_order_for_5_clusters_with_All_time]
    
else:
    # First, get the list of columns present in the cluster_counts_second_round_pick table
    available_columns = cluster_counts_second_round_pick.columns

    # Next, create a dictionary for renaming only existing columns
    rename_mapping_for_5_clusters_without_All_time = {column: cluster_mapping[column] for column in available_columns if column in cluster_mapping}

    # Rename only the necessary columns
    cluster_counts_second_round_pick = cluster_counts_second_round_pick.rename(columns=rename_mapping_for_5_clusters_without_All_time)
    
    # Adding probability columns to the table
    prob_columns_for_5_clusters_without_All_time = ['Superstar', 'Role-playing players', 'Reserve players', 'Players who have not played in the NBA']
    
    for column in prob_columns_for_5_clusters_without_All_time:
        cluster_counts_second_round_pick[f'Probability_{column}'] = cluster_counts_second_round_pick[column] / cluster_counts_second_round_pick['Total'] * 100

    new_order_for_5_clusters_with_All_time = [
        'Decade', 
        'Total', 
        'Superstar', 
        'Probability_Superstar', 
        'Role-playing players', 
        'Probability_Role-playing players', 
        'Reserve players', 
        'Probability_Reserve players', 
        'Players who have not played in the NBA', 
        'Probability_Players who have not played in the NBA'
    ]
    
    # Reordering columns in the specified order in the second round
    cluster_counts_second_round_pick = cluster_counts_second_round_pick[new_order_for_5_clusters_with_All_time]
    
cluster_counts_second_round_pick


cluster,Decade,Total,All-time NBA players,Probability_All-time NBA players,Superstar,Probability_Superstar,Role-playing players,Probability_Role-playing players,Reserve players,Probability_Reserve players,Players who have not played in the NBA,Probability_Players who have not played in the NBA
0,1950,285,0,0.0,0,0.0,1,0.350877,3,1.052632,281,98.596491
1,1960,267,0,0.0,2,0.749064,5,1.872659,6,2.247191,254,95.131086
2,1970,283,1,0.353357,3,1.060071,10,3.533569,35,12.367491,234,82.685512
3,1980,292,0,0.0,2,0.684932,16,5.479452,27,9.246575,247,84.589041
4,1990,260,0,0.0,4,1.538462,8,3.076923,30,11.538462,218,83.846154
5,2000,290,0,0.0,6,2.068966,17,5.862069,41,14.137931,226,77.931034
6,2010,300,0,0.0,2,0.666667,12,4.0,39,13.0,247,82.333333


The value of a second-round pick is extremely low. Consistently, over 80% of players don't make it to the NBA. The likelihood of picking an All-time NBA player is virtually zero. It would be considered fortunate to select a player from the Reserve players cluster (probability around 10%).

*Let's check out this unique situation: a player from the 'All-time NBA players' cluster was selected in the second round.*

In [35]:
# Finding the key for the value 'All-time NBA players' in cluster_mapping
cluster_key = next(key for key, value in cluster_mapping.items() if value == 'All-time NBA players')

# Finding rows where the player is an All-time NBA player and the pick is in the range 31-60
df_all_info_players_weighted_kmeans_5_clusters[(df_all_info_players_weighted_kmeans_5_clusters['Overall Pick'] >= 31) & 
                                                (df_all_info_players_weighted_kmeans_5_clusters['Overall Pick'] <= 60) & 
                                                (df_all_info_players_weighted_kmeans_5_clusters['cluster'] == cluster_key)]

Unnamed: 0,index,Rk_1,Overall Pick,Tm,Player,College,Yrs,G,MP,PTS,TRB,AST,FG%,3P%,FT%,MP_per_game,PTS_per_game,TRB_per_game,AST_per_game,WS,WS/48,BPM,VORP,Draft,Starter_sum,Reserve_sum,Ring_Count,MVP_final,MVP,DPOY,Teammate of the Year,ROY,Hall of Fame,cluster,Decade
3002,3601,40,40,PHO,George Gervin,Eastern Michigan,10,791,26536,20708,3607,2214,0.511,0,0.844,33.5,26.2,4.6,2.8,88.1,0.159,2.4,29.8,1974-01-01,7,2,0,0,0,0,0,0,1,3,1970


George Gervin is a basketball legend, a name that rings a bell for every fan out there.

George "The Iceman" Gervin, born on April 27, 1952, in Detroit, Michigan, is an American hoops star who made his mark in both the American Basketball Association (ABA) and the National Basketball Association (NBA) playing for the Virginia Squires, San Antonio Spurs, and Chicago Bulls. When the two leagues merged, he earned a spot on the ABA All-Time Team. He was inducted into the Basketball Hall of Fame in 1996 and named one of the 50 Greatest Players in NBA History. Fast forward to 2021, and Gervin still holds his own, named one of the 75 Greatest Players in NBA History. At the end of his NBA run, he stood tall as the all-time leader in blocks among guards.

In the 1974 NBA Draft, George Gervin got the nod in the third round as the 40th overall pick by the Phoenix Suns, although he never suited up for them.

Beyond his on-court prowess, Gervin was known for his swagger and style, both on and off the hardwood. They called him "The Iceman" for his cool demeanor on the court and his silky-smooth moves.

Despite racking up impressive personal accolades, George Gervin never snagged an NBA championship during his playing days. But make no mistake, his impact on the game is lasting, thanks to his skills, flair, and influence on future generations of ballers.

Our analysis checks out with valid results.

# General conclusions

Getting the results on player selection chances from various clusters across decades gave me a solid toolkit for making informed NBA team trades or at least deepening my understanding of them.

For instance, let's look at the 2021 trade where The Magic traded Nikola Vučević and Al-Farouq Aminu to the Chicago Bulls for Wendell Carter Jr., Otto Porter Jr., a 2021 first-round pick, and a 2023 first-round pick. At the time, Nikola Vučević was a decent player, but he definitely wasn't considered in the All-time player or Superstar cluster. For this type of player, the Magic acquired 2 talented players and two picks, giving them roughly a 30% chance of picking a player from the All-star player or Superstar cluster - a pretty sweet deal for the Magic.

Another big trade, this time in 2023, involved The Nets trading Kevin Durant to the Phoenix Suns for Mikal Bridges, Cameron Johnson, first-round picks in 2023, ’25, ’27, and ’29, a 2028 first-round pick swap, and second-round picks in 2028 and ’29. Kevin Durant is definitely a Superstar caliber player, and many fans already consider him an All-time player, likely to be remembered as such after his career is over. For a player of Durant's caliber, Brooklyn received 6 first-round picks, giving them a 90% chance of selecting a player from the All-time player or Superstar cluster, plus they got 2 players who are already solid. Taking context into account, like the player's desire to leave, contract length, player's age, and associated risks, the trade looks like a clear win for the Phoenix Suns.

Now, let's consider a trade that seems pretty balanced. The Portland Trail Blazers traded CJ McCollum, Larry Nance Jr., and Tony Snell to the New Orleans Pelicans for Nickeil Alexander-Walker, Josh Hart, Didi Louzda, Tomáš Satoranský, a 2022 protected first-round pick, a 2026 second-round pick, and New Orleans’ 2027 second-round pick. CJ McCollum isn't an All-time player or Superstar, but he's a solid player who could be the 3rd or 4th most important player on a strong team. Getting a first-round pick, with about a 15% chance of picking a player from the All-star player or Superstar cluster, looks like a pretty intriguing deal for both sides.

It's essential to understand that actions don't always lead to the expected statistical outcome (avoid the I-knew-it-all-along error). Here's a prime example: The Philadelphia 76ers traded away Fultz to Orlando, putting a lid on a very odd relationship with the former No. 1 pick. The Magic got a promising, if complicated, young guard in exchange for a 2020 top-20 protected first-round pick originally owned by the Thunder. On paper, it seemed like a solid move: trading an exceptionally talented player (No. 1 pick) for almost a second-round pick. However, that pick ended up being Tyrese Maxey when the 76ers used the 21st overall pick. Maxey has blossomed into one of the best guards as early as 2024 and made it to the All-Star game. While Fultz has shown improvement, he's nowhere near as good as Maxey, and Philadelphia now has the point guard they hoped to have drafted No. 1 overall with Fultz.

# Saving the table (if needed. If you're using code, specify the actual file path))

In [36]:
# Get the current date
current_date = datetime.datetime.now().strftime("%Y-%m-%d")

## Saving the table with all NBA draft picks and the players chosen under each of them (df_nba_draft_picks_players).

In [37]:
# Create a file name with the current date for Excel format
file_name_nba_draft_picks_players_excel = f"df_nba_draft_picks_players_{current_date}.xlsx"

# Create a file name with the current date for CSV format
file_name_nba_draft_picks_players_csv = f"df_nba_draft_picks_players_{current_date}.csv"

In [38]:
# Save the DataFrame to a CSV file with the specified file name and path. If you are using the code, please specify the actual file path
# df_nba_draft_picks_players.to_csv(f"/Users/andrejviflyancev/Desktop/{file_name_csv}", index=False)

## Saving the table showing the value of each pick (cluster_distribution_by_pick).

In [39]:
# Create a file name with the current date for Excel format
file_name_cluster_distribution_by_pick_excel = f"df_cluster_distribution_by_pick_{current_date}.xlsx"

# Create a file name with the current date for CSV format
file_name_cluster_distribution_by_pick_csv = f"df_cluster_distribution_by_pick_{current_date}.csv"

In [40]:
# Save the DataFrame to a CSV file with the specified file name and path. If you are using the code, please specify the actual file path
# df_nba_draft_picks_players.to_csv(f"/Users/andrejviflyancev/Desktop/{file_name_csv}", index=False)

## Saving the table with all players and cluster data (df_all_info_players_weighted_kmeans_5_clusters).

In [41]:
# Create a file name with the current date for Excel format
file_name_all_info_players_weighted_kmeans_5_clusters_excel = f"df_all_info_players_weighted_kmeans_5_clusters_{current_date}.xlsx"

# Create a file name with the current date for CSV format
file_name_all_info_players_weighted_kmeans_5_clusters_csv = f"df_all_info_players_weighted_kmeans_5_clusters_{current_date}.csv"

In [42]:
# Save the DataFrame to a CSV file with the specified file name and path. If you are using the code, please specify the actual file path
# df_all_info_players_weighted_kmeans_5_clusters.to_csv(f"/Users/andrejviflyancev/Desktop/{file_name_csv}", index=False)