STA 141B Project - A comparison of an NBA player's on court preformance against the expectations of their contract

1. Introduction
2. Data Acquisiton 
3. Data Process 
4. Approach
5. Data Analysis 
5b. General statistics 
5c. Data Analysis real 
6. Reports + visualization
7. Dicussion 
8. 

1. #### Introduction

The National Basketball Association (NBA) is the premiere American basketball league comprised of 30 teams from the United States and Canada. A player will receive larger compensation in their contract based on an assessment of skill and experience. As spectators, we generally assume that a larger salary indicates stronger aptitude and strength. However, this viewpoint heavily hinges on the individual while basketball wins depend upon the team as a whole. Such an assumption may not be a holistic signifier of player success but only on the individual level. The study examines the relationship between player statistics and salary for the 2023 to 2024 season to evaluate the validity of our assumption by taking into account team contribution metrics. Using visualizations of our findings, it will identify patterns and anomalies, and assess the implications of the results on team and the NBA's economics. By doing so, we take an analytical approach that aligns closer with the group-oriented objectives of professional basketball teams.

The data set, composed of 554 players, contains four metrics that measure team contribution per player: RAPTOR, LEBRON, Player Efficiency Rating, and Win Share. We selected team contribution metrics rather than individual ones to standardize indicators of performance across positions: as individual statistics such as blocks and rebounds may vary based on one's role on the court, they are not a fair overall evaluation of player accomplishment. 
* RAPTOR (**R**obust **A**lgorithm (using) **P**layer **T**racking (and) **O**n/Off **R**atings) is a measure of points that a player contributed per 100 possessions relative to a league average player. It takes Offensive and Defensive ratings. 
* LEBRON (**L**uck-adjusted player **E**stimate using a **B**ox prior **R**egularized **ON**-off) calculates the change in score when the player is on the court, stabilized to account for player position and variance. It takes Total, Offensive, and Defensive ratings. 
* Player Efficiency Rating (PER) measures a player's productivity per minute relative to a league average of 15.00, adjusted for pace. 
* Win Share assigns a number to a player based on their contributions to team wins during the season. 

As the NBA season progresses, player statistics are updated and available to the public via the web:
* Using [NBA API](https://rapidapi.com/api-sports/api/api-nba) we accessed annual player salaries.
* Courtesy of [FiveThirtyEight](https://neilpaine.substack.com/p/nba-estimated-raptor-player-ratings) Substack, we accessed a spreadsheet of estimated RAPTOR metrics - note that RAPTOR was discontinued by official reporting sources so we pulled from an independent editor within the same organization. 
* Basketball Index publishes seasonal LEBRON metrics on their [website](https://www.bball-index.com/lebron-database/), from which we also accessed a spreadsheet. 
* Lastly, we scraped PER and Win Share from Basketball Reference's Advanced Statistics [webpage](https://www.basketball-reference.com/leagues/NBA_2024_advanced.html).

In [84]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
from bokeh.io import output_notebook, show
from bokeh.plotting import figure
from bokeh.models import ColumnDataSource
from bokeh.transform import factor_cmap
from bokeh.palettes import Spectral6
from bokeh.models import ColumnDataSource, HoverTool, FixedTicker
import numpy as np

# This code will allow us to find the cap space for the 24 season for all NBA teams
pd.set_option('display.max_rows', None) 
pd.set_option('display.max_columns', None)  
pd.set_option('display.width', 1000) 

# URL of the page
url = 'https://www.spotrac.com/nba/cap/'

# Send a GET request to the page
response = requests.get(url)

# Parse the HTML content of the page
soup = BeautifulSoup(response.content, 'html.parser')

# Grab Table
table = soup.find('table', {'class': 'datatable'})

# Pandas dataframe conversion
df = pd.read_html(str(table))[0]

# Convert 'Total Cap' to integers
df['Total Cap'] = df['Total Cap'].replace('[\$,]', '', regex=True).astype(int)

# Prepare the data
hist, edges = np.histogram(df['Total Cap'], bins=range(125000000, df['Total Cap'].max() + 10000000, 10000000))

# Prepare the data
source = ColumnDataSource(data=dict(
    top=hist,
    left=edges[:-1],
    right=edges[1:]
))

# Create a new figure
p = figure(height=350, title="NBA Team Salary Caps 2023-2024",
           toolbar_location=None, tools="")

# Add a hover tool
hover = HoverTool(tooltips=[
    ("Count", "@top"),
    ("Bin Start", "@left"),
    ("Bin End", "@right")
])
p.add_tools(hover)

# Add a bar glyph to the figure
p.quad(top='top', bottom=0, left='left', right='right', source=source,
       line_color='white', fill_color=Spectral6[0])

p.xaxis.axis_label = "Team Salary" 
p.yaxis.axis_label = "Count"  

p.xaxis.ticker = FixedTicker(ticks=edges)  
p.xaxis.major_label_overrides = {tick: f"{int(tick):,}" for tick in edges} 
p.xaxis.major_label_orientation = np.pi / 4  

output_notebook()
show(p)

In [85]:
index = range(1, len(df) + 1)
df.index = index
newcolumns = [1, 2, 5]
df = df.iloc[:,newcolumns]
df = df.rename(columns={'Total Cap (USD)':'Total Cap'})
df

Unnamed: 0,Team,Win%,Total Cap
1,Utah Jazz,0.426,131557411
2,Detroit Pistons,0.176,136062671
3,Orlando Magic,0.594,139355784
4,Charlotte Hornets,0.246,141036011
5,San Antonio Spurs,0.217,142928288
6,Houston Rockets,0.485,146402009
7,Indiana Pacers,0.551,146429278
8,Sacramento Kings,0.582,149993550
9,Atlanta Hawks,0.441,156139798
10,Oklahoma City Thunder,0.701,160502705


In [91]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
from bokeh.models import HoverTool

# Fetch the webpage
url = "https://www.basketball-reference.com/contracts/players.html"
response = requests.get(url)

# Parse the HTML content
soup = BeautifulSoup(response.content, 'html.parser')

# Find the data table - you need to identify the correct table or data structure
# This is just an example; you'll need to adjust it based on the actual structure
table = soup.find('table', {'id': 'player-contracts'})

# Extract data and convert it to a DataFrame
# This part depends heavily on the structure of the table and the data you need
data = []
rows = table.find_all('tr')
for row in rows:
    cols = row.find_all('td')
    cols = [ele.text.strip() for ele in cols]
    data.append([ele for ele in cols if ele])

df2 = pd.DataFrame(data)
df2.replace('None', None, inplace=True)
df2.dropna(how='all', inplace=True)
headers = ["Player", "Team", "2023-24", "2024-25", "2025-26", "2026-27", "2027-28", "2028-29", "2029-30"]

df2.columns = headers

# Convert '2023-2024' to integers
df2['2023-24'] = df2['2023-24'].replace('[\$,]', '', regex=True).astype(int)

# Prepare the data
hist, edges = np.histogram(df2['2023-24'], bins=range(1000000, df2['2023-24'].max() + 4000000, 4000000))

# Prepare the data
source = ColumnDataSource(data=dict(
    top=hist,
    left=edges[:-1],
    right=edges[1:]
))

# Create a new figure
p = figure(height=350, title="NBA Team Salaries 2023-2024",
           toolbar_location=None, tools="")

# Add a hover tool
hover = HoverTool(tooltips=[
    ("Count", "@top"),
    ("Bin Start", "@left"),
    ("Bin End", "@right")
])
p.add_tools(hover)


p.xaxis.axis_label = "Annual Player Salary 2023-2024"  # Set x-axis label
p.yaxis.axis_label = "Count"  # Set y-axis label

# Add a bar glyph to the figure
p.quad(top='top', bottom=0, left='left', right='right', source=source,
       line_color='white', fill_color=Spectral6[0])

p.xaxis.axis_label = "Player Salary"  # Set x-axis label
p.yaxis.axis_label = "Count"  # Set y-axis label

p.xaxis.ticker = FixedTicker(ticks=edges)  # Set custom ticks
p.xaxis.major_label_overrides = {tick: f"{int(tick):,}" for tick in edges}  # Format tick labels as integers with comma separators
p.xaxis.major_label_orientation = np.pi / 4  # Rotate labels for better visibility

output_notebook()
show(p)


In [46]:
df2
index = range(1, len(df2) + 1)
df2.index = index
newcolumns = [0, 1, 2]
df2 = df2.iloc[:,newcolumns]

5. Data Analysis
One of the general correlation approach we can use to see if how much money teams are spending affects how often they win is to compare their yearly cap space with their win percentage. Logically, one would think that having more financial freedom would result in a better overall regular season preformance as it would mean that a team would b eable to allocate more financial resources torwards their players and coaching staff while also attracting more skilled players, hire better staff, and invest in better training facilities and support for their team. These investments could also translate to improved team synergy and player preformance which would result in a higher winning percentage. The chart below graphs regular season winning percentage so far this season against a team's total salary cap, with a ordinary least squared regression model. 

In [89]:
from bokeh.models import HoverTool
from bokeh.plotting import figure, show, output_notebook
from bokeh.models import ColumnDataSource

# Assuming that "Team" is a column in your dataframe
# Prepare the data
source = ColumnDataSource(data=dict(
    x=df['Total Cap'],
    y=df['Win%'],
    team=df['Team']
))

# Create a new figure
p = figure(height=350, title="Total Cap vs Win%",
           toolbar_location=None, tools="")

# Add a dot glyph to the figure
p.circle('x', 'y', size=10, source=source)

# Add a hover tool
hover = HoverTool(tooltips=[
    ("Team", "@team"),
    ("Total Cap", "@x"),
    ("Win%", "@y")
])
p.add_tools(hover)

# Set the labels for the x and y axes
p.xaxis.axis_label = 'Total Cap'
p.yaxis.axis_label = 'Win%'

import numpy as np

# Calculate the regression line
slope, intercept = np.polyfit(df['Total Cap'], df['Win%'], 1)
x = np.linspace(df['Total Cap'].min(), df['Total Cap'].max(), 100)
y = slope * x + intercept

# Add the regression line to the figure
p.line(x, y, line_color="red", line_width=2)

# Show the plot
output_notebook()
show(p)



We actually find that team salary cap does not actually correlate with how often they win regular season games, which is quite fascinating. This suggests that simply spending more money on an NBA team does not guarantee team success in the regular season. There are several confounding factors that can be considered, such as player utilization by coaching staff, team chemistry, coaching strategy, not paying players an expected amount for their preformance, and front office strategies. This report highlights the complex nature of the economics of the sport of basketball. This report will dive deeper into how on-court preformance may or may not impact player salary.

One would think that a player's on-court statistics would have significant impact on how much they earn. Key performance indicators in basketball such as points per game, assists per game, rebounds per games, and other analytics are often used to evaluate a player's contribution to their team and their overall value to the league. Players who can consistently perform at a higher level are usually seen as valuable assets, which would mean a larger contract. This expectation is rooted in an assumption that a player's on-court statistics is a quantifiable way to measure their skill and effectiveness on the court, influencing their market value and salary. Teams would pay more for top-performing players in hopes that their expected contributions would translate to more victories. Other studies and reports show that teams that are known to perform better become more valuable (Look at the Golden State Warriors's value before and after they won championships). The charts below provide context whenever these assumptions are accurate or not. 

In [76]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
from nba_api.stats.static import players
from nba_api.stats.endpoints import playergamelog
import statsmodels.api as sm
import time
from bokeh.models import HoverTool, ColumnDataSource
from bokeh.plotting import figure, show

print("Starting script...")

# Fetch the webpage
url = "https://www.basketball-reference.com/contracts/players.html"
response = requests.get(url)
print("Fetched webpage.")

# Parse the HTML content
soup = BeautifulSoup(response.content, 'html.parser')
print("Parsed HTML content.")

# Find the data table
table = soup.find('table', {'id': 'player-contracts'})
print("Found data table.")

# Extract data and convert it to a DataFrame
data = []
rows = table.find_all('tr')
for row in rows:
    cols = row.find_all('td')
    cols = [ele.text.strip() for ele in cols]
    data.append([ele for ele in cols if ele])
print("Extracted data.")

df2 = pd.DataFrame(data)
df2.replace('None', None, inplace=True)
df2.dropna(how='all', inplace=True)
headers = ["Name", "Tm", "2023-24", "2024-25", "2025-26", "2026-27", "2027-28", "2028-29", "2029-30"]
df2.columns = headers
print("Created DataFrame.")

# Convert contract value to numeric
df2['2023-24'] = df2['2023-24'].str.replace('$', '').str.replace(',', '').astype(float)
print("Converted contract value to numeric.")

# Fetch player's points per game data
player_stats = []
all_players = players.get_players()
for player in all_players:
    if player['full_name'] in df2['Name'].values:  # Only include players whose names match
        for i in range(5):  # Retry 5 times
            try:
                gamelog = playergamelog.PlayerGameLog(player_id=player['id']).get_data_frames()[0]
                ppg = gamelog['PTS'].mean()  # Use 'PTS' for points
                player_stats.append({'Name': player['full_name'], 'PPG': ppg, 'id': player['id']})  # Include player's ID
                print(f"Fetched data for player {player['full_name']}.")
                break
            except requests.exceptions.Timeout:
                print(f"Timeout occurred for player {player['full_name']}. Retrying...")
                time.sleep(2)  # Wait for 2 seconds before retrying
        else:
            print(f"Failed to fetch data for player {player['full_name']} after 5 attempts.")

df1 = pd.DataFrame(player_stats)
df1 = df1.sort_values(by='PPG', ascending=False).head(250)  # Get the top 50 players with the most points
print("Created DataFrame with player's points per game data.")

# Merge the contract data and points per game data
df = pd.merge(df1, df2, on='Name')
print("Merged data.")

# Perform the regression analysis
X = df[['PPG']]  # Use 'PPG' for points per game
y = df['2023-24']
X = sm.add_constant(X)
model = sm.OLS(y, X)
results = model.fit()
print("Performed regression analysis.")

# Calculate the regression line
slope, intercept = np.polyfit(df['PPG'], df['2023-24'], 1)
regression_line_x = np.array([df['PPG'].min(), df['PPG'].max()])
regression_line_y = slope * regression_line_x + intercept

# Add player images to the DataFrame
df['Image_URL'] = df['id'].apply(lambda id: f"https://ak-static.cms.nba.com/wp-content/uploads/headshots/nba/latest/260x190/{id}.png")

# Create the plot
source = ColumnDataSource(df)

p = figure(title="Points Per Game vs Contract Value with Player Images", 
           x_axis_label='Points Per Game', 
           y_axis_label='Contract Value for 2023-24')

# Plot the data as dots
p.circle('PPG', '2023-24', source=source, size=10)

# Create a HoverTool
hover = HoverTool()

# Set the tooltips to display the player's name, PPG, contract value, and image
hover.tooltips = """
    <div>
        <div><strong>Player:</strong> @Name</div>
        <div><strong>PPG:</strong> @PPG</div>
        <div><strong>Contract:</strong> @2023-24</div>
        <div><img src="@Image_URL" alt="" width="200" /></div>
    </div>
"""

# Add the HoverTool to the plot
p.add_tools(hover)

# Add the regression line to the plot
p.line(regression_line_x, regression_line_y, line_width=2, color='red')

# Display the plot
show(p)

Starting script...
Fetched webpage.
Parsed HTML content.
Found data table.
Extracted data.
Created DataFrame.
Converted contract value to numeric.
Fetched data for player Precious Achiuwa.
Fetched data for player Steven Adams.
Fetched data for player Bam Adebayo.
Fetched data for player Ochai Agbaji.
Fetched data for player Santi Aldama.
Fetched data for player Nickeil Alexander-Walker.
Fetched data for player Grayson Allen.
Fetched data for player Jarrett Allen.
Fetched data for player Jose Alvarado.
Fetched data for player Kyle Anderson.
Fetched data for player Giannis Antetokounmpo.
Fetched data for player Thanasis Antetokounmpo.
Fetched data for player Cole Anthony.
Fetched data for player OG Anunoby.
Fetched data for player Ryan Arcidiacono.
Fetched data for player Deni Avdija.
Fetched data for player Deandre Ayton.
Fetched data for player Marvin Bagley III.
Fetched data for player Patrick Baldwin Jr..
Fetched data for player LaMelo Ball.
Fetched data for player Lonzo Ball.
Fetche



In [77]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
from nba_api.stats.static import players
from nba_api.stats.endpoints import playergamelog
import statsmodels.api as sm
import time
from bokeh.models import HoverTool, ColumnDataSource
from bokeh.plotting import figure, show

print("Starting script...")

# Fetch the webpage
url = "https://www.basketball-reference.com/contracts/players.html"
response = requests.get(url)
print("Fetched webpage.")

# Parse the HTML content
soup = BeautifulSoup(response.content, 'html.parser')
print("Parsed HTML content.")

# Find the data table
table = soup.find('table', {'id': 'player-contracts'})
print("Found data table.")

# Extract data and convert it to a DataFrame
data = []
rows = table.find_all('tr')
for row in rows:
    cols = row.find_all('td')
    cols = [ele.text.strip() for ele in cols]
    data.append([ele for ele in cols if ele])
print("Extracted data.")

df2 = pd.DataFrame(data)
df2.replace('None', None, inplace=True)
df2.dropna(how='all', inplace=True)
headers = ["Name", "Tm", "2023-24", "2024-25", "2025-26", "2026-27", "2027-28", "2028-29", "2029-30"]
df2.columns = headers
print("Created DataFrame.")

# Convert contract value to numeric
df2['2023-24'] = df2['2023-24'].str.replace('$', '').str.replace(',', '').astype(float)
print("Converted contract value to numeric.")

# Fetch player's rebounds per game data
player_stats = []
all_players = players.get_players()
for player in all_players:
    if player['full_name'] in df2['Name'].values:  # Only include players whose names match
        for i in range(5):  # Retry 5 times
            try:
                gamelog = playergamelog.PlayerGameLog(player_id=player['id']).get_data_frames()[0]
                rpg = gamelog['REB'].mean()  # Use 'REB' for rebounds
                player_stats.append({'Name': player['full_name'], 'RPG': rpg, 'id': player['id']})  # Include player's ID
                print(f"Fetched data for player {player['full_name']}.")
                break
            except requests.exceptions.Timeout:
                print(f"Timeout occurred for player {player['full_name']}. Retrying...")
                time.sleep(2)  # Wait for 2 seconds before retrying
        else:
            print(f"Failed to fetch data for player {player['full_name']} after 5 attempts.")

df1 = pd.DataFrame(player_stats)
df1 = df1.sort_values(by='RPG', ascending=False).head(250)  # Get the top 50 players with the most rebounds
print("Created DataFrame with player's rebounds per game data.")

# Merge the contract data and rebounds per game data
df = pd.merge(df1, df2, on='Name')
print("Merged data.")

# Perform the regression analysis
X = df[['RPG']]  # Use 'RPG' for rebounds per game
y = df['2023-24']
X = sm.add_constant(X)
model = sm.OLS(y, X)
results = model.fit()
print("Performed regression analysis.")

# Calculate the regression line
slope, intercept = np.polyfit(df['RPG'], df['2023-24'], 1)
regression_line_x = np.array([df['RPG'].min(), df['RPG'].max()])
regression_line_y = slope * regression_line_x + intercept

# Add player images to the DataFrame
df['Image_URL'] = df['id'].apply(lambda id: f"https://ak-static.cms.nba.com/wp-content/uploads/headshots/nba/latest/260x190/{id}.png")

# Create the plot
source = ColumnDataSource(df)

p = figure(title="Rebounds Per Game vs Contract Value with Player Images", 
           x_axis_label='Rebounds Per Game', 
           y_axis_label='Contract Value for 2023-24')

# Plot the data as dots
p.circle('RPG', '2023-24', source=source, size=10)

# Create a HoverTool
hover = HoverTool()

# Set the tooltips to display the player's name, RPG, contract value, and image
hover.tooltips = """
    <div>
        <div><strong>Player:</strong> @Name</div>
        <div><strong>RPG:</strong> @RPG</div>
        <div><strong>Contract:</strong> @2023-24</div>
        <div><img src="@Image_URL" alt="" width="200" /></div>
    </div>
"""

# Add the HoverTool to the plot
p.add_tools(hover)

# Add the regression line to the plot
p.line(regression_line_x, regression_line_y, line_width=2, color='red')

# Display the plot
show(p)

print("Script finished.")

Starting script...
Fetched webpage.
Parsed HTML content.
Found data table.
Extracted data.
Created DataFrame.
Converted contract value to numeric.
Fetched data for player Precious Achiuwa.
Fetched data for player Steven Adams.
Fetched data for player Bam Adebayo.
Fetched data for player Ochai Agbaji.
Fetched data for player Santi Aldama.
Fetched data for player Nickeil Alexander-Walker.
Fetched data for player Grayson Allen.
Fetched data for player Jarrett Allen.
Fetched data for player Jose Alvarado.
Fetched data for player Kyle Anderson.
Fetched data for player Giannis Antetokounmpo.
Fetched data for player Thanasis Antetokounmpo.
Fetched data for player Cole Anthony.
Fetched data for player OG Anunoby.
Fetched data for player Ryan Arcidiacono.
Fetched data for player Deni Avdija.
Fetched data for player Deandre Ayton.
Fetched data for player Marvin Bagley III.
Fetched data for player Patrick Baldwin Jr..
Fetched data for player LaMelo Ball.
Fetched data for player Lonzo Ball.
Fetche



Script finished.


In [4]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
from nba_api.stats.static import players
from nba_api.stats.endpoints import playergamelog
import statsmodels.api as sm
import time
from bokeh.models import HoverTool, ColumnDataSource
from bokeh.plotting import figure, show
import numpy as np
from bokeh.io import output_notebook

print("Starting script...")
output_notebook()


# Fetch the webpage
url = "https://www.basketball-reference.com/contracts/players.html"
response = requests.get(url)
print("Fetched webpage.")

# Parse the HTML content
soup = BeautifulSoup(response.content, 'html.parser')
print("Parsed HTML content.")

# Find the data table
table = soup.find('table', {'id': 'player-contracts'})
print("Found data table.")

# Extract data and convert it to a DataFrame
data = []
rows = table.find_all('tr')
for row in rows:
    cols = row.find_all('td')
    cols = [ele.text.strip() for ele in cols]
    data.append([ele for ele in cols if ele])
print("Extracted data.")

df2 = pd.DataFrame(data)
df2.replace('None', None, inplace=True)
df2.dropna(how='all', inplace=True)
headers = ["Name", "Tm", "2023-24", "2024-25", "2025-26", "2026-27", "2027-28", "2028-29", "2029-30"]
df2.columns = headers
print("Created DataFrame.")

# Convert contract value to numeric
df2['2023-24'] = df2['2023-24'].str.replace('$', '').str.replace(',', '').astype(float)
print("Converted contract value to numeric.")

# Fetch player's rebounds per game data
player_stats = []
all_players = players.get_players()
for player in all_players:
    if player['full_name'] in df2['Name'].values:  # Only include players whose names match
        for i in range(5):  # Retry 5 times
            try:
                gamelog = playergamelog.PlayerGameLog(player_id=player['id']).get_data_frames()[0]
                apg = gamelog['AST'].mean()  # Use 'REB' for rebounds
                player_stats.append({'Name': player['full_name'], 'APG': apg, 'id': player['id']})  # Include player's ID
                print(f"Fetched data for player {player['full_name']}.")
                break
            except requests.exceptions.Timeout:
                print(f"Timeout occurred for player {player['full_name']}. Retrying...")
                time.sleep(2)  # Wait for 2 seconds before retrying
        else:
            print(f"Failed to fetch data for player {player['full_name']} after 5 attempts.")

df1 = pd.DataFrame(player_stats)
df1 = df1.sort_values(by='APG', ascending=False).head(250)  # Get the top 50 players with the most rebounds
print("Created DataFrame with player's rebounds per game data.")

# Merge the contract data and rebounds per game data
df = pd.merge(df1, df2, on='Name')
print("Merged data.")

# Perform the regression analysis
X = df[['APG']]  # Use 'RPG' for rebounds per game
y = df['2023-24']
X = sm.add_constant(X)
model = sm.OLS(y, X)
results = model.fit()
print("Performed regression analysis.")

# Calculate the regression line
slope, intercept = np.polyfit(df['APG'], df['2023-24'], 1)
regression_line_x = np.array([df['APG'].min(), df['APG'].max()])
regression_line_y = slope * regression_line_x + intercept

# Add player images to the DataFrame
df['Image_URL'] = df['id'].apply(lambda id: f"https://ak-static.cms.nba.com/wp-content/uploads/headshots/nba/latest/260x190/{id}.png")

# Create the plot
source = ColumnDataSource(df)

p = figure(title="Assists Per Game vs Contract Value with Player Images", 
           x_axis_label='Rebounds Per Game', 
           y_axis_label='Contract Value for 2023-24')

# Plot the data as dots
p.circle('APG', '2023-24', source=source, size=10)

# Create a HoverTool
hover = HoverTool()

# Set the tooltips to display the player's name, RPG, contract value, and image
hover.tooltips = """
    <div>
        <div><strong>Player:</strong> @Name</div>
        <div><strong>APG:</strong> @APG</div>
        <div><strong>Contract:</strong> @2023-24</div>
        <div><img src="@Image_URL" alt="" width="200" /></div>
    </div>
"""

# Add the HoverTool to the plot
p.add_tools(hover)

# Add the regression line to the plot
p.line(regression_line_x, regression_line_y, line_width=2, color='red')

# Display the plot
show(p)

print("Script finished.")

Starting script...


Fetched webpage.
Parsed HTML content.
Found data table.
Extracted data.
Created DataFrame.
Converted contract value to numeric.


  df2['2023-24'] = df2['2023-24'].str.replace('$', '').str.replace(',', '').astype(float)


Fetched data for player Precious Achiuwa.
Fetched data for player Steven Adams.
Fetched data for player Bam Adebayo.
Fetched data for player Ochai Agbaji.
Fetched data for player Santi Aldama.
Fetched data for player Nickeil Alexander-Walker.
Fetched data for player Grayson Allen.
Fetched data for player Jarrett Allen.
Fetched data for player Jose Alvarado.
Fetched data for player Kyle Anderson.
Fetched data for player Giannis Antetokounmpo.
Fetched data for player Thanasis Antetokounmpo.
Fetched data for player Cole Anthony.
Fetched data for player OG Anunoby.
Fetched data for player Ryan Arcidiacono.
Fetched data for player Deni Avdija.
Fetched data for player Deandre Ayton.
Fetched data for player Marvin Bagley III.
Fetched data for player Patrick Baldwin Jr..
Fetched data for player LaMelo Ball.
Fetched data for player Lonzo Ball.
Fetched data for player Mo Bamba.
Fetched data for player Paolo Banchero.
Fetched data for player Desmond Bane.
Fetched data for player Dalano Banton.
Fe



Script finished.


In [62]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
from nba_api.stats.static import players
from nba_api.stats.endpoints import playergamelog
import statsmodels.api as sm
import time
from bokeh.models import HoverTool, ColumnDataSource
from bokeh.plotting import figure, show

print("Starting script...")

# Fetch the webpage
url = "https://www.basketball-reference.com/contracts/players.html"
response = requests.get(url)
print("Fetched webpage.")

# Parse the HTML content
soup = BeautifulSoup(response.content, 'html.parser')
print("Parsed HTML content.")

# Find the data table
table = soup.find('table', {'id': 'player-contracts'})
print("Found data table.")

# Extract data and convert it to a DataFrame
data = []
rows = table.find_all('tr')
for row in rows:
    cols = row.find_all('td')
    cols = [ele.text.strip() for ele in cols]
    data.append([ele for ele in cols if ele])
print("Extracted data.")

df2 = pd.DataFrame(data)
df2.replace('None', None, inplace=True)
df2.dropna(how='all', inplace=True)
headers = ["Name", "Tm", "2023-24", "2024-25", "2025-26", "2026-27", "2027-28", "2028-29", "2029-30"]
df2.columns = headers
print("Created DataFrame.")

# Convert contract value to numeric
df2['2023-24'] = df2['2023-24'].str.replace('$', '').str.replace(',', '').astype(float)
print("Converted contract value to numeric.")

# Fetch player's points, rebounds, and assists per game data
player_stats = []
all_players = players.get_players()
for player in all_players:
    if player['full_name'] in df2['Name'].values:  # Only include players whose names match
        for i in range(5):  # Retry 5 times
            try:
                gamelog = playergamelog.PlayerGameLog(player_id=player['id']).get_data_frames()[0]
                ppg = gamelog['PTS'].mean()  # Use 'PTS' for points
                rpg = gamelog['REB'].mean()  # Use 'REB' for rebounds
                apg = gamelog['AST'].mean()  # Use 'AST' for assists
                total = ppg + rpg + apg  # Calculate the sum of PPG, RPG, and APG
                player_stats.append({'Name': player['full_name'], 'Total': total, 'id': player['id']})  # Include player's ID
                print(f"Fetched data for player {player['full_name']}.")
                break
            except requests.exceptions.Timeout:
                print(f"Timeout occurred for player {player['full_name']}. Retrying...")
                time.sleep(2)  # Wait for 2 seconds before retrying
        else:
            print(f"Failed to fetch data for player {player['full_name']} after 5 attempts.")

df1 = pd.DataFrame(player_stats)
df1 = df1.sort_values(by='Total', ascending=False).head(250)  # Get the top 50 players with the most total points, rebounds, and assists
print("Created DataFrame with player's total points, rebounds, and assists per game data.")

# Merge the contract data and total points, rebounds, and assists per game data
df = pd.merge(df1, df2, on='Name')
print("Merged data.")

# Perform the regression analysis
X = df[['Total']]  # Use 'Total' for the sum of points, rebounds, and assists per game
y = df['2023-24']
X = sm.add_constant(X)
model = sm.OLS(y, X)
results = model.fit()
print("Performed regression analysis.")

# Calculate the regression line
slope, intercept = np.polyfit(df['Total'], df['2023-24'], 1)
regression_line_x = np.array([df['Total'].min(), df['Total'].max()])
regression_line_y = slope * regression_line_x + intercept

# Calculate R-squared
r_squared = results.rsquared

# Create the plot
source = ColumnDataSource(df)

p = figure(title="Total Points, Rebounds, and Assists Per Game vs Contract Value with Player Images", 
           x_axis_label='Total Points, Rebounds, and Assists Per Game', 
           y_axis_label='Contract Value for 2023-24')

# Plot the data as dots
p.circle('Total', '2023-24', source=source, size=10)

# Create a HoverTool
hover = HoverTool()

# Set the tooltips to display the player's name, total points, rebounds, and assists per game, contract value, and image
hover.tooltips = """
    <div>
        <div><strong>Player:</strong> @Name</div>
        <div><strong>Total PPG, RPG, APG:</strong> @Total</div>
        <div><strong>Contract:</strong> @2023-24</div>
        <div><img src="@Image_URL" alt="" width="200" /></div>
    </div>
"""

# Add the HoverTool to the plot
p.add_tools(hover)

# Add the regression line to the plot
p.line(regression_line_x, regression_line_y, line_width=2, color='red')

# Create a label for the R-squared value and regression equation and add it to the plot
label = Label(x=70, y=500, x_units='screen', y_units='screen',
              text=f'R^2 = {r_squared:.2f}\n y = {slope:.2f}x + {intercept:.2f}',
              border_line_color='black', border_line_alpha=1.0,
              background_fill_color='white', background_fill_alpha=1.0)
p.add_layout(label)

# Display the plot
show(p)

print("Script finished.")

Starting script...
Fetched webpage.
Parsed HTML content.
Found data table.
Extracted data.
Created DataFrame.
Converted contract value to numeric.


  df2['2023-24'] = df2['2023-24'].str.replace('$', '').str.replace(',', '').astype(float)


Fetched data for player Precious Achiuwa.
Fetched data for player Steven Adams.
Fetched data for player Bam Adebayo.
Fetched data for player Ochai Agbaji.
Fetched data for player Santi Aldama.
Fetched data for player Nickeil Alexander-Walker.
Fetched data for player Grayson Allen.
Fetched data for player Jarrett Allen.
Fetched data for player Jose Alvarado.
Fetched data for player Kyle Anderson.
Fetched data for player Giannis Antetokounmpo.
Fetched data for player Thanasis Antetokounmpo.
Fetched data for player Cole Anthony.
Fetched data for player OG Anunoby.
Fetched data for player Ryan Arcidiacono.
Fetched data for player Deni Avdija.
Fetched data for player Deandre Ayton.
Fetched data for player Marvin Bagley III.
Fetched data for player Patrick Baldwin Jr..
Fetched data for player LaMelo Ball.
Fetched data for player Lonzo Ball.
Fetched data for player Mo Bamba.
Fetched data for player Paolo Banchero.
Fetched data for player Desmond Bane.
Fetched data for player Dalano Banton.
Fe



Script finished.


In [None]:
#raptor source: https://neilpaine.substack.com/p/nba-estimated-raptor-player-ratings
import pandas as pd
raptor = pd.read_csv("dataset2.csv")
newcolumns = [0, 1, 2, 3, 8]
raptor = raptor.iloc[:,newcolumns]
raptor = raptor.drop(0)
raptor = raptor.rename(columns={'Unnamed: 0': 'Player', 'Unnamed: 1':'Name','Unnamed: 2':'Team', 'Unnamed: 3':'Position(s)', 'Unnamed: 8':'Total RAPTOR'})
raptor


Unnamed: 0,Player,Name,Team,Position(s),Total RAPTOR
1,Markquis Nowell,24,TOR,SG,37.74899093
2,Hamidou Diallo,25,WAS,SG,26.42653133
3,Henri Drell,23,CHI,SF,25.36624168
4,Drew Peterson,24,BOS,PF,23.67703879
5,Adama Sanogo,21,CHI,PF,19.42261036
...,...,...,...,...,...
553,Dmytro Skapintsev,25,NYK,C,-25.4
554,Alondes Williams,24,MIA,SG,-26.7508189
555,Quenton Jackson,25,IND,PG,-26.76915289
556,Joshua Primo,21,LAC,SG,-27.14852305


In [31]:
lebron = pd.read_csv('lebron.csv')
index = range(1, len(lebron) + 1)
lebron.index = index
newcolumns = [0, 7, 8, 9]
# lebron = lebron.iloc[:,newcolumns]
# lebron = lebron.drop(535)
lebron.head()


Unnamed: 0,Player,Season,Team,Offensive Archetype,Minutes,Pos,Age,LEBRON,O-LEBRON,D-LEBRON,WAR,LEBRON Contract Value,boxLEBRON,boxOLEBRON,boxDLEBRON
1,Nikola Jokic,2023-24,DEN,Post Scorer,1584,C,28,6.56,5.38,1.18,8.6,29852693.0,5.97,4.82,1.15
2,Giannis Antetokounmpo,2023-24,MIL,Shot Creator,1618,PF,29,6.16,5.1,1.06,8.5,29280300.0,5.33,4.62,0.7
3,Joel Embiid,2023-24,PHI,Shot Creator,1157,C,29,5.8,4.77,1.03,5.8,20143002.0,6.06,5.2,0.86
4,Shai Gilgeous-Alexander,2023-24,OKC,Shot Creator,1615,PG,25,5.7,4.68,1.01,8.0,27794844.0,6.36,5.29,1.07
5,Luka Doncic,2023-24,DAL,Shot Creator,1499,PG,24,5.46,6.17,-0.72,7.3,25111052.0,5.32,5.9,-0.58


In [35]:
from bokeh.models import HoverTool, Span

# Re-create the figure (if necessary)
source_simple = ColumnDataSource(lebron)
p_simple = figure(height=400, width=400, title="O-LEBRON vs D-LEBRON with Hover Tool")

# Add circles for each point in the dataset
p_simple.circle(x='O-LEBRON', y='D-LEBRON', source=source_simple, size=10)

# Create a HoverTool object
hover = HoverTool()
hover.tooltips = [
    ("Player", "@Player"), 
    ("O-LEBRON", "@{O-LEBRON}"), 
    ("D-LEBRON", "@{D-LEBRON}")
]

# Add the HoverTool to the figure
p_simple.add_tools(hover)

# Add a horizontal line at y=0
hline = Span(location=0, dimension='width', line_color='black', line_width=1)
p_simple.renderers.extend([hline])

# Add a vertical line at x=0
vline = Span(location=0, dimension='height', line_color='black', line_width=1)
p_simple.renderers.extend([vline])

# Display the plot with the hover tool
show(p_simple)



In [130]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
from bokeh.models import HoverTool, ColumnDataSource, Label
from bokeh.plotting import figure, show
import numpy as np
import statsmodels.api as sm

# Load the lebron DataFrame
lebron = pd.read_csv('lebron.csv')

# Fetch the webpage
url = "https://www.basketball-reference.com/contracts/players.html"
response = requests.get(url)

# Parse the HTML content
soup = BeautifulSoup(response.content, 'html.parser')

# Find the data table
table = soup.find('table', {'id': 'player-contracts'})

# Extract data and convert it to a DataFrame
data = []
rows = table.find_all('tr')
for row in rows:
    cols = row.find_all('td')
    cols = [ele.text.strip() for ele in cols]
    data.append([ele for ele in cols if ele])

df2 = pd.DataFrame(data)
df2.replace('None', None, inplace=True)
df2.dropna(how='all', inplace=True)
headers = ["Name", "Tm", "2023-24", "2024-25", "2025-26", "2026-27", "2027-28", "2028-29", "2029-30"]
df2.columns = headers

# Convert contract value to numeric
df2['2023-24'] = df2['2023-24'].str.replace('$', '').str.replace(',', '').astype(float)

# Fetch player's overall LEBRON data
player_stats = []
for player in lebron['Player'].unique():
    if player in df2['Name'].values:  # Only include players whose names match
        lebron_value = lebron[lebron['Player'] == player]['LEBRON'].mean()  # Use 'LEBRON' for overall LEBRON
        player_stats.append({'Name': player, 'LEBRON': lebron_value})

df1 = pd.DataFrame(player_stats)
df1 = df1.sort_values(by='LEBRON', ascending=False).head(250)  # Get the top 50 players with the most overall LEBRON

# Merge the contract data and overall LEBRON data
df = pd.merge(df1, df2, on='Name')

# Perform the regression analysis
X = df[['LEBRON']]  # Use 'LEBRON' for the overall LEBRON
y = df['2023-24']
X = sm.add_constant(X)
model = sm.OLS(y, X)
results = model.fit()

# Calculate the regression line
slope, intercept = np.polyfit(df['LEBRON'], df['2023-24'], 1)
regression_line_x = np.array([df['LEBRON'].min(), df['LEBRON'].max()])
regression_line_y = slope * regression_line_x + intercept

# Calculate R-squared
r_squared = results.rsquared

# Create the plot
source = ColumnDataSource(df)

p = figure(title="Overall LEBRON vs Contract Value", 
           x_axis_label='Overall LEBRON', 
           y_axis_label='Contract Value for 2023-24')

# Plot the data as dots
p.circle('LEBRON', '2023-24', source=source, size=10)

# Create a HoverTool
hover = HoverTool()

# Set the tooltips to display the player's name, overall LEBRON, and contract value

# Update the HoverTool tooltips
# Set the tooltips to display the player's name, overall LEBRON, and contract value
hover.tooltips = """
    <div>
        <div><strong>Player:</strong> @Name</div>
        <div><strong>Overall LEBRON:</strong> @LEBRON</div>
        <div><strong>Contract:</strong> @{2023-24}</div>  
        <div><img src="@Image_URL" alt="" width="200" /></div>
    </div>
"""

# Add the HoverTool to the plot
p.add_tools(hover)

# Add the regression line to the plot
p.line(regression_line_x, regression_line_y, line_width=2, color='red')

# Create a label for the R-squared value and regression equation and add it to the plot
label = Label(x=10, y=485, x_units='screen', y_units='screen',
              text=f'R^2 = {r_squared:.2f}\n y = {slope:.2f}x + {intercept:.2f}',
              border_line_color='black', border_line_alpha=1.0,
              background_fill_color='white', background_fill_alpha=1.0, text_font_size = '8pt')
p.add_layout(label)

# Display the plot
show(p)

  df2['2023-24'] = df2['2023-24'].str.replace('$', '').str.replace(',', '').astype(float)


                      Name  LEBRON   Tm     2023-24      2024-25      2025-26      2026-27       2027-28       2028-29       2029-30
0    Giannis Antetokounmpo    6.16  MIL  45640084.0  $48,787,676  $68,302,746  $73,766,966   $79,231,186  $236,497,472          None
1              Joel Embiid    5.80  PHI  47607350.0  $51,415,938  $55,224,526  $59,033,114  $154,247,814          None          None
2  Shai Gilgeous-Alexander    5.70  OKC  33386850.0  $35,859,950  $38,333,050  $40,806,150  $148,386,000          None          None
3         Donovan Mitchell    4.60  CLE  33162030.0  $35,410,310  $37,096,620  $68,572,340          None          None          None
4        Tyrese Haliburton    4.34  IND   5808435.0  $35,500,000  $38,340,000  $41,180,000   $44,020,000   $46,860,000  $211,708,435


In [143]:
#win share and player efficiency rating
import requests
import lxml.html as lx
url = 'https://www.basketball-reference.com/leagues/NBA_2024_advanced.html'
response = requests.get(url)
html = lx.fromstring(response.text)
tables = html.xpath('//table')

# Iterate over each table element
for table in tables:
    dataframe = []
    for row in table.xpath('.//tr'):
        row_data = [cell.text_content().strip() for cell in row.xpath('.//td')]
        dataframe.append(row_data)
rows = []
for row_data in dataframe:
    rows.append(row_data)
columns = ['Player', 'Position', 'Age', 'Team', 'G', 'MP', 'PER', 'TS%', '3PAr', 'FTr', 'ORB%', 'DRB%', 'TRB%', 'AST%', 'STL%', 'BLK%', 'TOV%', 'USG%', 'OWS', 'DWS', 'WS', 'WS/48', 'OBPM', 'DBPM', 'BPM', 'VORP', 'bweebo', 'ball']

# Create DataFrame
df = pd.DataFrame(dataframe, columns=columns)

In [144]:
df.drop(0)
df = df.drop_duplicates(subset=['Player'], keep='first')
newcolumns = [0, 1, 2, 3, 6, 20]
df = df.iloc[:,newcolumns]
advanced = df



In [151]:
advanced = advanced.drop(0)
advanced.reset_index(drop=True, inplace=True)
advanced.head()
df2 = advanced
df2.head()


Unnamed: 0,Player,Position,Age,Team,PER,WS
0,Santi Aldama,PF,23,MEM,12.7,1.9
1,Nickeil Alexander-Walker,SG,25,MIN,10.7,2.5
2,Grayson Allen,SG,28,PHO,13.6,1.6
3,Jarrett Allen,C,25,CLE,21.5,3.6
4,Jose Alvarado,PG,25,NOP,12.6,1.3


In [153]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
from bokeh.models import HoverTool, ColumnDataSource, Label
from bokeh.plotting import figure, show
import numpy as np
import statsmodels.api as sm

# Fetch the webpage
url = "https://www.basketball-reference.com/contracts/players.html"
response = requests.get(url)

# Parse the HTML content
soup = BeautifulSoup(response.content, 'html.parser')

# Find the data table
table = soup.find('table', {'id': 'player-contracts'})

# Extract data and convert it to a DataFrame
data = []
rows = table.find_all('tr')
for row in rows:
    cols = row.find_all('td')
    cols = [ele.text.strip() for ele in cols]
    data.append([ele for ele in cols if ele])

df2 = pd.DataFrame(data)
df2.replace('None', None, inplace=True)
df2.dropna(how='all', inplace=True)
headers = ["Name", "Tm", "2023-24", "2024-25", "2025-26", "2026-27", "2027-28", "2028-29", "2029-30"]
df2.columns = headers

# Convert contract value to numeric
df2['2023-24'] = df2['2023-24'].str.replace('$', '').str.replace(',', '').astype(float)

# Fetch player's overall LEBRON data
player_stats = []
for player in lebron['Player'].unique():
    if player in df2['Name'].values:  # Only include players whose names match
        lebron_value = lebron[lebron['Player'] == player]['LEBRON'].mean()  # Use 'LEBRON' for overall LEBRON
        player_stats.append({'Name': player, 'LEBRON': lebron_value})

df1 = pd.DataFrame(player_stats)
df1 = df1.sort_values(by='LEBRON', ascending=False).head(250)  # Get the top 50 players with the most overall LEBRON

# Merge the contract data and overall LEBRON data
df = pd.merge(df1, df2, on='Name')

# Perform the regression analysis
X = df[['LEBRON']]  # Use 'LEBRON' for the overall LEBRON
y = df['2023-24']
X = sm.add_constant(X)
model = sm.OLS(y, X)
results = model.fit()

# Calculate the regression line
slope, intercept = np.polyfit(df['LEBRON'], df['2023-24'], 1)
regression_line_x = np.array([df['LEBRON'].min(), df['LEBRON'].max()])
regression_line_y = slope * regression_line_x + intercept

# Calculate R-squared
r_squared = results.rsquared

# Create the plot
source = ColumnDataSource(df)

p = figure(title="Overall LEBRON vs Contract Value", 
           x_axis_label='Overall LEBRON', 
           y_axis_label='Contract Value for 2023-24')

# Plot the data as dots
p.circle('LEBRON', '2023-24', source=source, size=10)

# Create a HoverTool
hover = HoverTool()

# Set the tooltips to display the player's name, overall LEBRON, and contract value

# Update the HoverTool tooltips
# Set the tooltips to display the player's name, overall LEBRON, and contract value
hover.tooltips = """
    <div>
        <div><strong>Player:</strong> @Name</div>
        <div><strong>Overall LEBRON:</strong> @LEBRON</div>
        <div><strong>Contract:</strong> @{2023-24}</div>  
        <div><img src="@Image_URL" alt="" width="200" /></div>
    </div>
"""

# Add the HoverTool to the plot
p.add_tools(hover)

# Add the regression line to the plot
p.line(regression_line_x, regression_line_y, line_width=2, color='red')

# Create a label for the R-squared value and regression equation and add it to the plot
label = Label(x=10, y=485, x_units='screen', y_units='screen',
              text=f'R^2 = {r_squared:.2f}\n y = {slope:.2f}x + {intercept:.2f}',
              border_line_color='black', border_line_alpha=1.0,
              background_fill_color='white', background_fill_alpha=1.0, text_font_size = '8pt')
p.add_layout(label)

# Display the plot
show(p)

KeyError: 'PER'