## performing data analysis using the NBA API to examine correlations and calculate various statistics related to team and player performance.

### Correlation between Win Percentage and Player's Points:

Data Needed: Team win/loss records, player statistics (points per game).
Analysis: Calculate the correlation coefficient (e.g., using Pearson's correlation) between a team's win percentage and the average points per game scored by their key players. This analysis can help identify how influential a player's scoring is on the team's success.

### Percentage of Points Scored by a Player:

Data Needed: Game statistics (points scored by each player), team total points scored.
Analysis: For a specific player or a group of players, calculate the percentage of total points scored by that player(s) in a game or over a season. This can provide insights into the offensive contribution of individual players to their team's overall scoring.
These analyses can help understand the relationship between player performance and team success, which can be valuable for making data-driven decisions, such as optimizing player rotations, evaluating player contracts, or predicting team performance in upcoming games or seasons.

#### initial plan - work with one year of games from GSW or any team, analyse each player's (most if not all) points scored as well as percentage of team's points for that game, then correlate it with winrate somehow.

In [24]:
import requests
import pandas as pd
from nba_api.stats.endpoints import teamgamelogs
from nba_api.stats.library.parameters import Season

# Define the team's abbreviation (GSW for Golden State Warriors)
team_abbreviation = "GSW"

# Define the season you are interested in
season = Season.default

# # Define custom headers to mimic a web browser request
# headers = {
#     "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
# }

# Create the TeamGameLogs endpoint instance with headers 
# https://github.com/swar/nba_api/blob/master/docs/nba_api/stats/endpoints/teamgamelogs.md
team_game_logs = teamgamelogs.TeamGameLogs(team_id_nullable='1610612744') # season_nullable

# Get the data as a Pandas DataFrame
data = team_game_logs.get_data_frames()[0]

# Display the data
print(data)



   SEASON_YEAR     TEAM_ID TEAM_ABBREVIATION              TEAM_NAME  \
0      2016-17  1610612744               GSW  Golden State Warriors   
1      2016-17  1610612744               GSW  Golden State Warriors   
2      2016-17  1610612744               GSW  Golden State Warriors   
3      2016-17  1610612744               GSW  Golden State Warriors   
4      2016-17  1610612744               GSW  Golden State Warriors   
..         ...         ...               ...                    ...   
77     2016-17  1610612744               GSW  Golden State Warriors   
78     2016-17  1610612744               GSW  Golden State Warriors   
79     2016-17  1610612744               GSW  Golden State Warriors   
80     2016-17  1610612744               GSW  Golden State Warriors   
81     2016-17  1610612744               GSW  Golden State Warriors   

       GAME_ID            GAME_DATE      MATCHUP WL   MIN  FGM  ...  REB_RANK  \
0   0021601229  2017-04-12T00:00:00  GSW vs. LAL  W  48.0   41  ..

In [21]:
# note that this might not work if the game/player is from too long ago, because they didnt keep records from that long ago.

from nba_api.stats.endpoints import playergamelogs

# Specify the player's ID (you'll need to obtain this)
player_id = '2544'

# Create the PlayerGameLogs endpoint instance
player_game_logs = playergamelogs.PlayerGameLogs(player_id_nullable=player_id)  # Specify the player's ID and season

# Get the data as a Pandas DataFrame
player_data = player_game_logs.get_data_frames()[0]

# Display the data
print(player_data)



   SEASON_YEAR  PLAYER_ID   PLAYER_NAME NICKNAME     TEAM_ID  \
0      2016-17       2544  LeBron James   LeBron  1610612739   
1      2016-17       2544  LeBron James   LeBron  1610612739   
2      2016-17       2544  LeBron James   LeBron  1610612739   
3      2016-17       2544  LeBron James   LeBron  1610612739   
4      2016-17       2544  LeBron James   LeBron  1610612739   
..         ...        ...           ...      ...         ...   
69     2016-17       2544  LeBron James   LeBron  1610612739   
70     2016-17       2544  LeBron James   LeBron  1610612739   
71     2016-17       2544  LeBron James   LeBron  1610612739   
72     2016-17       2544  LeBron James   LeBron  1610612739   
73     2016-17       2544  LeBron James   LeBron  1610612739   

   TEAM_ABBREVIATION            TEAM_NAME     GAME_ID            GAME_DATE  \
0                CLE  Cleveland Cavaliers  0021601197  2017-04-09T00:00:00   
1                CLE  Cleveland Cavaliers  0021601179  2017-04-07T00:00:00 

In [25]:
# this retrieves 2022-23 regular season lebron games (can compare here: https://www.espn.com/nba/player/gamelog/_/id/1966/lebronjames)
from nba_api.stats.endpoints import playergamelogs



# Specify the season you are interested in
season = "2022-23"  # Replace with the desired season

# Create a list to store game-level player statistics
game_player_stats = []

# Get the player IDs for the team
# You may need to retrieve the player IDs associated with the team
# This can be done by using another endpoint like 'commonallplayers'
# and filtering for players on the specified team
# player_ids = [player_id_1, player_id_2, ...]
player_ids = ['2544'] # lebron james 

# Loop through each player and retrieve their game logs
for player_id in player_ids:
    player_game_logs = playergamelogs.PlayerGameLogs(player_id_nullable=player_id, season_nullable=season)
    player_data = player_game_logs.get_data_frames()[0]
    game_player_stats.append(player_data)

# Concatenate the data frames into a single data frame
team_game_data = pd.concat(game_player_stats, ignore_index=True)

# Group the data by 'GAME_ID' and calculate the total points scored by each player in each game
game_points = team_game_data.groupby(['GAME_ID', 'PLAYER_ID'])['PTS'].sum().reset_index()

# Display the resulting data frame
print(game_points)


       GAME_ID  PLAYER_ID  PTS
0   0022200002       2544   31
1   0022200016       2544   20
2   0022200037       2544   31
3   0022200064       2544   19
4   0022200076       2544   28
5   0022200095       2544   26
6   0022200117       2544   20
7   0022200131       2544   17
8   0022200140       2544   27
9   0022200170       2544   30
10  0022200282       2544   21
11  0022200288       2544   39
12  0022200308       2544   21
13  0022200324       2544   31
14  0022200331       2544   28
15  0022200349       2544   29
16  0022200360       2544   21
17  0022200382       2544   23
18  0022200396       2544   35
19  0022200413       2544   33
20  0022200437       2544   30
21  0022200451       2544   33
22  0022200475       2544   31
23  0022200492       2544   34
24  0022200494       2544   38
25  0022200505       2544   28
26  0022200518       2544   27
27  0022200530       2544   47
28  0022200551       2544   43
29  0022200590       2544   25
30  0022200595       2544   37
31  0022

In [29]:
from nba_api.stats.endpoints import playergamelogs
import pandas as pd

# Define the season you are interested in (e.g., '2016-17')
season = '2016-17'

# Specify the desired game ID
desired_game_id = '0021601229'  # Replace with the desired GAME_ID

# Create the PlayerGameLogs endpoint instance for the specified season
player_game_logs = playergamelogs.PlayerGameLogs(
    season_nullable=season,
    game_segment_nullable='All Games'
)

# Get the data as a Pandas DataFrame
player_game_data = player_game_logs.get_data_frames()[0]

# Filter the data for the desired game ID
desired_game_data = player_game_data[player_game_data['GAME_ID'] == desired_game_id]

# Extract player names and points scored for the desired game
player_points = desired_game_data[['PLAYER_NAME', 'PTS']]

# Display the player points for the desired game
print(player_points)



JSONDecodeError: Expecting value: line 1 column 1 (char 0)

In [27]:
# Print the column names to identify the correct column for player names
print(data.columns)


Index(['SEASON_YEAR', 'TEAM_ID', 'TEAM_ABBREVIATION', 'TEAM_NAME', 'GAME_ID',
       'GAME_DATE', 'MATCHUP', 'WL', 'MIN', 'FGM', 'FGA', 'FG_PCT', 'FG3M',
       'FG3A', 'FG3_PCT', 'FTM', 'FTA', 'FT_PCT', 'OREB', 'DREB', 'REB', 'AST',
       'TOV', 'STL', 'BLK', 'BLKA', 'PF', 'PFD', 'PTS', 'PLUS_MINUS',
       'GP_RANK', 'W_RANK', 'L_RANK', 'W_PCT_RANK', 'MIN_RANK', 'FGM_RANK',
       'FGA_RANK', 'FG_PCT_RANK', 'FG3M_RANK', 'FG3A_RANK', 'FG3_PCT_RANK',
       'FTM_RANK', 'FTA_RANK', 'FT_PCT_RANK', 'OREB_RANK', 'DREB_RANK',
       'REB_RANK', 'AST_RANK', 'TOV_RANK', 'STL_RANK', 'BLK_RANK', 'BLKA_RANK',
       'PF_RANK', 'PFD_RANK', 'PTS_RANK', 'PLUS_MINUS_RANK'],
      dtype='object')
