    The data in the boxscores and Anthropometrics files both go through an initial cleaning using MS excel, then uploaded into the mySQL server using MySQL workbench. For the Anthropometrics file, the data was based off of the NBA draft combine information for the years 2000 to 2023. For those who did not attend the combine, there wasn't information. Missing information for height and weight was collected from the internet and imputed using MS excel.

In [1]:
import numpy as np
import scipy
from scipy import stats
import sklearn
import pandas as pd
import seaborn as sns

In [2]:
box_scores = pd.read_csv('NBA-BoxScores-2023-2024.csv')
#sorting the box scores by game_ID, team_ID and minutes played by player
box_scores_sorted = box_scores.sort_values(['GAME_ID','TEAM_ID','MIN'])
box_scores_sorted = box_scores_sorted.drop(['NICKNAME'],axis=1)

In [3]:
#this cell transforms the players registerred with 2 positions to just their first position
anthropometrics = pd.read_csv('NBA_Anthropometric.csv')
anthropometrics['position'] = anthropometrics['position'].str.split('/').str[0]

Function to find the potential matchups for a given player
psuedocode:
box_scores and anthropometrics are the pandas dataframes.
-given a player in box scores 
-find their position(guard or forward classification) in anthropometrics
-in the opposition lineup find players with same position classification.
-since the players are sorted by minutes played for the respective game and team in box_scores the opponents of interest for a player are ranked as follows: the player whose rank in minutes played is equal to player of interest is ranked first opponent of interest, ranked second would be the one with one rank higher minutes and third ranked would be the one with one rank lower minutes played.
-Incase the player is the highest ranked by minutes consider the two players with ranks below equivalent opponent to be 2 and 3 in matchup ranks.
-and if lowest minutes played consider 2nd last and 3rd last for opponents

In [4]:
player_position = anthropometrics.loc[anthropometrics['player_name'] == 'Jonas Valanciunas', 'position'].values
anthropometrics[anthropometrics['player_name'] == 'Jonas Valanciunas']

Unnamed: 0,player_name,position,height,height_with_shoes,weight,wingspan,draft_year


In [5]:
def Opponents_of_interest(playerName, gameID, box_scores, anthropometrics):
    # Find the player's position in the anthropometrics DataFrame
    player_position = anthropometrics.loc[anthropometrics['player_name'] == playerName,
                                          'position'].values[0]

    # Find the game and team for the given player in the box_scores DataFrame
    team = box_scores[(box_scores['PLAYER_NAME'] == playerName) & (box_scores['GAME_ID'] == gameID)
                     ]['TEAM_ABBREVIATION'].values[0]
    #the minutes a player played in the given game
    playerMinutes = box_scores[(box_scores['PLAYER_NAME'] == playerName) & (box_scores['GAME_ID'] == gameID)
                     ]['MIN'].values[0]

    # Get the opposition lineup with the same position classification
    """If they have the same gameID in box_scores dataset and they are on oppo and their position 
    is in the list associated with the player of interest's in the PositionMatchup dictionary then 
    they qualify to be in the players matchups then they are ranked by minutes played"""
    
    #dictionary with the NBA positions and their corresponding likely matchups ranked in likelihood
    PositionMatchup = {'PG': ['PG','SG','SF'], 'SG':['SG','SF','PG'], 'SF':['SF','PF','SG'], 
                   'PF':['PF','C','SF'], 'C':['C','PF','SF']}
    #df of all the players of the opponent team
    oppositionMatchups = box_scores[(box_scores['GAME_ID'] == gameID) & (
        box_scores['TEAM_ABBREVIATION'] != team)]
    #the absolute difference between minutes played by players in the opponent team and the player of interest sorted from
    #least minute difference to most
    oppositionMatchups['minutesDifference'] = abs(oppositionMatchups['MIN'] - playerMinutes)
    oppositionMatchups.sort_values(by='minutesDifference')
    #list of opposition players who meet the position matchup criteria ranked from least minutes difference to most with
    #player of interest
    opps = []
    for opp in oppositionMatchups['PLAYER_NAME']:
        player_position_data = anthropometrics.loc[anthropometrics['player_name'
                                                                  ] == opp, 'position'].values
        if player_position_data.size > 0 and player_position_data[0] in PositionMatchup[
            player_position]:
            opps.append((opp))
    return opps

In [20]:
opps = Opponents_of_interest('Anthony Davis', 22301230, box_scores, anthropometrics)
oppsWeighted = [[string, len(opps) - i] for i, string in enumerate(opps)]
print(opps)

['Zion Williamson', 'Herbert Jones', 'Naji Marshall', 'Trey Murphy III', 'Cody Zeller', 'Jeremiah Robinson-Earl']


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  oppositionMatchups['minutesDifference'] = abs(oppositionMatchups['MIN'] - playerMinutes)


In [22]:
#function to retrieve opposition players anthropometrics and stats of interest
print(anthropometrics[anthropometrics['player_name'].isin(opps)].drop(['position','height_with_shoes','draft_year'],axis=1))

                 player_name  height  weight  wingspan
624          Trey Murphy III  201.93   93.44    213.36
633            Herbert Jones  198.12   93.62    214.00
1170             Cody Zeller  210.18  104.33    210.18
1229           Naji Marshall  197.48  105.91    215.26
1383  Jeremiah Robinson-Earl  202.56  109.95    207.64
1467         Zion Williamson  198.12  128.80    209.55
