# ESPN Quantitative analysis of NBA player performance

### Group members:
- Jay Kalathur
- Manoch Boonnatakul
- Kaelynn Lackey
- Aaron Huang

## Project introduction & Why it's interesting
Our project aims to analyze NBA players statistics from **2019 to 2024**. By analyzing the in-game performance metrics such as Field Goal percentage, Free Throw percentage and Minutes of play, as well as, external variables such as difference in demographic, team and position, our objective is to access in-depth analysis to identify top performing players in the sports as well as finding the best player of each year and predicting the winning percentage of each team. 

From analyzing historical data and current data, we can establish a reliable benchmark to assess whether a player is under-performing or over-performing holding a considerable factor that can help our prediction on future winning teams.

Furthermore, with the rise in sports betting popularity from online platforms and recent legality of the 200 billion dollar industry in the United States (2018), along with the evolving landscape in sport betting with variation of betting types such as in-game wagering, straight bets, and parlays, this project will potentially find interesting data analysis from most accessible online data source, ESPN, that can be used to enrich sport betting performance portfolios.

## Reference papers:
1. [Sports betting around the world: A systematic review (2022)](https://akjournals.com/view/journals/2006/11/3/article-p689.xml)
2. [Impact of Sports Gambling Legality on U.S. States’ Real GDP per Capita (2023)](https://ideaexchange.uakron.edu/cgi/viewcontent.cgi?article=3236&context=honors_research_projects) 
3. [Exploring Game Performance in the National Basketball Association Using Player Tracking Data (2015)](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4501835/) 

## Data Source and Introduction

### Source:
https://www.espn.com/nba/seasonleaders/_/league/nba 

### Data Introduction:
We sourced our data from ESPN’s NBA season leader ratings across various statistical categories. Our data provides basic information for each of the season leaders such as their name, ranking (RK), team name (Team), amount of games played (GP) and the average number of minutes they spent on the court per game (MGP). The data features metrics for each player’s performance over the course of the season such as field goal percentage (FG), free throw percentage (FT) and three-pointers made (3PM). The data also features statistics for how each player performed on average per game such as the how many rebounds they usually make per game (RPG), assists per game (APG), steals per game (STPG), blocks per game (BLKPG) and turnovers per game (TOPG). We also include each player’s total points scored (PTS) as well as their final ESPN rating for the season. We analyzed each of the ten features of the dataset as well as five different seasons, giving us a total eight hundred observations.

## Group member duties:

- **Jay Kalathur**: Jay’s role is the *Project coordinator and Data Analyst*. Jay will be responsible for organizing meetings, keeping track of project timelines and facilitating effective communication amongst the group. Jay’s also responsible for producing high quality data analysis and ensuring high reliability, as well as, accuracy in our findings

- **Manoch Boonnatakul**: Manoch’s role is the *Data Analyst and Testing Analyst* responsible for analyzing data and recording significant findings from data processing through python manipulation. As well as, responsible for quality assurance through algorithm testing and frequent data accuracy checking. 

- **Kaelynn Lackey**: Kaelynn’s role is *Writer and Report Specialist*. Kaelynn will be responsible for writer reports, recording project analysis, and creating data visualization throughout the entire project timeline. Kaelynn’s goal is to ensure appeal data visualization and in-depth content recording of step-by-step procedures of the project
- **Aaron Huang**: Aaron’s role will be the *Data Collector and Algorithm Engineer*, responsible for extracting data from the web source. Aaron’s responsibilities are cleaning, formatting the data with the goal to provide well-prepared data to optimize efficiency for data analysis through python manipulation.


In [192]:
import pandas as pd

def retrieve_year_team_data(year="", team="league"):
    """
        Arguments:
            year (int): Default: ""
            team (str): Team data is desired for. Use the 3 letter abbreviation each team in the NBA uses (in all lowercase). 
    
        Result: 
            result (df): 
    """
    #Add year and team input website,
    url = f"https://www.espn.com/nba/seasonleaders/_/team/{team}/year/{str(year)}"
     
    #Extract from the list
    result = pd.read_html(url)[0]
    
    #Drop the first row
    result.drop(0, axis = 0, inplace = True )
    
    #Make the first row the column names
    result.columns = result.iloc[0]
    
    return result

#Check    
data = retrieve_year_team_data(2019, 'league')

In [193]:
#Check
#print(data.columns)
#data.head()

Index(['RK', 'PLAYER', 'TEAM', 'GP', 'MPG', 'FG%', 'FT%', '3PM', 'RPG', 'APG',
       'STPG', 'BLKPG', 'TOPG', 'PTS', 'ESPN'],
      dtype='object', name=1)


1,RK,PLAYER,TEAM,GP,MPG,FG%,FT%,3PM,RPG,APG,STPG,BLKPG,TOPG,PTS,ESPN
1,RK,PLAYER,TEAM,GP,MPG,FG%,FT%,3PM,RPG,APG,STPG,BLKPG,TOPG,PTS,ESPN
2,1,"James Harden, SG",LAC,78,36.8,.442,.879,4.8,6.6,7.5,2.0,0.7,5.0,36.1,56.5
3,2,"Giannis Antetokounmpo, PF",MIL,72,32.8,.578,.729,0.7,12.5,5.9,1.3,1.5,3.7,27.7,53.4
4,3,"Anthony Davis, PF",LAL,56,33.0,.517,.794,0.9,12.0,3.9,1.6,2.4,2.0,25.9,50.0
5,4,"LeBron James, SF",LAL,55,35.2,.510,.665,2.0,8.5,8.3,1.3,0.6,3.6,27.4,49.6


In [200]:
def data_clean(dataframe):
    """
        Arguments:
            dataframe (df): Unclean dataframe
        
        Return: 
            dataframe (df): Clean dataframe with accurate ranking and new index
    """
    #Choose only the player data
    dataframe = dataframe[dataframe['RK'] != 'RK']
    
    #Reset the index
    dataframe.reset_index()
    
    #Iterate over every item in 'RK' column
    for i in range(len(dataframe['RK'])):
        
        #Creating new ranking values corresponding to the order to eliminate NaN 
        dataframe['RK'].iloc[i] = i + 1
    
    #Set the index to ranking 
    dataframe = dataframe.set_index('RK')
    
    return dataframe


In [201]:
#Check
clean_data = data_clean(data)
clean_data

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dataframe['RK'].iloc[i] = i + 1


1,PLAYER,TEAM,GP,MPG,FG%,FT%,3PM,RPG,APG,STPG,BLKPG,TOPG,PTS,ESPN
RK,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
1,"James Harden, SG",LAC,78,36.8,0.442,0.879,4.8,6.6,7.5,2.0,0.7,5.0,36.1,56.5
2,"Giannis Antetokounmpo, PF",MIL,72,32.8,0.578,0.729,0.7,12.5,5.9,1.3,1.5,3.7,27.7,53.4
3,"Anthony Davis, PF",LAL,56,33.0,0.517,0.794,0.9,12.0,3.9,1.6,2.4,2.0,25.9,50.0
4,"LeBron James, SF",LAL,55,35.2,0.51,0.665,2.0,8.5,8.3,1.3,0.6,3.6,27.4,49.6
5,"Joel Embiid, C",PHI,64,33.7,0.484,0.804,1.2,13.6,3.7,0.7,1.9,3.5,27.5,49.6
6,"Russell Westbrook, PG",LAC,73,36.0,0.428,0.656,1.6,11.1,10.7,1.9,0.5,4.5,22.9,48.0
7,"Karl-Anthony Towns, C",MIN,77,33.1,0.518,0.836,1.8,12.4,3.4,0.9,1.6,3.1,24.4,46.1
8,"Paul George, F",LAC,77,36.9,0.438,0.839,3.8,8.2,4.1,2.2,0.4,2.7,28.0,45.2
9,"Kevin Durant, PF",PHO,78,34.6,0.521,0.885,1.8,6.4,5.9,0.7,1.1,2.9,26.0,45.0
10,"Stephen Curry, PG",GSW,69,33.8,0.472,0.916,5.1,5.3,5.2,1.3,0.4,2.8,27.3,44.0


In [202]:
def pos_df(dataframe):
    """
        Arguments:
            dataframe (df): Unclean dataframe
        
        Return: 
            dataframe (df): Clean dataframe with seperated player name and position column
    """
    
    #Duplicate the PLAYER column and naming it POSITION
    dataframe['POSITION'] = dataframe['PLAYER']
    
    #Iterate over every player
    for i, player in enumerate(dataframe['PLAYER']):
       
        #Split the player name to position
        player_pos = player.strip().split(', ')
           
        #Assign player name to player vairable 
        player = player_pos[0]
        
        #Assign player position to pos vairable 
        pos = player_pos[1]
        
        #Update the values of PLAYER and POSITION
        dataframe['PLAYER'].iloc[i] = player
        dataframe['POSITION'].iloc[i] = pos
        
    return dataframe


In [203]:
#Check
final_data = pos_df(clean_data)
final_data

1,PLAYER,TEAM,GP,MPG,FG%,FT%,3PM,RPG,APG,STPG,BLKPG,TOPG,PTS,ESPN,POSITION
RK,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
1,James Harden,LAC,78,36.8,0.442,0.879,4.8,6.6,7.5,2.0,0.7,5.0,36.1,56.5,SG
2,Giannis Antetokounmpo,MIL,72,32.8,0.578,0.729,0.7,12.5,5.9,1.3,1.5,3.7,27.7,53.4,PF
3,Anthony Davis,LAL,56,33.0,0.517,0.794,0.9,12.0,3.9,1.6,2.4,2.0,25.9,50.0,PF
4,LeBron James,LAL,55,35.2,0.51,0.665,2.0,8.5,8.3,1.3,0.6,3.6,27.4,49.6,SF
5,Joel Embiid,PHI,64,33.7,0.484,0.804,1.2,13.6,3.7,0.7,1.9,3.5,27.5,49.6,C
6,Russell Westbrook,LAC,73,36.0,0.428,0.656,1.6,11.1,10.7,1.9,0.5,4.5,22.9,48.0,PG
7,Karl-Anthony Towns,MIN,77,33.1,0.518,0.836,1.8,12.4,3.4,0.9,1.6,3.1,24.4,46.1,C
8,Paul George,LAC,77,36.9,0.438,0.839,3.8,8.2,4.1,2.2,0.4,2.7,28.0,45.2,F
9,Kevin Durant,PHO,78,34.6,0.521,0.885,1.8,6.4,5.9,0.7,1.1,2.9,26.0,45.0,PF
10,Stephen Curry,GSW,69,33.8,0.472,0.916,5.1,5.3,5.2,1.3,0.4,2.8,27.3,44.0,PG
