# NBA-Data-2010-2024 🏀

## Schema for Box_scores

### Dimensions
- **season_year**: The year of the basketball season.
- **game_date**: The date of the game.
- **gameId**: Unique identifier for the game.
- **teamId**: Unique identifier for the team.
- **teamCity**: The city where the team is based.
- **teamName**: The name of the team.
- **teamTricode**: A three-letter code representing the team.
- **teamSlug**: A unique identifier for the team.
- **personId**: Unique identifier for the person (player).
- **personName**: The name of the person (player).
- **position**: The position of the player.
- **comment**: Any additional comments or notes.
- **jerseyNum**: The jersey number of the player.

### Metrics
- **minutes**: The number of minutes played by the player.
- **fieldGoalsMade**: The number of field goals made by the player.
- **fieldGoalsAttempted**: The number of field goals attempted by the player.
- **fieldGoalsPercentage**: The shooting percentage for field goals.
- **threePointersMade**: The number of three-pointers made by the player.
- **threePointersAttempted**: The number of three-pointers attempted by the player.
- **threePointersPercentage**: The shooting percentage for three-pointers.
- **freeThrowsMade**: The number of free throws made by the player.
- **freeThrowsAttempted**: The number of free throws attempted by the player.
- **freeThrowsPercentage**: The shooting percentage for free throws.
- **reboundsOffensive**: The number of offensive rebounds by the player.
- **reboundsDefensive**: The number of defensive rebounds by the player.
- **reboundsTotal**: The total number of rebounds by the player.
- **assists**: The number of assists by the player.
- **steals**: The number of steals by the player.
- **blocks**: The number of blocks by the player.
- **turnovers**: The number of turnovers by the player.
- **foulsPersonal**: The number of personal fouls committed by the player.
- **points**: The total number of points scored by the player.
- **plusMinusPoints**: The plus-minus statistic for the player, indicating the team's score differential when the player is on the court.

## Schema of game totals 

### Dimensions
- **SEASON_YEAR**: The year of the NBA season.
- **TEAM_ID**: Unique identifier for the team.
- **TEAM_ABBREVIATION**: Abbreviated name of the team.
- **TEAM_NAME**: Full name of the team.
- **GAME_ID**: Unique identifier for the game.
- **GAME_DATE**: Date of the game.
- **MATCHUP**: Matchup details indicating the teams involved.
- **WL**: Outcome of the game (Win or Loss).

### Metrics
- **MIN**: Total minutes played in the game.
- **FGM**: Field goals made.
- **FGA**: Field goals attempted.
- **FG_PCT**: Field goal percentage.
- **FG3M**: Three-point field goals made.
- **FG3A**: Three-point field goals attempted.
- **FG3_PCT**: Three-point field goal percentage.
- **FTM**: Free throws made.
- **FTA**: Free throws attempted.
- **FT_PCT**: Free throw percentage.
- **OREB**: Offensive rebounds.
- **DREB**: Defensive rebounds.
- **REB**: Total rebounds.
- **AST**: Assists.
- **TOV**: Turnovers.
- **STL**: Steals.
- **BLK**: Blocks.
- **BLKA**: Opponent's blocks.
- **PF**: Personal fouls.
- **PFD**: Personal fouls drawn.
- **PTS**: Total points scored.
- **PLUS_MINUS**: Plus-minus statistic.
- **GP_RANK**: Rank based on games played.
- **W_RANK**: Rank based on wins.
- **L_RANK**: Rank based on losses.
- **W_PCT_RANK**: Rank based on win percentage.
- **MIN_RANK**: Rank based on minutes played.
- **Ranks for various statistical categories like field goals made, rebounds, assists, etc., indicated by suffix _RANK.**
- **AVAILABLE_FLAG**: Indicates if the data for this row is available.

## Authors

- [@NocturneBear](https://github.com/NocturneBear)

## License

[MIT](https://github.com/NocturneBear/NBA-Data-2010-2024/blob/main/LICENSE)

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import re

In [2]:
playoff_games_total=pd.read_csv("./datasets/NBA_DATA_2010_2024/play_off_totals_2010_2024.csv",delimiter=',',header=0)
regular_games_total=pd.read_csv("./datasets/NBA_DATA_2010_2024/regular_season_totals_2010_2024.csv",delimiter=',',header=0)


## Organize Data
The main objective now is to connect the data, we want to get all the player form a team. We want to be able to search for a team tag and season, and get the list of data associated with the players of each team.


In [None]:
def getWinPercentageByTeamBySeason(playoffs,team_tag,season):
    """
    Function to get the win percentage of a team in a season
    :param team_tag: team tag
    :param season: season
    :return: win percentage of the team in the season
    """
    teams=playoffs.filter(items=['SEASON_YEAR','TEAM_ABBREVIATION','MATCHUP','TEAM_ID','WL','FGA','FGM'])
    
    teams=teams[teams['TEAM_ABBREVIATION']==team_tag]
    teams=teams[teams['SEASON_YEAR']==season]
    if len(teams)==0:
        return 0
    totalWins=teams[teams['WL']=='W'].shape[0]
    totalGames=teams.shape[0]
    if totalGames==0:
        return 0
    winPercentage=totalWins/totalGames
    return teams,winPercentage

In [None]:
def getMatchupByTeamBySeason(playoffs,team_tag_home,team_tag_visitor,season=False):
    """
    Function to get the matchup of a team in a season
    :param team_tag: team tag
    :optional param season: season to filter the data by season 
    :return: matchup of the team in the season
    """
    teams=playoffs.filter(items=['SEASON_YEAR','TEAM_ABBREVIATION','MATCHUP','TEAM_ID','WL','FGA','FGM'])
    if season is not False:
        teams=teams[teams['SEASON_YEAR']==season]
    mathcup_tag=team_tag_home+" vs. "+team_tag_visitor 
    mathcup_tag_visitor=team_tag_visitor+" vs. "+team_tag_home
    teams['MATCHUP_STANDARD'] = teams['MATCHUP'].str.replace("@", "vs.")
    teams=pd.concat([teams[teams['MATCHUP_STANDARD'] ==  mathcup_tag],teams[teams['MATCHUP_STANDARD']==  mathcup_tag_visitor]],ignore_index=True)
    if len(teams)==0:
        return 0
    return teams.filter(items=['SEASON_YEAR','TEAM_ABBREVIATION','MATCHUP_STANDARD','TEAM_ID','WL','FGA','FGM'])

In [37]:
tp,w=getWinPercentageByTeamBySeason(regular_games_total,'BOS','2018-19')
match=getMatchupByTeamBySeason(regular_games_total,'BOS','GSW')
print("Matchup:")
print(match)

Matchup:
   SEASON_YEAR TEAM_ABBREVIATION MATCHUP_STANDARD     TEAM_ID WL  FGA  FGM
0      2023-24               BOS      BOS vs. GSW  1610612738  W   96   53
1      2023-24               BOS      BOS vs. GSW  1610612738  L  114   47
2      2015-16               BOS      BOS vs. GSW  1610612738  L  114   49
3      2018-19               BOS      BOS vs. GSW  1610612738  L   99   41
4      2017-18               BOS      BOS vs. GSW  1610612738  L   85   37
5      2019-20               BOS      BOS vs. GSW  1610612738  W   88   42
6      2018-19               BOS      BOS vs. GSW  1610612738  W   96   49
7      2021-22               BOS      BOS vs. GSW  1610612738  W   81   38
8      2021-22               BOS      BOS vs. GSW  1610612738  L   82   36
9      2014-15               BOS      BOS vs. GSW  1610612738  L   95   41
10     2020-21               BOS      BOS vs. GSW  1610612738  W   94   44
11     2016-17               BOS      BOS vs. GSW  1610612738  L   83   31
12     2019-20  