# Introduction

In this Jupyter Book we will explore some predictive analytics of NBA basketball. Over recent years, statistical analytics have been used increasingly in commentary across the league to highlight and quantify the feats these athletes achieve game to game and their careers overall. The ability to predict game outcomes has recently become a major topic with the expanding legalization and accessibility to sports betting in the United States. Betters have gone to the lengths of creating statistical models such as the one created in this project in hopes of optimizing and perfecting their "odds".

The goal of this project is to generate a fundamental predictive model that accurately predicts game wins based off certain features. The hope is this model is a foundational start to creating more robust and complex models that can more accurately predict real life game situations with all variables involved. 


In [3]:
import pandas as pd
games = pd.read_csv('data/games.csv')
games

Unnamed: 0.1,Unnamed: 0,SEASON_ID,TEAM_ID,TEAM_ABBREVIATION,TEAM_NAME,GAME_ID,GAME_DATE,MATCHUP,WL,MIN,...,FT_PCT,OREB,DREB,REB,AST,STL,BLK,TOV,PF,PLUS_MINUS
0,0,42022,1610612752,NYK,New York Knicks,42200204,2023-05-08,NYK @ MIA,L,240,...,0.750,8,27,35,21,5,4,16,23,-8.0
1,1,42022,1610612744,GSW,Golden State Warriors,42200234,2023-05-08,GSW @ LAL,L,239,...,0.750,9,31,40,29,9,2,16,22,-3.0
2,2,42022,1610612748,MIA,Miami Heat,42200204,2023-05-08,MIA vs. NYK,W,239,...,0.727,13,31,44,28,7,4,12,21,8.0
3,3,42022,1610612747,LAL,Los Angeles Lakers,42200234,2023-05-08,LAL vs. GSW,W,240,...,1.000,8,34,42,21,8,1,14,15,3.0
4,4,42022,1610612738,BOS,Boston Celtics,42200214,2023-05-07,BOS @ PHI,L,264,...,0.727,13,31,44,25,3,12,10,21,-1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10323,10323,22019,1610612752,NYK,New York Knicks,21900010,2019-10-23,NYK @ SAS,L,239,...,0.722,5,34,39,24,16,1,14,32,-9.0
10324,10324,22019,1610612761,TOR,Toronto Raptors,21900001,2019-10-22,TOR vs. NOP,W,265,...,0.842,16,41,57,23,7,3,16,24,8.0
10325,10325,22019,1610612747,LAL,Los Angeles Lakers,21900002,2019-10-22,LAL @ LAC,L,238,...,0.714,9,32,41,20,4,7,14,24,-10.0
10326,10326,22019,1610612740,NOP,New Orleans Pelicans,21900001,2019-10-22,NOP @ TOR,L,265,...,0.850,16,37,53,30,4,9,19,34,-8.0


## Data Description

Now that our data is sourced we can see that for we have 28 different feature columns of data across 10328 different games. For the sake of this project, we want to use no earlier game data than the 2019-2020 NBA season up until the most recent playoff games on May 8th, 2023. This will help us maintain greater accuracy because it lowers the margin in differences of team composition and coaching staff. We used League ID '00' to indicate that we are interested in only NBA games not G-League. The features and their descriptions are as follows:
- SEASON_ID: A league generated ID number for the season the game was played in.
- TEAM_ID: A league generated ID number for the team that played that game.
- TEAM_ABBREVIATION: An abbreviation for team names.
- TEAM_NAME: The full name of the NBA team.
- GAME_ID: A league generated ID number for the specific game. 
- GAME_DATE: Date of the game play, format(YYYY-MM-DD).
- MATCHUP: String that notates what team played and where.
- WL: Win or Loss.
- MIN: Number of minutes played.
- PTS: Number of points scored by the team.
- FGM: Number of field goals made.
- FGA: Number of field goals attempted.
- FG_PCT: Percentage of field goals made.
- FG3M: Number of 3 point field goals made.
- FG3A: Number of 3 point field goals attempted.
- FG3_PCT: Percentage of 3 point field goals attempted.
- FTM: Free throws Made. 
- FTA: Free throws attempted.
- FT_PCT: Percentage of free throws made.
- OREB: Number of offensive rebounds.
- DREB: Number of defensive rebounds.
- REB: Total number of rebounds. 
- AST: Total number of assists.
- STL: Total number of steals.
- BLK: Total number of blocks.
- TOV: Total number of turnovers.
- PF: Total number of personal fouls.
- PLUS_MINUS: Value difference between points scored by team and the opponent, positive implies a win and negative a loss. 