# Predicting NBA Games with Machine Learning
## Author: Kishore Annambhotla

### Introduction and Methods

As a math-minded person, I've always been interested in the use of numbers and analytics in sports. Being an avid follower of the NBA, I decided to pursue a project I've thought about for a while: **a machine learning model that could accurately predict the outcomes of NBA regular season games.** This project was my first real foray into data science, machine learning, and sports analytics, and was certainly one of the most instructive things I've ever worked on. The end result of this project is a high-accuracy ML model capable of intelligently predicting NBA games, and this notebook will step through the code and methods I used to create the model.


### Setting Up

In order to predict NBA game outcomes, we will need to analyze many past games and see what statistics help or hurt a team's chances of winning. We will also need a robust predictive model capable of making a decision based on all of this game data. For this project, I used `nba_api` to retrieve up-to-date NBA game data, `pandas` and `numpy` for data manipulation, and `scikit-learn` to build an ML model. All of the project's necessary imports will be included in the code block below.

In [53]:
from nba_api.stats.static import teams
from nba_api.stats.endpoints import leaguegamefinder
import pandas as pd

pd.options.mode.chained_assignment = (
    None  # prevents warnings on reformatted columns later
)
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import HistGradientBoostingClassifier
from sklearn.metrics import accuracy_score, classification_report
from sklearn.preprocessing import LabelEncoder

### Importing Data

We will base our model off of games played by all NBA teams. As shown, this game data will include categorical information about the game, teams, and season, quantitative data regarding one team's offensive and defensive statistics, and of course the winner of the game.

First, we will use the `nba_api` to get games played by each NBA team. Note that we set `season_type_nullable` to only include regular season games, since those are the types of games we aim to predict with this model. This could easily be changed to train a model better for predicting playoff or preseason games.

In [2]:
nba_teams = teams.get_teams()
team_abbr_to_id = {team["abbreviation"]: team["id"] for team in nba_teams}
all_games = pd.DataFrame()

for team in nba_teams:
    team_id = team["id"]
    gamefinder = leaguegamefinder.LeagueGameFinder(
        team_id_nullable=team_id, season_type_nullable="Regular Season"
    )
    games = gamefinder.get_data_frames()[0]
    all_games = pd.concat([all_games, games], ignore_index=True)

In [3]:
print(all_games.columns)

Index(['SEASON_ID', 'TEAM_ID', 'TEAM_ABBREVIATION', 'TEAM_NAME', 'GAME_ID',
       'GAME_DATE', 'MATCHUP', 'WL', 'MIN', 'PTS', 'FGM', 'FGA', 'FG_PCT',
       'FG3M', 'FG3A', 'FG3_PCT', 'FTM', 'FTA', 'FT_PCT', 'OREB', 'DREB',
       'REB', 'AST', 'STL', 'BLK', 'TOV', 'PF', 'PLUS_MINUS'],
      dtype='object')


In [4]:
all_games.sample(n=5)

Unnamed: 0,SEASON_ID,TEAM_ID,TEAM_ABBREVIATION,TEAM_NAME,GAME_ID,GAME_DATE,MATCHUP,WL,MIN,PTS,FGM,FGA,FG_PCT,FG3M,FG3A,FG3_PCT,FTM,FTA,FT_PCT,OREB,DREB,REB,AST,STL,BLK,TOV,PF,PLUS_MINUS
72615,22021,1610612759,SAS,San Antonio Spurs,22100487,2021-12-23,SAS @ LAL,W,240,138,46,95,0.484,18,39.0,0.462,28,33,0.848,12.0,38.0,50.0,33,7.0,5,6,21,28.0
32539,21984,1610612746,LAC,Los Angeles Clippers,28400611,1985-02-17,LAC vs. ATL,L,240,90,41,83,0.494,0,2.0,0.0,8,11,0.727,17.0,24.0,41.0,23,8.0,8,17,15,
92650,22004,1610612765,DET,Detroit Pistons,20400001,2004-11-02,DET vs. HOU,W,240,87,32,72,0.444,6,18.0,0.333,17,25,0.68,13.0,29.0,42.0,20,9.0,5,12,22,8.0
49560,22016,1610612752,NYK,New York Knicks,21600724,2017-01-31,NYK @ WAS,L,241,101,34,93,0.366,5,24.0,0.208,28,35,0.8,22.0,29.0,51.0,18,8.0,2,12,17,-16.0
87494,21995,1610612763,VAN,Vancouver Grizzlies,29500219,1995-12-03,VAN vs. MIL,L,240,95,37,79,0.468,4,11.0,0.364,17,26,0.654,12.0,17.0,29.0,27,9.0,4,11,25,


While this data is good, it may be too much for our model. Over the last forty seasons, NBA basketball has greatly changed. Many rules have been added and removed, and aspects of "perfect" offense and defense have completely shifted. As a result, older data is increasingly unreliable for predicting current-day NBA games. For this model, I chose to only use data from the **2019-2020 season onward.** In my testing, this period resulted in the highest accuracy.

In [5]:
# Get games where season is in modern_seasons
games_modern = all_games[
    (all_games.SEASON_ID.str[-4:] == "2019")
    | (all_games.SEASON_ID.str[-4:] == "2020")
    | (all_games.SEASON_ID.str[-4:] == "2021")
    | (all_games.SEASON_ID.str[-4:] == "2022")
    | (all_games.SEASON_ID.str[-4:] == "2023")
    | (all_games.SEASON_ID.str[-4:] == "2024")
]

### Cleaning Data

We have plenty of good data, but it's still unrefined and hard to work with. The next step is to clean up our columns and reformat the data. This will make training, testing, and evaluating our model much easier in the future.

First, we will reformat some of our columns to make working with the data easier.

In [6]:
# Convert GAME_DATE to pandas datetime
# Also order by date earliest to latest to make working with stats easier later
games_modern["GAME_DATE"] = pd.to_datetime(games_modern["GAME_DATE"])
games_modern.sort_values(by="GAME_DATE", inplace=True)


# Add binary "WIN" column, remove categorical WL column
games_modern["WIN"] = games_modern["WL"].apply(lambda x: 1 if x == "W" else 0)


# Convert int stat columns to float type for accurate data analysis
games_modern["MIN"] = games_modern["MIN"].astype(float)  # minutes
games_modern["PTS"] = games_modern["PTS"].astype(float)  # points
games_modern["FGM"] = games_modern["FGM"].astype(float)  # field goals made
games_modern["FGA"] = games_modern["FGA"].astype(float)  # field goals attempted
games_modern["FG3M"] = games_modern["FG3M"].astype(float)  # 3s made
games_modern["FG3A"] = games_modern["FG3A"].astype(float)  # 3s attempted
games_modern["FTM"] = games_modern["FTM"].astype(float)  # free throws made
games_modern["FTA"] = games_modern["FTA"].astype(float)  # free throws attempted
games_modern["REB"] = games_modern["REB"].astype(float)  # rebounds
games_modern["OREB"] = games_modern["OREB"].astype(float)  # offensive rebounds
games_modern["DREB"] = games_modern["DREB"].astype(float)  # defensive rebounds
games_modern["AST"] = games_modern["AST"].astype(float)  # assists
games_modern["BLK"] = games_modern["BLK"].astype(float)  # blocks
games_modern["TOV"] = games_modern["TOV"].astype(float)  # turnovers
games_modern["PF"] = games_modern["PF"].astype(float)  # personal fouls


# Add opponent id as column
def get_opponent_id(matchup, team_abbr_to_id, team_id):
    if "@" in matchup:
        opponent_abbr = matchup.split(" @ ")[-1]
    else:
        opponent_abbr = matchup.split(" vs. ")[-1]
    return team_abbr_to_id.get(opponent_abbr, team_id)


games_modern["OPP_TEAM_ID"] = games_modern.apply(
    lambda row: get_opponent_id(row["MATCHUP"], team_abbr_to_id, row["TEAM_ID"]), axis=1
)

Our data is now workable and could already be used to create a decent model. However, a few new statistics will greatly improve our model's accuracy. First, we can indicate to our model if a team had home game advantage, which can have an immense impact on the away team. Additionally, we can indicate to the model the outcome of the team's last game, as a prior win or loss could affect the team's energy going into a future game. We can then calculate the important **"Four Factors"** in basketball, created by basketball statistician Dean Oliver. These statistics track a team's shooting, turnovers, rebounding, and free throws, and are arguably the best metrics for predicting team success in basketball. Finally, we can calculate a team's true shooting percentage in each game, another intelligent measure of shooting efficiency.

In [7]:
# Define 'HGA' (Home Game Advantage)
games_modern["HGA"] = games_modern["MATCHUP"].apply(lambda x: 0 if "@" in x else 1)

# Define 'LAST_GAME_OUTCOME'
games_modern["LAST_GAME_OUTCOME"] = (
    games_modern.groupby("TEAM_ID")["WIN"].shift(1).fillna(0)
)

# Define 'EFG%' (Effective Field Goal Percentage)
games_modern["EFG%"] = (
    games_modern["FGM"] + (0.5 * games_modern["FG3M"])
) / games_modern["FGA"]

# Define 'TOV%' (Turnover Percentage)
games_modern["TOV%"] = games_modern["TOV"] / (
    games_modern["FGA"] + 0.44 * games_modern["FTA"] + games_modern["TOV"]
)

# Define 'FTR' (Free Throw Attempt Rate)
games_modern["FTR"] = games_modern["FTA"] / games_modern["FGA"]

# Define 'TS%' (True Shooting Percentage)
games_modern["TS%"] = games_modern["PTS"] / (
    2 * (games_modern["FGA"] + (0.44 * games_modern["FTA"]))
)

Finally, we want to use a `LabelEncoder` to transform a few of our categorical data columns into numerical values that can be understood by our model.

In [8]:
# LabelEncode data
le = LabelEncoder()
games_modern["TEAM_ID"] = le.fit_transform(games_modern["TEAM_ID"])
games_modern["OPP_TEAM_ID"] = le.fit_transform(games_modern["OPP_TEAM_ID"])
games_modern["GAME_ID"] = le.fit_transform(games_modern["GAME_ID"])
games_modern.sample(5)

Unnamed: 0,SEASON_ID,TEAM_ID,TEAM_ABBREVIATION,TEAM_NAME,GAME_ID,GAME_DATE,MATCHUP,WL,MIN,PTS,FGM,FGA,FG_PCT,FG3M,FG3A,FG3_PCT,FTM,FTA,FT_PCT,OREB,DREB,REB,AST,STL,BLK,TOV,PF,PLUS_MINUS,WIN,OPP_TEAM_ID,HGA,LAST_GAME_OUTCOME,EFG%,TOV%,FTR,TS%
52635,22020,16,ORL,Orlando Magic,1972,2021-04-25,ORL vs. IND,L,239.0,112.0,40.0,101.0,0.396,10.0,32.0,0.313,22.0,28.0,0.786,15.0,36.0,51.0,22.0,9.0,3.0,6.0,14.0,-19.0,0,17,1,0.0,0.445545,0.050285,0.277228,0.494176
39565,22019,12,MIL,Milwaukee Bucks,530,2020-01-04,MIL vs. SAS,W,239.0,127.0,43.0,87.0,0.494,17.0,45.0,0.378,24.0,33.0,0.727,7.0,42.0,49.0,23.0,4.0,10.0,9.0,22.0,9.0,1,22,1,1.0,0.591954,0.081433,0.37931,0.625493
45698,22022,14,BKN,Brooklyn Nets,3501,2022-11-05,BKN @ CHA,W,239.0,98.0,33.0,84.0,0.393,12.0,36.0,0.333,20.0,22.0,0.909,9.0,32.0,41.0,23.0,8.0,10.0,11.0,21.0,4.0,1,29,0,1.0,0.464286,0.105082,0.261905,0.523057
36471,22020,11,MIA,Miami Heat,1533,2021-02-22,MIA @ OKC,W,239.0,108.0,36.0,81.0,0.444,15.0,40.0,0.375,21.0,25.0,0.84,10.0,35.0,45.0,30.0,7.0,6.0,13.0,20.0,14.0,1,23,0,1.0,0.537037,0.12381,0.308642,0.586957
79316,22023,24,TOR,Toronto Raptors,4793,2023-11-15,TOR vs. MIL,L,241.0,112.0,38.0,96.0,0.396,9.0,33.0,0.273,27.0,31.0,0.871,26.0,29.0,55.0,28.0,10.0,8.0,14.0,19.0,-16.0,0,12,1,1.0,0.442708,0.113232,0.322917,0.510762


We will also now define a dictionary keying each team to their encoded ID. This may not seem useful yet, but will be helpful for data manipulation later.

In [9]:
# Will be helpful to have dictionary keying teams to encoded id values
ENCODED_TEAM_IDS = {
    "Atlanta Hawks": 0,
    "Hawks": 0,
    "Boston Celtics": 1,
    "Celtics": 1,
    "Cleveland Cavaliers": 2,
    "Cavaliers": 2,
    "New Orleans Pelicans": 3,
    "Pelicans": 3,
    "Chicago Bulls": 4,
    "Bulls": 4,
    "Dallas Mavericks": 5,
    "Mavericks": 5,
    "Denver Nuggets": 6,
    "Nuggets": 6,
    "Golden State Warriors": 7,
    "Warriors": 7,
    "Houston Rockets": 8,
    "Rockets": 8,
    "LA Clippers": 9,
    "Clippers": 9,
    "Los Angeles Lakers": 10,
    "Lakers": 10,
    "Miami Heat": 11,
    "Heat": 11,
    "Milwaukee Bucks": 12,
    "Bucks": 12,
    "Minnesota Timberwolves": 13,
    "Timberwolves": 13,
    "Brooklyn Nets": 14,
    "Nets": 14,
    "New York Knicks": 15,
    "Knicks": 15,
    "Orlando Magic": 16,
    "Magic": 16,
    "Indiana Pacers": 17,
    "Pacers": 17,
    "Philadelphia 76ers": 18,
    "76ers": 18,
    "Phoenix Suns": 19,
    "Suns": 19,
    "Portland Trail Blazers": 20,
    "Trail Blazers": 20,
    "Sacramento Kings": 21,
    "Kings": 21,
    "San Antonio Spurs": 22,
    "Spurs": 22,
    "Oklahoma City Thunder": 23,
    "Thunder": 23,
    "Toronto Raptors": 24,
    "Raptors": 24,
    "Utah Jazz": 25,
    "Jazz": 25,
    "Memphis Grizzlies": 26,
    "Grizzlies": 26,
    "Washington Wizards": 27,
    "Wizards": 27,
    "Detroit Pistons": 28,
    "Pistons": 28,
    "Charlotte Hornets": 29,
    "Hornets": 29,
}

### Defining Features

In our model, `X` includes features of the dataset that will be used to predict `y`. Here, `y` is obviously `WIN`, since we are just predicting and classifying each game as a win (1) or loss (0). `X` includes a number of valuable statistics, and all the ones used will be included in the code segment.

We will also use `train_test_split` from `scikit-learn` to train and test our model with an 80-20 split, widely regarded as the recommended training split for most ML models.

In [10]:
features = [
    "TEAM_ID",
    "OPP_TEAM_ID",
    "PTS",
    "OREB",
    "DREB",
    "REB",
    "AST",
    "STL",
    "BLK",
    "TOV",
    "EFG%",
    "TOV%",
    "FTR",
    "TS%",
    "HGA",
    "LAST_GAME_OUTCOME",
]
X = games_modern[features]

y = games_modern[["WIN"]]


# Use sklearn train_test_split with 80% training, 20% testing
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

### Creating Initial Model

`scikit-learn` offers many options for ML classification models. For this problem, I found the best results using a `HistGradientBoostingClassifier`, which uses histogram-based techniques combined with gradient boosting, an excellent machine learning algorithm, to accurately predict data in datasets with null values, which we unfortunately have several of.

In [51]:
xg_model = HistGradientBoostingClassifier(learning_rate=0.1, random_state=42)
xg_model.fit(X_train, y_train)

  y = column_or_1d(y, warn=True)


In [52]:
y_pred = xg_model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))

Accuracy: 0.836350470673425
              precision    recall  f1-score   support

           0       0.83      0.85      0.84      1422
           1       0.84      0.82      0.83      1340

    accuracy                           0.84      2762
   macro avg       0.84      0.84      0.84      2762
weighted avg       0.84      0.84      0.84      2762



### Predicting Games

Our model is accurate, but there's an issue: We can only predict games that have already been played. That's  not very useful and, more importantly, isn't a test of how accurate our model is at predicting games that have **not yet been played.**. However, if we can use our model with **average** stats from teams this season, we can make individual predictions on games to be played in the future. To do this, we will need to know what games are going to be played. Once again, using `nba_api`, we can retrieve games played today with the `scoreboard` object.

In [17]:
# Note: This code comes from the nba_api repository via GitHub. 
# It is used to demonstrate the functionality of the scoreboard.
from datetime import datetime, timezone
from dateutil import parser
from nba_api.live.nba.endpoints import scoreboard

f = "{gameId}: {awayTeam} vs. {homeTeam} @ {gameTimeLTZ}"

board = scoreboard.ScoreBoard()
print("ScoreBoardDate: " + board.score_board_date)
games = board.games.get_dict()
game_teams = []
for game in games:
    gameTimeLTZ = (
        parser.parse(game["gameTimeUTC"])
        .replace(tzinfo=timezone.utc)
        .astimezone(tz=None)
    )
    print(
        f.format(
            gameId=game["gameId"],
            awayTeam=game["awayTeam"]["teamName"],
            homeTeam=game["homeTeam"]["teamName"],
            gameTimeLTZ=gameTimeLTZ,
        )
    )
    game_teams.append([game["homeTeam"]["teamName"], game["awayTeam"]["teamName"]])

ScoreBoardDate: 2025-01-21
0022400606: Knicks vs. Nets @ 2025-01-21 19:30:00-05:00
0022400607: Trail Blazers vs. Heat @ 2025-01-21 19:30:00-05:00
0022400608: Magic vs. Raptors @ 2025-01-21 19:30:00-05:00
0022400609: 76ers vs. Nuggets @ 2025-01-21 22:00:00-05:00
0022400610: Wizards vs. Lakers @ 2025-01-21 22:30:00-05:00


For our data, instead of using games that have already been played, we will get per game and average stats for each team this season, which are most relevant to the games that will be played.

In [18]:
from nba_api.stats.endpoints import leaguedashteamstats

advancedGamefinder = leaguedashteamstats.LeagueDashTeamStats(
    season="2024-25", per_mode_detailed="PerGame"
)

team_stats = advancedGamefinder.get_data_frames()[0]

# Drop all columns related to ranks (not helpful for database)
keyword = "RANK"
columns_to_drop = [col for col in team_stats.columns if keyword in col]
team_stats.drop(columns=columns_to_drop, inplace=True)

Next, we can clean and modify our `team_stats` dataframe. This process is quite similar to the cleaning process for `games_modern`, since we need to match the columns in each dataframe.

In [19]:
# Cleaning data in team_stats and matching to games_modern

# Define three of four factors (EFG%, TOV%, FTR)

# Define 'EFG%' (Effective Field Goal Percentage)
team_stats["EFG%"] = (team_stats["FGM"] + (0.5 * team_stats["FG3M"])) / team_stats[
    "FGA"
]

# Define 'TOV%' (Turnover Percentage)
team_stats["TOV%"] = team_stats["TOV"] / (
    team_stats["FGA"] + 0.44 * team_stats["FTA"] + team_stats["TOV"]
)

# Define 'FTR' (Free Throw Attempt Rate)
team_stats["FTR"] = team_stats["FTA"] / team_stats["FGA"]

# Define 'TS%' (True Shooting Percentage)
team_stats["TS%"] = team_stats["PTS"] / (
    2 * (team_stats["FGA"] + (0.44 * team_stats["FTA"]))
)

# Label encode team ids
team_stats["TEAM_ID"] = le.fit_transform(team_stats["TEAM_ID"])

Finally, with our data cleaned, we can iterate through the games that will be played today and construct a one-row dataframe for each game with the necessary columns and data. Notice that this is also where our dictionary of team IDs comes in handy.

In [20]:
# Fetch data to predict upcoming games
games = board.games.get_dict()
# Empty list to store each prediction dataframe
prediction_dfs = []
for game in games:

    # Get home team (team) and away team (opponent) encoded ids
    team_id = ENCODED_TEAM_IDS.get(game["homeTeam"]["teamName"])
    opp_team_id = ENCODED_TEAM_IDS.get(game["awayTeam"]["teamName"])

    # Get relevant home team statistics for features

    home_condition = team_stats["TEAM_ID"] == team_id

    pts = team_stats.loc[home_condition, "PTS"].values[0]

    oreb = team_stats.loc[home_condition, "OREB"].values[0]

    dreb = team_stats.loc[home_condition, "DREB"].values[0]

    reb = team_stats.loc[home_condition, "REB"].values[0]

    ast = team_stats.loc[home_condition, "AST"].values[0]

    stl = team_stats.loc[home_condition, "STL"].values[0]

    blk = team_stats.loc[home_condition, "BLK"].values[0]

    tov = team_stats.loc[home_condition, "TOV"].values[0]

    efg_pct = team_stats.loc[home_condition, "EFG%"].values[0]

    tov_pct = team_stats.loc[home_condition, "TOV%"].values[0]

    ftr = team_stats.loc[home_condition, "FTR"].values[0]

    ts_pct = team_stats.loc[home_condition, "TS%"].values[0]

    # Get home game advantage (always 1 due to how we format data)
    hga = 1.0

    # Get last game outcome
    filtered_games_modern = games_modern.loc[games_modern["TEAM_ID"] == team_id]
    filtered_games_modern.sort_values(by="GAME_DATE", ascending=False, inplace=True)
    last_game_outcome = filtered_games_modern.iloc[0]["WIN"]

    # With all of the data, construct a dataframe with the game's information.
    prediction_data = {
        "TEAM_ID": [team_id],
        "OPP_TEAM_ID": [opp_team_id],
        "PTS": [pts],
        "OREB": [oreb],
        "DREB": [dreb],
        "REB": [reb],
        "AST": [ast],
        "STL": [stl],
        "BLK": [blk],
        "TOV": [tov],
        "EFG%": [efg_pct],
        "TOV%": [tov_pct],
        "FTR": [ftr],
        "TS%": [ts_pct],
        "HGA": [hga],
        "LAST_GAME_OUTCOME": [last_game_outcome],
    }
    game_prediction_df = pd.DataFrame(prediction_data)
    prediction_dfs.append(game_prediction_df)

# Will all game dataframes, concatenate into a single dataframe for easy simultaneous viewing
all_game_predictions = pd.concat(prediction_dfs)

In [21]:
all_game_predictions

Unnamed: 0,TEAM_ID,OPP_TEAM_ID,PTS,OREB,DREB,REB,AST,STL,BLK,TOV,EFG%,TOV%,FTR,TS%,HGA,LAST_GAME_OUTCOME
0,14,15,106.6,9.7,30.4,40.2,25.3,7.3,3.7,15.7,0.528202,0.142675,0.246769,0.564978,1.0,0
0,11,20,111.2,9.3,34.2,43.5,26.3,8.6,3.8,13.5,0.540276,0.122638,0.253165,0.575689,1.0,1
0,24,16,111.0,12.3,31.8,44.1,28.9,7.7,4.5,16.0,0.531146,0.138961,0.233593,0.559814,1.0,0
0,6,18,120.1,10.9,34.7,45.6,30.9,8.5,4.8,14.4,0.566964,0.12565,0.268973,0.599277,1.0,1
0,10,27,111.2,9.3,31.5,40.8,26.4,7.6,4.9,13.4,0.543275,0.122765,0.272515,0.580667,1.0,0


With each of our games stored in the array `prediction_dfs`, we can finally iterate through and apply our model to each one-row dataframe. In this case, a prediction of `[0]` indicates a loss for the home team and a prediction of `[1]` indicates a victory for the home team. Note that, due to the structure of the `scoreboard` object, we can only get predictions for games today after noon Eastern Time. Running the code below before noon would give predictions for the previous day's games.

In [23]:
for i in range(0, len(prediction_dfs)):
    print("Home Team:", game_teams[i][0])
    print("Away Team:", game_teams[i][1])
    prediction = xg_model.predict(prediction_dfs[i])
    # 0 = loss for home team/win for away team
    # 1 = win for home team/loss for away team
    if prediction == 0:
        print(game_teams[i][1] + " predicted to win!")
    if prediction == 1:
        print(game_teams[i][0] + " predicted to win!")
    print("")


Home Team: Nets
Away Team: Knicks
Knicks predicted to win!

Home Team: Heat
Away Team: Trail Blazers
Heat predicted to win!

Home Team: Raptors
Away Team: Magic
Magic predicted to win!

Home Team: Nuggets
Away Team: 76ers
Nuggets predicted to win!

Home Team: Lakers
Away Team: Wizards
Lakers predicted to win!



### Conclusions and Future

Overall, I'm proud of how this project turned out. This project really sparked my interest in data science and machine learning, and I would definitely be happy to work in the field of AI/ML in the future. That being said, there are certainly a number of improvements that could be made. One limitation of the data from the `nba_api` was that I could not easily access game stats for the opposing team, potentially limiting the accuracy of my model. This could be fixed in the future by using a web scraper, which would theoretically allow me to retrieve any data I need related to each game played. 

Another limitation of my model is the fact that injuries are not accounted for, which can definitely make some predictions highly inaccurate. This could be fixed by making a model that instead analyzes data related to individual players rather than entire teams. That way, injured players could simply be ignored when comparing teams. In the future, I would like to recreate this project to address these limitations and make an even stronger model.