# Final Project - Anthony Tobias
This project will be used to classify if the home team will win a game based on ranking, and season averages for scoring, field goal percentage, assists, etc. 

These tables will be broken down by home and away averages, by season and team. These will then be averaged, and used to train my model. I want my model to be able to take in a season, a home team, and an away, and be able to predict whether the home team wins (1 or 0). 

For cleaning, I will mostly be removing rows with missing data. In this case, it seems more effective to simply remove missing data, than to fill it in. There is enough data for each team and season that if there is missing data, a singluar row will make a small difference. Additionally, after doing some cleaning, the size of my tables has hardly changed. 

For this project, the end goal is to be able to get 65% or more of my predictions correct. Depending on how my project progresses, this number may go up or down. If it was possible to get a super high number, I may have to quit school to start gambing on NBA games!

In [43]:
from data_table import *
from data_util import *
import random

Get the data of games, remove rows with any missing values or duplicates, and drop desired columns

In [54]:
game_labels = ['Date', 'Game ID', 'Status', 'Home Team ID', 'Visitor Team ID', 'Season', 
               'Team ID Home', 'Pts Home', 'FG PCT Home', 'FT PCT Home', 'FG3 PCT Home', 'AST Home', 'REB Home', 
               'Team ID Away', 'PTS Away', 'FG PCT Away', 'FT PCT Away', 'FG3 PCT Away', 'AST Away', 'REB Away', 'Home Wins']
game = DataTable(game_labels)

game.load('games.csv')
game = remove_missing(game, game.columns())
game.drop(['Team ID Home', 'Team ID Away', 'Status'])
remove_duplicates(game)

    
print(game)

Date          Game ID    Home Team ID    Visitor Team ID    Season    Pts Home    FG PCT Home    FT PCT Home    FG3 PCT Home    AST Home    REB Home    PTS Away    FG PCT Away    FT PCT Away    FG3 PCT Away    AST Away    REB Away    Home Wins
----------  ---------  --------------  -----------------  --------  ----------  -------------  -------------  --------------  ----------  ----------  ----------  -------------  -------------  --------------  ----------  ----------  -----------
2022-12-22   22200477      1610612740         1610612759      2022         126          0.484          0.926           0.382          25          46         117          0.478          0.815           0.321          23          44            1
2022-12-22   22200478      1610612762         1610612764      2022         120          0.488          0.952           0.457          16          40         112          0.561          0.765           0.333          20          37            1
2022-12-21   22200466   

Partition game data by season. Do this twice, to compare home and away teams

In [55]:
# partition based on season to normalize
home_data = partition(game, ['Season'])

# partition based on season to normalize
away_data = partition(game, ['Season'])

Normalize the game statistics by season

In [57]:
# normalize team stats by season
normalize_game_columns = ['Pts Home', 'FG PCT Home', 'FT PCT Home', 'FG3 PCT Home', 'AST Home', 'REB Home',
                          'PTS Away', 'FG PCT Away', 'FT PCT Away', 'FG3 PCT Away', 'AST Away', 'REB Away']
for i in range(len(home_data)):
    for col in normalize_game_columns:
        home_data[i] = normalize(home_data[i], col)
        away_data[i] = normalize(away_data[i], col)
        
season_team_home = []
season_team_away = []

for table in home_data:
    temp_game_data = partition(table, ['Home Team ID'])
    for t in temp_game_data:
        season_team_home.append(t)

for table in away_data:
    temp_game_data = partition(table, ['Visitor Team ID'])
    for t in temp_game_data:
        season_team_away.append(t)
        
print(season_team_away[1])

Date          Game ID    Home Team ID    Visitor Team ID    Season    Pts Home    FG PCT Home    FT PCT Home    FG3 PCT Home    AST Home    REB Home    PTS Away    FG PCT Away    FT PCT Away    FG3 PCT Away    AST Away    REB Away    Home Wins
----------  ---------  --------------  -----------------  --------  ----------  -------------  -------------  --------------  ----------  ----------  ----------  -------------  -------------  --------------  ----------  ----------  -----------
2022-12-22   22200478      1610612762         1610612764      2022    0.576923       0.524217       0.92           0.664165    0.0714286    0.372093    0.484375       0.732523       0.664286        0.465021    0.285714    0.358974            1
2022-12-20   22200464      1610612756         1610612764      2022    0.448718       0.447293       0.648333       0.56848     0.428571     0.465116    0.5            0.471125       0.575714        0.617284    0.357143    0.461538            0
2022-12-18   22200451   

Create 2 tables of Teams, one for visitors and one for home

In [49]:
teams_labels_home = ['League ID', 'Home Team ID', 'Earliest Year', 'Latest Year', 
                'Home Team Abbreviation', 'Home Team Name', 'Founding Year', 'City',
                'Arena', 'Arena Capacity', 'Owner', 'GM', 'Coach',
                'G League Team']

home_teams = DataTable(teams_labels_home)
teams_labels_away = ['League ID', 'Visitor Team ID', 'Earliest Year', 'Latest Year', 
                'Visitor Team Abbreviation', 'Visitor Team Name', 'Founding Year', 'City',
                'Arena', 'Arena Capacity', 'Owner', 'GM', 'Coach',
                'G League Team']

visitor_teams = DataTable(teams_labels_away)
home_teams.load('teams.csv')
visitor_teams.load('teams.csv')
home_teams.drop(['League ID', 'Earliest Year', 'Latest Year', 'Founding Year', 'Arena', 'League', 'Arena Capacity', 'Owner', 'GM', 'Coach', 'G League Team'])
visitor_teams.drop(['League ID', 'Earliest Year', 'Latest Year', 'Founding Year', 'Arena', 'League', 'Arena Capacity', 'Owner', 'GM', 'Coach', 'G League Team'])
print(visitor_teams)

  Visitor Team ID  Visitor Team Abbreviation    Visitor Team Name    City
-----------------  ---------------------------  -------------------  -------------
       1610612737  ATL                          Hawks                Atlanta
       1610612738  BOS                          Celtics              Boston
       1610612740  NOP                          Pelicans             New Orleans
       1610612741  CHI                          Bulls                Chicago
       1610612742  DAL                          Mavericks            Dallas
       1610612743  DEN                          Nuggets              Denver
       1610612745  HOU                          Rockets              Houston
       1610612746  LAC                          Clippers             Los Angeles
       1610612747  LAL                          Lakers               Los Angeles
       1610612748  MIA                          Heat                 Miami
       1610612749  MIL                          Bucks             

Create a combined table that include name of home and away teams

In [50]:
temp_combined = []
teams_combined_home = []
teams_combined_away = []

for table in season_team_home:
    temp_combined.append(table.combine(table, home_teams, ['Home Team ID']))
    
for table in temp_combined:
    teams_combined_home.append(table.combine(table, visitor_teams, ['Visitor Team ID']))
    
temp_combined = []
for table in season_team_away:
    temp_combined.append(table.combine(table, visitor_teams, ['Visitor Team ID']))
    
for table in temp_combined:
    teams_combined_away.append(table.combine(table, home_teams, ['Home Team ID']))
print(teams_combined_away[1])

Date          Game ID    Home Team ID    Visitor Team ID    Season    Pts Home    FG PCT Home    FT PCT Home    FG3 PCT Home    AST Home    REB Home    PTS Away    FG PCT Away    FT PCT Away    FG3 PCT Away    AST Away    REB Away    Home Wins  Visitor Team Abbreviation    Visitor Team Name    City        Home Team Abbreviation    Home Team Name
----------  ---------  --------------  -----------------  --------  ----------  -------------  -------------  --------------  ----------  ----------  ----------  -------------  -------------  --------------  ----------  ----------  -----------  ---------------------------  -------------------  ----------  ------------------------  ----------------
2022-11-27   22200294      1610612738         1610612764      2022         130          0.55           1               0.471          25          38         121          0.537          0.788           0.25           23          29            1  WAS                          Wizards              Washing

In [53]:
ranking_labels = ['Home Team ID', 'League ID', 'Season ID', 'Date', 'Conference',
                  'Team Name', 'Games Played', 'Wins', 'Losses', 'Win PCT',
                  'Home Record', 'Road Record', 'Return To Play']
rankings_home = DataTable(ranking_labels)

ranking_labels[0] = 'Visitor Team ID'
rankings_away = DataTable(ranking_labels)

rankings_home.load('ranking.csv')
rankings_home.drop(['League ID', 'Conference', 'Return To Play', 'Season ID'])
rankings_home = remove_duplicates(rankings_home)
rankings_home = remove_missing(rankings_home, rankings_home.columns())

rankings_away.load('ranking.csv')
rankings_away.drop(['League ID', 'Conference', 'Return To Play', 'Season ID'])
rankings_away = remove_duplicates(rankings_away)
rankings_away = remove_missing(rankings_away, rankings_away.columns())
print(rankings_away[0])

  Visitor Team ID  Date        Team Name      Games Played    Wins    Losses    Win PCT  Home Record    Road Record
-----------------  ----------  -----------  --------------  ------  --------  ---------  -------------  -------------
       1610612743  2022-12-22  Denver                   30      19        11      0.633  10-3           9-8
