# 2020 Tournament Challenge
### This notebook will generate a 2020 bracket, it will alllow us to see what could have happened had this years tournament been played.  We based our seeds on the KenPom Adjusted EM metric. 

#### Hard to know exactly what tournament committee would have done since they have no objective strategy for placing teams in seeds and regions.

##### Install dependencies and import packages

In [1]:
pip install xgboost

Collecting xgboost
[?25l  Downloading https://files.pythonhosted.org/packages/70/91/551d37ba472bcbd70a25e667acc65a18a9d053657b13afcf0f87aa24d7bb/xgboost-1.0.2-py3-none-manylinux1_x86_64.whl (109.7MB)
[K     |████████████████████████████████| 109.8MB 29.3MB/s eta 0:00:01
Installing collected packages: xgboost
Successfully installed xgboost-1.0.2
Note: you may need to restart the kernel to use updated packages.


In [2]:
pip install -U scikit-learn

Collecting scikit-learn
[?25l  Downloading https://files.pythonhosted.org/packages/41/b6/126263db075fbcc79107749f906ec1c7639f69d2d017807c6574792e517e/scikit_learn-0.22.2.post1-cp37-cp37m-manylinux1_x86_64.whl (7.1MB)
[K     |████████████████████████████████| 7.1MB 2.4MB/s eta 0:00:01
Collecting joblib>=0.11 (from scikit-learn)
[?25l  Downloading https://files.pythonhosted.org/packages/28/5c/cf6a2b65a321c4a209efcdf64c2689efae2cb62661f8f6f4bb28547cf1bf/joblib-0.14.1-py2.py3-none-any.whl (294kB)
[K     |████████████████████████████████| 296kB 20.1MB/s eta 0:00:01
Installing collected packages: joblib, scikit-learn
  Found existing installation: scikit-learn 0.20.3
    Uninstalling scikit-learn-0.20.3:
      Successfully uninstalled scikit-learn-0.20.3
Successfully installed joblib-0.14.1 scikit-learn-0.22.2.post1
Note: you may need to restart the kernel to use updated packages.


In [3]:
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings("ignore")
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from datetime import datetime, timedelta
from pandas.io.json import json_normalize
import json
import sys
import urllib.request
from bs4 import BeautifulSoup
import re
from sklearn.externals import joblib
import xgboost as xgb
from xgboost import XGBClassifier
from xgboost import XGBRegressor
from sklearn.metrics import accuracy_score
from plotnine import *
import math
from contextlib import suppress

Read in the gamedatacleanedelo csv and subset for 2020 season

In [22]:
gamedata = pd.read_csv('~/jupyter/capstone_Group10/data/gamedatacleanedelo.csv')
gamedata = gamedata[gamedata['season']>=2020]
gamedata.rename(columns = {'homeSeasonElo':'hometeam_seasonelo','awaySeasonElo':'awayteam_seasonelo','homedist': 'hometeam_dist', 'awaydist': 'awayteam_dist'}, inplace = True)
gamedata = gamedata[['hometeam','hometeam_conference','hometeam_adjem']]
gamedata.sort_values("hometeam_adjem", inplace = True, ascending=False)
gamedata.head()

Unnamed: 0,hometeam,hometeam_conference,hometeam_adjem
18267,KANSAS,big-12,30.23
21354,KANSAS,big-12,30.23
18948,KANSAS,big-12,30.23
21028,KANSAS,big-12,30.23
19227,KANSAS,big-12,30.23


Creating DataFrame with each teams conference and home team adjusted em . Adj EM is how KenPom determines the rankings for each team.

In [23]:
gamedatateam = gamedata.groupby(['hometeam','hometeam_conference']).max()
gamedatateam.sort_values("hometeam_adjem", inplace = True, ascending=False)
gamedatateam = gamedatateam.reset_index()
gamedatateam = gamedatateam[['hometeam','hometeam_adjem']]

In [24]:
gamedataconf = gamedata.groupby(['hometeam_conference']).max()
gamedataconf.sort_values("hometeam_adjem", inplace = True, ascending=False)
gamedataconf = gamedataconf.reset_index()
gamedataconf = gamedataconf[['hometeam_conference','hometeam_adjem']]

Merging gamedataconf with gamedata 

In [25]:
gamedatatourn = pd.merge(gamedataconf, gamedata, how="inner", left_on= "hometeam_adjem", right_on ='hometeam_adjem')

Dropping duplicates from the dataframe

In [26]:
gamedatatourn.drop_duplicates(subset ='hometeam', 
                     keep = 'first', inplace = True)

Dropping and renaming columns from the created dataframe 

In [27]:
gamedatatourn.drop(['hometeam_conference_y'], axis = 1, inplace = True)
gamedatatourn.rename(columns = {'hometeam_conference_x':'hometeam_conference'}, inplace = True)

In [50]:
gamedatatourney = pd.merge(gamedatatourn, gamedata, how="right", left_on= "hometeam", right_on ='hometeam')
gamedatatourney.sort_values(["hometeam_adjem_x","hometeam_adjem_y"], inplace = True, ascending=False)
gamedatatourney.drop_duplicates(subset ='hometeam', 
                     keep = 'first', inplace = True)

The NCAA tournament selects the top 64 teams for the tournament. They do have a play in game but we did not include that in our tournament simulation. 

In [51]:
gamedatatourney = gamedatatourney[0:64]

In [52]:
gamedatatourney.sort_values("hometeam_adjem_y", inplace = True, ascending=False)
gamedatatourney.drop(['hometeam_adjem_x','hometeam_conference_x'], axis = 1, inplace = True)
gamedatatourney.rename(columns = {'hometeam_adjem_y':'hometeam_adjem','hometeam_conference_y':'hometeam_conference'}, inplace = True)
gamedatatourney = gamedatatourney.reset_index()
gamedatatourney.drop(['index'], axis = 1, inplace = True)

We then created seeds for each team in the dataframe

In [53]:
seed =[1]
j = 1
i = 1
for index, row in gamedatatourney.iterrows():
    #print(index,i)
    if i <= 3:
        seed.append(j)
        i +=1
    else:
        j +=1
        if j <= 16:
            seed.append(j)
        i = 1
    
gamedatatourney['seed'] = seed        

In [54]:
gamedatatourney

Unnamed: 0,hometeam,gamedatekey,season,date,location,winner,awayteam,home_points,away_points,home_teamname_season,...,awayteam_strength_of_schedule,awayteam_three_point_attempt_rate,awayteam_two_point_field_goal_percentage,awayteam_win_percentage,awayteam_total_wins,hometeam_dist,awayteam_dist,hometeam_seasonelo,awayteam_seasonelo,seed
0,KANSAS,2019-11-08-21-kansas,2020,"November 8, 2019","Allen Fieldhouse, Lawrence, Kansas",Home,NORTH_CAROLINA_GREENSBORO,74,62,KANSAS_2020,...,-1.48,0.422,0.511,0.719,23.0,0.000000,1396.463834,1836.887933,1653.308768,1
1,GONZAGA,2019-11-05-20-gonzaga,2020,"November 5, 2019","McCarthey Athletic Center, Spokane, Washington",Home,ALABAMA_STATE,95,64,GONZAGA_2020,...,-6.53,0.390,0.441,0.250,8.0,0.000000,3116.195091,1903.090698,1290.555325,1
2,BAYLOR,2019-11-05-12-baylor,2020,"November 5, 2019","Ferrell Center, Waco, Texas",Home,CENTRAL_ARKANSAS,105,61,BAYLOR_2020,...,-2.70,0.404,0.498,0.323,10.0,0.000000,583.642863,1763.292228,1388.577035,1
3,DAYTON,2019-11-09-19-dayton,2020,"November 9, 2019","University of Dayton Arena, Dayton, Ohio",Home,INDIANA_STATE,86,81,DAYTON_2020,...,1.81,0.343,0.492,0.600,18.0,0.000000,276.208569,1757.944092,1498.138337,1
4,DUKE,2019-11-05-19-duke,2020,"November 5, 2019","Madison Square Garden (IV), New York, New York",Home,KANSAS,68,66,DUKE_2020,...,11.63,0.329,0.553,0.903,28.0,681.970288,1821.235524,1839.881047,1836.887933,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
59,SIENA,2019-11-05-19-siena,2020,"November 5, 2019","Times Union Center , Albany, New York",Home,AMERICAN,96,80,SIENA_2020,...,-6.50,0.348,0.519,0.533,16.0,0.000000,498.985211,1527.054921,1473.729613,15
60,VALPARAISO,2019-11-05-20-valparaiso,2020,"November 5, 2019","Athletics-Recreation Center, Valparaiso, Indiana",Home,TOLEDO,79,77,VALPARAISO_2020,...,-1.47,0.420,0.486,0.531,17.0,0.000000,286.901631,1530.900523,1588.051118,16
61,SAINT_FRANCIS_PA,2019-11-16-19-saint-francis-pa,2020,"November 16, 2019","DeGol Arena, Loretto, Pennsylvania",Home,AMERICAN,79,76,STATE_FRANCIS_PA_2020,...,-6.50,0.348,0.519,0.533,16.0,0.000000,218.957979,,1473.729613,16
62,PRAIRIE_VIEW,2019-11-23-19-prairie-view,2020,"November 23, 2019","CBU Events Center, Riverside, California",Home,CENTRAL_ARKANSAS,78,72,PRAIRIE_VIEW_A_M_2020,...,-2.70,0.404,0.498,0.323,10.0,2060.965357,2285.080838,,1388.577035,16


We begin seperating the teams into 2 groups of 32 teams based on seed

In [55]:
gamedatatourneya = gamedatatourney[0:32]
gamedatatourneya = gamedatatourneya.reset_index()
gamedatatourneya.drop(['index'], axis = 1, inplace = True)

In [56]:
gamedatatourneya 

Unnamed: 0,hometeam,gamedatekey,season,date,location,winner,awayteam,home_points,away_points,home_teamname_season,...,awayteam_strength_of_schedule,awayteam_three_point_attempt_rate,awayteam_two_point_field_goal_percentage,awayteam_win_percentage,awayteam_total_wins,hometeam_dist,awayteam_dist,hometeam_seasonelo,awayteam_seasonelo,seed
0,KANSAS,2019-11-08-21-kansas,2020,"November 8, 2019","Allen Fieldhouse, Lawrence, Kansas",Home,NORTH_CAROLINA_GREENSBORO,74,62,KANSAS_2020,...,-1.48,0.422,0.511,0.719,23.0,0.0,1396.463834,1836.887933,1653.308768,1
1,GONZAGA,2019-11-05-20-gonzaga,2020,"November 5, 2019","McCarthey Athletic Center, Spokane, Washington",Home,ALABAMA_STATE,95,64,GONZAGA_2020,...,-6.53,0.39,0.441,0.25,8.0,0.0,3116.195091,1903.090698,1290.555325,1
2,BAYLOR,2019-11-05-12-baylor,2020,"November 5, 2019","Ferrell Center, Waco, Texas",Home,CENTRAL_ARKANSAS,105,61,BAYLOR_2020,...,-2.7,0.404,0.498,0.323,10.0,0.0,583.642863,1763.292228,1388.577035,1
3,DAYTON,2019-11-09-19-dayton,2020,"November 9, 2019","University of Dayton Arena, Dayton, Ohio",Home,INDIANA_STATE,86,81,DAYTON_2020,...,1.81,0.343,0.492,0.6,18.0,0.0,276.208569,1757.944092,1498.138337,1
4,DUKE,2019-11-05-19-duke,2020,"November 5, 2019","Madison Square Garden (IV), New York, New York",Home,KANSAS,68,66,DUKE_2020,...,11.63,0.329,0.553,0.903,28.0,681.970288,1821.235524,1839.881047,1836.887933,2
5,SAN_DIEGO_STATE,2019-11-05-22-san-diego-state,2020,"November 5, 2019","Viejas Arena, San Diego, California",Home,TEXAS_SOUTHERN,77,42,SAN_DIEGO_STATE_2020,...,-6.05,0.286,0.464,0.5,16.0,0.0,2087.607675,1759.929361,1470.459359,2
6,MICHIGAN_STATE,2019-11-10-19-michigan-state,2020,"November 10, 2019","Breslin Events Center, East Lansing, Michigan",Home,BINGHAMTON,100,47,MICHIGAN_STATE_2020,...,-4.62,0.479,0.482,0.345,10.0,0.0,702.759717,1806.394251,1328.989112,2
7,OHIO_STATE,2019-11-06-20-ohio-state,2020,"November 6, 2019","Value City Arena, Columbus, Ohio",Home,CINCINNATI,64,56,OHIO_STATE_2020,...,6.73,0.361,0.517,0.667,20.0,0.0,160.610646,1698.662248,1759.088167,2
8,LOUISVILLE,2019-11-10-14-louisville,2020,"November 10, 2019","KFC Yum! Center, Louisville, Kentucky",Home,YOUNGSTOWN_STATE,78,55,LOUISVILLE_2020,...,-4.79,0.372,0.491,0.545,18.0,0.0,539.568616,1738.87845,1406.397903,3
9,WEST_VIRGINIA,2019-11-08-19-west-virginia,2020,"November 8, 2019","WVU Coliseum, Morgantown, West Virginia",Home,AKRON,94,84,WEST_VIRGINIA_2020,...,-0.4,0.444,0.504,0.774,24.0,0.0,204.676842,1664.263717,1615.024769,3


In [57]:
gamedatatourney.sort_values("hometeam_adjem", inplace = True, ascending=True)

In [58]:
gamedatatourneyb = gamedatatourney[0:32]
gamedatatourneyb = gamedatatourneyb.reset_index()
gamedatatourneyb.drop(['index'], axis = 1, inplace = True)

In [59]:
gamedatatourneyb.head(10)

Unnamed: 0,hometeam,gamedatekey,season,date,location,winner,awayteam,home_points,away_points,home_teamname_season,...,awayteam_strength_of_schedule,awayteam_three_point_attempt_rate,awayteam_two_point_field_goal_percentage,awayteam_win_percentage,awayteam_total_wins,hometeam_dist,awayteam_dist,hometeam_seasonelo,awayteam_seasonelo,seed
0,NORFOLK_STATE,2019-12-01-16-norfolk-state,2020,"December 1, 2019","Echols Memorial Hall, Norfolk, Virginia",Away,NIAGARA,61,65,NORFOLK_STATE_2020,...,-4.33,0.366,0.462,0.375,12.0,0.0,738.229387,1456.337192,1370.767649,16
1,PRAIRIE_VIEW,2019-11-23-19-prairie-view,2020,"November 23, 2019","CBU Events Center, Riverside, California",Home,CENTRAL_ARKANSAS,78,72,PRAIRIE_VIEW_A_M_2020,...,-2.7,0.404,0.498,0.323,10.0,2060.965357,2285.080838,,1388.577035,16
2,SAINT_FRANCIS_PA,2019-11-16-19-saint-francis-pa,2020,"November 16, 2019","DeGol Arena, Loretto, Pennsylvania",Home,AMERICAN,79,76,STATE_FRANCIS_PA_2020,...,-6.5,0.348,0.519,0.533,16.0,0.0,218.957979,,1473.729613,16
3,VALPARAISO,2019-11-05-20-valparaiso,2020,"November 5, 2019","Athletics-Recreation Center, Valparaiso, Indiana",Home,TOLEDO,79,77,VALPARAISO_2020,...,-1.47,0.42,0.486,0.531,17.0,0.0,286.901631,1530.900523,1588.051118,16
4,SIENA,2019-11-05-19-siena,2020,"November 5, 2019","Times Union Center , Albany, New York",Home,AMERICAN,96,80,SIENA_2020,...,-6.5,0.348,0.519,0.533,16.0,0.0,498.985211,1527.054921,1473.729613,15
5,WINTHROP,2019-11-21-19-winthrop,2020,"November 21, 2019","Winthrop Coliseum, Rock Hill, South Carolina",Away,TENNESSEE_TECH,58,61,WINTHROP_2020,...,-4.67,0.425,0.473,0.29,9.0,0.0,428.874865,1588.115624,1307.23561,15
6,WRIGHT_STATE,2019-11-16-19-wright-state,2020,"November 16, 2019","Ervin J. Nutter Center, Dayton, Ohio",Away,KENT_STATE,71,72,WRIGHT_STATE_2020,...,-0.72,0.405,0.532,0.625,20.0,0.0,274.83961,1594.707187,1557.371674,15
7,COLGATE,2019-11-06-19-colgate,2020,"November 6, 2019","Cotterell Court, Hamilton, New York",Home,NJIT,80,75,COLGATE_2020,...,-4.83,0.333,0.453,0.3,9.0,0.0,256.695872,1632.082892,1405.708055,15
8,NORTH_DAKOTA_STATE,2019-11-11-14-north-dakota-state,2020,"November 11, 2019","Scheels Center, Fargo, North Dakota",Home,CAL_POLY,74,67,NORTH_DAKOTA_STATE_2020,...,-0.42,0.34,0.459,0.233,7.0,0.0,2364.597375,1576.351718,1236.510099,14
9,WESTERN_KENTUCKY,2019-11-05-20-western-kentucky,2020,"November 5, 2019","E.A. Diddle Arena, Bowling Green, Kentucky",Home,TENNESSEE_TECH,76,64,WESTERN_KENTUCKY_2020,...,-4.67,0.425,0.473,0.29,9.0,0.0,123.459658,1610.724823,1307.23561,14


Merging the dataframes to get match-ups for each team in the tournmanet.

In [60]:
gamedatatourney = pd.merge(gamedatatourneya,gamedatatourneyb, how="inner",left_index=True, right_index=True)

The same as the NCAA tournment we see the highest seed play the lowest seed and so on. Our first matchup is the 16 seed vs the 1 seed. 

In [61]:
gamedatatourney.head()

Unnamed: 0,hometeam_x,gamedatekey_x,season_x,date_x,location_x,winner_x,awayteam_x,home_points_x,away_points_x,home_teamname_season_x,...,awayteam_strength_of_schedule_y,awayteam_three_point_attempt_rate_y,awayteam_two_point_field_goal_percentage_y,awayteam_win_percentage_y,awayteam_total_wins_y,hometeam_dist_y,awayteam_dist_y,hometeam_seasonelo_y,awayteam_seasonelo_y,seed_y
0,KANSAS,2019-11-08-21-kansas,2020,"November 8, 2019","Allen Fieldhouse, Lawrence, Kansas",Home,NORTH_CAROLINA_GREENSBORO,74,62,KANSAS_2020,...,-4.33,0.366,0.462,0.375,12.0,0.0,738.229387,1456.337192,1370.767649,16
1,GONZAGA,2019-11-05-20-gonzaga,2020,"November 5, 2019","McCarthey Athletic Center, Spokane, Washington",Home,ALABAMA_STATE,95,64,GONZAGA_2020,...,-2.7,0.404,0.498,0.323,10.0,2060.965357,2285.080838,,1388.577035,16
2,BAYLOR,2019-11-05-12-baylor,2020,"November 5, 2019","Ferrell Center, Waco, Texas",Home,CENTRAL_ARKANSAS,105,61,BAYLOR_2020,...,-6.5,0.348,0.519,0.533,16.0,0.0,218.957979,,1473.729613,16
3,DAYTON,2019-11-09-19-dayton,2020,"November 9, 2019","University of Dayton Arena, Dayton, Ohio",Home,INDIANA_STATE,86,81,DAYTON_2020,...,-1.47,0.42,0.486,0.531,17.0,0.0,286.901631,1530.900523,1588.051118,16
4,DUKE,2019-11-05-19-duke,2020,"November 5, 2019","Madison Square Garden (IV), New York, New York",Home,KANSAS,68,66,DUKE_2020,...,-6.5,0.348,0.519,0.533,16.0,0.0,498.985211,1527.054921,1473.729613,15


Now adding the round 4 region location to include distances teams traveled for matchups. This is based on where each region would have been played. 

In [62]:
gamedatatourney = gamedatatourney[['hometeam_x','hometeam_y','seed_x','seed_y']]
gamedatatourney.rename(columns = {'hometeam_x':'hometeam','hometeam_y':'awayteam'}, inplace = True)
gamedatatourney = gamedatatourney.reset_index()
gamedatatourney['R4Location'] ='Atlanta'
gamedatatourney.head()


Unnamed: 0,index,hometeam,awayteam,seed_x,seed_y,R4Location
0,0,KANSAS,NORFOLK_STATE,1,16,Atlanta
1,1,GONZAGA,PRAIRIE_VIEW,1,16,Atlanta
2,2,BAYLOR,SAINT_FRANCIS_PA,1,16,Atlanta
3,3,DAYTON,VALPARAISO,1,16,Atlanta
4,4,DUKE,SIENA,2,15,Atlanta


In [63]:
first = gamedatatourney[0:1]
print(first)
second = gamedatatourney[31:32]
print(second)
third = gamedatatourney[19:20]
print(third)
fourth = gamedatatourney[15:16]
print(fourth)
fifth = gamedatatourney[23:24]
print(fifth)
sixth = gamedatatourney[11:12]
print(sixth)
seventh = gamedatatourney[27:28]
print(seventh)
eigth = gamedatatourney[7:8]
print(eigth)
bracket1 = pd.concat([first,second,third,fourth,fifth,sixth,seventh,eigth])
bracket1.head(16)

   index hometeam       awayteam  seed_x  seed_y R4Location
0      0   KANSAS  NORFOLK_STATE       1      16    Atlanta
    index hometeam awayteam  seed_x  seed_y R4Location
31     31  FLORIDA   AUBURN       8       9    Atlanta
    index    hometeam awayteam  seed_x  seed_y R4Location
19     19  SETON_HALL  VERMONT       5      12    Atlanta
    index  hometeam     awayteam  seed_x  seed_y R4Location
15     15  MICHIGAN  TEXAS_STATE       4      13    Atlanta
    index hometeam              awayteam  seed_x  seed_y R4Location
23     23   PURDUE  EAST_TENNESSEE_STATE       6      11    Atlanta
    index   hometeam awayteam  seed_x  seed_y R4Location
11     11  CREIGHTON  BELMONT       3      14    Atlanta
    index hometeam         awayteam  seed_x  seed_y R4Location
27     27  RUTGERS  LOUISIANA_STATE       7      10    Atlanta
   index    hometeam awayteam  seed_x  seed_y R4Location
7      7  OHIO_STATE  COLGATE       2      15    Atlanta


Unnamed: 0,index,hometeam,awayteam,seed_x,seed_y,R4Location
0,0,KANSAS,NORFOLK_STATE,1,16,Atlanta
31,31,FLORIDA,AUBURN,8,9,Atlanta
19,19,SETON_HALL,VERMONT,5,12,Atlanta
15,15,MICHIGAN,TEXAS_STATE,4,13,Atlanta
23,23,PURDUE,EAST_TENNESSEE_STATE,6,11,Atlanta
11,11,CREIGHTON,BELMONT,3,14,Atlanta
27,27,RUTGERS,LOUISIANA_STATE,7,10,Atlanta
7,7,OHIO_STATE,COLGATE,2,15,Atlanta


In [64]:
first = gamedatatourney[1:2]
print(first)
second = gamedatatourney[30:31]
print(second)
third = gamedatatourney[18:19]
print(third)
fourth = gamedatatourney[14:15]
print(fourth)
fifth = gamedatatourney[22:23]
print(fifth)
sixth = gamedatatourney[10:11]
print(sixth)
seventh = gamedatatourney[26:27]
print(seventh)
eigth = gamedatatourney[6:7]
print(eigth)
bracket2 = pd.concat([first,second,third,fourth,fifth,sixth,seventh,eigth])
bracket2.head(16)

   index hometeam      awayteam  seed_x  seed_y R4Location
1      1  GONZAGA  PRAIRIE_VIEW       1      16    Atlanta
    index   hometeam awayteam  seed_x  seed_y R4Location
30     30  MARQUETTE  INDIANA       8       9    Atlanta
    index hometeam awayteam  seed_x  seed_y R4Location
18     18  ARIZONA  LIBERTY       5      12    Atlanta
    index       hometeam          awayteam  seed_x  seed_y R4Location
14     14  FLORIDA_STATE  STEPHEN_F_AUSTIN       4      13    Atlanta
    index hometeam awayteam  seed_x  seed_y R4Location
22     22     IOWA     YALE       6      11    Atlanta
    index  hometeam awayteam  seed_x  seed_y R4Location
10     10  MARYLAND     UTAH       3      14    Atlanta
    index   hometeam       awayteam  seed_x  seed_y R4Location
26     26  MINNESOTA  WICHITA_STATE       7      10    Atlanta
   index        hometeam      awayteam  seed_x  seed_y R4Location
6      6  MICHIGAN_STATE  WRIGHT_STATE       2      15    Atlanta


Unnamed: 0,index,hometeam,awayteam,seed_x,seed_y,R4Location
1,1,GONZAGA,PRAIRIE_VIEW,1,16,Atlanta
30,30,MARQUETTE,INDIANA,8,9,Atlanta
18,18,ARIZONA,LIBERTY,5,12,Atlanta
14,14,FLORIDA_STATE,STEPHEN_F_AUSTIN,4,13,Atlanta
22,22,IOWA,YALE,6,11,Atlanta
10,10,MARYLAND,UTAH,3,14,Atlanta
26,26,MINNESOTA,WICHITA_STATE,7,10,Atlanta
6,6,MICHIGAN_STATE,WRIGHT_STATE,2,15,Atlanta


In [65]:
first = gamedatatourney[2:3]
print(first)
second = gamedatatourney[29:30]
print(second)
third = gamedatatourney[17:18]
print(third)
fourth = gamedatatourney[13:14]
print(fourth)
fifth = gamedatatourney[21:22]
print(fifth)
sixth = gamedatatourney[9:10]
print(sixth)
seventh = gamedatatourney[25:26]
print(seventh)
eigth = gamedatatourney[5:6]
print(eigth)
bracket3 = pd.concat([first,second,third,fourth,fifth,sixth,seventh,eigth])
bracket3.head(16)

   index hometeam          awayteam  seed_x  seed_y R4Location
2      2   BAYLOR  SAINT_FRANCIS_PA       1      16    Atlanta
    index  hometeam  awayteam  seed_x  seed_y R4Location
29     29  ILLINOIS  COLORADO       8       9    Atlanta
    index   hometeam awayteam  seed_x  seed_y R4Location
17     17  VILLANOVA    AKRON       5      12    Atlanta
    index hometeam           awayteam  seed_x  seed_y R4Location
13     13  HOUSTON  CALIFORNIA_IRVINE       4      13    Atlanta
    index   hometeam        awayteam  seed_x  seed_y R4Location
21     21  WISCONSIN  LOUISIANA_TECH       6      11    Atlanta
   index       hometeam          awayteam  seed_x  seed_y R4Location
9      9  WEST_VIRGINIA  WESTERN_KENTUCKY       3      14    Atlanta
    index    hometeam        awayteam  seed_x  seed_y R4Location
25     25  PENN_STATE  SAINT_MARYS_CA       7      10    Atlanta
   index         hometeam  awayteam  seed_x  seed_y R4Location
5      5  SAN_DIEGO_STATE  WINTHROP       2      15    At

Unnamed: 0,index,hometeam,awayteam,seed_x,seed_y,R4Location
2,2,BAYLOR,SAINT_FRANCIS_PA,1,16,Atlanta
29,29,ILLINOIS,COLORADO,8,9,Atlanta
17,17,VILLANOVA,AKRON,5,12,Atlanta
13,13,HOUSTON,CALIFORNIA_IRVINE,4,13,Atlanta
21,21,WISCONSIN,LOUISIANA_TECH,6,11,Atlanta
9,9,WEST_VIRGINIA,WESTERN_KENTUCKY,3,14,Atlanta
25,25,PENN_STATE,SAINT_MARYS_CA,7,10,Atlanta
5,5,SAN_DIEGO_STATE,WINTHROP,2,15,Atlanta


In [66]:
first = gamedatatourney[3:4]
print(first)
second = gamedatatourney[28:29]
print(second)
third = gamedatatourney[16:17]
print(third)
fourth = gamedatatourney[12:13]
print(fourth)
fifth = gamedatatourney[20:21]
print(fifth)
sixth = gamedatatourney[8:9]
print(sixth)
seventh = gamedatatourney[24:25]
print(seventh)
eigth = gamedatatourney[4:5]
print(eigth)
bracket4 = pd.concat([first,second,third,fourth,fifth,sixth,seventh,eigth])
bracket4.head(16)

   index hometeam    awayteam  seed_x  seed_y R4Location
3      3   DAYTON  VALPARAISO       1      16    Atlanta
    index  hometeam  awayteam  seed_x  seed_y R4Location
28     28  KENTUCKY  OKLAHOMA       8       9    Atlanta
    index hometeam          awayteam  seed_x  seed_y R4Location
16     16   OREGON  NEW_MEXICO_STATE       5      12    Atlanta
    index       hometeam awayteam  seed_x  seed_y R4Location
12     12  BRIGHAM_YOUNG  HOFSTRA       4      13    Atlanta
    index    hometeam           awayteam  seed_x  seed_y R4Location
20     20  TEXAS_TECH  NORTHERN_COLORADO       6      11    Atlanta
   index    hometeam            awayteam  seed_x  seed_y R4Location
8      8  LOUISVILLE  NORTH_DAKOTA_STATE       3      14    Atlanta
    index hometeam       awayteam  seed_x  seed_y R4Location
24     24   BUTLER  NORTHERN_IOWA       7      10    Atlanta
   index hometeam awayteam  seed_x  seed_y R4Location
4      4     DUKE    SIENA       2      15    Atlanta


Unnamed: 0,index,hometeam,awayteam,seed_x,seed_y,R4Location
3,3,DAYTON,VALPARAISO,1,16,Atlanta
28,28,KENTUCKY,OKLAHOMA,8,9,Atlanta
16,16,OREGON,NEW_MEXICO_STATE,5,12,Atlanta
12,12,BRIGHAM_YOUNG,HOFSTRA,4,13,Atlanta
20,20,TEXAS_TECH,NORTHERN_COLORADO,6,11,Atlanta
8,8,LOUISVILLE,NORTH_DAKOTA_STATE,3,14,Atlanta
24,24,BUTLER,NORTHERN_IOWA,7,10,Atlanta
4,4,DUKE,SIENA,2,15,Atlanta


Loading gamedata for 2020 then rename homeSeasonElo column

In [67]:
gamedata = pd.read_csv('~/jupyter/capstone_Group10/data/gamedatacleanedelo.csv')
gamedata = gamedata[gamedata['season']==2020]
#gamedata.drop(columns=['Season'],inplace =True)
gamedata.rename(columns = {'homeSeasonElo':'hometeam_seasonelo','awaySeasonElo':'awayteam_seasonelo','homedist': 'hometeam_dist', 'awaydist': 'awayteam_dist'}, inplace = True)
#gamedata.drop(['home_lat', 'home_long','away_lat','away_long','loc_lat','loc_long'], axis=1, inplace=True)
gamedata.head()

Unnamed: 0,gamedatekey,season,date,location,winner,awayteam,hometeam,home_points,away_points,home_teamname_season,...,awayteam_steal_percentage,awayteam_strength_of_schedule,awayteam_three_point_attempt_rate,awayteam_two_point_field_goal_percentage,awayteam_win_percentage,awayteam_total_wins,hometeam_dist,awayteam_dist,hometeam_seasonelo,awayteam_seasonelo
16698,2019-11-05-12-baylor,2020,"November 5, 2019","Ferrell Center, Waco, Texas",Home,CENTRAL_ARKANSAS,BAYLOR,105,61,BAYLOR_2020,...,6.8,-2.7,0.404,0.498,0.323,10.0,0.0,583.642863,1763.292228,1388.577035
16699,2019-11-05-13-tulsa,2020,"November 5, 2019","Donald W. Reynolds Center, Tulsa, Oklahoma",Home,HOUSTON_BAPTIST,TULSA,80,72,TULSA_2020,...,7.2,-3.62,0.33,0.483,0.138,4.0,0.0,718.750206,1605.99366,1267.899912
16700,2019-11-05-18-duquesne,2020,"November 5, 2019","CONSOL Energy Center, Pittsburgh, Pennsylvania",Home,PRINCETON,DUQUESNE,94,67,DUQUESNE_2020,...,9.5,-1.48,0.439,0.546,0.519,14.0,0.2825118,452.369898,1562.390578,1529.849362
16701,2019-11-05-18-miami-fl,2020,"November 5, 2019","BankUnited Center, Coral Gables, Florida",Away,LOUISVILLE,MIAMI_FL,74,87,MIAMI_FL_2020,...,7.5,7.55,0.38,0.501,0.774,24.0,3.950437e-13,1486.367403,1574.042091,1738.87845
16702,2019-11-05-18-seton-hall,2020,"November 5, 2019","Walsh Gymnasium, South Orange, New Jersey",Home,WAGNER,SETON_HALL,105,71,SETON_HALL_2020,...,8.6,-5.57,0.377,0.471,0.276,8.0,6.28589,18.891145,1702.799507,1405.253199


Making lists of floats and ints 

In [68]:
#making lists of floats and ints
floats = list(filter(lambda i: gamedata[i].dtypes==np.float64, gamedata.columns))
ints = list(filter(lambda i: gamedata[i].dtypes==np.int64, gamedata.columns))
#merging the lists
columns = ints + floats
columns.remove('home_points')
columns.remove('away_points')


Filtering gamedata for floats and int types for predictions

In [69]:
gamedatacleaned = gamedata[columns]
gamedatacleaned = gamedatacleaned.apply(lambda x: x.fillna(x.mean()),axis=0)

### Getting home and away features 

In [70]:
#getting home and away features
away_cols = [col for col in gamedatacleaned.columns if 'awayteam' in col]
home_cols = [col for col in gamedatacleaned.columns if 'hometeam' in col]
home_cols.append('season')
away_cols.append('season')

### Creating home and away dataframes

In [71]:
#making the home dataframe
homegamedata = pd.DataFrame() #initializing the dataframe
homegamedata = gamedatacleaned[home_cols]
homegamedata['hometeam'] = gamedata['hometeam']
#homegamedata['homepace'] = gamedata['pace']
homegamedata.shape

(5331, 54)

In [72]:
#making the away dataframe
awaygamedata = pd.DataFrame() #initializing the dataframe
awaygamedata = gamedatacleaned[home_cols]
awaygamedata['awayteam'] = gamedata['awayteam']
awaygamedata.head()

Unnamed: 0,hometeam_adjem,hometeam_adj_o,hometeam_adj_d,hometeam_adj_t,hometeam_luck,hometeam_sos_adj_em,hometeam_sos_oppo,hometeam_sos_oppd,hometeam_ncsos_adj,hometeam_assist_percentage,...,hometeam_steal_percentage,hometeam_strength_of_schedule,hometeam_three_point_attempt_rate,hometeam_two_point_field_goal_percentage,hometeam_win_percentage,hometeam_total_wins,hometeam_dist,hometeam_seasonelo,season,awayteam
16698,25.49,113.5,88.1,66.2,0.016,10.2,106.4,96.2,1.38,54.1,...,12.2,9.13,0.369,0.475,0.867,26.0,0.0,1763.292228,2020,CENTRAL_ARKANSAS
16699,9.65,103.2,93.6,65.8,0.076,3.19,103.5,100.3,-4.68,53.7,...,11.1,3.59,0.342,0.502,0.677,21.0,0.0,1605.99366,2020,HOUSTON_BAPTIST
16700,7.58,106.6,99.0,67.8,0.049,0.52,102.8,102.3,-5.04,56.5,...,10.1,1.67,0.431,0.538,0.7,21.0,0.2825118,1562.390578,2020,PRINCETON
16701,6.39,107.7,101.3,68.3,0.034,9.57,106.3,96.8,1.07,38.9,...,8.4,8.66,0.372,0.495,0.484,15.0,3.950437e-13,1574.042091,2020,LOUISVILLE
16702,19.54,112.3,92.7,69.8,0.006,11.48,108.4,96.9,4.86,55.8,...,9.9,9.95,0.405,0.523,0.7,21.0,6.28589,1702.799507,2020,WAGNER


In [73]:
away_cols.append('awayteam')

In [74]:
awaygamedata.columns = away_cols

In [75]:
awaygamedata.head()

Unnamed: 0,awayteam_adjem,awayteam_adj_o,awayteam_adj_d,awayteam_adj_t,awayteam_luck,awayteam_sos_adj_em,awayteam_sos_oppo,awayteam_sos_oppd,awayteam_ncsos_adj,awayteam_assist_percentage,...,awayteam_steal_percentage,awayteam_strength_of_schedule,awayteam_three_point_attempt_rate,awayteam_two_point_field_goal_percentage,awayteam_win_percentage,awayteam_total_wins,awayteam_dist,awayteam_seasonelo,season,awayteam
16698,25.49,113.5,88.1,66.2,0.016,10.2,106.4,96.2,1.38,54.1,...,12.2,9.13,0.369,0.475,0.867,26.0,0.0,1763.292228,2020,CENTRAL_ARKANSAS
16699,9.65,103.2,93.6,65.8,0.076,3.19,103.5,100.3,-4.68,53.7,...,11.1,3.59,0.342,0.502,0.677,21.0,0.0,1605.99366,2020,HOUSTON_BAPTIST
16700,7.58,106.6,99.0,67.8,0.049,0.52,102.8,102.3,-5.04,56.5,...,10.1,1.67,0.431,0.538,0.7,21.0,0.2825118,1562.390578,2020,PRINCETON
16701,6.39,107.7,101.3,68.3,0.034,9.57,106.3,96.8,1.07,38.9,...,8.4,8.66,0.372,0.495,0.484,15.0,3.950437e-13,1574.042091,2020,LOUISVILLE
16702,19.54,112.3,92.7,69.8,0.006,11.48,108.4,96.9,4.86,55.8,...,9.9,9.95,0.405,0.523,0.7,21.0,6.28589,1702.799507,2020,WAGNER


### Summarizing the data by team and revisiting the shape

In [76]:
#summarizing the data by team
homegamedata = homegamedata.groupby(['hometeam']).max()
awaygamedata = awaygamedata.groupby(['awayteam']).max()

In [77]:
print(homegamedata.shape)
print(awaygamedata.shape)

(354, 53)
(353, 53)


Starting the join to the gamedata

In [78]:
#resetting the index for joining again
awaygamedata = awaygamedata.reset_index()
homegamedata = homegamedata.reset_index()

### Distance Calculation, the result is distance traveled to game in kilometers.

In [79]:
def distance(origin, destination):
    lat1, lon1 = origin
    lat2, lon2 = destination
    radius = 6371  # km

    dlat = math.radians(lat2 - lat1)
    dlon = math.radians(lon2 - lon1)
    a = (math.sin(dlat / 2) * math.sin(dlat / 2) +
         math.cos(math.radians(lat1)) * math.cos(math.radians(lat2)) *
         math.sin(dlon / 2) * math.sin(dlon / 2))
    c = 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))
    d = radius * c

    return d


### Define predictor function to help create bracket

In [80]:
def predictor(df, roundname):
    matchupa = pd.merge(awaygamedata, df, how= 'inner', left_on='awayteam', right_on = 'awayteam')
    matchupa.drop(['hometeam','R12Location','R3Location','R4Location'], axis = 1, inplace = True)
    matchuph = pd.merge(homegamedata, df, how= 'inner', left_on='hometeam', right_on = 'hometeam')
    matchupa.drop(['awayteam'], axis = 1, inplace = True)
    if roundname == 'Round 1':
        matchup = pd.merge(matchuph, matchupa, how= 'inner', left_on='index', right_on = 'index')
    else:
        matchup = pd.merge(matchuph, matchupa, how= 'inner', left_index=True, right_index=True)
    matchup.drop(['season_x'], axis = 1, inplace = True)
    matchup.rename(columns = {'season_y':'season'}, inplace = True)
    team_locations = pd.read_csv(r'~/jupyter/capstone_Group10/data/team_home_locations.csv')
    matchup = pd.merge(matchup, team_locations, how= 'inner', left_on='hometeam', right_on = 'team_name')
    matchup.drop(['team_name', 'Unnamed: 0','home_location'], axis=1, inplace=True)
    matchup["home_lat"] = matchup['lat']
    matchup["home_long"] = matchup['long']
    matchup = pd.merge(matchup, team_locations, how= 'inner', left_on='awayteam', right_on = 'team_name')
    matchup.drop(['team_name', 'Unnamed: 0','home_location'], axis=1, inplace=True)
    matchup["away_lat"] = matchup['lat_y']
    matchup["away_long"] = matchup['long_y']
    matchup.drop(['lat_y', 'long_y','lat_x','long_x'], axis=1, inplace=True)
    site_locations = pd.read_csv(r'~/jupyter/capstone_Group10/data/2020site_locations.csv')
    if roundname == 'Sweet 16' or roundname == 'Elite 8': 
        matchup = pd.merge(matchup, site_locations, how= 'inner', left_on='R3Location', right_on = 'site_name')
    elif roundname == 'Final 4' or roundname == 'Championship':
        matchup = pd.merge(matchup, site_locations, how= 'inner', left_on='R4Location', right_on = 'site_name')
    else: 
        matchup = pd.merge(matchup, site_locations, how= 'inner', left_on='R12Location', right_on = 'site_name')
    homedist=[]
    awaydist=[]

    for index,row in matchup.iterrows():
        #origin --- Home
        hl = row['home_lat']
        hln = row['home_long']
        hloc = (hl,hln)
        #origin --- Away 
        al = row['away_lat']
        aln = row['away_long']
        aloc = (al,aln)
        #destination---- Game Locatiom
        l = row['loc_lat']
        ln = row['loc_long']
        loc = (l,ln)    
        home_result = distance(hloc, loc)
        away_result = distance(aloc, loc)  
        homedist.append(home_result)
        awaydist.append(away_result)  
    matchup['hometeam_dist']=homedist
    matchup['awayteam_dist']=awaydist
    matchup.drop(['home_lat', 'home_long','away_lat','away_long','loc_lat','loc_long'], axis=1, inplace=True)
    df["home_points_pred"] = np.nan
    df["away_points_pred"] = np.nan
    X_test3 = matchup[columns]
    X_test3['neutral'] = 1.
    homemodel = joblib.load('XGBoosthomegridmodelallpca2020.pkl')
    #saving predictions back to the current day matchup's dataframe
    df[['home_points_pred']]=homemodel.predict(X_test3).astype(int)
    X_test3 = matchup[columns]
    X_test3['neutral'] = 1.
    awaymodel = joblib.load('XGBoostawaygridmodelallpca2020.pkl')
    #saving predictions back to the current day matchup's dataframe
    df[['away_points_pred']]=awaymodel.predict(X_test3).astype(int)
    winner = []
    loser =[]
    for index, row in df.iterrows():
        if pd.to_numeric(row['away_points_pred']) > pd.to_numeric(row['home_points_pred']):
            winner.append(row['awayteam'])
            loser.append(row['hometeam'])
        elif pd.to_numeric(row['away_points_pred']) <= pd.to_numeric(row['home_points_pred']):
            winner.append(row['hometeam'])
            loser.append(row['awayteam'])
        else:
            winner.append('No Predictions')
    df['Predicted_Winner'] = winner
    df['Predicted_Loser'] = loser
    df['Round'] = roundname
    
    return df

### Define Matchup function to create matchups as tournament progresses

In [81]:
def matchups(df):
    dfa = df[['Predicted_Winner','R12Location','R3Location','R4Location','seed_x','seed_y']]
    dfa = dfa[::2]
    dfa = dfa.reset_index()
    dfa.drop(['index'], axis = 1, inplace = True)
    dfb = df['Predicted_Winner']
    dfb = dfb.reset_index()
    dfb =  dfb[~dfb["Predicted_Winner"].isin(dfa['Predicted_Winner'].values.tolist())]
    dfb = dfb.reset_index()
    dfb.drop(['index','level_0'], axis = 1, inplace = True)
    df = pd.merge(dfa,dfb, how="inner",left_index=True, right_index=True)
    df.rename(columns = {'Predicted_Winner_x':'hometeam','Predicted_Winner_y':'awayteam'}, inplace = True)
    return df

## Bracket 1 - Midwest Region

Adding Locations for each game in order to calculate distances traveled for each team. The locations are based on where the match-ups would have been played.

In [82]:
bracket1['R12Location'] = ''
bracket1['R12Location'][0:2] ='Saint Louis'
bracket1['R12Location'][2:4] ='Tampa'
bracket1['R12Location'][4:6] ='Cleveland'
bracket1['R12Location'][6:8] ='Omaha'
bracket1['R3Location'] = 'Indianapolis'
bracket1.head(16)

Unnamed: 0,index,hometeam,awayteam,seed_x,seed_y,R4Location,R12Location,R3Location
0,0,KANSAS,NORFOLK_STATE,1,16,Atlanta,Saint Louis,Indianapolis
31,31,FLORIDA,AUBURN,8,9,Atlanta,Saint Louis,Indianapolis
19,19,SETON_HALL,VERMONT,5,12,Atlanta,Tampa,Indianapolis
15,15,MICHIGAN,TEXAS_STATE,4,13,Atlanta,Tampa,Indianapolis
23,23,PURDUE,EAST_TENNESSEE_STATE,6,11,Atlanta,Cleveland,Indianapolis
11,11,CREIGHTON,BELMONT,3,14,Atlanta,Cleveland,Indianapolis
27,27,RUTGERS,LOUISIANA_STATE,7,10,Atlanta,Omaha,Indianapolis
7,7,OHIO_STATE,COLGATE,2,15,Atlanta,Omaha,Indianapolis


### Simulate round 1 for Midwest Region

In [83]:
bracket1 = predictor(bracket1,'Round 1')
bracket1.drop(['index'], axis = 1, inplace = True)
bracket1.head(16)



Unnamed: 0,hometeam,awayteam,seed_x,seed_y,R4Location,R12Location,R3Location,home_points_pred,away_points_pred,Predicted_Winner,Predicted_Loser,Round
0,KANSAS,NORFOLK_STATE,1,16,Atlanta,Saint Louis,Indianapolis,93,77,KANSAS,NORFOLK_STATE,Round 1
31,FLORIDA,AUBURN,8,9,Atlanta,Saint Louis,Indianapolis,80,74,FLORIDA,AUBURN,Round 1
19,SETON_HALL,VERMONT,5,12,Atlanta,Tampa,Indianapolis,80,76,SETON_HALL,VERMONT,Round 1
15,MICHIGAN,TEXAS_STATE,4,13,Atlanta,Tampa,Indianapolis,92,68,MICHIGAN,TEXAS_STATE,Round 1
23,PURDUE,EAST_TENNESSEE_STATE,6,11,Atlanta,Cleveland,Indianapolis,85,76,PURDUE,EAST_TENNESSEE_STATE,Round 1
11,CREIGHTON,BELMONT,3,14,Atlanta,Cleveland,Indianapolis,87,73,CREIGHTON,BELMONT,Round 1
27,RUTGERS,LOUISIANA_STATE,7,10,Atlanta,Omaha,Indianapolis,88,74,RUTGERS,LOUISIANA_STATE,Round 1
7,OHIO_STATE,COLGATE,2,15,Atlanta,Omaha,Indianapolis,78,71,OHIO_STATE,COLGATE,Round 1


In [41]:
bracket1q = matchups(bracket1)

##### Results

In [42]:
bracket1q.head()

Unnamed: 0,hometeam,R12Location,R3Location,R4Location,seed_x,seed_y,awayteam
0,KANSAS,Saint Louis,Indianapolis,Atlanta,1,16,FLORIDA
1,SETON_HALL,Tampa,Indianapolis,Atlanta,5,12,MICHIGAN
2,PURDUE,Cleveland,Indianapolis,Atlanta,6,11,CREIGHTON
3,RUTGERS,Omaha,Indianapolis,Atlanta,7,10,OHIO_STATE


### Simulate Round 2 of Midwest Region

In [43]:
bracket1q = predictor(bracket1q,'Round 2')



In [44]:
bracket1s = matchups(bracket1q)

### Simulate Sweet Sixteen (Round 3) of Midwest Region

In [45]:
bracket1s = predictor(bracket1s,'Sweet 16')



In [46]:
bracket1f = matchups(bracket1s)

### Simulate Elite Eight (Round 4) of Midwest Region

In [47]:
bracket1f = predictor(bracket1f,'Elite 8')



#### Concatenate dataframes to construct final midwest region

In [48]:
midwestregion = pd.concat([bracket1,bracket1q,bracket1s,bracket1f])
midwestregion['Region'] ='MidWest'
#midwestregion.to_csv(r'2020Tourney.csv', index = False)

In [49]:
midwestregion.head(16)

Unnamed: 0,Predicted_Loser,Predicted_Winner,R12Location,R3Location,R4Location,Round,away_points_pred,awayteam,home_points_pred,hometeam,seed_x,seed_y,Region
0,NORFOLK_STATE,KANSAS,Saint Louis,Indianapolis,Atlanta,Round 1,77,NORFOLK_STATE,93,KANSAS,1,16,MidWest
31,AUBURN,FLORIDA,Saint Louis,Indianapolis,Atlanta,Round 1,74,AUBURN,80,FLORIDA,8,9,MidWest
19,VERMONT,SETON_HALL,Tampa,Indianapolis,Atlanta,Round 1,76,VERMONT,80,SETON_HALL,5,12,MidWest
15,TEXAS_STATE,MICHIGAN,Tampa,Indianapolis,Atlanta,Round 1,68,TEXAS_STATE,92,MICHIGAN,4,13,MidWest
23,EAST_TENNESSEE_STATE,PURDUE,Cleveland,Indianapolis,Atlanta,Round 1,76,EAST_TENNESSEE_STATE,85,PURDUE,6,11,MidWest
11,BELMONT,CREIGHTON,Cleveland,Indianapolis,Atlanta,Round 1,73,BELMONT,87,CREIGHTON,3,14,MidWest
27,LOUISIANA_STATE,RUTGERS,Omaha,Indianapolis,Atlanta,Round 1,74,LOUISIANA_STATE,88,RUTGERS,7,10,MidWest
7,COLGATE,OHIO_STATE,Omaha,Indianapolis,Atlanta,Round 1,71,COLGATE,78,OHIO_STATE,2,15,MidWest
0,FLORIDA,KANSAS,Saint Louis,Indianapolis,Atlanta,Round 2,69,FLORIDA,82,KANSAS,1,16,MidWest
1,MICHIGAN,SETON_HALL,Tampa,Indianapolis,Atlanta,Round 2,69,MICHIGAN,75,SETON_HALL,5,12,MidWest


## Bracket 2 (South Region)

#### Adding locations for games played in the south region 

In [50]:
bracket2['R12Location'] = ''
bracket2['R12Location'][0:2] ='Omaha'
bracket2['R12Location'][2:4] ='Greensboro'
bracket2['R12Location'][4:6] ='Saint Louis'
bracket2['R12Location'][6:8] ='Tampa'
bracket2['R3Location'] = 'Houston'

### Simulate Round 1 of South Region

In [51]:
bracket2 = predictor(bracket2,'Round 1')
bracket2.drop(['index'], axis = 1, inplace = True)
bracket2.head(16)



Unnamed: 0,hometeam,awayteam,seed_x,seed_y,R4Location,R12Location,R3Location,home_points_pred,away_points_pred,Predicted_Winner,Predicted_Loser,Round
1,GONZAGA,PRAIRIE_VIEW,1,16,Atlanta,Omaha,Houston,81,71,GONZAGA,PRAIRIE_VIEW,Round 1
30,MARQUETTE,INDIANA,8,9,Atlanta,Omaha,Houston,86,76,MARQUETTE,INDIANA,Round 1
18,ARIZONA,LIBERTY,5,12,Atlanta,Greensboro,Houston,98,80,ARIZONA,LIBERTY,Round 1
14,FLORIDA_STATE,STEPHEN_F_AUSTIN,4,13,Atlanta,Greensboro,Houston,83,80,FLORIDA_STATE,STEPHEN_F_AUSTIN,Round 1
22,IOWA,YALE,6,11,Atlanta,Saint Louis,Houston,88,75,IOWA,YALE,Round 1
10,MARYLAND,UTAH,3,14,Atlanta,Saint Louis,Houston,79,73,MARYLAND,UTAH,Round 1
26,MINNESOTA,SAINT_MARYS_CA,7,10,Atlanta,Tampa,Houston,91,68,MINNESOTA,SAINT_MARYS_CA,Round 1
6,MICHIGAN_STATE,WRIGHT_STATE,2,15,Atlanta,Tampa,Houston,80,78,MICHIGAN_STATE,WRIGHT_STATE,Round 1


### Simulate Round 2 of South Region

In [52]:
bracket2q = matchups(bracket2)
bracket2q = predictor(bracket2q,'Round 2')
bracket2q.head()



Unnamed: 0,hometeam,R12Location,R3Location,R4Location,seed_x,seed_y,awayteam,home_points_pred,away_points_pred,Predicted_Winner,Predicted_Loser,Round
0,GONZAGA,Omaha,Houston,Atlanta,1,16,MARQUETTE,77,76,GONZAGA,MARQUETTE,Round 2
1,ARIZONA,Greensboro,Houston,Atlanta,5,12,FLORIDA_STATE,87,79,ARIZONA,FLORIDA_STATE,Round 2
2,IOWA,Saint Louis,Houston,Atlanta,6,11,MARYLAND,81,78,IOWA,MARYLAND,Round 2
3,MINNESOTA,Tampa,Houston,Atlanta,7,10,MICHIGAN_STATE,79,76,MINNESOTA,MICHIGAN_STATE,Round 2


### Simulate Sweet 16 (Round 3) of South Region

In [53]:
bracket2s = matchups(bracket2q)
bracket2s = predictor(bracket2s,'Sweet 16')
bracket2s.head()



Unnamed: 0,hometeam,R12Location,R3Location,R4Location,seed_x,seed_y,awayteam,home_points_pred,away_points_pred,Predicted_Winner,Predicted_Loser,Round
0,GONZAGA,Omaha,Houston,Atlanta,1,16,ARIZONA,87,78,GONZAGA,ARIZONA,Sweet 16
1,IOWA,Saint Louis,Houston,Atlanta,6,11,MINNESOTA,79,79,IOWA,MINNESOTA,Sweet 16


### Simulate Elite Eight(Round 4) of South Region 

In [54]:
bracket2f = matchups(bracket2s)
bracket2f = predictor(bracket2f,'Elite 8')
bracket2f.head()



Unnamed: 0,hometeam,R12Location,R3Location,R4Location,seed_x,seed_y,awayteam,home_points_pred,away_points_pred,Predicted_Winner,Predicted_Loser,Round
0,GONZAGA,Omaha,Houston,Atlanta,1,16,IOWA,87,80,GONZAGA,IOWA,Elite 8


### Create final dataframe for south region

In [55]:
southregion = pd.concat([bracket2,bracket2q,bracket2s,bracket2f])
southregion['Region'] ='South'
southregion.head(16)

Unnamed: 0,Predicted_Loser,Predicted_Winner,R12Location,R3Location,R4Location,Round,away_points_pred,awayteam,home_points_pred,hometeam,seed_x,seed_y,Region
1,PRAIRIE_VIEW,GONZAGA,Omaha,Houston,Atlanta,Round 1,71,PRAIRIE_VIEW,81,GONZAGA,1,16,South
30,INDIANA,MARQUETTE,Omaha,Houston,Atlanta,Round 1,76,INDIANA,86,MARQUETTE,8,9,South
18,LIBERTY,ARIZONA,Greensboro,Houston,Atlanta,Round 1,80,LIBERTY,98,ARIZONA,5,12,South
14,STEPHEN_F_AUSTIN,FLORIDA_STATE,Greensboro,Houston,Atlanta,Round 1,80,STEPHEN_F_AUSTIN,83,FLORIDA_STATE,4,13,South
22,YALE,IOWA,Saint Louis,Houston,Atlanta,Round 1,75,YALE,88,IOWA,6,11,South
10,UTAH,MARYLAND,Saint Louis,Houston,Atlanta,Round 1,73,UTAH,79,MARYLAND,3,14,South
26,SAINT_MARYS_CA,MINNESOTA,Tampa,Houston,Atlanta,Round 1,68,SAINT_MARYS_CA,91,MINNESOTA,7,10,South
6,WRIGHT_STATE,MICHIGAN_STATE,Tampa,Houston,Atlanta,Round 1,78,WRIGHT_STATE,80,MICHIGAN_STATE,2,15,South
0,MARQUETTE,GONZAGA,Omaha,Houston,Atlanta,Round 2,76,MARQUETTE,77,GONZAGA,1,16,South
1,FLORIDA_STATE,ARIZONA,Greensboro,Houston,Atlanta,Round 2,79,FLORIDA_STATE,87,ARIZONA,5,12,South


# Bracket 3 (West Region)

#### Adding locations for game played in the west region

In [56]:
bracket3['R12Location'] = ''
bracket3['R12Location'][0:2] ='Spokane'
bracket3['R12Location'][2:4] ='Sacramento'
bracket3['R12Location'][4:6] ='Spokane'
bracket3['R12Location'][6:8] ='Sacramento'
bracket3['R3Location'] = 'Los Angelas'

### Simulate Round 1 of  West Region

In [57]:
bracket3 = predictor(bracket3,'Round 1')
bracket3.drop(['index'], axis = 1, inplace = True)
bracket3.head(16)



Unnamed: 0,hometeam,awayteam,seed_x,seed_y,R4Location,R12Location,R3Location,home_points_pred,away_points_pred,Predicted_Winner,Predicted_Loser,Round
2,BAYLOR,SAINT_FRANCIS_PA,1,16,Atlanta,Spokane,Los Angelas,80,66,BAYLOR,SAINT_FRANCIS_PA,Round 1
29,ILLINOIS,COLORADO,8,9,Atlanta,Spokane,Los Angelas,78,75,ILLINOIS,COLORADO,Round 1
17,VILLANOVA,AKRON,5,12,Atlanta,Sacramento,Los Angelas,82,71,VILLANOVA,AKRON,Round 1
13,HOUSTON,CALIFORNIA_IRVINE,4,13,Atlanta,Sacramento,Los Angelas,79,73,HOUSTON,CALIFORNIA_IRVINE,Round 1
21,WISCONSIN,LOUISIANA_TECH,6,11,Atlanta,Spokane,Los Angelas,78,68,WISCONSIN,LOUISIANA_TECH,Round 1
9,WEST_VIRGINIA,WESTERN_KENTUCKY,3,14,Atlanta,Spokane,Los Angelas,80,74,WEST_VIRGINIA,WESTERN_KENTUCKY,Round 1
25,PENN_STATE,WICHITA_STATE,7,10,Atlanta,Sacramento,Los Angelas,80,71,PENN_STATE,WICHITA_STATE,Round 1
5,SAN_DIEGO_STATE,WINTHROP,2,15,Atlanta,Sacramento,Los Angelas,80,75,SAN_DIEGO_STATE,WINTHROP,Round 1


### Simulate Round 2 of  West Region

In [58]:
bracket3q = matchups(bracket3)
bracket3q = predictor(bracket3q,'Round 2')
bracket3q.head()



Unnamed: 0,hometeam,R12Location,R3Location,R4Location,seed_x,seed_y,awayteam,home_points_pred,away_points_pred,Predicted_Winner,Predicted_Loser,Round
0,BAYLOR,Spokane,Los Angelas,Atlanta,1,16,ILLINOIS,78,67,BAYLOR,ILLINOIS,Round 2
1,VILLANOVA,Sacramento,Los Angelas,Atlanta,5,12,HOUSTON,74,71,VILLANOVA,HOUSTON,Round 2
2,WISCONSIN,Spokane,Los Angelas,Atlanta,6,11,WEST_VIRGINIA,78,73,WISCONSIN,WEST_VIRGINIA,Round 2
3,PENN_STATE,Sacramento,Los Angelas,Atlanta,7,10,SAN_DIEGO_STATE,81,75,PENN_STATE,SAN_DIEGO_STATE,Round 2


### Simulate Sweet 16 (Round 3) of West Region

In [59]:
bracket3s = matchups(bracket3q)
bracket3s = predictor(bracket3s,'Sweet 16')
bracket3s.head()



Unnamed: 0,hometeam,R12Location,R3Location,R4Location,seed_x,seed_y,awayteam,home_points_pred,away_points_pred,Predicted_Winner,Predicted_Loser,Round
0,BAYLOR,Spokane,Los Angelas,Atlanta,1,16,VILLANOVA,77,69,BAYLOR,VILLANOVA,Sweet 16
1,WISCONSIN,Spokane,Los Angelas,Atlanta,6,11,PENN_STATE,75,73,WISCONSIN,PENN_STATE,Sweet 16


### Simulate Elite Eight (Round 4) of West Region

In [60]:
bracket3f = matchups(bracket3s)
bracket3f = predictor(bracket3f,'Elite 8')
bracket3f.head()



Unnamed: 0,hometeam,R12Location,R3Location,R4Location,seed_x,seed_y,awayteam,home_points_pred,away_points_pred,Predicted_Winner,Predicted_Loser,Round
0,BAYLOR,Spokane,Los Angelas,Atlanta,1,16,WISCONSIN,78,69,BAYLOR,WISCONSIN,Elite 8


### Concatenate results for West Region 

In [61]:
westregion = pd.concat([bracket3,bracket3q,bracket3s,bracket3f])
westregion['Region'] ='West'
westregion.head(16)

Unnamed: 0,Predicted_Loser,Predicted_Winner,R12Location,R3Location,R4Location,Round,away_points_pred,awayteam,home_points_pred,hometeam,seed_x,seed_y,Region
2,SAINT_FRANCIS_PA,BAYLOR,Spokane,Los Angelas,Atlanta,Round 1,66,SAINT_FRANCIS_PA,80,BAYLOR,1,16,West
29,COLORADO,ILLINOIS,Spokane,Los Angelas,Atlanta,Round 1,75,COLORADO,78,ILLINOIS,8,9,West
17,AKRON,VILLANOVA,Sacramento,Los Angelas,Atlanta,Round 1,71,AKRON,82,VILLANOVA,5,12,West
13,CALIFORNIA_IRVINE,HOUSTON,Sacramento,Los Angelas,Atlanta,Round 1,73,CALIFORNIA_IRVINE,79,HOUSTON,4,13,West
21,LOUISIANA_TECH,WISCONSIN,Spokane,Los Angelas,Atlanta,Round 1,68,LOUISIANA_TECH,78,WISCONSIN,6,11,West
9,WESTERN_KENTUCKY,WEST_VIRGINIA,Spokane,Los Angelas,Atlanta,Round 1,74,WESTERN_KENTUCKY,80,WEST_VIRGINIA,3,14,West
25,WICHITA_STATE,PENN_STATE,Sacramento,Los Angelas,Atlanta,Round 1,71,WICHITA_STATE,80,PENN_STATE,7,10,West
5,WINTHROP,SAN_DIEGO_STATE,Sacramento,Los Angelas,Atlanta,Round 1,75,WINTHROP,80,SAN_DIEGO_STATE,2,15,West
0,ILLINOIS,BAYLOR,Spokane,Los Angelas,Atlanta,Round 2,67,ILLINOIS,78,BAYLOR,1,16,West
1,HOUSTON,VILLANOVA,Sacramento,Los Angelas,Atlanta,Round 2,71,HOUSTON,74,VILLANOVA,5,12,West


# Bracket 4 (East Region)

#### Adding locations for game played in the east region

In [62]:
bracket4['R12Location'] = ''
bracket4['R12Location'][0:2] ='Greensboro'
bracket4['R12Location'][2:4] ='Albany'
bracket4['R12Location'][4:6] ='Cleveland'
bracket4['R12Location'][6:8] ='Albany'
bracket4['R3Location'] = 'New York'

### Simulate Round 1 of East Region

In [63]:
bracket4 = predictor(bracket4,'Round 1')
bracket4.drop(['index'], axis = 1, inplace = True)
bracket4.head(16)



Unnamed: 0,hometeam,awayteam,seed_x,seed_y,R4Location,R12Location,R3Location,home_points_pred,away_points_pred,Predicted_Winner,Predicted_Loser,Round
3,DAYTON,VALPARAISO,1,16,Atlanta,Greensboro,New York,89,74,DAYTON,VALPARAISO,Round 1
28,KENTUCKY,OKLAHOMA,8,9,Atlanta,Greensboro,New York,71,67,KENTUCKY,OKLAHOMA,Round 1
16,OREGON,NEW_MEXICO_STATE,5,12,Atlanta,Albany,New York,89,74,OREGON,NEW_MEXICO_STATE,Round 1
12,BRIGHAM_YOUNG,HOFSTRA,4,13,Atlanta,Albany,New York,84,72,BRIGHAM_YOUNG,HOFSTRA,Round 1
20,TEXAS_TECH,NORTHERN_COLORADO,6,11,Atlanta,Cleveland,New York,84,71,TEXAS_TECH,NORTHERN_COLORADO,Round 1
8,LOUISVILLE,NORTH_DAKOTA_STATE,3,14,Atlanta,Cleveland,New York,76,74,LOUISVILLE,NORTH_DAKOTA_STATE,Round 1
24,BUTLER,NORTHERN_IOWA,7,10,Atlanta,Albany,New York,83,71,BUTLER,NORTHERN_IOWA,Round 1
4,DUKE,SIENA,2,15,Atlanta,Albany,New York,81,72,DUKE,SIENA,Round 1


### Simulate Round 2 of East Region

In [64]:
bracket4q = matchups(bracket4)
bracket4q = predictor(bracket4q,'Round 2')
bracket4q.head()



Unnamed: 0,hometeam,R12Location,R3Location,R4Location,seed_x,seed_y,awayteam,home_points_pred,away_points_pred,Predicted_Winner,Predicted_Loser,Round
0,DAYTON,Greensboro,New York,Atlanta,1,16,KENTUCKY,75,75,DAYTON,KENTUCKY,Round 2
1,OREGON,Albany,New York,Atlanta,5,12,BRIGHAM_YOUNG,79,73,OREGON,BRIGHAM_YOUNG,Round 2
2,TEXAS_TECH,Cleveland,New York,Atlanta,6,11,LOUISVILLE,81,73,TEXAS_TECH,LOUISVILLE,Round 2
3,BUTLER,Albany,New York,Atlanta,7,10,DUKE,76,74,BUTLER,DUKE,Round 2


### Simulate Sweet 16 (Round 3) of East Region

In [65]:
bracket4s = matchups(bracket4q)
bracket4s = predictor(bracket4s,'Sweet 16')
bracket4s.head()



Unnamed: 0,hometeam,R12Location,R3Location,R4Location,seed_x,seed_y,awayteam,home_points_pred,away_points_pred,Predicted_Winner,Predicted_Loser,Round
0,DAYTON,Greensboro,New York,Atlanta,1,16,OREGON,83,77,DAYTON,OREGON,Sweet 16
1,TEXAS_TECH,Cleveland,New York,Atlanta,6,11,BUTLER,76,77,BUTLER,TEXAS_TECH,Sweet 16


### Simulate Elite Eight (Round 4) of East Region 

In [66]:
bracket4f = matchups(bracket4s)
bracket4f = predictor(bracket4f,'Elite 8')
bracket4f.head()



Unnamed: 0,hometeam,R12Location,R3Location,R4Location,seed_x,seed_y,awayteam,home_points_pred,away_points_pred,Predicted_Winner,Predicted_Loser,Round
0,DAYTON,Greensboro,New York,Atlanta,1,16,BUTLER,83,76,DAYTON,BUTLER,Elite 8


### Concatenate results for East Region

In [67]:
eastregion = pd.concat([bracket4,bracket4q,bracket4s,bracket4f])
eastregion['Region'] ='East'
eastregion.head(16)

Unnamed: 0,Predicted_Loser,Predicted_Winner,R12Location,R3Location,R4Location,Round,away_points_pred,awayteam,home_points_pred,hometeam,seed_x,seed_y,Region
3,VALPARAISO,DAYTON,Greensboro,New York,Atlanta,Round 1,74,VALPARAISO,89,DAYTON,1,16,East
28,OKLAHOMA,KENTUCKY,Greensboro,New York,Atlanta,Round 1,67,OKLAHOMA,71,KENTUCKY,8,9,East
16,NEW_MEXICO_STATE,OREGON,Albany,New York,Atlanta,Round 1,74,NEW_MEXICO_STATE,89,OREGON,5,12,East
12,HOFSTRA,BRIGHAM_YOUNG,Albany,New York,Atlanta,Round 1,72,HOFSTRA,84,BRIGHAM_YOUNG,4,13,East
20,NORTHERN_COLORADO,TEXAS_TECH,Cleveland,New York,Atlanta,Round 1,71,NORTHERN_COLORADO,84,TEXAS_TECH,6,11,East
8,NORTH_DAKOTA_STATE,LOUISVILLE,Cleveland,New York,Atlanta,Round 1,74,NORTH_DAKOTA_STATE,76,LOUISVILLE,3,14,East
24,NORTHERN_IOWA,BUTLER,Albany,New York,Atlanta,Round 1,71,NORTHERN_IOWA,83,BUTLER,7,10,East
4,SIENA,DUKE,Albany,New York,Atlanta,Round 1,72,SIENA,81,DUKE,2,15,East
0,KENTUCKY,DAYTON,Greensboro,New York,Atlanta,Round 2,75,KENTUCKY,75,DAYTON,1,16,East
1,BRIGHAM_YOUNG,OREGON,Albany,New York,Atlanta,Round 2,73,BRIGHAM_YOUNG,79,OREGON,5,12,East


## Final Four 

In [68]:
final4=pd.concat([bracket1f,bracket2f,bracket3f,bracket4f])
final4.head()

Unnamed: 0,hometeam,R12Location,R3Location,R4Location,seed_x,seed_y,awayteam,home_points_pred,away_points_pred,Predicted_Winner,Predicted_Loser,Round
0,KANSAS,Saint Louis,Indianapolis,Atlanta,1,16,PURDUE,83,69,KANSAS,PURDUE,Elite 8
0,GONZAGA,Omaha,Houston,Atlanta,1,16,IOWA,87,80,GONZAGA,IOWA,Elite 8
0,BAYLOR,Spokane,Los Angelas,Atlanta,1,16,WISCONSIN,78,69,BAYLOR,WISCONSIN,Elite 8
0,DAYTON,Greensboro,New York,Atlanta,1,16,BUTLER,83,76,DAYTON,BUTLER,Elite 8


In [69]:
final4 = matchups(final4)
final4 = predictor(final4,'Final 4')
final4.head()



Unnamed: 0,hometeam,R12Location,R3Location,R4Location,seed_x,seed_y,awayteam,home_points_pred,away_points_pred,Predicted_Winner,Predicted_Loser,Round
0,KANSAS,Saint Louis,Indianapolis,Atlanta,1,16,GONZAGA,78,68,KANSAS,GONZAGA,Final 4
1,BAYLOR,Spokane,Los Angelas,Atlanta,1,16,DAYTON,83,70,BAYLOR,DAYTON,Final 4


## National Championship

In [70]:
championship = matchups(final4)
championship = predictor(championship,'championship')
championship.head()



Unnamed: 0,hometeam,R12Location,R3Location,R4Location,seed_x,seed_y,awayteam,home_points_pred,away_points_pred,Predicted_Winner,Predicted_Loser,Round
0,KANSAS,Saint Louis,Indianapolis,Atlanta,1,16,BAYLOR,87,70,KANSAS,BAYLOR,championship


## Grand Finale: 2020NCAATourneyResults

In [71]:
final = pd.concat([eastregion,westregion,midwestregion,southregion,final4,championship])
final.to_csv('~/jupyter/capstone_Group10/data/2020NCAATourneyResults.csv')
final.head(100)

Unnamed: 0,Predicted_Loser,Predicted_Winner,R12Location,R3Location,R4Location,Region,Round,away_points_pred,awayteam,home_points_pred,hometeam,seed_x,seed_y
3,VALPARAISO,DAYTON,Greensboro,New York,Atlanta,East,Round 1,74,VALPARAISO,89,DAYTON,1,16
28,OKLAHOMA,KENTUCKY,Greensboro,New York,Atlanta,East,Round 1,67,OKLAHOMA,71,KENTUCKY,8,9
16,NEW_MEXICO_STATE,OREGON,Albany,New York,Atlanta,East,Round 1,74,NEW_MEXICO_STATE,89,OREGON,5,12
12,HOFSTRA,BRIGHAM_YOUNG,Albany,New York,Atlanta,East,Round 1,72,HOFSTRA,84,BRIGHAM_YOUNG,4,13
20,NORTHERN_COLORADO,TEXAS_TECH,Cleveland,New York,Atlanta,East,Round 1,71,NORTHERN_COLORADO,84,TEXAS_TECH,6,11
...,...,...,...,...,...,...,...,...,...,...,...,...,...
1,MINNESOTA,IOWA,Saint Louis,Houston,Atlanta,South,Sweet 16,79,MINNESOTA,79,IOWA,6,11
0,IOWA,GONZAGA,Omaha,Houston,Atlanta,South,Elite 8,80,IOWA,87,GONZAGA,1,16
0,GONZAGA,KANSAS,Saint Louis,Indianapolis,Atlanta,,Final 4,68,GONZAGA,78,KANSAS,1,16
1,DAYTON,BAYLOR,Spokane,Los Angelas,Atlanta,,Final 4,70,DAYTON,83,BAYLOR,1,16
