# IDM Engineering - March Madness Machine Learning - 2020

Welcome to the first IDM engineering March Madness Machine Learning lunch and learn! Thanks for attending!

The point of this lunch and learn is to be educational about the process around machine learning, not how to code in python. This activity is presented in a jupyter notebook, and it is setup in a way such that you can simply run the full notebook and you will get a result. Or, you can follow along and customize your algorithms as you see fit. 

## Table of Contents:
* Jupyter Notebooks
* Library Imports 
* Data Manipulation
* Data Analysis
* Model Exploration
    * Linear - Ordinary Least Squares
    * Linear - Logistic Regression
    * Random Forest Classifier
    * Neural Network 
        * Scaled Data
        * Grid Search CV
* Build the Final Model
* Load Submission Data
* Make Predictions
* Simulate Tournament

## Jupyter Notebooks

Quick note about jupyter notebooks. Jupyter allows you to execute individual snippets of code within one kernal. While selecting a cell, you can hit the run button to run the individual cell. You can also hit shift-enter. 

If a cell gets stuck, hit the stop button next to the run button. If your kernal crashes, you can hit kernal-restart kernal to get a fresh python instance. Note that you will lose all of your work if you hit that.

Comment cells lines of code with the $ \# $ character, or you can use "ctrl-/".

## Library Imports

Python is an incredibly flexible language, partially due to how modular it is. We can extend its basic functionality by importanting 3rd party libraries.

In [2]:
import numpy as np
import pandas as pd
import os
import matplotlib.pyplot as plt
import seaborn as sns
import pkg_resources

from binaryTree import Node
from PIL import Image, ImageDraw

from sklearn.model_selection import GridSearchCV
from sklearn import linear_model
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import GridSearchCV

In [3]:
cwd = os.getcwd()

## Data Manipulation

The intention of the data manipulation section is to create a dataframe of our target variable (result) and the given factors.

First, let's see what format the data is in that we currently have.

For this section, we are going to read in our training sets and records. You could start the notebook right here if you got lost at any point. 

### Train the Model

Remember some will count twice!

### Regular Season Data Analysis

In [4]:
# Existing code from data manipulation section. Only run if needed. 
cwd = os.getcwd()

tourney_cresults = pd.read_csv(cwd + '/data/MNCAATourneyCompactResults.csv')
seeds = pd.read_csv(cwd + '/data/MNCAATourneySeeds.csv')
season_dresults = pd.read_csv(cwd +'/data/MRegularSeasonDetailedResults.csv')

targetYear = 2003
tourney_cresults = tourney_cresults.loc[tourney_cresults['Season'] >= targetYear]

training_set = pd.read_csv("training_set.csv")
record = pd.read_csv('record.csv')

seeds['Seed'] =  pd.to_numeric(seeds['Seed'].str[1:3], downcast='integer',errors='coerce')

def delta_seed(row):
    cond = (seeds['Season'] == row['Season'])
    return seeds[cond & (seeds['TeamID'] == row['Team1'])]['Seed'].iloc[0] - seeds[cond & (seeds['TeamID'] == row['Team2'])]['Seed'].iloc[0]

# function to, given a row, calculate what the difference between the two seeds was. 
#Function to look up 
def delta_winPct(row):
    cond1 = (record['Season'] == row['Season']) & (record['WTeamID'] == row['Team1'])
    cond2 = (record['Season'] == row['Season']) & (record['WTeamID'] == row['Team2'])
    return (record[cond1]['wins']/record[cond1]['games']).mean() - (record[cond2]['wins']/record[cond2]['games']).mean()

def get_points_against(row):
    wcond = (dfW['Season'] == row['Season']) & (dfW['WTeamID'] == row['WTeamID']) 
    fld1 = 'LScore'
    lcond = (dfL['Season'] == row['Season']) & (dfL['LTeamID'] == row['WTeamID']) 
    fld2 = 'WScore'
    retVal = dfW[wcond][fld1].sum()
    if len(dfL[lcond][fld2]) > 0:
        retVal = retVal + dfL[lcond][fld2].sum() 
    return retVal

def get_points_for(row):
    wcond = (dfW['Season'] == row['Season']) & (dfW['WTeamID'] == row['WTeamID']) 
    fld1 = 'WScore'
    lcond = (dfL['Season'] == row['Season']) & (dfL['LTeamID'] == row['WTeamID']) 
    fld2 = 'LScore'
    retVal = dfW[wcond][fld1].sum()
    if len(dfL[lcond][fld2]) > 0:
        retVal = retVal + dfL[lcond][fld2].sum() 
    return retVal

def get_remaining_stats(row, field):
    wcond = (dfW['Season'] == row['Season']) & (dfW['WTeamID'] == row['WTeamID']) 
    fld1 = 'W' + field
    lcond = (dfL['Season'] == row['Season']) & (dfL['LTeamID'] == row['WTeamID']) 
    fld2 = 'L'+ field
    retVal = dfW[wcond][fld1].sum()
    if len(dfL[lcond][fld2]) > 0:
        retVal = retVal + dfL[lcond][fld2].sum()
    return retVal

def delta_stat(row, field):
    cond1 = (record['Season'] == row['Season']) & (record['WTeamID'] == row['Team1'])
    cond2 = (record['Season'] == row['Season']) & (record['WTeamID'] == row['Team2'])
    return (record[cond1][field]/record[cond1]['games']).mean() - (record[cond2][field]/record[cond2]['games']).mean()

Make sure you have record, and training set data

Ok, so now we have a trained model. Next we need to find sumission data.

The kaggle competition provides a sample submission.csv file that contains a matchup ID, and a default prediction value. 

In [5]:
sub = pd.read_csv(cwd + '/data/SampleSubmissionStage2.csv')
sub

Unnamed: 0,ID,Pred
0,2019_1101_1113,0.5
1,2019_1101_1120,0.5
2,2019_1101_1124,0.5
3,2019_1101_1125,0.5
4,2019_1101_1133,0.5
...,...,...
2273,2019_1449_1459,0.5
2274,2019_1449_1463,0.5
2275,2019_1458_1459,0.5
2276,2019_1458_1463,0.5


Split this string into Team IDs and year

In [6]:
sub['Season'], sub['Team1'], sub['Team2'] = sub['ID'].str.split('_').str
sub[['Season', 'Team1', 'Team2']] = sub[['Season', 'Team1', 'Team2']].apply(pd.to_numeric)
sub

Unnamed: 0,ID,Pred,Season,Team1,Team2
0,2019_1101_1113,0.5,2019,1101,1113
1,2019_1101_1120,0.5,2019,1101,1120
2,2019_1101_1124,0.5,2019,1101,1124
3,2019_1101_1125,0.5,2019,1101,1125
4,2019_1101_1133,0.5,2019,1101,1133
...,...,...,...,...,...
2273,2019_1449_1459,0.5,2019,1449,1459
2274,2019_1449_1463,0.5,2019,1449,1463
2275,2019_1458_1459,0.5,2019,1458,1459
2276,2019_1458_1463,0.5,2019,1458,1463


Calculate the deltaSeed and deltaWinPct features

In [7]:
sub['deltaSeed'] = sub.apply(delta_seed,axis=1)
# sub['deltaMO'] = sub.apply(delta_ord,axis=1)
sub['deltaWinPct'] = sub.apply(delta_winPct,axis=1)
sub

Unnamed: 0,ID,Pred,Season,Team1,Team2,deltaSeed,deltaWinPct
0,2019_1101_1113,0.5,2019,1101,1113,4,0.105603
1,2019_1101_1120,0.5,2019,1101,1120,10,0.057809
2,2019_1101_1124,0.5,2019,1101,1124,6,0.199353
3,2019_1101_1125,0.5,2019,1101,1125,4,-0.040230
4,2019_1101_1133,0.5,2019,1101,1133,0,0.217346
...,...,...,...,...,...,...,...
2273,2019_1449_1459,0.5,2019,1449,1459,2,-0.101961
2274,2019_1449_1463,0.5,2019,1449,1463,-5,0.014706
2275,2019_1458_1459,0.5,2019,1458,1459,-2,-0.169697
2276,2019_1458_1463,0.5,2019,1458,1463,-9,-0.053030


Now, caluclate the rest of our stats. This will take a while.

In [8]:
# cut to slides
rawCols = ['PointsFor','PointsAgainst','FGM','FGA','FGM3','FGA3','FTM','FTA','OR','DR','Ast','TO','Stl','Blk','PF']

for rawCol in rawCols:
    print("Processing",rawCol)
    sub['delta' + rawCol] = sub.apply(delta_stat,args=(rawCol,),axis=1)

Processing PointsFor
Processing PointsAgainst
Processing FGM
Processing FGA
Processing FGM3
Processing FGA3
Processing FTM
Processing FTA
Processing OR
Processing DR
Processing Ast
Processing TO
Processing Stl
Processing Blk
Processing PF


In [14]:
sub.to_csv("training_set_stage2.csv", index=False)
sub

Unnamed: 0,ID,Pred,Season,Team1,Team2,deltaSeed,deltaWinPct,deltaPointsFor,deltaPointsAgainst,deltaFGM,...,deltaFGA3,deltaFTM,deltaFTA,deltaOR,deltaDR,deltaAst,deltaTO,deltaStl,deltaBlk,deltaPF
0,2019_1101_1113,0.5,2019,1101,1113,4,0.105603,-6.088362,-8.165948,-1.248922,...,-2.353448,-3.581897,-6.837284,-3.087284,-4.915948,1.026940,-1.938578,1.781250,-0.667026,-0.768319
1,2019_1101_1120,0.5,2019,1101,1120,10,0.057809,-7.158215,-3.691684,-1.684584,...,-11.074037,0.381339,0.333671,-2.666329,0.955375,0.208925,-0.491886,-1.294118,-2.212982,0.755578
2,2019_1101_1124,0.5,2019,1101,1124,6,0.199353,0.067888,-2.290948,-0.155172,...,-4.478448,1.074353,0.193966,-4.306034,-1.697198,0.776940,-1.626078,1.875000,-2.198276,0.356681
3,2019_1101_1125,0.5,2019,1101,1125,4,-0.040230,-15.142529,-9.770115,-6.321839,...,-9.070115,0.626437,1.168966,0.168966,-6.770115,-4.979310,0.055172,1.333333,-1.248276,3.437931
4,2019_1101_1133,0.5,2019,1101,1133,0,0.217346,5.360502,-0.285266,2.314525,...,0.138976,0.399164,-0.294671,0.038662,-2.194357,2.560084,-0.890282,2.848485,-1.205852,1.531870
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2273,2019_1449_1459,0.5,2019,1449,1459,2,-0.101961,-11.376471,-3.150980,-4.741176,...,-4.545098,1.501961,2.425490,-0.829412,-2.315686,-3.023529,2.119608,2.866667,2.801961,0.911765
2274,2019_1449_1463,0.5,2019,1449,1463,-5,0.014706,-11.069328,-9.331933,-5.262605,...,0.766807,-0.228992,0.701681,0.792017,-7.703782,-5.323529,0.102941,3.250000,1.413866,1.411765
2275,2019_1458_1459,0.5,2019,1458,1459,-2,-0.169697,-12.139394,-6.109091,-3.421212,...,-6.678788,-1.551515,-0.936364,-2.148485,2.627273,-1.942424,-1.687879,-1.012121,1.278788,-2.439394
2276,2019_1458_1463,0.5,2019,1458,1463,-9,-0.053030,-11.832251,-12.290043,-3.942641,...,-1.366883,-3.282468,-2.660173,-0.527056,-2.760823,-4.242424,-3.704545,-0.628788,-0.109307,-1.939394


Once we have a model trained, we can simply make predictions the same way we have all along. 

If you are using the ordinary least squares method, run the following predict method. otherwise, use the predict_proba method

Write our submission ID and Pred columns to submission.csv

In [13]:
sub[['ID', 'Pred']].to_csv('training_set_stage2.csv', index=False)

## Simulate Tournament

Submission.csv is the submission for the official tournament, but we want to take it a step further. Now, we are will load the submisssion.csv and use that prediction data to simulate the full tournamnet, predicting a winner for each game. 

Run the following cell and check the output.png file in the binder directory. That is your simulated bracket with a percentage score of who will win each game. 

In [147]:
__version__ = '0.2.0'
ID = 'id'
PRED = 'pred'
SEASON = 'season'
TEAM = 'teamname'

year=2019

import os
#import pkg_resources

from binaryTree import Node
# import matplotlib.pyplot as plt # for notebook usage
# import numpy as np # for notebook usage
import pandas as pd
from PIL import Image, ImageDraw

cwd = os.getcwd()

slot_coordinates = {
    2019: {1: (372, 32),# First four
         2: (372, 50),
         3: (30, 328),
         4: (30, 346),
         5: (695, 325),
         6: (695, 343),
         7: (370, 642),
         8: (370, 659),
         9:  (30, 532),# W1
         10: (30, 514),
         11: (30, 567),
         12: (30, 550),
         13: (30, 604),
         14: (30, 586),
         15: (30, 640),
         16: (30, 622),
         17: (30, 496),
         18: (30, 478),
         19: (30, 460),
         20: (30, 442),
         21: (30, 424),
         22: (30, 406),
         23: (30, 388),
         24: (30, 370),
         25: (30, 199),# X1
         26: (30, 182),
         27: (30, 236),
         28: (30, 218),
         29: (30, 272),
         30: (30, 254),
         31: (30, 308),
         32: (30, 290),
         33: (30, 164),
         34: (30, 146),
         35: (30, 128),
         36: (30, 110),
         37: (30, 92),
         38: (30, 74),
         39: (30, 55),
         40: (30, 38),
         41: (815, 532),# Y1
         42: (815, 514),
         43: (815, 567),
         44: (815, 550),
         45: (815, 604),
         46: (815, 586),
         47: (815, 640),
         48: (815, 622),
         49: (815, 496),
         50: (815, 478),
         51: (815, 460),
         52: (815, 442),
         53: (815, 424),
         54: (815, 406),
         55: (815, 388),
         56: (815, 370),
         57: (815, 199),# Z1
         58: (815, 182),
         59: (815, 236),
         60: (815, 218),
         61: (815, 272),
         62: (815, 254),
         63: (815, 308),
         64: (815, 290),
         65: (815, 164),
         66: (815, 146),
         67: (815, 128),
         68: (815, 110),
         69: (815, 92),
         70: (815, 74),
         71: (815, 55),
         72: (815, 38),
         73: (155, 523),# W2
         74: (155, 559),
         75: (155, 595),
         76: (155, 631),
         77: (155, 487),
         78: (155, 451),
         79: (155, 415),
         80: (155, 379),
         81: (155, 191),# X2
         82: (155, 227),
         83: (155, 263),
         84: (155, 299),
         85: (155, 155),
         86: (155, 119),
         87: (155, 83),
         88: (155, 47),
         89: (735, 523),# Y2
         90: (735, 559),
         91: (735, 595),
         92: (735, 631),
         93: (735, 487),
         94: (735, 451),
         95: (735, 415),
         96: (735, 379),
         97: (735, 191),# Z2
         98: (735, 227),
         99: (735, 263),
         100: (735, 299),
         101: (735, 155),
         102: (735, 119),
         103: (735, 83),
         104: (735, 47),
         105: (232, 541),# W3
         106: (232, 613),
         107: (232, 469),
         108: (232, 397),
         109: (232, 209),# X3
         110: (232, 281),
         111: (232, 137),
         112: (232, 65),
         113: (668, 541),# Y3
         114: (668, 613),
         115: (668, 469),
         116: (668, 397),
         117: (668, 209),# Z3
         118: (668, 281),
         119: (668, 137),
         120: (668, 65),
         121: (298, 576),# W4
         122: (298, 432),
         123: (298, 244),# X4
         124: (298, 100),
         125: (601, 576),# Y4
         126: (601, 432),
         127: (601, 244),# Z4
         128: (601, 100),
         129: (358, 504),# W5
         130: (358, 172),# X5
         131: (540, 504),# Y5
         132: (540, 172),# Z5
         133: (420, 457),# WX6
         134: (435, 219),# YZ6
         135: (435, 339)# CH
    }
}

class extNode(Node):
    def __init__(self, value, left=None, right=None, parent=None):
        Node.__init__(self, value, left=left, right=right)
        if parent is not None and isinstance(parent, extNode):
            self.__setattr__('parent', parent)
        else:
            self.__setattr__('parent', None)

    def __setattr__(self, name, value):
        # Magically set the parent to self when a child is created
        if (name in ['left', 'right']
                and value is not None
                and isinstance(value, extNode)):
            value.parent = self
        object.__setattr__(self, name, value)

def clean_col_names(df):
    return df.rename(columns={col: col.lower().replace('_', '') for col in df.columns})

def get_team_id(seedMap):
        return (seedMap, df[df['seed'] == seed_slot_map[seedMap]]['teamid'].values[0])

def get_team_ids_and_gid(slot1, slot2):
    team1 = get_team_id(slot1)
    team2 = get_team_id(slot2)
    if team2[1] < team1[1]:
        temp = team1
        team1 = team2
        team2 = temp
    gid = '{season}_{t1}_{t2}'.format(season=year, t1=team1[1], t2=team2[1])
    return team1, team2, gid

outputPath= cwd + '//output.png'
teamsPath=cwd + '//data//Teams.csv'
seedsPath=cwd + '//data//2019TourneySeeds.csv'
slotsPath=cwd + '//data//MNCAATourneySlots.csv'
submissionPath=cwd + '//submission.csv'
resultsPath=None


submit = clean_col_names(pd.read_csv(submissionPath))

teams_df = clean_col_names(pd.read_csv(teamsPath))
seeds_df = clean_col_names(pd.read_csv(seedsPath))
slots_df = clean_col_names(pd.read_csv(slotsPath))

df = seeds_df.merge(teams_df, left_on='teamid', right_on='teamid')

df = df.drop(['firstd1season','lastd1season'], axis=1)

s = slots_df[slots_df['season'] == year]
seed_slot_map = {0: 'R6CH'}
bkt = extNode(0)

counter = 1
current_nodes = [bkt]
current_id = -1
current_index = 0

while current_nodes:
    next_nodes = []
    current_index = 0
    while current_index < len(current_nodes):
        node = current_nodes[current_index]
        if len(s[s['slot'] == seed_slot_map[node.value]].index) > 0:
            node.left = extNode(counter)
            node.right = extNode(counter + 1)
            seed_slot_map[counter] = s[s['slot'] == seed_slot_map[node.value]].values[0][2]
            seed_slot_map[counter + 1] = s[s['slot'] == seed_slot_map[node.value]].values[0][3]
            next_nodes.append(node.left)
            next_nodes.append(node.right)
            counter += 2
        current_index += 1
        current_id += 1
    current_nodes = next_nodes
    
# Solve bracket using predictions
# Also create a map with slot, seed, game_id, pred
    
results_df = pd.DataFrame({"id": [], "pred": []})
    
pred_map = {}

#Straight Winner

for level in list(reversed(bkt.levels)):
#     print(level)
    for ix, node in enumerate(level[0: len(level) // 2]):
#         print(node)
        team1, team2, gid = get_team_ids_and_gid(level[ix * 2].value, level[ix * 2 + 1].value)
#         print(gid)
        pred = submit[submit['id'] == gid]['pred'].values[0]
        if gid in list(results_df.id):
            game_outcome = results_df[results_df[ID] == gid][PRED].values[0]

            team = team1 if game_outcome == 1 else team2
            if (game_outcome == 1 and pred > 0.5):
                # outcome agress with prediction, team1 wins
                pred_label = pred
            elif (game_outcome == 0 and pred > 0.5):
                # outcome different than prediction, team2 wins
                pred_label = 1 - pred
            elif (game_outcome == 0 and pred <= 0.5):
                # outcome agrees with prediction, team2 wins
                pred_label = 1 - pred
            elif (game_outcome == 1 and pred <= 0.5):
                # outcome different than prediction, team2 wins
                pred_label = pred
            else:
                raise ValueError("wat")

        elif pred >= 0.5:
            team = team1
            pred_label = pred
        else:
            team = team2
            pred_label = 1 - pred

        level[ix * 2].parent.value = team[0]
        pred_map[gid] = (team[0], seed_slot_map[team[0]], pred_label)

slotdata = []
for ix, key in enumerate([b for a in bkt.levels for b in a]):
    print()
    xy = slot_coordinates[2019][max(slot_coordinates[2019].keys()) - ix]
    pred = ''
    gid = ''
    if key.parent is not None:
        team1, team2, gid = get_team_ids_and_gid(key.parent.left.value, key.parent.right.value)
    
    # to generate normal prediction bracket, use this one
    if gid != '' and pred_map[gid][1] == seed_slot_map[key.value]:
        pred = "{:.2f}%".format(pred_map[gid][2] * 100)
    st = '{teamid} {teamname}'.format(
        # to generate the scoring bracket, use this one
        teamid=df[df['seed'] == seed_slot_map[key.value]]['teamid'].values[0],
        teamname=df[df['seed'] == seed_slot_map[key.value]]['teamname'].values[0],
    )
    
    # to generate the scoring bracket, use this one
#     if gid != '' and pred_map[gid][1] == seed_slot_map[key.value]:
#         pred = "{:.2f}%".format(pred_map[gid][2] * 100)
#     st = '{seed} {team} {pred}'.format(
#         seed=seed_slot_map[key.value],    
#         team=df[df['seed'] == seed_slot_map[key.value]]['teamname'].values[0],
#         pred=pred
#     )
    slotdata.append((xy, st))
# print(seed_slot_map)
# print(slotdata)
        
# Create bracket image
# relevant:
# https://stackoverflow.com/questions/26649716/how-to-show-pil-image-in-ipython-notebook
#emptyBracketPath = pkg_resources.resource_filename(2017.jpg)
img = Image.open('2019.jpg')
draw = ImageDraw.Draw(img)
# font = ImageFont.truetype(<font-file>, <font-size>)
# draw.text((x, y),"Sample Text",(r,g,b))
for slot in slotdata:
    draw.text(slot[0], str(slot[0]), (0, 0, 0))

# dpi = 72
# margin = 0.05  # (5% of the width/height of the figure...)
# xpixels, ypixels = 940, 700

# Make a figure big enough to accomodate an axis of xpixels by ypixels
# as well as the ticklabels, etc...
# figsize = (1 + margin) * ypixels / dpi, (1 + margin) * xpixels / dpi
# fig = plt.figure(figsize=figsize, dpi=dpi)
# Make the axis the right size...
# ax = fig.add_axes([margin, margin, 1 - 2*margin, 1 - 2*margin])

# ax.imshow(np.asarray(img))
# plt.show() # for in notebook
img.save(outputPath)

predictionsCSV= []
for slot in slotdata:
    predictionsCSV.append([slot[0], str(slot[1])])
    
predictionsCSV










































































































































[[(435, 339), '1181 Duke'],
 [(435, 219), '1181 Duke'],
 [(420, 457), '1314 North Carolina'],
 [(540, 172), '1181 Duke'],
 [(540, 504), '1211 Gonzaga'],
 [(358, 172), '1314 North Carolina'],
 [(358, 504), '1397 Tennessee'],
 [(601, 100), '1181 Duke'],
 [(601, 244), '1261 LSU'],
 [(601, 432), '1211 Gonzaga'],
 [(601, 576), '1276 Michigan'],
 [(298, 100), '1314 North Carolina'],
 [(298, 244), '1246 Kentucky'],
 [(298, 432), '1438 Virginia'],
 [(298, 576), '1397 Tennessee'],
 [(668, 65), '1181 Duke'],
 [(668, 137), '1280 Mississippi St'],
 [(668, 281), '1277 Michigan St'],
 [(668, 209), '1261 LSU'],
 [(668, 397), '1211 Gonzaga'],
 [(668, 469), '1199 Florida St'],
 [(668, 613), '1276 Michigan'],
 [(668, 541), '1403 Texas Tech'],
 [(232, 65), '1314 North Carolina'],
 [(232, 137), '1242 Kansas'],
 [(232, 281), '1246 Kentucky'],
 [(232, 209), '1222 Houston'],
 [(232, 397), '1438 Virginia'],
 [(232, 469), '1243 Kansas St'],
 [(232, 613), '1397 Tennessee'],
 [(232, 541), '1345 Purdue'],
 [(735,

In [205]:
__version__ = '0.2.0'
ID = 'id'
PRED = 'pred'
SEASON = 'season'
TEAM = 'teamname'

year=2019

import os
#import pkg_resources

from binaryTree import Node
# import matplotlib.pyplot as plt # for notebook usage
# import numpy as np # for notebook usage
import pandas as pd
from PIL import Image, ImageDraw

cwd = os.getcwd()

slot_coordinates = {
    2019: {1: (372, 32),# First four
         2: (372, 50),
         3: (30, 328),
         4: (30, 346),
         5: (695, 325),
         6: (695, 343),
         7: (370, 642),
         8: (370, 659),
         9:  (30, 532),# W1
         10: (30, 514),
         11: (30, 567),
         12: (30, 550),
         13: (30, 604),
         14: (30, 586),
         15: (30, 640),
         16: (30, 622),
         17: (30, 496),
         18: (30, 478),
         19: (30, 460),
         20: (30, 442),
         21: (30, 424),
         22: (30, 406),
         23: (30, 388),
         24: (30, 370),
         25: (30, 199),# X1
         26: (30, 182),
         27: (30, 236),
         28: (30, 218),
         29: (30, 272),
         30: (30, 254),
         31: (30, 308),
         32: (30, 290),
         33: (30, 164),
         34: (30, 146),
         35: (30, 128),
         36: (30, 110),
         37: (30, 92),
         38: (30, 74),
         39: (30, 55),
         40: (30, 38),
         41: (815, 532),# Y1
         42: (815, 514),
         43: (815, 567),
         44: (815, 550),
         45: (815, 604),
         46: (815, 586),
         47: (815, 640),
         48: (815, 622),
         49: (815, 496),
         50: (815, 478),
         51: (815, 460),
         52: (815, 442),
         53: (815, 424),
         54: (815, 406),
         55: (815, 388),
         56: (815, 370),
         57: (815, 199),# Z1
         58: (815, 182),
         59: (815, 236),
         60: (815, 218),
         61: (815, 272),
         62: (815, 254),
         63: (815, 308),
         64: (815, 290),
         65: (815, 164),
         66: (815, 146),
         67: (815, 128),
         68: (815, 110),
         69: (815, 92),
         70: (815, 74),
         71: (815, 55),
         72: (815, 38),
         73: (155, 523),# W2
         74: (155, 559),
         75: (155, 595),
         76: (155, 631),
         77: (155, 487),
         78: (155, 451),
         79: (155, 415),
         80: (155, 379),
         81: (155, 191),# X2
         82: (155, 227),
         83: (155, 263),
         84: (155, 299),
         85: (155, 155),
         86: (155, 119),
         87: (155, 83),
         88: (155, 47),
         89: (735, 523),# Y2
         90: (735, 559),
         91: (735, 595),
         92: (735, 631),
         93: (735, 487),
         94: (735, 451),
         95: (735, 415),
         96: (735, 379),
         97: (735, 191),# Z2
         98: (735, 227),
         99: (735, 263),
         100: (735, 299),
         101: (735, 155),
         102: (735, 119),
         103: (735, 83),
         104: (735, 47),
         105: (232, 541),# W3
         106: (232, 613),
         107: (232, 469),
         108: (232, 397),
         109: (232, 209),# X3
         110: (232, 281),
         111: (232, 137),
         112: (232, 65),
         113: (668, 541),# Y3
         114: (668, 613),
         115: (668, 469),
         116: (668, 397),
         117: (668, 209),# Z3
         118: (668, 281),
         119: (668, 137),
         120: (668, 65),
         121: (298, 576),# W4
         122: (298, 432),
         123: (298, 244),# X4
         124: (298, 100),
         125: (601, 576),# Y4
         126: (601, 432),
         127: (601, 244),# Z4
         128: (601, 100),
         129: (358, 504),# W5
         130: (358, 172),# X5
         131: (540, 504),# Y5
         132: (540, 172),# Z5
         133: (420, 457),# WX6
         134: (435, 219),# YZ6
         135: (435, 339)# CH
    }
}

class extNode(Node):
    def __init__(self, value, left=None, right=None, parent=None):
        Node.__init__(self, value, left=left, right=right)
        if parent is not None and isinstance(parent, extNode):
            self.__setattr__('parent', parent)
        else:
            self.__setattr__('parent', None)

    def __setattr__(self, name, value):
        # Magically set the parent to self when a child is created
        if (name in ['left', 'right']
                and value is not None
                and isinstance(value, extNode)):
            value.parent = self
        object.__setattr__(self, name, value)

def clean_col_names(df):
    return df.rename(columns={col: col.lower().replace('_', '') for col in df.columns})

def get_team_id(seedMap):
#         print(df['seed'])
#         print(seed_slot_map[seedMap])
        return (seedMap, df[df['seed'] == seed_slot_map[seedMap]]['teamid'].values[0])

def get_team_ids_and_gid(slot1, slot2):
    team1 = get_team_id(slot1)
    team2 = get_team_id(slot2)
    if team2[1] < team1[1]:
        temp = team1
        team1 = team2
        team2 = temp
    gid = '{season}_{t1}_{t2}'.format(season=year, t1=team1[1], t2=team2[1])
    return team1, team2, gid

outputPath= cwd + '//output.png'
teamsPath=cwd + '//data//Teams.csv'
seedsPath=cwd + '//data//2019TourneySeeds.csv'
slotsPath=cwd + '//data//MNCAATourneySlots.csv'
submissionPath=cwd + '//submission.csv'
resultsPath=None


submit = clean_col_names(pd.read_csv(submissionPath))

teams_df = clean_col_names(pd.read_csv(teamsPath))
seeds_df = clean_col_names(pd.read_csv(seedsPath))
slots_df = clean_col_names(pd.read_csv(slotsPath))

df = seeds_df.merge(teams_df, left_on='teamid', right_on='teamid')

df = df.drop(['firstd1season','lastd1season'], axis=1)
print(df)

s = slots_df[slots_df['season'] == year]
seed_slot_map = {0: 'R6CH'}
bkt = extNode(0)

counter = 1
current_nodes = [bkt]
current_id = -1
current_index = 0

while current_nodes:
    next_nodes = []
    current_index = 0
    while current_index < len(current_nodes):
        node = current_nodes[current_index]
        if len(s[s['slot'] == seed_slot_map[node.value]].index) > 0:
            node.left = extNode(counter)
            node.right = extNode(counter + 1)
            seed_slot_map[counter] = s[s['slot'] == seed_slot_map[node.value]].values[0][2]
            seed_slot_map[counter + 1] = s[s['slot'] == seed_slot_map[node.value]].values[0][3]
            next_nodes.append(node.left)
            next_nodes.append(node.right)
            counter += 2
        current_index += 1
        current_id += 1
#         print(node.value)
    current_nodes = next_nodes
    
# Solve bracket using predictions
# Also create a map with slot, seed, game_id, pred
    
results_df = pd.DataFrame({"id": [], "pred": []})
    
pred_map = {}

#Straight Winner

for level in list(reversed(bkt.levels)):
    for ix, node in enumerate(level[0: len(level) // 2]):
#         print(node.value)
        team1, team2, gid = get_team_ids_and_gid(level[ix * 2].value, level[ix * 2 + 1].value)
#         print(gid)
        pred = submit[submit['id'] == gid]['pred'].values[0]
        if gid in list(results_df.id):
            game_outcome = results_df[results_df[ID] == gid][PRED].values[0]
#             print(game_outcome)

            team = team1 if game_outcome == 1 else team2
            if (game_outcome == 1 and pred > 0.5):
                # outcome agress with prediction, team1 wins
                pred_label = pred
            elif (game_outcome == 0 and pred > 0.5):
                # outcome different than prediction, team2 wins
                pred_label = 1 - pred
            elif (game_outcome == 0 and pred <= 0.5):
                # outcome agrees with prediction, team2 wins
                pred_label = 1 - pred
            elif (game_outcome == 1 and pred <= 0.5):
                # outcome different than prediction, team2 wins
                pred_label = pred
            else:
                raise ValueError("wat")

        elif pred >= 0.5:
            team = team1
            pred_label = pred
        else:
            team = team2
            pred_label = 1 - pred

        level[ix * 2].parent.value = team[0]
        pred_map[gid] = (team[0], seed_slot_map[team[0]], pred_label)

slotdata = []
for ix, key in enumerate([b for a in bkt.levels for b in a]):
#     print(key.value)
    xy = slot_coordinates[2019][max(slot_coordinates[2019].keys()) - ix]
#     print(xy)
    pred = ''
    gid = ''
    if key.parent is not None:
        team1, team2, gid = get_team_ids_and_gid(key.parent.left.value, key.parent.right.value)
    
    # to generate normal prediction bracket, use this one
    if gid != '' and pred_map[gid][1] == seed_slot_map[key.value]:
        pred = "{:.2f}%".format(pred_map[gid][2] * 100)
    st = '{teamid} {teamname}'.format(
        # to generate the scoring bracket, use this one
        teamid=df[df['seed'] == seed_slot_map[key.value]]['teamid'].values[0],
        teamname=df[df['seed'] == seed_slot_map[key.value]]['teamname'].values[0],
    )
    
    # to generate the scoring bracket, use this one
#     if gid != '' and pred_map[gid][1] == seed_slot_map[key.value]:
#         pred = "{:.2f}%".format(pred_map[gid][2] * 100)
#     st = '{seed} {team} {pred}'.format(
#         seed=seed_slot_map[key.value],    
#         team=df[df['seed'] == seed_slot_map[key.value]]['teamname'].values[0],
#         pred=pred
#     )
    slotdata.append((xy, st, key.value))
#     print(pred_map)
# print(seed_slot_map)
# print(slotdata)
        
# Create bracket image
# relevant:
# https://stackoverflow.com/questions/26649716/how-to-show-pil-image-in-ipython-notebook
#emptyBracketPath = pkg_resources.resource_filename(2017.jpg)
img = Image.open('2019.jpg')
draw = ImageDraw.Draw(img)
# font = ImageFont.truetype(<font-file>, <font-size>)
# draw.text((x, y),"Sample Text",(r,g,b))
for slot in slotdata:
    draw.text(slot[0], str(slot[1]), (0, 0, 0))
#     draw.text(slot[0], str(slot[2]), (0, 0, 0))

# dpi = 72
# margin = 0.05  # (5% of the width/height of the figure...)
# xpixels, ypixels = 940, 700

# Make a figure big enough to accomodate an axis of xpixels by ypixels
# as well as the ticklabels, etc...
# figsize = (1 + margin) * ypixels / dpi, (1 + margin) * xpixels / dpi
# fig = plt.figure(figsize=figsize, dpi=dpi)
# Make the axis the right size...
# ax = fig.add_axes([margin, margin, 1 - 2*margin, 1 - 2*margin])

# ax.imshow(np.asarray(img))
# plt.show() # for in notebook
img.save(outputPath)

predictionsCSV= []
for slot in slotdata:
    predictionsCSV.append([slot[0],str(slot[1]), slot[2]])
    
# predictionsCSV

df = pd.DataFrame(predictionsCSV)
df.columns = ['Coordinates', 'Predicted Team', 'Index']
df.to_csv('bracket_predictions_csv.csv')

   seed  teamid        teamname
0   W01    1181            Duke
1   W02    1277     Michigan St
2   W03    1261             LSU
3   W04    1439   Virginia Tech
4   W05    1280  Mississippi St
..  ...     ...             ...
63  Z12    1332          Oregon
64  Z13    1414       UC Irvine
65  Z14    1330    Old Dominion
66  Z15    1159         Colgate
67  Z16    1205    Gardner Webb

[68 rows x 3 columns]


In [104]:
seed_slot_map

{0: 'R6CH',
 1: 'R5WX',
 2: 'R5YZ',
 3: 'R4W1',
 4: 'R4X1',
 5: 'R4Y1',
 6: 'R4Z1',
 7: 'R3W1',
 8: 'R3W2',
 9: 'R3X1',
 10: 'R3X2',
 11: 'R3Y1',
 12: 'R3Y2',
 13: 'R3Z1',
 14: 'R3Z2',
 15: 'R2W1',
 16: 'R2W4',
 17: 'R2W2',
 18: 'R2W3',
 19: 'R2X1',
 20: 'R2X4',
 21: 'R2X2',
 22: 'R2X3',
 23: 'R2Y1',
 24: 'R2Y4',
 25: 'R2Y2',
 26: 'R2Y3',
 27: 'R2Z1',
 28: 'R2Z4',
 29: 'R2Z2',
 30: 'R2Z3',
 31: 'R1W1',
 32: 'R1W8',
 33: 'R1W4',
 34: 'R1W5',
 35: 'R1W2',
 36: 'R1W7',
 37: 'R1W3',
 38: 'R1W6',
 39: 'R1X1',
 40: 'R1X8',
 41: 'R1X4',
 42: 'R1X5',
 43: 'R1X2',
 44: 'R1X7',
 45: 'R1X3',
 46: 'R1X6',
 47: 'R1Y1',
 48: 'R1Y8',
 49: 'R1Y4',
 50: 'R1Y5',
 51: 'R1Y2',
 52: 'R1Y7',
 53: 'R1Y3',
 54: 'R1Y6',
 55: 'R1Z1',
 56: 'R1Z8',
 57: 'R1Z4',
 58: 'R1Z5',
 59: 'R1Z2',
 60: 'R1Z7',
 61: 'R1Z3',
 62: 'R1Z6',
 63: 'W01',
 64: 'W16',
 65: 'W08',
 66: 'W09',
 67: 'W04',
 68: 'W13',
 69: 'W05',
 70: 'W12',
 71: 'W02',
 72: 'W15',
 73: 'W07',
 74: 'W10',
 75: 'W03',
 76: 'W14',
 77: 'W06',
 78: 'W11',

In [40]:
df

Unnamed: 0,seed,teamid,teamname
0,W01,1181,Duke
1,W02,1277,Michigan St
2,W03,1261,LSU
3,W04,1439,Virginia Tech
4,W05,1280,Mississippi St
...,...,...,...
63,Z12,1332,Oregon
64,Z13,1414,UC Irvine
65,Z14,1330,Old Dominion
66,Z15,1159,Colgate
