Each year need to generate the following. Get the latest Teams.csv and put it in the data folder, and make a new folder for the year. Put the following files there:

- bracket.csv = Team names listed in order of bracket (top left, bot left, top right bot right), with Playin1-4 listed for the playin games. Used for simulating tourneys
- playins.csv = Team names in order of where playins appear (first two go into top left bracket, next two into bot left, next two into top right, last two into bot right)
- field.csv = Team names in the field (can be any order)
- teams.csv = These are the 68 teams in the tourney (first the 60 regular teams in same order as bracket.csv, and playins last in same order as playins.csv. Used to match team name to index in arrays generated by tourney simulations.
- draft.csv = Order of the draft used for the Calcutta, so you can iterate through fast enough as things are happening live rather than trying to match up names to IDs in real time

In [1]:
import numpy as np
import pandas as pd

def getID(row):
    if isinstance(row['Team'], list):
        IDs = []
        for team in row['Team']:
            try:
                IDs.append(teamids.loc[teamids['TeamName'] == team, 'TeamID'].values[0])
            except:
                print("No match for {0}".format(team))
                IDs.append(0)
        return IDs
    else:
        try:
            ID = teamids.loc[teamids['TeamName'] == row['Team'], 'TeamID'].values[0]
        except:
            print("No match for {0}".format(row['Team']))
            ID = 0
        return ID
    
teamids = pd.read_csv('data/Teams.csv')
teamids.head()

Unnamed: 0,TeamID,TeamName,FirstD1Season,LastD1Season
0,1101,Abilene Chr,2014,2019
1,1102,Air Force,1985,2019
2,1103,Akron,1985,2019
3,1104,Alabama,1985,2019
4,1105,Alabama A&M,2000,2019


In [2]:
teamids[np.array(['NC' in name for name in teamids['TeamName']])]

Unnamed: 0,TeamID,TeamName,FirstD1Season,LastD1Season
198,1299,NC A&T,1985,2019
199,1300,NC Central,2008,2019
200,1301,NC State,1985,2019
320,1421,UNC Asheville,1987,2019
321,1422,UNC Greensboro,1992,2019
322,1423,UNC Wilmington,1985,2019


# brackets.csv

Consists of the 64 tourney teams, with playin teams labeled as Playin1-4, in bracket order (top left, then bottom left, then top right, then bottom right), followed by the 8 playin teams, in order (first two teams are top left playin team, next two bottom left, next two top right, next two bottom left)

In [3]:
bracket15 = ['Kentucky', 'Playin1', 'Cincinnati', 'Purdue', 'West Virginia', 'Buffalo', 'Maryland', 'Valparaiso', 'Butler', 'Texas', 'Notre Dame', 'Northeastern', 'Wichita St', 'Indiana', 'Kansas', 'New Mexico St']
bracket15 += ['Wisconsin', 'Coastal Car', 'Oregon', 'Oklahoma St', 'Arkansas', 'Wofford', 'North Carolina', 'Harvard', 'Xavier', 'Playin2', 'Baylor', 'Georgia St', 'VA Commonwealth', 'Ohio St', 'Arizona', 'TX Southern']
bracket15 += ['Villanova', 'Lafayette', 'NC State', 'LSU', 'Northern Iowa', 'Wyoming', 'Louisville', 'UC Irvine', 'Providence', 'Playin3', 'Oklahoma', 'Albany NY', 'Michigan St', 'Georgia', 'Virginia', 'Belmont']
bracket15 += ['Duke', 'Playin4', 'San Diego St', "St John's", 'Utah', 'SF Austin', 'Georgetown', 'E Washington', 'SMU', 'UCLA', 'Iowa St', 'UAB', 'Iowa', 'Davidson', 'Gonzaga', 'N Dakota St']
playins15 = ['Manhattan', 'Hampton', 'BYU', 'Mississippi', 'Dayton', 'Boise St', 'North Florida', 'Robert Morris']
bracket15 += playins15
bracket15 = pd.DataFrame(bracket15, columns=['Team'])
bracket15['TeamID'] = bracket15.apply(getID, axis=1)
print("All team names successfully matched to IDs = {0}".format((bracket15['TeamID'] == 0).sum() == 4))
bracket15.tail(10)

No match for Playin1
No match for Playin2
No match for Playin3
No match for Playin4
All team names successfully matched to IDs = True


Unnamed: 0,Team,TeamID
62,Gonzaga,1211
63,N Dakota St,1295
64,Manhattan,1264
65,Hampton,1214
66,BYU,1140
67,Mississippi,1279
68,Dayton,1173
69,Boise St,1129
70,North Florida,1316
71,Robert Morris,1352


In [4]:
draft15 = ['Villanova', 'SF Austin', 'Arizona', 'Ohio St', 'Davidson', 'Notre Dame', 'Arkansas', 'Michigan St', 'Northern Iowa']
draft15 += ['Oklahoma', 'Providence', "St John's", 'Baylor', 'Indiana']
field15 = ['Manhattan', 'Hampton', 'Valparaiso', 'Northeastern', 'New Mexico St', 'Coastal Car', 'Harvard', 'Georgia St', 'TX Southern', 'Lafayette', 'UC Irvine', 'Albany NY', 'Belmont', 'Robert Morris', 'North Florida', 'E Washington', 'UAB', 'N Dakota St']
draft15 += [field15]
draft15 += ['Virginia', ['Dayton', 'Boise St'], 'VA Commonwealth', 'Texas', 'Utah', 'Kentucky']
draft15 += ['San Diego St', 'Iowa', 'Louisville', 'Iowa St', 'Wyoming', 'Xavier', 'Purdue', 'Kansas']
draft15 += ['Georgia', ['Mississippi', 'BYU'], 'LSU', 'Butler', 'SMU', 'Maryland', 'Wisconsin']
draft15 += ['NC State', 'North Carolina', 'Georgetown', 'Buffalo', 'West Virginia', 'Wichita St']
draft15 += ['Gonzaga', 'Oregon', 'Oklahoma St', 'Cincinnati', 'Wofford', 'Duke', 'UCLA']
draft15 = pd.DataFrame(draft15, columns=['Team'])
draft15['TeamID'] = draft15.apply(getID, axis=1)
draft15

Unnamed: 0,Team,TeamID
0,Villanova,1437
1,SF Austin,1372
2,Arizona,1112
3,Ohio St,1326
4,Davidson,1172
5,Notre Dame,1323
6,Arkansas,1116
7,Michigan St,1277
8,Northern Iowa,1320
9,Oklahoma,1328


In [5]:
sub = pd.read_csv('2015/SampleSubmission.csv', index_col=0)
# drop playins from bracket
bracketnoplayins = bracket15[np.array(['Playin' not in name for name in bracket15['Team']])]
ids = pd.Series([int(code.split('_')[1]) for code in sub.index] + [int(code.split('_')[2]) for code in sub.index]).unique()
ids2 = bracketnoplayins['TeamID']
ids2 = ids2.sort_values().values
if (ids == ids2).sum() == len(ids):
    print('Manually inputted playins and regular teams matches kaggle submission file')

Manually inputted playins and regular teams matches kaggle submission file


In [7]:
bracket15.to_csv('2015/bracket.csv', encoding='ascii')
draft15.to_csv('2015/draft.csv', encoding='ascii')

Check for errors matching the IDs in the Kaggle submission file

In [92]:
class Team():
    def __init__(self, playins, field, teams):
        self.array = np.zeros(68)
        self.playins = playins
        self.field = field
        self.teams = teams
        self.index = dict(zip(teams['TeamID'], teams.index))
        self.names = []
    def add(self, teams, weight=1):
        if isinstance(teams, str):
            teams = [teams]
        expteams = []
        if 'field' in teams or 'Field' in teams:
            expteams += list(self.field['Team'].values)
        if 'Playin2' in teams:
            expteams += list(self.playins.iloc[2:4]['Team'].values)
        if 'Playin3' in teams:
            expteams += list(self.playins.iloc[4:6]['Team'].values)
        expteams += [team for team in teams if team != 'field' and team != 'Field' and team != 'Playin2' and team != 'Playin3']  
        self.names += expteams
        expteams = pd.DataFrame(expteams, columns=['Team'])
        expteams['TeamID'] = expteams.apply(getID, axis=1)
        for ID in expteams['TeamID']:
            self.array[self.index[ID]] = weight

In [137]:
field = Team(playins15, field15, teams15)
field.add('field')
np.save('data/2015field.npy', field.array)

In [132]:
jeffteams = Team(playins15, field15, teams15)
teams = ['Kansas', 'Louisville', 'Maryland', 'Georgetown', 'Wichita St', 'Purdue', 'Oklahoma St', 'Davidson', 'Georgia', 'Texas', 'Buffalo']
jeffteams.add(teams)
jeffteams.add('Notre Dame', 0.5)
np.save('data/2015Jeff.npy', jeffteams.array)

In [131]:
jeffteams.array

array([0. , 0. , 1. , 0. , 1. , 1. , 0. , 0. , 1. , 0.5, 0. , 1. , 0. ,
       1. , 0. , 0. , 0. , 0. , 1. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
       0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 1. , 0. , 0. ,
       0. , 0. , 0. , 1. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 1. , 0. ,
       0. , 0. , 0. , 0. , 0. , 1. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
       0. , 0. , 0. ])

In [133]:
robbieteams = Team(playins15, field15, teams15)
teams = ['Oklahoma', 'Iowa St', 'Northern Iowa', 'SMU', 'Michigan St', "St John's", 'Playin3']
robbieteams.add(teams)
robbieteams.add(['Kentucky', 'Wisconsin', 'Duke'], 0.5)
np.save('data/2015Robbie.npy', robbieteams.array)

In [103]:
robbieteams.names

['Dayton',
 'Boise St',
 'Oklahoma',
 'Iowa St',
 'Northern Iowa',
 'SMU',
 'Michigan St',
 "St John's",
 'Kentucky',
 'Wisconsin',
 'Duke']

In [134]:
masonteams = Team(playins15, field15, teams15)
teams = ['Villanova', 'Arizona', 'Virginia', 'Gonzaga', 'Arkansas', 'Utah', 'Ohio St', 'Playin2', 'SF Austin']
masonteams.add(teams)
np.save('data/2015Mason.npy', masonteams.array)

In [105]:
masonteams.names

['BYU',
 'Mississippi',
 'Villanova',
 'Arizona',
 'Virginia',
 'Gonzaga',
 'Arkansas',
 'Utah',
 'Ohio St',
 'SF Austin']

In [135]:
prasadteams = Team(playins15, field15, teams15)
teams = ['Iowa', 'San Diego St', 'NC State', 'LSU', 'Field']
prasadteams.add(teams)
prasadteams.add(['Kentucky', 'Wisconsin', 'Duke'], 0.5)
np.save('data/2015Prasad.npy', prasadteams.array)

In [110]:
prasadteams.names

['Manhattan',
 'Hampton',
 'Valparaiso',
 'Northeastern',
 'New Mexico St',
 'Coastal Car',
 'Harvard',
 'Georgia St',
 'TX Southern',
 'Lafayette',
 'UC Irvine',
 'Albany NY',
 'Belmont',
 'Robert Morris',
 'North Florida',
 'E Washington',
 'UAB',
 'N Dakota St',
 'Iowa',
 'San Diego St',
 'NC State',
 'LSU',
 'Kentucky',
 'Wisconsin',
 'Duke']

In [136]:
rajteams = Team(playins15, field15, teams15)
teams = ['Baylor', 'North Carolina', 'West Virginia', 'Providence', 'Xavier', 'Butler', 'VA Commonwealth', 'Oregon', 'Cincinnati', 'Indiana', 'UCLA', 'Wyoming', 'Wofford']
rajteams.add(teams)
rajteams.add(['Notre Dame'], 0.5)
np.save('data/2015Raj.npy', rajteams.array)

In [129]:
rajteams.names

['Baylor',
 'North Carolina',
 'West Virginia',
 'Providence',
 'Xavier',
 'Butler',
 'VA Commonwealth',
 'Oregon',
 'Cincinnati',
 'Indiana',
 'UCLA',
 'Wyoming',
 'Wofford',
 'Notre Dame']

In [115]:
teamids[np.array(['Wofford' in name for name in teamids['TeamName']])]

Unnamed: 0,TeamID,TeamName,FirstD1Season,LastD1Season
358,1459,Wofford,1996,2019


In [99]:
playins15

Unnamed: 0,Team,TeamID
0,Manhattan,1264
1,Hampton,1214
2,BYU,1140
3,Mississippi,1279
4,Dayton,1173
5,Boise St,1129
6,North Florida,1316
7,Robert Morris,1352
