### ELO

Now that we have ranked hitters and pitchers, we can develop ratings for each team and begin to predict upcoming games. First, we follow the same code we used for our park factor in order to scrape each team's schedule from their webpage.

Before we begin, import the necessary statements and create our list of teams with their corresponding numbers on the DakStats website. We also must run the following line once to install the ELO package.

In [77]:
#pip install elosports

In [78]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import pickle
from elosports.elo import Elo

In [3]:
teams = ["Bethel", "Goshen", "Grace", "HU", "IWU", "Marian", "MVNU", "SAU", "SFU", "Taylor"]
t_nums = [1629, 1678, 1679, 1688, 1694, 1717, 1736, 1780, 1805, 1784]

First, we gather the different DakStats URL's for each team.

In [4]:
urls = ['http://www.dakstats.com/WebSync/Pages/Team/TeamSchedule.aspx?association=10&sg=MBA&sea=NAIMBA_2019&team=' +
        str(num) for num in t_nums]
#Create a handle, page, to handle the contents of the website
pages = [requests.get(url) for url in urls]
#Store the page as an element tree using BeautifulSoup4
soups = [BeautifulSoup(page.content) for page in pages]

The code below collects all of the html tables from the different teams' webpages on DakStats.

In [5]:
team_tables = [
  [
    [
      [td.get_text(strip=True) for td in tr.find_all('td')] 
      for tr in table.find_all('tr') 
    ]#for each row in each table
    for table in soup.find_all('table') 
  ]#for each table on each webpage
  for soup in soups 
]#for each team's webpage

This loop allows us to locate the table from the webpage that contains the data we are interested in. We find the headers in the 33rd table and the actual data in the 35th table. We will assume that this is the same for all teams.

In [6]:
for i in range(len(team_tables[2])):
  #print(i, team_tables[2][i])
  #The line ablve is commented out becuause we only needed to run it once to find the location of the data on the webpage.
  pass

Next, we define the column names for our dataframe.

In [7]:
headers = [['Date', 'Opponent', 'Location', 'Score', 'Outcome'] for tables in team_tables]
headers[2]

['Date', 'Opponent', 'Location', 'Score', 'Outcome']

Here, we collected the data into the list `team_rows`. We used the code `[:5]` to take only the first 5 columns of data and we used the code `[1::2]` to collect the data from every other row, since between each list of data there is an empty list.

In [8]:
team_rows = [[r[:5] for r in tables[35][1::2]] for tables in team_tables]
team_rows[2][:9]

[['2/27/2019', 'Lourdes (Ohio)', 'N', '3-4', 'L'],
 ['2/27/2019', 'Lourdes (Ohio)', 'N', '4-8', 'L'],
 ['3/2/2019', 'Cornerstone (Mich.)', 'N', '3-4', 'L'],
 ['3/2/2019', 'Trinity Baptist', 'N', '5-1', 'W'],
 ['3/4/2019', 'Michigan-Dearborn', 'N', '13-1', 'W'],
 ['3/5/2019', 'Rochester (Mich.)', 'N', '24-4', 'W'],
 ['3/6/2019', 'Robert Morris (Ill.)', 'N', '10-9', 'W'],
 ['3/8/2019', 'Bethel (Ind.) *', 'N', '13-6', 'W'],
 ['3/9/2019', 'Bethel (Ind.) *', 'N', '14-2', 'W']]

Now, we put the data into a dataframe.

In [9]:
dfc = [pd.DataFrame(columns = headers[i], data = team_rows[i]) for i in range(len(headers))]
dfc[2][:5]

Unnamed: 0,Date,Opponent,Location,Score,Outcome
0,2/27/2019,Lourdes (Ohio),N,3-4,L
1,2/27/2019,Lourdes (Ohio),N,4-8,L
2,3/2/2019,Cornerstone (Mich.),N,3-4,L
3,3/2/2019,Trinity Baptist,N,5-1,W
4,3/4/2019,Michigan-Dearborn,N,13-1,W


We subset the data to only include opponents with an asterisk which denotes conference games.

In [10]:
conf_df = [df[df.Opponent.str.contains("*", regex = False)] for df in dfc]
conf_df[2][:5]

Unnamed: 0,Date,Opponent,Location,Score,Outcome
7,3/8/2019,Bethel (Ind.) *,N,13-6,W
8,3/9/2019,Bethel (Ind.) *,N,14-2,W
9,3/9/2019,Bethel (Ind.) *,N,3-1,W
10,3/14/2019,Taylor (Ind.) *,A,5-15,L
11,3/16/2019,Taylor (Ind.) *,A,2-10,L


The below code copies the dataframe with `.copy()` to avoid errors, splits the "Score" column into two columns, one for the selected team and one for the opponent. Then, the code `str.replace(' \*', '', regex= True)` eliminates the parentheses and the number between them for extra-inning games.

In [12]:
tidy_conf = conf_df.copy()
for i, df in enumerate(conf_df):
  split_scores = df['Score'].str.replace(r"\(.*\)","").str.split('-', expand = True)
  tidy_conf[i] = df.assign(Score = pd.to_numeric(split_scores[0]),
                           Opp_score = pd.to_numeric(split_scores[1]),
                           Opponent = df.Opponent.str.replace(' \*', '', regex= True),
                           Date = pd.to_datetime(df.Date)
                           )
tidy_conf[2][:5]

Unnamed: 0,Date,Opponent,Location,Score,Outcome,Opp_score
7,2019-03-08,Bethel (Ind.),N,13,W,6
8,2019-03-09,Bethel (Ind.),N,14,W,2
9,2019-03-09,Bethel (Ind.),N,3,W,1
10,2019-03-14,Taylor (Ind.),A,5,L,15
11,2019-03-16,Taylor (Ind.),A,2,L,10


In [15]:
for i in range(len(teams)): #add column for team
    tidy_conf[i]["Team"] = teams[i]
    team = tidy_conf[i].pop("Team")
    tidy_conf[i].insert(1, team.name, team) #move team column to second

In [25]:
tidy_conf[2][:5]

Unnamed: 0,Date,Team,Opponent,Location,Score,Outcome,Opp_score
7,2019-03-08,Grace,Bethel (Ind.),N,13,W,6
8,2019-03-09,Grace,Bethel (Ind.),N,14,W,2
9,2019-03-09,Grace,Bethel (Ind.),N,3,W,1
10,2019-03-14,Grace,Taylor (Ind.),A,5,L,15
11,2019-03-16,Grace,Taylor (Ind.),A,2,L,10


In [38]:
for df in tidy_conf:    
    df.Opponent.replace({
            'Bethel (Ind.)' : 'Bethel',
            'Taylor (Ind.)' : 'Taylor',
            'Spring Arbor (Mich.)' : 'SAU',
            'Huntington (Ind.)' : 'HU',
            'St. Francis (Ind.)' : 'SFU',
            'Indiana Wesleyan' : 'IWU',
            'Mount Vernon Nazarene (Ohio)' : 'MVNU',
            'Marian (Ind.)' : 'Marian',
            'Goshen (Ind.)' : 'Goshen',
            'Grace (Ind.)' : 'Grace'
        }, 
    inplace=True)
tidy_conf[3]

Unnamed: 0,Date,Team,Opponent,Location,Score,Outcome,Opp_score
9,2019-03-08,HU,Taylor,A,7,W,6
10,2019-03-11,HU,Taylor,A,2,W,0
11,2019-03-11,HU,Taylor,A,1,L,5
12,2019-03-18,HU,SAU,H,6,W,5
13,2019-03-19,HU,SAU,H,8,W,0
14,2019-03-19,HU,SAU,H,6,W,4
15,2019-03-23,HU,Bethel,A,3,W,2
16,2019-03-23,HU,Bethel,A,1,L,4
17,2019-03-25,HU,Bethel,A,7,W,3
18,2019-03-28,HU,Grace,A,2,L,12


In [67]:
conf_w = [df[df.Outcome.str.contains("W", regex = False)] for df in tidy_conf]
conf_w[2][:5]

Unnamed: 0,Date,Team,Opponent,Location,Score,Outcome,Opp_score
7,2019-03-08,Grace,Bethel,N,13,W,6
8,2019-03-09,Grace,Bethel,N,14,W,2
9,2019-03-09,Grace,Bethel,N,3,W,1
18,2019-03-28,Grace,HU,H,12,W,2
20,2019-04-01,Grace,HU,H,10,W,9


In [74]:
all_games = pd.concat(conf_w)
all_games = all_games.sort_values('Date')
all_games.rename(columns={
    'Team': 'Win_Tm',
    'Opponent': 'Lose_Tm',
    'Score': 'W_Score',
    'Opp_score': 'L_Score'}, inplace=True)
del all_games["Outcome"]
all_games

Unnamed: 0,Date,Win_Tm,Lose_Tm,Location,W_Score,L_Score
11,2019-03-01,MVNU,SAU,N,5,3
12,2019-03-01,MVNU,SAU,N,12,1
14,2019-03-04,SAU,MVNU,N,11,1
9,2019-03-08,HU,Taylor,A,7,6
7,2019-03-08,Grace,Bethel,N,13,6
...,...,...,...,...,...,...
50,2019-04-27,MVNU,Bethel,A,13,2
42,2019-04-27,Marian,SAU,H,6,3
44,2019-04-27,Goshen,Grace,A,12,2
50,2019-04-27,Taylor,SFU,A,5,1


In [91]:
allTeams = set(all_games.Win_Tm.tolist())
eloLeague = Elo(k = 20)

In [100]:
for team in allTeams:
    eloLeague.addPlayer(team)
for game in all_games.iterrows():
    if game[1].Location == "H":
        eloLeague.gameOver(game[1].Win_Tm, game[1].Lose_Tm,True)
    else:
        eloLeague.gameOver(game[1].Win_Tm, game[1].Lose_Tm,False)
for team in eloLeague.ratingDict.keys():
    print(team, eloLeague.ratingDict[team])

Goshen 1492.979106446127
Grace 1444.2086153343528
Marian 1550.0000191469642
IWU 1534.8724324186744
SFU 1395.259860799865
MVNU 1596.2945623671487
Taylor 1502.4984861827372
SAU 1483.305460104055
Bethel 1403.7620571690106
HU 1596.819400031065


In [101]:
for i in range(2):    
    for key in eloLeague.ratingDict.keys():
            eloLeague.ratingDict[key] = eloLeague.ratingDict[key] - ((eloLeague.ratingDict[key] - 1500) * (1/3.))

In [102]:
for team in eloLeague.ratingDict.keys():
    print(team, eloLeague.ratingDict[team])

Goshen 1496.8796028649454
Grace 1475.20382903749
Marian 1522.2222307319842
IWU 1515.4988588527442
SFU 1453.4488270221623
MVNU 1542.7975832742884
Taylor 1501.1104383034387
SAU 1492.5802044906911
Bethel 1457.2275809640048
HU 1543.030844458251


In [103]:
eloLeague.gameOver("Taylor", "Goshen", True)
eloLeague.gameOver("Taylor", "Goshen", True)

In [104]:
for team in eloLeague.ratingDict.keys():
    print(team, eloLeague.ratingDict[team])

Goshen 1483.074609518382
Grace 1475.20382903749
Marian 1522.2222307319842
IWU 1515.4988588527442
SFU 1453.4488270221623
MVNU 1542.7975832742884
Taylor 1514.9154316500021
SAU 1492.5802044906911
Bethel 1457.2275809640048
HU 1543.030844458251
