This Jupyter notebook contains the code used for my Medium article on NHL overtime. It also contains a detailed
explanation of my reasoning behind each step, for those interested in the more technical side of my analysis

In [48]:
from scipy.stats import poisson
import matplotlib
import pandas as pd
import numpy as np
import plotly.graph_objects as go
import plotly.express as px
import time
import plotly.figure_factory as ff

# Create Simulation Functions

There are many possible ways to estimate the number of points Team A is expected to pick up when playing Team B. Probably the most straight-forward is to simulate a number of matchups between teams A and B, and calculate the average number of points Team A earns per game. To do this, I have assumed the number of goals an NHL team scores in a given game is a Poisson random variable, with lambda being the goals per game scored by each team. I then built a function that inputs the goals per game of two generic NHL teams (Team A and Team B), and simulates "iters" number of games between these two teams. This function returns a dataframe containing three numbers:
- the number of games won by Team A (the number of times in "iters" sims where Team A scored more goals than Team B)
- the number of games won by Team B (same thing but in reverse)
- the number of games in which both teams were tied at the end of regulation

In [2]:
# This function returns the frequency with which each team wins and loses in regulation (or goes into OT)
# based on the estimated number of goals for each team per game

def score_sims(team_a_expected_score, team_b_expected_score, iters):
    a_goals = poisson.rvs(team_a_expected_score, size = iters + 3) # Poisson sim of goals scored in "iters" number of games
    b_goals = poisson.rvs(team_b_expected_score, size = iters + 3) # The "+ 3" is to counteract extreme sims where one team never wins. These 3 extra sims will be taken away later
    margin = a_goals - b_goals # margin of victory is the number of goals Team A score - goals Team B scores
    
    winner = ['A', 'B', 'OT'] # Each possible outcome is preloaded to account for extreme sims where one team never wins
    for i in range(iters): # for each iteration (game)
        if margin[i] > 0: 
            winner.append('A') # if Team A scores more goals, they win
        elif margin[i] < 0: 
            winner.append('B') # if Team B scores more goals, they win
        else:
            winner.append('OT') # if the teams are tied, they go into overtime
    
    games_dict = {'Team A Score': a_goals, 'Team B Score': b_goals, 'Margin': margin, 'Winner': winner}
    games_df = pd.DataFrame.from_dict(games_dict) # A dataframe that keeps track of the results of every game
    
    # margin_df = games_df.groupby('Margin').count().drop(['Team A Score', 'Team B Score'], axis = 1)
    # margin_df.rename(columns = {'Winner': 'Frequency'}, inplace = True)
    
    winner_df = games_df.groupby('Winner').count().drop(['Team A Score', 'Team B Score'], axis = 1)
    winner_df.rename(columns = {'Margin': 'Frequency'}, inplace = True) # A dataframe that counts the frequency of how many games are won in regulation by either Team A or B, and how many go into OT 
    
    return(winner_df - 1) # the "-1" subtracts off the preloaded values

Example output: according to our simulation, if Team A (which scores 4 goals a game) and Team B (which scores 3) meet 10,000 times, Team A wins 5,738 times, Team B wins 2,861, and regulation ends with both teams tied 1,401 times. This means we expect Team A to win about 57% of such games in regulation, Team B to win about 29%, and neither team to win in regulation about 14% of the time.

In [3]:
np.random.seed(1)
ex_sim = score_sims(4, 3, 10000)
ex_sim

Unnamed: 0_level_0,Frequency
Winner,Unnamed: 1_level_1
A,5738
B,2861
OT,1401


Now it's time to assign point values to these simulated results. Each win is worth 2 points, while each loss in regulation is worth nothing. Based on my reasoning from the article, I will assume that once OT is reached, each team has a 50% chance of winning. Because the winner in OT still receives 2 points, but the loser now receives 1, this means both teams are expected to receive 1.5 points (on average) when they make it to OT.

In [4]:
# This function takes the results of the first function, and determines the expected number of "Points" Team A
# is expected to gain in the standings when it plays Team B once

def ex_points(winner_df):
    total = winner_df.sum() # Total number of iterations (games)
    A_pct = winner_df.iloc[0] / total
    B_pct = winner_df.iloc[1] /total
    ot_pct = winner_df.iloc[2] / total
    
    points = A_pct*2 + ot_pct*1.5 # Everytime Team A wins, they get 2 points. Every time they go into OT, they get either 1 or 2. Assuming OT is 50/50, Team A gets 1.5 points per OT game
    
    return(points)

Using the same sim we ran above, we conclude that Team A is expected to earn about 1.39 points every time they play Team B.

In [5]:
ex_points(ex_sim)

Frequency    1.35775
dtype: float64

In [6]:
# This function combines the first two functions in a succinct way. The only purpose of this is that, instead of
# having to run two functions every time you want to calculate expected points, you only have to run one

def point_sims(team_a_expected_score, team_b_expected_score, iters):
    winner_df = score_sims(team_a_expected_score, team_b_expected_score, iters)
    points = ex_points(winner_df)
    
    return(points)

In [7]:
np.random.seed(1)
point_sims(4, 3, 10000)

Frequency    1.35775
dtype: float64

# Actual Results

The database below was created in the "Goal Dif Pre-Process" notebook. For information on how it was compiled, please visit that page.

In [8]:
goal_dif = pd.read_csv('/Users/calebsmith/Documents/Personal Projects/NHL OT/Goal Diffs.csv')
goal_dif

Unnamed: 0,Team,Games Played,Goals Scored,GSPG,Goals Allowed,GAPG,Year,Conference
0,Philadelphia Flyers*,82,267,3.3,259,3.2,2006,East
1,New Jersey Devils*,82,242,3.0,229,2.8,2006,East
2,New York Rangers*,82,257,3.1,215,2.6,2006,East
3,New York Islanders,82,230,2.8,278,3.4,2006,East
4,Pittsburgh Penguins,82,244,3.0,316,3.9,2006,East
...,...,...,...,...,...,...,...,...
480,Vegas Golden Knights,82,266,3.2,248,3.0,2022,West
481,Vancouver Canucks,82,249,3.0,236,2.9,2022,West
482,San Jose Sharks,82,214,2.6,264,3.2,2022,West
483,Anaheim Ducks,82,232,2.8,271,3.3,2022,West


In [9]:
max_goals = goal_dif['GSPG'].max()
max_goals

4.1

In [10]:
min_goals = goal_dif['GSPG'].min()
min_goals

1.9

# Expected Points Matrix

We now come to the game theory portion of this analysis: when should 2 teams agree to "intentionally" take a game into OT. As was discussed in my article, the first criteria is that the teams should be in opposite conferences. The second criteria is a bit more technical: any time both teams are expected to leave a game with less than 1.5 points, they should agree to take it into OT. The reasoning is simple: if we assume that OT is a 50/50 affair, then the expected outcome of an OT game for both teams is 1.5 points. This is obviously better for both teams than an expected outcome below this.

And this is where game theory comes in: we need to assume that both teams will act rationally. Because, from a purely rational standpoint, this would be the right thing to do. Any concern for "sportsmanship" or "pride" have no effect in an economically "rational" view of the world. 

In [11]:
# We now need to do many sims of many different expected scores to determine when two teams should "intentionally"
# go to overtime. To do this, I'll assume a wide range of possible expected outcomes: everything from a team being
# expected to score 1 goals per game against another team, all the way to a team being expected to score 5 goals.
# This is a bit wider of a range than we really need: from the results above, we know the real range to be 
# 1.9 to 4.1 goals per game.

def sims_dist(iters):
    num_list = []
    for n in range(10,50,1):
        num_list.append(n/10) # This creates a list of numbers from 1 to 4.9, with step 0.1
        
    results_array = np.zeros((40,40)) # Create a 40 by 40 array to hold expected points
    
    for r in range(len(num_list)):
        for c in range(len(num_list)):
            results_array[r, c] = point_sims(num_list[c], num_list[r], iters) # This calculates the expected number of points received per game from every possible outcome from 0.1-0.1, all the way to 10-10, and stores them in the array we just created. A reminder that these numbers are expected scores, so for example, a game with an expected outcome of 4-3 won't necessarily end that way.
            
    return(results_array)

The array below contains the projected points earned by Team A for every possible combination of expected scores between 1 and 5:

In [12]:
np.random.seed(1)
sims_array = sims_dist(10000)
sims_array

array([[1.165  , 1.20105, 1.2302 , ..., 1.92135, 1.92195, 1.9246 ],
       [1.09215, 1.13635, 1.1866 , ..., 1.90325, 1.9053 , 1.91775],
       [1.03875, 1.0933 , 1.13935, ..., 1.89   , 1.897  , 1.9022 ],
       ...,
       [0.13945, 0.1415 , 0.1725 , ..., 1.059  , 1.093  , 1.1096 ],
       [0.1265 , 0.14425, 0.16045, ..., 1.0493 , 1.0676 , 1.0788 ],
       [0.11305, 0.141  , 0.1514 , ..., 0.99805, 1.0471 , 1.0653 ]])

# Visualizations

In [101]:
num_list = []
for n in range(10,50,1):
    num_list.append(n/10)

trace = go.Heatmap(x = num_list, y = num_list, z = sims_array, colorbar = {'title': 'Points'})
fig = go.Figure(data = [trace])
fig.update_layout(title = 'Heatmap of Team A Projected Points', 
                  xaxis_title = 'Team A Expected Goals', 
                  yaxis_title = 'Team B Expected Goals',
                  title_x = 0.5)
fig

In [14]:
gspg_table = goal_dif.groupby('GSPG').count()
gspg_table.drop(columns = ['Games Played', 'Goals Scored', 'Goals Allowed', 'GAPG', 'Year', 'Conference'], inplace = True)
gspg_table.rename(columns = {'Team': 'Frequency'}, inplace = True)
gspg_table

Unnamed: 0_level_0,Frequency
GSPG,Unnamed: 1_level_1
1.9,1
2.0,3
2.1,2
2.2,5
2.3,6
2.4,19
2.5,37
2.6,59
2.7,56
2.8,57


In [99]:
hist = go.Figure(data = go.Histogram(x = goal_dif['GSPG']))
hist.update_layout({'plot_bgcolor': 'rgba(0,0,0,0)'})
hist.add_hline(y = 10, line_width = 1, opacity = 0.2)
hist.add_hline(y = 20, line_width = 1, opacity = 0.2)
hist.add_hline(y = 30, line_width = 1, opacity = 0.2)
hist.add_hline(y = 40, line_width = 1, opacity = 0.2)
hist.add_hline(y = 50, line_width = 1, opacity = 0.2)
hist.add_hline(y = 60, line_width = 1, opacity = 0.2)
hist.update_layout(title = 'Distribution of Goals Per Game since 2005', title_x = 0.5)
hist.update_layout(yaxis_title = 'Number of Teams')
hist.update_layout(xaxis_title = 'Goals per Game')

In [87]:
px.histogram(goal_dif, x = 'GSPG')

# Which teams would choose OT?

Based on the results of the array above, I determined which sets of expected scores end with Team A earning more than 1.5 points per game. These are the (nonconference) games that Team A would NOT want to intentionally take into OT, as their expected points would actually decrease by doing so. 

In [16]:
coord_list = []

for l in range(len(sims_array)):
    for l2 in range(len(sims_array)):
        if sims_array[l2, l] >= 1.5:
            coord_list.append([(l+10)/10, (l2+10)/10])
            
coord_list
# Most of these are irrelevant, since no team averaged less than 1.9 goals a game, or more than 4.1 goals per game

[[1.9, 1.0],
 [2.0, 1.0],
 [2.1, 1.0],
 [2.1, 1.1],
 [2.2, 1.0],
 [2.2, 1.1],
 [2.3, 1.0],
 [2.3, 1.1],
 [2.3, 1.2],
 [2.4, 1.0],
 [2.4, 1.1],
 [2.4, 1.2],
 [2.4, 1.3],
 [2.5, 1.0],
 [2.5, 1.1],
 [2.5, 1.2],
 [2.5, 1.3],
 [2.5, 1.4],
 [2.6, 1.0],
 [2.6, 1.1],
 [2.6, 1.2],
 [2.6, 1.3],
 [2.6, 1.4],
 [2.7, 1.0],
 [2.7, 1.1],
 [2.7, 1.2],
 [2.7, 1.3],
 [2.7, 1.4],
 [2.7, 1.5],
 [2.8, 1.0],
 [2.8, 1.1],
 [2.8, 1.2],
 [2.8, 1.3],
 [2.8, 1.4],
 [2.8, 1.5],
 [2.8, 1.6],
 [2.9, 1.0],
 [2.9, 1.1],
 [2.9, 1.2],
 [2.9, 1.3],
 [2.9, 1.4],
 [2.9, 1.5],
 [2.9, 1.6],
 [3.0, 1.0],
 [3.0, 1.1],
 [3.0, 1.2],
 [3.0, 1.3],
 [3.0, 1.4],
 [3.0, 1.5],
 [3.0, 1.6],
 [3.0, 1.7],
 [3.1, 1.0],
 [3.1, 1.1],
 [3.1, 1.2],
 [3.1, 1.3],
 [3.1, 1.4],
 [3.1, 1.5],
 [3.1, 1.6],
 [3.1, 1.7],
 [3.1, 1.8],
 [3.2, 1.0],
 [3.2, 1.1],
 [3.2, 1.2],
 [3.2, 1.3],
 [3.2, 1.4],
 [3.2, 1.5],
 [3.2, 1.6],
 [3.2, 1.7],
 [3.2, 1.8],
 [3.3, 1.0],
 [3.3, 1.1],
 [3.3, 1.2],
 [3.3, 1.3],
 [3.3, 1.4],
 [3.3, 1.5],
 [3.3, 1.6],
 [3.3, 1.7],

The problem with the list above is that many of these combinations never occur with real teams. For instance, no teams scored less than 1.9 goals per game, and no teams scored more than 4.1. Therefore, I trimmed the list to only include outcomes between these 2 parameters.

(NOTE: while it is possible that teams in the future will score more than 4.1 or less than 1.9 goals per game, the point of this analysis is to look back at the number of teams that should have intentionally taken games into OT since the NHL lockout of 2005)

In [17]:
coord_list_relevant = []

for l in range(len(sims_array)):
    for l2 in range(len(sims_array)):
        if ((sims_array[l2, l] >= 1.5) & ((l+10)/10 <= max_goals) & ((l2+10)/10 >= min_goals)):
            coord_list_relevant.append([(l+10)/10, (l2+10)/10])
            
coord_list_relevant

[[3.3, 1.9],
 [3.4, 1.9],
 [3.4, 2.0],
 [3.5, 1.9],
 [3.5, 2.0],
 [3.5, 2.1],
 [3.6, 1.9],
 [3.6, 2.0],
 [3.6, 2.1],
 [3.7, 1.9],
 [3.7, 2.0],
 [3.7, 2.1],
 [3.7, 2.2],
 [3.7, 2.3],
 [3.8, 1.9],
 [3.8, 2.0],
 [3.8, 2.1],
 [3.8, 2.2],
 [3.8, 2.3],
 [3.9, 1.9],
 [3.9, 2.0],
 [3.9, 2.1],
 [3.9, 2.2],
 [3.9, 2.3],
 [3.9, 2.4],
 [4.0, 1.9],
 [4.0, 2.0],
 [4.0, 2.1],
 [4.0, 2.2],
 [4.0, 2.3],
 [4.0, 2.4],
 [4.0, 2.5],
 [4.1, 1.9],
 [4.1, 2.0],
 [4.1, 2.1],
 [4.1, 2.2],
 [4.1, 2.3],
 [4.1, 2.4],
 [4.1, 2.5],
 [4.1, 2.6]]

NOTE: a major assumption I'm making in this analysis (and certainly the worst one I make) is that the number of goals a team is expected to score is not affected by the team they are playing. Obviously, this is not a good assumption. Some teams have good defenses and/or goalies that make it harder for a team to hit their average; others have bad defenses and/or goalies that make it likely that a team will exceed their average. 

However, this complicates the analysis dramatically, and I am skeptical that this complication would drastrically change my conclusion. The reason for this is that accounting for defense and goaltending would not alter the above coordiante list at all. Rather, it would simply change the number of real-world games that fall within it. While this does affect the remainder of my analysis, it doesn't affect anything I've done up to this point. Therefore, for the sake of my Medium article, I decided the cost of this complication wasn't worth the benefit of a slight uptick in accuracy. As you will see from my conclusion, not many teams fall within this range, and I don't have reason to believe that would change much from updating the precision of the real-life numbers.

With that said, I encourage anyone who is interested in taking this analysis further by accounting for defense and goalkeeping to do so. If you have any ideas on how to do this, please contact me at caleb.r.smith95@outlook.com

In [18]:
len(coord_list_relevant)

40

In [19]:
coord_list_relevant[0][0]

3.3

In [20]:
# This makes two separate lists out of our relevant list of coordinates: one with all of Team A's scores, and the
# other with all of Team B's scores
l1 = []
l2 = []

for l in range(len(coord_list_relevant)):
    l1.append(coord_list_relevant[l][0])
    l2.append(coord_list_relevant[l][1])

In [21]:
# This is the least number of goals a team can be expecte to score and not want to intentionally take nonconference game to OT
l1_min = min(l1)
l1_min

3.3

In [22]:
# This is the most number of goals an opposing team can be expected to score that still doesn't convince your team to take them to OT
l2_max = max(l2)
l2_max

2.6

Below is the list of teams that might not want to take a nonconference game into OT, depending on how few goals their opponent is expected to score (any team that scored fewer than 3.3 goals per game since 2006 would always want nonconference games to go into OT). 

In [23]:
winners = goal_dif[goal_dif.GSPG >= l1_min].copy()
winners.reset_index(inplace = True)
winners.drop(columns = ['index', 'Games Played', 'Goals Scored', 'Goals Allowed', 'GAPG'], inplace = True)
winners

Unnamed: 0,Team,GSPG,Year,Conference
0,Philadelphia Flyers*,3.3,2006,East
1,Ottawa Senators*,3.8,2006,East
2,Buffalo Sabres*,3.4,2006,East
3,Carolina Hurricanes*,3.6,2006,East
4,Atlanta Thrashers,3.4,2006,East
5,Detroit Red Wings*,3.7,2006,West
6,Colorado Avalanche*,3.5,2006,West
7,Pittsburgh Penguins*,3.4,2007,East
8,Buffalo Sabres*,3.8,2007,East
9,Ottawa Senators*,3.5,2007,East


Below is the full list of teams that the teams above might choose not to take into OT. Any team that scored more than 2.6 goals a game since 2006 was too good to risk a regulation loss with.

In [24]:
losers = goal_dif[goal_dif.GSPG <= l2_max].copy()
losers.reset_index(inplace = True)
losers.drop(columns = ['index', 'Games Played', 'Goals Scored', 'Goals Allowed', 'GAPG'], inplace = True)
losers

Unnamed: 0,Team,GSPG,Year,Conference
0,Chicago Blackhawks,2.6,2006,West
1,St. Louis Blues,2.4,2006,West
2,New Jersey Devils*,2.6,2007,East
3,Philadelphia Flyers,2.6,2007,East
4,St. Louis Blues,2.6,2007,West
...,...,...,...,...
127,San Jose Sharks,2.6,2020,West
128,Philadelphia Flyers,2.6,2022,East
129,Arizona Coyotes,2.5,2022,West
130,San Jose Sharks,2.6,2022,West


In [25]:
len(winners)

60

In [26]:
len(losers)

132

In [27]:
winner_teams = winners.Team
winner_years = winners.Year
winner_con = winners.Conference
winner_scores = winners.GSPG
loser_teams = losers.Team
loser_years = losers.Year
loser_con = losers.Conference
loser_scores = losers.GSPG

In [28]:
loser_years

0      2006
1      2006
2      2007
3      2007
4      2007
       ... 
127    2020
128    2022
129    2022
130    2022
131    2022
Name: Year, Length: 132, dtype: int64

I can finally create the list of nonconference matchups since 2006 that, from a rational perspective, should not have intentionally been taken into OT.

In [29]:
final_list = []

for w in range(len(winners)):
    for l in range(len(losers)):
        
        if((winner_years[w] == loser_years[l]) & (winner_con[w] != loser_con[l])): # If the potential matchup occured in the same year and the opposite conference, advance to next criteria
            score_coord = [winner_scores[w], loser_scores[l]]
            
            if(score_coord in coord_list_relevant): # If the potential matchup falls within our relevant coordinate list, this game should not be intentionally taken into OT
                final_list.append([winner_years[w], winner_teams[w], winner_con[w], winner_scores[w], loser_teams[l], loser_con[l], loser_scores[l]])

In [30]:
final_list

[[2014, 'Chicago Blackhawks*', 'West', 3.3, 'Buffalo Sabres', 'East', 1.9],
 [2017,
  'Pittsburgh Penguins*',
  'East',
  3.4,
  'Colorado Avalanche',
  'West',
  2.0],
 [2019, 'Tampa Bay Lightning*', 'East', 4.0, 'Anaheim Ducks', 'West', 2.4],
 [2019, 'Tampa Bay Lightning*', 'East', 4.0, 'Los Angeles Kings', 'West', 2.5],
 [2020, 'Colorado Avalanche*', 'West', 3.4, 'Detroit Red Wings', 'East', 2.0],
 [2022, 'Florida Panthers*', 'East', 4.1, 'Arizona Coyotes', 'West', 2.5],
 [2022, 'Florida Panthers*', 'East', 4.1, 'San Jose Sharks', 'West', 2.6],
 [2022, 'Florida Panthers*', 'East', 4.1, 'Seattle Kraken', 'West', 2.6]]

Since 2006, these are the only 8 nonconference matchups in which one team should not have agreed to take the game into OT.

In [31]:
final_df = pd.DataFrame(final_list, columns = ['Year', 'Winning Team', 'Conference', 'Expected Score', 'Losing Team', 'Conference', 'Expected Score'])
final_df

Unnamed: 0,Year,Winning Team,Conference,Expected Score,Losing Team,Conference.1,Expected Score.1
0,2014,Chicago Blackhawks*,West,3.3,Buffalo Sabres,East,1.9
1,2017,Pittsburgh Penguins*,East,3.4,Colorado Avalanche,West,2.0
2,2019,Tampa Bay Lightning*,East,4.0,Anaheim Ducks,West,2.4
3,2019,Tampa Bay Lightning*,East,4.0,Los Angeles Kings,West,2.5
4,2020,Colorado Avalanche*,West,3.4,Detroit Red Wings,East,2.0
5,2022,Florida Panthers*,East,4.1,Arizona Coyotes,West,2.5
6,2022,Florida Panthers*,East,4.1,San Jose Sharks,West,2.6
7,2022,Florida Panthers*,East,4.1,Seattle Kraken,West,2.6


In [57]:
scores_df = final_df['Expected Score']
scores_df.columns = ['Ex_Score_W', 'Ex_Score_L']
scores_df

Unnamed: 0,Ex_Score_W,Ex_Score_L
0,3.3,1.9
1,3.4,2.0
2,4.0,2.4
3,4.0,2.5
4,3.4,2.0
5,4.1,2.5
6,4.1,2.6
7,4.1,2.6


In [79]:
winning_scores = scores_df['Ex_Score_W']
losing_scores = scores_df['Ex_Score_L']
ex_points = []

for i in range(len(scores_df)):
    ex_points.append(round(sims_array[int(losing_scores[i]*10 - 10), int(winning_scores[i]*10 - 10)], 2))
    
ex_points

[1.54, 1.52, 1.54, 1.51, 1.52, 1.52, 1.5, 1.5]

In [80]:
final_df['Expected Points'] = ex_points
final_df

Unnamed: 0,Year,Winning Team,Conference,Expected Score,Losing Team,Conference.1,Expected Score.1,Expected Points
0,2014,Chicago Blackhawks*,West,3.3,Buffalo Sabres,East,1.9,1.54
1,2017,Pittsburgh Penguins*,East,3.4,Colorado Avalanche,West,2.0,1.52
2,2019,Tampa Bay Lightning*,East,4.0,Anaheim Ducks,West,2.4,1.54
3,2019,Tampa Bay Lightning*,East,4.0,Los Angeles Kings,West,2.5,1.51
4,2020,Colorado Avalanche*,West,3.4,Detroit Red Wings,East,2.0,1.52
5,2022,Florida Panthers*,East,4.1,Arizona Coyotes,West,2.5,1.52
6,2022,Florida Panthers*,East,4.1,San Jose Sharks,West,2.6,1.5
7,2022,Florida Panthers*,East,4.1,Seattle Kraken,West,2.6,1.5


# Total number of potential non-conference matchups

In [32]:
num_teams = goal_dif.groupby(['Year', 'Conference']).count()
num_teams.drop(columns = ['Games Played', 'Goals Scored', 'GSPG', 'Goals Allowed', 'GAPG'], inplace = True)
num_teams.rename(columns = {'Team': 'Teams'}, inplace = True)
num_teams

Unnamed: 0_level_0,Unnamed: 1_level_0,Teams
Year,Conference,Unnamed: 2_level_1
2006,East,15
2006,West,15
2007,East,15
2007,West,15
2008,East,15
2008,West,15
2009,East,15
2009,West,15
2010,East,15
2010,West,15


In [33]:
teams_analysis = num_teams.reset_index()
teams_analysis

Unnamed: 0,Year,Conference,Teams
0,2006,East,15
1,2006,West,15
2,2007,East,15
3,2007,West,15
4,2008,East,15
5,2008,West,15
6,2009,East,15
7,2009,West,15
8,2010,East,15
9,2010,West,15


In [34]:
east = teams_analysis[teams_analysis.Conference == 'East'].copy()
west = teams_analysis[teams_analysis.Conference == 'West'].copy()

In [35]:
east.reset_index(inplace = True)
east.drop(columns = 'index', inplace = True)
east

Unnamed: 0,Year,Conference,Teams
0,2006,East,15
1,2007,East,15
2,2008,East,15
3,2009,East,15
4,2010,East,15
5,2011,East,15
6,2012,East,15
7,2013,East,15
8,2014,East,16
9,2015,East,16


In [36]:
west.reset_index(inplace = True)
west.drop(columns = 'index', inplace = True)
west

Unnamed: 0,Year,Conference,Teams
0,2006,West,15
1,2007,West,15
2,2008,West,15
3,2009,West,15
4,2010,West,15
5,2011,West,15
6,2012,West,15
7,2013,West,15
8,2014,West,14
9,2015,West,14


In [37]:
final_con = pd.merge(east, west, how = 'inner', on = 'Year')
final_con['Combinations'] = final_con['Teams_x'] * final_con['Teams_y']
final_con

Unnamed: 0,Year,Conference_x,Teams_x,Conference_y,Teams_y,Combinations
0,2006,East,15,West,15,225
1,2007,East,15,West,15,225
2,2008,East,15,West,15,225
3,2009,East,15,West,15,225
4,2010,East,15,West,15,225
5,2011,East,15,West,15,225
6,2012,East,15,West,15,225
7,2013,East,15,West,15,225
8,2014,East,16,West,14,224
9,2015,East,16,West,14,224


In [38]:
total_comb = final_con.Combinations.sum()
total_comb

3672

In [102]:
1 - (len(final_df) / total_comb)

0.9978213507625272

Therefore, my final conclusion is that about 99.8% of NHL non-conference games should be intentionally taken into OT to maximize each team's expected number of points. This is a reasonable outcome because of how stupid the NHL's overtime loss system is.