In [1]:
%matplotlib inline
import os
import sys
import pandas
import numpy
import matplotlib
import matplotlib.pyplot as plt
pandas.set_option('display.notebook_repr_html', False)
pandas.set_option('display.max_columns', 20)
pandas.set_option('display.max_rows', 25)
pandas.set_option('precision',3)

from decimal import getcontext, Decimal
# Set the precision.
getcontext().prec = 4

In [2]:
dm = pandas.read_csv('nhl_ingame_goals.csv')
dm = dm.sort_values(by=['season', 'gameno', 'goalno'])
dm = dm[dm['period']==1]
dm = dm.groupby(['gameno']).last()
dm = dm.drop(['goalno', 'secstart', 'gteam'], axis=1)
dm.reset_index()
dm.head()

          gamedate  season  period hteam ateam wteam  hgoals  agoals
gameno                                                              
20500   2014-12-22    2014       1   VAN   ARI  home       2       0
20502   2014-12-22    2014       1    LA   CGY  away       1       0
20503   2014-12-23    2014       1   BOS   NSH  home       2       1
20504   2014-12-23    2014       1    NJ   CAR  away       0       0
20505   2014-12-23    2014       1   NYI   MTL  away       0       0

The data contains score of the game at the end of the first period and game outcome information for NHL games. 

Use Bayes' rule to calculate the (posterior) probability of a home team win given the home team is leading the game by one goal at the end of the first period.

$$ p(hwin_g| dgoals_g = +1 ) = \frac{p(dgoals_g = +1|hwin_g) p(hwin_g)}{ p(dgoals_g = +1 | hwin_g) p(hwin_g) +  p(dgoals_g = +1 | \overline{hwin_g}) p( \overline{hwin_g}) }$$

# Question

List the information required to calculate the posterior probability

# Answer

In general:

* prior and inverse conditional probabilities

Specifically:

* prior probability of a home win: $p(hwin_g)$
* prior probability of a home loss: $p( \overline{hwin_g})$
* probability the score differential is home team leads by one goal at the end of the first period when the home team wins: $p(dgoals_g = +1|hwin_g)$
* probability the score differential is home team leads by one goal at the end of the first period when the home team losses: $p(dgoals_g = +1 | \overline{hwin_g})$


# Question

Calculate the following inverse conditional probabilities:

$p(dgoals_g = +1|hwin_g)$

$p(dgoals_g = +1|\overline{hwin_g})$

In [17]:
# create variables in master data frame
dm['dgoals'] = dm['hgoals'] - dm['agoals']
dm['l1'] = numpy.where(dm['dgoals']==1, 1, 0)

# create inverse conditional data frame
d1 = dm[dm['wteam']=='home']
d0 = dm[dm['wteam']=='away']

# calculate inverse conditional probabilities

inv_cond_1 = round(d1['l1'].mean(), 2)
inv_cond_0 = round(d0['l1'].mean(), 2)

The  inverse conditional probabilities:

$p(dgoals_g = +1|hwin_g)$: {{inv_cond_1}}

$p(dgoals_g = +1|\overline{hwin_g})$: {{inv_cond_0}}

# Question

Assume the prior probability of a home team win is 0.50. Use Bayes' rule to calculate the (posterior) probability of a home team win given the home team is leading the game by one goal at the end of the first period.

# Answer

In [19]:
prior_1 = 0.50
prior_0 = 1 - prior_1

post = (prior_1 * inv_cond_1) / (prior_1 * inv_cond_1 + prior_0 * inv_cond_0)
post = round(post_a, 2)

The posterior probability is {{post}}

# Question

Assume the prior probability of a home team win is 0.40. Use Bayes' rule to calculate the (posterior) probability of a home team win given the home team is leading the game by one goal at the end of the first period.

# Answer

In [20]:
prior_1 = 0.40
prior_0 = 1 - prior_1

post = (prior_1 * inv_cond_1) / (prior_1 * inv_cond_1 + prior_0 * inv_cond_0)
post = round(post, 2)

The posterior probability is {{post}}

# Question

State Bayes' rule in your own words. Explain the reason is useful model to implement in sports analytics.

# Answer

Bayes' rule updates probabilities and with new information. Bayes' rule is a useful model to implement in sports analytics for two primary reasons.

* sports contests are based on uncertain events/outcomes that can be represented as probabilities
* historical data are readily available to calculate prior probabilities 