# Exercise 1 (The World Series) 

The “World Series” is a tournament between the champion of the USA’s National League and American League to decide the U.S. Major League Baseball champion. At present, it is won by the first team to win four games out of a possible seven. Since baseball games do not end in ties, at most seven games are ever played.

It is often said that “baseball is a game of inches.” This means that small changes in the physical outcomes of a given play can lead to loss or victory. It also means that the outcome of a game between two teams is effectively random. Let us say that if the probability p that Team A beats Team B is strictly greater than 1/2, then Team A is better than Team B. Note that it is possible (with probability 1 − p) for the better team to lose a game. Frederick Mosteller estimated (based on data from 44 Series from the first half of the 20th century) that the probability that the better team wins any given World Series game is 0.65 and that the outcomes of the games are stochastically independent. A few years ago I redid his calculation for all 108 Series through 2012 and came up with 0.59. (You will have a chance to figure this out later in the course with data through the 2016 Series.)

Let p be the probability that Team A wins any given game. Assume that it is the same for every game, and that the outcomes of the game are stochastically independent. The probability that Team B wins is thus 1 − p.

We can describe the general rule to determine the winner in two ways. Either the winner is first team to win m games, or as the team that wins the most out of 2m − 1 games. In practice, the series is over as soon as one team wins m games.

#### Exercise 1.1

1. What is the probability that Team A wins the series in exactly m games? (Give the formula, and explain it.)

The only way Team A can win the series in exactly $m$ games is if it wins the first $m$ games.  This happens with probability $p^m$. $\square$

In [1]:
import numpy as np

In [75]:
# empirical check
m = 3
p = 0.6
num_experiments = 1000
data = np.random.binomial(m, p, size=num_experiments)
print("Empirical result = {0}".format(np.count_nonzero(data == m)/num_experiments))
print("Theoretical result = {0}".format(p**m))

Empirical result = 0.208
Theoretical result = 0.21599999999999997


#### Exercise 1.2

2. What is the probability that Team B wins the series in exactly m games?

Similarly, the only way Team B can win the series in exactly $m$ games is if it wins the first $m$ games.  This happens with probability $(1-p)^m$. $\square$

In [74]:
# empirical check
m = 3
p = 0.6
num_experiments = 1000
data = np.random.binomial(m, (1-p), size=num_experiments)
print("Empirical result = {0}".format(np.count_nonzero(data == m)/num_experiments))
print("Theoretical result = {0}".format((1-p)**m))

Empirical result = 0.067
Theoretical result = 0.06400000000000002


#### Exercise 1.3

3. What is the probability that the series is over in exactly m games?

There are only two ways the series is over in exactly $m$ games.  Either Team A wins the first $m$ games, or Team B wins the first $m$ games.  This happens with probability $p^m+(1-p)^m$.$\square$

In [50]:
# empirical check
m = 3
p = 0.6
num_experiments = 1000
data = np.random.binomial(m, p, size=num_experiments)
print("Empirical result = {0}".format(np.count_nonzero(data == m)/num_experiments+
                                      np.count_nonzero(data == 0)/num_experiments))
print("Theoretical result = {0}".format(p**m + (1-p)**m))

Empirical result = 0.262
Theoretical result = 0.27999999999999997


#### Exercise 1.4

4. What is the probability that Team A wins in exactly m + 1 games?

This happens whenver Team A loses one of the first $m$ games, and wins the remainder of the first $m+1$ games.  Each such outcome has probability $p^m(1-p)$, and there are $m$ different ways this can happen (Team A loses game $i, i\in\{1,\dots,m\}$).  Hence, the probability Team A wins in exactly in $m+1$ games is given by $mp^m(1-p)$. $\square$

In [73]:
# empirical check
m = 3
p = 0.6
num_experiments = 10000
# outcomes of first m games
first_m = np.random.binomial(m, p, size=num_experiments)
# outcome of m+1-th game
mPlus1 = np.random.binomial(1, p, size=num_experiments)
# series is over precisely when there are m-1 successes in the first m games,
# and the mth game is a success
results = [int(first_m[i]==m-1 and mPlus1[i]==1) for i in range(num_experiments)]
print("Empirical result = {0}".format(sum(results)/num_experiments))
print("Theoretical result = {0}".format(m*(p**m)*(1-p)))

Empirical result = 0.2648
Theoretical result = 0.2592


#### Exercise 1.5

5. What is the probability that the series lasts exactly 2m − 1 games?

In order for the series to last exactly $2m-1$ games, both teams must win exactly $m-1$ games in the fist $2(m-1)$ games.  Each such outcome has probability $p^{m-1}(1-p)^{m-1}$.  There are ${2(m-1)\choose m-1}$ distinct ways such an outcome could occur.  Hence, the probability is ${2(m-1)\choose m-1}p^{m-1}(1-p)^{m-1}$. $\square$

In [None]:
import scipy.special

In [110]:
# empirical check
m = 4
p = 0.5
num_experiments = 10000
counter = 0
# run experiments
for experiment in range(num_experiments):
    # store first 2*(m-1) games
    data = []
    for game in range(2*(m-1)):
        outcome = np.random.binomial(1, p, size=1)[0]
        data.append(outcome)
    # sum(data)==m-1 if and only if series ends in 2m-1 games
    if sum(data)==m-1:
        counter+=1
print("Empirical result = {0}".format(counter/num_experiments))
print("Theoretical result = {0}".format(scipy.special.binom(2*(m-1), m-1)*p**(m-1)*(1-p)**(m-1)))

Empirical result = 0.3141
Theoretical result = 0.3125


#### Exercise 1.6

What is the probability that Team A wins the series by being the first team to win m games?

Team A wins the series by being the first team to win $m$ games precisely when:

$0$) There are $m+0$ games total.  Team A wins the first $m$ games.

$1$)  There are $m+1$ games total.  Team A wins $m$ of these games, with one of those games necessarily being the last (i.e. $m+1$-th) game.  Note that if this were not the case, then the series would have ended earlier.

\vdots

$m-1$) There are $m+m-1=2m-1$ games total.  Team A wins $m$ of these games, with one of those games necessarily being the last (i.e. $2m-1$-th) game.  Again, note that if this were not the case, then the series would have ended earlier.

For each case (i) above, the probability that such an outcome occurs is $p^m(1-p)^i$, and there are ${m+i-1\choose i}$ (i.e. "we have to distribute $i$ losses among the first $m+i-1$ games)  such ways for that outcome to occur.  Hence, the probability is
\begin{equation}
\sum_{i=0}^{m-1}{m+i-1\choose i}p^m(1-p)^i.\square
\end{equation}

In [143]:
# empirical check
# probability Team A wins
p = 0.6
# first to m games
m = 4
# case, integer in {0,1,...,m-2,m-1}
for case in range(0, m-1+1):
    # total games = m+case
    total_games = m + case
    # start experiments
    num_experiments = 10000
    counter = 0
    # run experiments
    for experiment in range(num_experiments):
        # store first 2*(m-1) games
        data = []
        for game in range(total_games):
            outcome = np.random.binomial(1, p, size=1)[0]
            data.append(outcome)
        # sum(data)==m-1 if and only if series ends in 2m-1 games
        if data[-1]==1 and sum(data[:-1])==m-1:
            counter+=1
    print("Empirical result = {0}".format(counter/num_experiments))
    theoretical_p = scipy.special.binom(m+case-1, case)*p**m*(1-p)**case
    print("Theoretical result = {0}".format(theoretical_p))

Empirical result = 0.1295
Theoretical result = 0.1296
Empirical result = 0.2065
Theoretical result = 0.20736
Empirical result = 0.2041
Theoretical result = 0.20736000000000002
Empirical result = 0.1657
Theoretical result = 0.165888


#### Exercise 1.7

Suppose the rule was that the teams had to play all $2m−1$ games. What is the probability that Team A wins the series? What interesting algebraic fact does this prove?

Team A wins the series precisely if 

$0$) Team A wins $m+0$ games.

$1$)  Team A wins $m+1$ games total.

\vdots

$m-1$) Team A wins $m+m-1=2m-1$ games.

For each case $i$ above, the outcome occurs with probability $p^{m+i}(1-p)^{2m-1-m-i}=p^{m+i}(1-p)^{m-1-i}$.  There are ${2m-1\choose m+i}$ ways for this to occur (i.e., we have to distribute $m+i$ victories across $2m-1$ total games).  Hence, the probability is given by
\begin{equation}
\sum_{i=0}^{m-1} {2m-1\choose m+i}p^{m+i}(1-p)^{m-1-i}.
\end{equation}
This is the same probability we obtained in the previous exercise. $\square$

In [None]:
# empirical check
# probability Team A wins
p = 0.6
# first to m games
m = 4
# total games = m+case
total_games = 2*m-1
# start experiments
num_experiments = 10000
counter = 0
# run experiments
for experiment in range(num_experiments):
    # store first 2*(m-1) games
    data = []
    for game in range(total_games):
        outcome = np.random.binomial(1, p, size=1)[0]
        data.append(outcome)
    # sum(data)==m-1 if and only if series ends in 2m-1 games
    if sum(data)>=m:
        counter+=1
print("Empirical result = {0}".format(counter/num_experiments))
theoretical_p=0
for i in range(0,m-1+1):
    theoretical_p += scipy.special.binom(2*m-1, m+i)*p**(m+i)*(1-p)**(m-1-i)
print("Theoretical result = {0}".format(theoretical_p))

#### Exercise 1.8

What is the probability that the Team A wins a best-of-7 series (m = 4) if p = 0.65? (Remember to give both a formula and a numeric answer.)

We already derived a formula above.  The code below produces a numerical answer. $\square$

In [145]:
# empirical check
# probability Team A wins
p = 0.65
# first to m games
m = 4
# total games = m+case
total_games = 2*m-1
# start experiments
num_experiments = 100000
counter = 0
# run experiments
for experiment in range(num_experiments):
    # store first 2*(m-1) games
    data = []
    for game in range(total_games):
        outcome = np.random.binomial(1, p, size=1)[0]
        data.append(outcome)
    # sum(data)==m-1 if and only if series ends in 2m-1 games
    if sum(data)>=m:
        counter+=1
print("Empirical result = {0}".format(counter/num_experiments))
theoretical_p=0
for i in range(0,m-1+1):
    theoretical_p += scipy.special.binom(2*m-1, m+i)*p**(m+i)*(1-p)**(m-1-i)
print("Theoretical result = {0}".format(theoretical_p))

Empirical result = 0.79912
Theoretical result = 0.8001542656250001
