# homework 06: Student's game night


In [None]:
# Importing all the packages we might need:

import numpy as np
import sys
import math
import seaborn as sns
import matplotlib.pyplot as plt
import numpy.random as rand
import scipy.stats as stats
import scipy.misc as misc
from scipy.special import logsumexp # for log probabilities 

**1. the beginner's game**

Write a script that:

takes n observations xi..xn and one of the 20 possible values of σ (i.e. a known row, specified by Student) as input.

calculates the posterior probability P(μ∣xi..xn,σ) for each of the 21 possible values of μ on that grid row. Remember that the prior P(μ) is uniform.

plots that distribution on a semilog scale (so you can see differences in the small-probability tail more easily), using the semilogy plot of matplotlib for example.

and plots the pub's calculated probability distribution on the same semilog plot, so you can compare.

You can use this script, which implements Student's game, to generate data (and σ) to try your analysis out, for varying numbers of samples, especially small n (3-6).

Have your script show the plots for the X = [ 11.50, -2.32, 9.18], true_sigma = 60. example.

It is said that Student uniformly samples μ and σ, which means each μ and σ have the same chance of being picked (he throws a dart with uniform probability).

We begin by setting the parameters specified in the problem and creating the grid:



For this first game, we have a number n of samples, and a random σ (which is the same as the row number).

In [None]:
# Set up the x and y coords
gridrows = np.linspace( 100.0,   5.0, 20)   # rows of the board: std. dev., sigma
gridcols = np.linspace(-100.,  100.0, 21)   # cols of the board: mean (location), mu
nrows = len(gridrows)
ncols = len(gridcols)

# Set up a pseudo-random integer generator:
seed = np.random.randint(0,10000)
np.random.seed(seed)

# Student throws a uniformly distributed dart into the grid, and this
# chooses mu, sigma. These values are unknown to the customers.

true_row = np.random.randint(0, nrows)    # Note, randint(0,n) samples 0..n-1
true_col = np.random.randint(0, ncols)

true_sigma = gridrows[true_row]
true_mu    = gridcols[true_col]

# Student's tea distribution machine drops observed samples onto the
# line on the bar: nX of them, X[0..nX-1]

X  =  [11.50, -2.32, 9.18] 

sample_mean  = np.mean(X)
sample_stdev = np.std(X, ddof = 1)   ## ddof is "degrees of freedom". 0 = population sd; 1 = sample sd.

After we have written the above, where we specify the number nX of samples and also the randomly picked σ, we now will calculate the posterior probability for each of the 21 possible values of μ on that grid row.

In [None]:
""" 
    Takes n observations xi..xn (array X) and one of the 20 possible values of σ 
    (i.e. a known row, specified by Student) as input.
    
    returns a list of the inferred P(μ | X,sigma) for each column (μ))
"""

def probdist_beginner(X, sigma, mu_values):
    
    numerators = [] # empty list to store all of the numerators
    log_term = np.log(1/ncols)  # log term for 1/21 (probability of each column (μ))
    
    # Loop through the columns for each mu:
    for k in mu_values:
        numerator = log_term
        for x in X:
            numerator = numerator + stats.norm.logpdf(x,k,sigma) # for each one add this term to the numerator
        numerators.append(numerator) # add each term; should make a list of 21 numerator terms
    denominators = logsumexp(numerators) # this returns the log sum of all the items in the list
    posts = np.exp(numerators - denominators) # return the square of the subtraction between the nums and denoms
 
    return posts

Calling our function:

In [None]:
PrB = probdist_beginner([ 11.50, -2.32, 9.18], 60, gridcols)

Now that we have our result for the beginner game we will want to compare this with the pub's calculation. In order to plot this we first need to calculate the pub's odds:

In [None]:
def probdist_beginner_pub(X, sigma, mu_values):
    """ 
    Given an ndarray X_1..X_n, and a known sigma;
    and a list of the mu values in each column;
    return a list of the inferred P(mu | X,sigma) for each column.
    """
    xbar = np.mean(X)
    N    = len(X)
    Pr   = [ stats.norm.pdf(x, loc=xbar, scale=sigma / np.sqrt(N)) for x in mu_values ]  # proportional to std error of the mean
    Z    = sum(Pr)                   # normalization constant
    Pr   = [ p / Z for p in Pr ]     # normalization to a discrete probability distribution
    return Pr

Pub's beginner odds function call:

In [None]:
PrB_pub = probdist_beginner_pub(X, 60, gridcols)

Now we plot our beginner distribution next to the pub's to see how they compare using the X = [11.50, -2.32, 9.18], true_sigma = 60 example.

In [None]:
# Set up our graphical display.
#
# We'll show the pub's supposedly "fair odds" probability distribution plot for the
# beginner version and the advanced version, as semilog plots.
#
f, (ax1, ax2) = plt.subplots(2,1, sharey=True)  # figure consists of 2 graphs, 2 rows x 1 col

ax1.semilogy(gridcols, PrB_pub, label="pub's estimate: beginner (sigma known)")
ax1.xaxis.set_ticks(gridcols)
ax1.set(xlabel='$\mu$', ylabel='$P(\mu \mid \sigma)$')
ax1.legend(loc="best")

ax2.semilogy(gridcols, PrB, label="our estimate: beginner (sigma known)")
ax2.xaxis.set_ticks(gridcols)
ax2.set(xlabel='$\mu$', ylabel='$P(\mu \mid \sigma)$')
ax2.legend(loc="best")

plt.show()

**2. the advanced game**

Now write a second script that:

just takes n observations xi..xn.

calculates the posterior probability P(μ,σ∣xi..xn) for each of the 420 (20x21) possible values of σ,μ on Student's grid.

plots that 20x21 posterior distribution as a heat map

marginalizes (sum over the rows) to obtain P(μ∣xi..xn):

P(μ∣xi..xn)=∑σP(μ,σ∣xi..xn)
plots that marginal distribution on a semilog scale;

and plots the pub's calculated probability distribution, so you can compare.

Again you can use the Student's game script to generate data (and σ) to try your analysis out, for varying numbers of samples, especially small n (3-6).

Have your script show the plots for the X = [ 11.50, -2.32, 9.18] example.

We will use the parameters we already wrote above for the beginner game here too, with the main difference here being that sigma is unknown.

Creating our first function for the advanced game where we will calculate the posterior probability for each of the 420 (20x21) possible values of σ,μ on Student's grid.

We will also marginalize once here and then again below. 


In [None]:
def probdist_advanced(X, sigma_values, mu_values):

    posts = np.zeros((len(sigma_values), len(mu_values))) #empty array with 20x21 dimensions
    log_term = np.log(1/ncols) + np.log(1/nrows)  # log term for 1/21 + 1/20
    
    for i in range(len(mu_values)): # mus
        for j in range(len(sigma_values)): # sigmas
            numerator = log_term  # use our prior here
            for x in X: 
                numerator = numerator + stats.norm.logpdf(x, mu_values[i], sigma_values[j])
            posts[j][i] = numerator
        
    denominators = logsumexp(posts) # sum of the posts
    posts = np.exp(posts - denominators) # return the square of the subtraction between the nums and denoms
    
    marginals = [] # Empty list to store the marginals we calculate below.
    
    for i in range(len(posts[0,:])): # marginalizing across mus (columns)
        marginals.append(np.sum(posts[:,i]))
        
    return posts, marginals

Calling the function for the specified X example:

In [None]:
posts, marginals = probdist_advanced(X, gridrows, gridcols)

Now lets plot that posterior distribution as a heat map using seaborn:

In [None]:
# "Data" are all zeros, with one 1.0 at (5,15)
data = posts            # Initializes a numpy "n-D array" (ndarray)
        
# Format axis labels as strings, with values as "10" not "10.0" for clarity, space
xlabels = [ "{0:.0f}".format(val) for val in gridcols ]
ylabels = [ "{0:.0f}".format(val) for val in gridrows ]

# the Seaborn "heatmap" plot
# with some examples of how it can be customized.
#
ax = sns.heatmap(data,                 # takes a 2D array of data
                 xticklabels=xlabels,  #   ... set custom x axis labels
                 yticklabels=ylabels,  #   ... set custom y axis labels
                 cbar=False,           #   ... turn off the default color scale bar
                 square=True,          #   ... force the plot to be square
                 linecolor='grey',     #   ... set grid line color
                 linewidth=0.5)        #   ... set grid line width

# now we have an Axes object that Seaborn returned to us,
# and we can do additional customization, like...

ax.set(xlabel='$\mu$',                 # ...set X axis label, using LaTeX formatting
       ylabel='$\sigma$')              # ...and Y axis label
for label in ax.get_yticklabels():    
    label.set_size(10)                 # ... and font size on y-axis tick labels
for label in ax.get_xticklabels():    
    label.set_size(10)                 # ... and on x-axis tick labels


Calling our function we wrote above:

In [None]:
def probdist_advanced_pub(X, mu_values):
    """ 
    Given an ndarray X_1..X_n,
    and a list of the mu values in each column;
    return a list of the inferred P(mu | X) for each column.
    """
    xbar = np.mean(X)
    s    = np.std(X, ddof=1)     # note that numpy.sd() by default calculates a population std dev; to get sample std. dev., set ddof=1
    N    = len(X)
    Pr   = [ stats.norm.pdf(x, loc=xbar, scale= s / np.sqrt(N)) for x in mu_values ]  # proportional to std error of the mean
    Z    = sum(Pr)                   # normalization constant
    Pr   = [ p / Z for p in Pr ]     # normalization to a discrete probability distribution
    return Pr

Function call for the pub's advanced game results, once again using the specified parameters:

In [None]:
PrA_pub = probdist_advanced_pub(X, gridcols)

Plotting our results for our advanced game implementation compared to the pub's supposedly "fair odds" distribution for the advanced game:

In [None]:
# Set up our graphical display.
#
# We'll show the pub's supposedly "fair odds" probability distribution plot for the
# advanced version compared to our own advanced game odds, as semilog plots.
#
f, (ax1, ax2) = plt.subplots(2,1, sharey=True)  # figure consists of 2 graphs, 2 rows x 1 col

ax1.semilogy(gridcols, PrA_pub, label="pub's estimate: advanced (sigma unknown)")
ax1.xaxis.set_ticks(gridcols)
ax1.set(xlabel='$\mu$', ylabel='$P(\mu)$')
ax1.legend(loc="best")

ax2.semilogy(gridcols, marginals, label="our estimate: advanced (sigma unknown)")
ax2.xaxis.set_ticks(gridcols)
ax2.set(xlabel='$\mu$', ylabel='$P(\mu)$')
ax2.legend(loc="best")

plt.show()


Additionally, we will also compare our results with the t-distribution code the pub should have used.
Adding the t-distribution code from the pub to compare with our Bayesian calculation above:

In [None]:
def probdist_t(X, mu_values):
    """ 
    Given an ndarray X_1..X_n,
    and a list of the mu values in each column;
    return a list of the inferred P(mu | X) for each column,
    according to Student's t distribution with N-1 degrees of freedom.
    """
    N    = len(X)
    xbar = np.mean(X)
    s    = np.std(X, ddof=1)
    t    = [ (xbar - mu) / (s / np.sqrt(N)) for mu in mu_values ]    # t statistic, given sample mean, sample stddev, and N
    #t    = [ stats.ttest_1samp(X, mu)[0] for mu in mu_values ]       # ... (equivalently, python can calculate t statistic for you)
    Pr   = [ stats.t.pdf(val, N-1) for val in t ]
    Z    = sum(Pr)
    Pr   = [ p / Z for p in Pr ]    
    return Pr

Function call for the pub's t-distribution code odds:

In [None]:
PrT = probdist_t(X, gridcols)

Plotting our results for our advanced game implementation compared to the t-distribution odds:

In [None]:
f, (ax1, ax2) = plt.subplots(2,1, sharey=True)  # figure consists of 2 graphs, 2 rows x 1 col

ax1.semilogy(gridcols, PrT, label="pub's t-distribution: advanced (sigma unknown)")
ax1.xaxis.set_ticks(gridcols)
ax1.set(xlabel='$\mu$', ylabel='$P(\mu)$')
ax1.legend(loc="best")

ax2.semilogy(gridcols, marginals, label="our estimate: advanced (sigma unknown)")
ax2.xaxis.set_ticks(gridcols)
ax2.set(xlabel='$\mu$', ylabel='$P(\mu)$')
ax2.legend(loc="best")

plt.show()

**3. where's the advantage?**
Is the pub calculating its odds correctly? Where do you see an advantage?

For the beginner's game, we can see by looking at our graph compared to the pub's graph that we are calculating the odds the same way, even as we increase the sample size. We are also using a normal distribution and the correct std dev.

We do see a difference between our graph and the pub's graph in the advanced game, which gives us an idea about why we have an advantage:

Our advantage over the pub comes becaus  we are considering all the different possibilities with each sigma and then marginalizing over all of these possible sigmas.
On the other hand, the pub is simply computing the sample standard deviation, and not considering the marginalization over sigma that will give a better overall result.
We actually see that our method looks like the t-distribution code, which is correct. The pub is not calculating the odds correctly because they are using the fixed sigma in their pdf generation.

Let's plot now for a large number nX and see the difference between our results:

In [None]:
X = np.random.normal(loc=true_mu, scale=true_sigma, size=50) # nX = 50
PrA_pub = probdist_advanced_pub(X, gridcols)
posts, marginals = probdist_advanced(X, gridrows, gridcols)

f, (ax1, ax2) = plt.subplots(2,1, sharey=True)  # figure consists of 2 graphs, 2 rows x 1 col

ax1.semilogy(gridcols, PrA_pub, label="pub's t-distribution: advanced (sigma unknown)")
ax1.xaxis.set_ticks(gridcols)
ax1.set(xlabel='$\mu$', ylabel='$P(\mu)$')
ax1.legend(loc="best")

ax2.semilogy(gridcols, marginals, label="our estimate: advanced (sigma unknown)")
ax2.xaxis.set_ticks(gridcols)
ax2.set(xlabel='$\mu$', ylabel='$P(\mu)$')
ax2.legend(loc="best")

plt.show()

For thislarger sample size the two graphs look more similar, because the std dev they are using becomes a more reasonable measurement for a larger sample size.