Bayes Factor
======

This notebook will play around with "estimating" Bayes Factor for GMM using BIC. 

We use estimating very loosely as using BIC for BF means using a uniform prior!

Using Formulation from (Wagenmakers 2007 p796): http://www.ejwagenmakers.com/2007/pValueProblems.pdf

$$BF_{01} = \frac{P(D|H_0)}{P(D|H_1)} = \exp\bigg(\frac{\text{BIC}(H_1) - \text{BIC}(H_0)}{2}\bigg) $$

In [1]:
import numpy as np
import itertools

from scipy import linalg
%matplotlib inline

import matplotlib
import numpy as np
import matplotlib.pyplot as plt

from sklearn import mixture


In [2]:
from sklearn import datasets
iris = datasets.load_iris()
X = iris.data  # we only take the first two features.
Y = iris.target

In [3]:
X1 = np.hstack([X.copy(), np.atleast_2d(X[:, 1]+1).T])

In [14]:
def best_bic(X, gmm_config={'n_components_range': range(1, 7), 
                 'cv_types': ['spherical', 'tied', 'diag', 'full']}):
    # based on code here: 
    # http://scikit-learn.org/stable/auto_examples/mixture/plot_gmm_selection.html
    n_components_range = gmm_config['n_components_range']
    cv_types = gmm_config['cv_types']
    lowest_bic = np.infty
    bic = []
    for cv_type in cv_types:
        for n_components in n_components_range:
            # Fit a Gaussian mixture with EM
            gmm = mixture.GaussianMixture(n_components=n_components,
                                          covariance_type=cv_type)
            gmm.fit(X)
            bic.append(gmm.bic(X))
            if bic[-1] < lowest_bic:
                lowest_bic = bic[-1]
                best_gmm = gmm
    return lowest_bic

In [15]:
def bic_bayesfactor(bic0, bic1):
    # will calculate: P(D|H_0)/P(D|H_1) 
    #
    # from : https://mailman.ucsd.edu/pipermail/ling-r-lang-l/2013-December/000581.html
    # The BIC approximation to the Bayes Factor he advocates for BF_01 is given by exp( (BIC_1 - BIC_0)/2 ) (see page 796).
    # http://www.ejwagenmakers.com/2007/pValueProblems.pdf
    return np.exp((bic1 - bic0)/2)

In [16]:
def approx_bayesfactor(X0, X1, gmm_config={'n_components_range': range(1, 7), 
                 'cv_types': ['spherical', 'tied', 'diag', 'full']}):
    bic0 = best_bic(X0, gmm_config)
    bic1 = best_bic(X1, gmm_config)
    
    return bic_bayesfactor(bic0, bic1)

In [17]:
approx_bayesfactor(X, X1)

0.0

In [18]:
gmm_config={'n_components_range': range(1, 7), 
                 'cv_types': ['spherical', 'tied', 'diag', 'full']}
bic0 = best_bic(X, gmm_config)
bic1 = best_bic(X1, gmm_config)

In [19]:
bic0

575.64056273964275

In [20]:
bic1

-1056.9047604024274