Bayes Factor
======

This notebook will play around with "estimating" Bayes Factor for GMM using BIC. 

We use estimating very loosely as using BIC for BF means using a uniform prior!

Using Formulation from (Wagenmakers 2007 p796): http://www.ejwagenmakers.com/2007/pValueProblems.pdf

$$BF_{01} = \frac{P(D|H_0)}{P(D|H_1)} = \exp\bigg(\frac{\text{BIC}(H_1) - \text{BIC}(H_0)}{2}\bigg) $$

In [1]:
import numpy as np
import itertools

from scipy import linalg
%matplotlib inline

import matplotlib
import numpy as np
import matplotlib.pyplot as plt

from sklearn import mixture


In [2]:
from sklearn import datasets
iris = datasets.load_iris()
X = iris.data  # we only take the first two features.
Y = iris.target

In [3]:
X1 = np.hstack([X.copy(), np.atleast_2d(X[:, 1]+1).T])

In [20]:
def lik_mc(X, gmm_config={'n_components_range': range(1, 7), 
                 'cv_types': ['spherical', 'tied', 'diag', 'full']}):
    # based on code here: 
    # http://scikit-learn.org/stable/auto_examples/mixture/plot_gmm_selection.html
    
    search_range = max(int(np.sqrt(X.shape[1])), 4)
    n_components_range = range(1, search_range)
    n_components_range = gmm_config['n_components_range']
    
    
    cv_types = gmm_config['cv_types']
    lowest_bic = np.infty
    lik = []
    
    
    for cv_type in cv_types:
        for n_components in n_components_range:
            # Fit a Gaussian mixture with EM
            gmm = mixture.GaussianMixture(n_components=n_components,
                                          covariance_type=cv_type)
            gmm.fit(X)
            # score gives the average log likelihood
            likelihood = np.exp(gmm.score(X))
            lik.append(likelihood)
    return np.mean(lik)

In [21]:
def lik_bayesfactor(lik0, lik1):
    # will calculate: P(D|H_0)/P(D|H_1) 
    #
    # from : https://mailman.ucsd.edu/pipermail/ling-r-lang-l/2013-December/000581.html
    # The BIC approximation to the Bayes Factor he advocates for BF_01 is given by exp( (BIC_1 - BIC_0)/2 ) (see page 796).
    # http://www.ejwagenmakers.com/2007/pValueProblems.pdf
    return lik0/lik1

In [22]:
def approx_bayesfactor(X0, X1, gmm_config={'n_components_range': range(1, 7), 
                 'cv_types': ['spherical', 'tied', 'diag', 'full']}):
    lik0 = lik_mc(X0, gmm_config)
    lik1 = lik_mc(X1, gmm_config)
    
    return lik_bayesfactor(lik0, lik1)

In [29]:
approx_bayesfactor(X, X1)

0.0051804078848482274

In [24]:
gmm_config={'n_components_range': range(1, 7), 
                 'cv_types': ['spherical', 'tied', 'diag', 'full']}
bic0 = lik_mc(X, gmm_config)
bic1 = lik_mc(X1, gmm_config)

In [25]:
bic0

0.16999715563203691

In [26]:
bic1

31.890812446795668