In [1]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt

# Understanding large odds/Bayes factors

In a recent [paper](https://arxiv.org/abs/1610.03508) we found an odds between two models of
$$ \mathcal{O} = \frac{P(\textrm{model 1})}{P(\textrm{model 2})} \approx 10^{72} $$
in this post I will discuss how such large numbers are produced.

Firstly, the odds are equivalent to the Bayes factor, provided we set our prior odds to unity (i.e. no preference between the two models). As such, the thing which is actually calculated is

$$
\mathcal{O} = \frac{P(\textrm{model 1}|\textrm{ data})}{P(\textrm{model 2}|\textrm{ data})}
$$

## Two Gaussian models

Let's take a simple example where we say that model 1 is a Gaussian distribution with mean $\mu_1$ and standard-deviation $\sigma$ while model 2 is also a Gaussian with a different mean $\mu_2$, but the same standard deviation
$\sigma$. This describes any model where the likelihood and prior are both normal, since the normal distribution is self-conjugate. Then, we specify the data to be $N$ observations $x_{i}$ where $i \in [1, N]$.

The log-odds ratio then between these two models can be written as
$$
\log \mathcal{O} = \frac{1}{2\sigma^{2}}\sum_{i=1}^{N}
\left[(x_i - \mu_2)^{2} - (x_i - \mu_1)^2\right].
$$
Defining $\langle . \rangle$ as the mean then we have
$$
\log \mathcal{O} = \frac{N}{2\sigma^{2}}
\left[\langle(x_i - \mu_2)^{2}\rangle - \langle(x_i - \mu_1)^2\rangle\right].
$$
The quantity in the square brackets is the difference between the mean-square values, let us define
$$
\Delta = \left[\langle(x_i - \mu_2)^{2}\rangle - \langle(x_i - \mu_1)^2\rangle\right],
$$
such that
$$
\log_{10}\mathcal{O} = \frac{N\Delta}{2\sigma^{2}\log(10)}
$$

### Fixed observations

Let us now consider that all $N$ $x_i$ values are exactly the same value, $x_0$. This is obviously unlike real data, but allows us to gain some simple intuition. In such a case,
$$
\Delta = (x_0 - \mu_2)^{2} - (x_0 - \mu_1)^2
$$

Furthermore, we can then set $\mu_1=0$ and $\mu_2 = n*sigma$ - i.e. the two distributions are separated by a fixed number of standard deviations. Then, we have that
$$
\Delta = n^{2}\sigma^{2} - 2n\sigma x_0,
$$
such that
$$
\log_{10}\mathcal{O} = \frac{N n}{\log_{10}}\left(\frac{n}{2} - \frac{x_0}{\sigma}\right).
$$
As a sanity check, one sees that when $x_0$ is in the middle of the two models, $x_0 = n\sigma/2$, the odds are unity. Moreover, if we define $x_0 = n\sigma/2 - k \sigma$ i.e. the number of standard-deviations from the middle of the two models, then
$$
\log_{10}\mathcal{O} = \frac{N n}{\log_{10}}k.
$$


This equation gives us an ability to reflect on how large odds-ratios can be easily calculated. If the observations are not found at the center between the two models (i.e. $k\gtrsim1$ such that they favour one model of the other). Then provided the two models are distinctions (i.e. $n \gtrsim 1$) then the odds-ratio scales with the number of observations. So for $N\sim 100$ observations one can easily find odds ratios of order $10^{100}$!




