This notebooks tests three different methods to calculate the PTE and number of sigma from a given point and a probability distribution:

- Gaussian approximation
- KDE
- Nearest neighbour

In [1]:
import numpy as np
from gaussian_approximation import get_pte_from_gaussian
from kde_method import get_pte_KDE
from nearest_neighbour_method import get_pte_nearest_neighbour
from stats import *

# One dimensional example

First, we test with a one dimensional Gaussian distribution centered at zero and with standard deviation 1. We use a point x = 2, so we expect to recover a PTE of ~0.05, and $2 \sigma$ tension.

For this example, we use only the Gaussian approximation, which in this case is not an approximation, therefore we should recover exactly the expected result

In [2]:
# Number of Gaussian samples to generate
nsamples = int(1e6)

x0 = 2
mean = 0
cov = 1

In [3]:
pte = get_pte_from_gaussian(x0, mean, cov, nsamples)
nsigma = get_nsigma(pte)

print('PTE =', pte)
print('Number of sigma =', nsigma)

PTE = 0.04566300000000001
Number of sigma = 1.9984951982454928


# Two dimensional example

Having checked that we recover the correct result for a one dimensional case, we move to two dimensions. We take the point $x = (2,2)$ for a Gaussian centered at zero with covariance identity. We want to compare the Gaussian approximation to the KDE and nearest neighbour methods. Therefore, we need a chain, which we generate using PolyChord

First, we repeat the Gaussian approximation, which again will give us the true result

In [4]:
# Number of Gaussian samples to generate
nsamples = int(1e6)

x0 = np.array([1,1])
mean = (0, 0)
cov = [[1, 0], [0, 1]]

In [5]:
pte = get_pte_from_gaussian(x0, mean, cov, nsamples)
nsigma = get_nsigma(pte)

print('PTE =', pte)
print('Number of sigma =', nsigma)

PTE = 0.367818
Number of sigma = 0.9005681036168847


We can now compare with the other two methods, using the PolyChord chain

In [6]:
path_to_chains = 'chains/gaussian_2d'

In [7]:
# Bandwidth picked from a grid search
pte = get_pte_KDE(x0, path_to_chains, bandwidth = 2, rtol = 1e-8)
nsigma = get_nsigma(pte)

print('PTE =', pte)
print('Number of sigma =', nsigma)

PTE = 0.3506718263062343
Number of sigma = 0.9332869595627669


In [8]:
pte = get_pte_nearest_neighbour(x0, path_to_chains)
nsigma = get_nsigma(pte)

print('PTE =', pte)
print('Number of sigma =', nsigma)

PTE = 0.358960909692977
Number of sigma = 0.9173468593387709


The KDE method works pretty well, but it depends heavily on the bandwidth, and I am not sure which one is better. 

The nearest neighbour method is very quick, and gives a decent result, but will quick degrade with dimensionality