<a href="https://colab.research.google.com/github/davidwhogg/FlatPriorFlatLikelihood/blob/main/notebooks/To_Gaby_From_Hogg.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# For Gaby: flat LF, flat priors, and yet peaked posterior

## Authors:
- **David W. Hogg** (NYU) (MPIA)

## Projects:
- Show that even in the simplest situation: Everything linear, everything Gaussian, and flat priors, the Bayesian posterior is peaked when the likelihood function is not.
  - That is, even integrating a flat LF over a flat prior can give you a spurious peak. It's sweet! The fundamental reason is that in more than a few dimensions, any misalignment between the model degeneracy and the box edges will give you structure.

## Bugs / To-do items:
- Show that as the truth (`CENTER`) moves, the peak in the LF moves accordingly.
- Explore the effect of dimensionality of the space.
- Explore the effect of the dimensionality of the degeneracy.
- Show plots that are aligned with the degeneracy.
- Show that the problem is easy to diagnose by changing the limits of the prior.
- Switch to a latin hypercube, maybe?


In [None]:
import numpy as np
import pylab as plt

In [None]:
# hyperparameters / choices
D = 6 # dimensionality of the space
M = 2 # dimensionality of the null space (exact degeneracy space)
K = 12 # number of samplings to do
N = 2 ** (11 + D // 3) # number of samples (number of models to check inside the cube per sampling)
SIGMA = 0.35 # precision in the non-null directions
print(D, M, K, N)

In [None]:
# make some random numbers to define our Universe
rng = np.random.default_rng(17)
CENTER = 10. * rng.uniform(size=D)
vecs = rng.normal(size=(D, D))

In [None]:
# now orthogonalize to make a randomly oriented orthonormal coordinate system
UNITVECS = np.zeros((D, D))
for i in range(D):
    veci = 1. * vecs[i]
    for j in range(i):
        uvecj = UNITVECS[j]
        veci -= (veci @ uvecj) * uvecj
    UNITVECS[i] = veci / np.linalg.norm(veci)

In [None]:
# Make a LLF with an exact M-dimensional degeneracy
def log_likelihood(points):
    """
    ## Bugs:
    - Only can take lists of points, not a single point.
    """
    deltas = (UNITVECS[M:] @ (points.T - CENTER[:, None])).T
    return 10.0 - 0.5 * np.sum(deltas * deltas, axis=1) / (SIGMA * SIGMA)

In [None]:
plot_center = np.round(CENTER)
axis_labels = list("abcdefghkmnpqrstuvwxyzABCDEFGHJKLMNPQRSTUVWXYZ")[-D:]
print(plot_center, axis_labels)

In [None]:
# Make a sampling in a cube
points = 2. * rng.uniform(size=(K * N, D)) - 1. + plot_center[None, :]
llfs = log_likelihood(points)
print(points.shape, llfs.shape)

In [None]:
FIGSIZE = 3. # units?
nx = np.ceil(np.sqrt(D)).astype(int)
ny = np.ceil(D / nx).astype(int)
fig, axs = plt.subplots(ny, nx, figsize=(FIGSIZE * nx, FIGSIZE * ny), sharey=True)
axs = axs.flatten()
for i in range(D):
    ax = axs[i]
    ax.axvline(CENTER[i], lw=1, color="k", alpha=0.5)
    foo, bar = np.histogram(points[:, i], weights=np.exp(llfs[:]), density=True)
    ax.stairs(foo, bar, color="k", alpha=1.0, fill=False, label=f"p({axis_labels[i]}|data)")
    for k in range(K):
        foo, bar = np.histogram(points[k::K, i], weights=np.exp(llfs[k::K]), density=True)
        ax.stairs(foo, bar, color="k", alpha=0.25, fill=False)
    ax.legend(loc=8)
for j in range(i+1, len(axs)):
    axs[j].set_visible(False)

In [None]:
Nplot = 2 ** 11
vmax = np.max(llfs)
vmin = vmax - 5.0
fig, axs = plt.subplots(D-1, D-1, figsize=(FIGSIZE * (D - 1), FIGSIZE * (D - 1)))
print(axs.shape)
for i in range(1, D):
    for j in range(i):
        ax = axs[i - 1, j]
        ax.scatter(points[:Nplot, j], points[:Nplot, i], s=4, c=llfs[:Nplot],
                   vmin=vmin, vmax=vmax, cmap="viridis_r", alpha=0.75)
        if i == (D - 1): ax.set_xlabel(axis_labels[j])
        if j == 0: ax.set_ylabel(axis_labels[i])
    for j in range(i, D - 1):
        axs[i - 1, j].set_visible(False)