# Numerical Langevin
### By Carlos A.C.C. Perello

## Preliminaries
The only preliminaries are importing the relevant packages, i.e. the `LinearAlgebra.jl` package and making a function that creates a square grid centered about $(0,0)$ of side length $2x$ where the points are evenly spaced by $h$. We do this by using the `GridInterpolations.jl` package which will help us interpolate the Langevin map over a given grid.

In [1]:
using GridInterpolations, LinearAlgebra

In [2]:
function SquareGrid(x, space=0.5)
    RectangleGrid(-x:space:x, -x:space:x)
end

function UsableGrid(grid::RectangleGrid)
    [i for i in grid]
end

UsableGrid (generic function with 1 method)

### Test grid

In [3]:
sqr_grid = SquareGrid(5);
grid = UsableGrid(sqr_grid)

441-element Vector{Vector{Float64}}:
 [-5.0, -5.0]
 [-4.5, -5.0]
 [-4.0, -5.0]
 [-3.5, -5.0]
 [-3.0, -5.0]
 [-2.5, -5.0]
 [-2.0, -5.0]
 [-1.5, -5.0]
 [-1.0, -5.0]
 [-0.5, -5.0]
 [0.0, -5.0]
 [0.5, -5.0]
 [1.0, -5.0]
 ⋮
 [-0.5, 5.0]
 [0.0, 5.0]
 [0.5, 5.0]
 [1.0, 5.0]
 [1.5, 5.0]
 [2.0, 5.0]
 [2.5, 5.0]
 [3.0, 5.0]
 [3.5, 5.0]
 [4.0, 5.0]
 [4.5, 5.0]
 [5.0, 5.0]

## Computing the derivative of the Brenier map

We are interested in transporting the following distributions:

$$
\mathcal{N}(m_1,\Sigma_1)\to\mathcal{N}(m_2,\Sigma_2)
$$

Where $\mathcal{N}(\mathbf{\mu}, \Sigma)$ is a multivariate normal with mean $\mathbf{\mu}\in \mathbb{R}^n$ and covariance matrix $\Sigma\in\mathbb{R}^{n\times n}$.

As we are interested in transporting a Gaussian distribution into another Gaussian distribution, the problem of computing the Brenier map between these 2 distributions simplifies to computing the map [1]:

$$
T_{\text{Brenier}}:x\mapsto m_2 + A(x-m_1) \quad \text{with } A = \Sigma_1^{-1/2}(\Sigma_1^{1/2}\Sigma_2\Sigma_1^{1/2})^{1/2}\Sigma_1^{-1/2}
$$

Therefore, the derivative of the Brenier map, $DT_{\text{Brenier}} = A$, which is easy to compute and done below. We also want to compute the singular values of $DT_{\text{Brenier}}$ as they can be used to bound its operator norm.

In [90]:
function OT_derivative(Σ₁, Σ₂)
    #=
    Returns a tuple consisting of 
    the derivative of the Bernier map in the first entry
    and a vector containing the singular vals
    of the derivative of the Bernier map in the second entry.
    =#
    A = inv(√(Σ₁))*√(√(Σ₁)*Σ₂*√(Σ₁))*inv(√(Σ₁))
    A, svdvals(A)
end

OT_derivative (generic function with 1 method)

### Testing the computation of the derivative of Brenier map

In [91]:
n = 2 # Dimension of multivariate Gaussian
X₁ = rand(n, n)
X₂ = rand(n, n) # Generate 2 random n×n matrices
C₁ = X₁'*X₁
C₂ = X₂'*X₂ # Use the random matrices to generate 2 random 
# n×n pos. def. matrices, as covariance matrices are pos. def.

OT_map = OT_derivative(C₁, C₂)[1]

2×2 Matrix{Float64}:
 1.10613    0.0904174
 0.0904174  1.28892

## Computing the Langevin map
We want to numerically compute the Langevin map between two Gaussians. To do so, we first need to introduce $P_t^A(f) $ and $B_t$. We define $P_t^A(f)$ as [2]:
$$
P_t^A(f) = \int_{\mathbb{R^n}}f\left(\exp(-tA)x + \sqrt{\text{Id}-\exp(-2tA)} y\right)d\mu(y)
$$

In the case of transport between 2 gaussians, $B_t$ is then defined as [2]:
$$
P_t^A\left(c_o\exp\left(-\frac{1}{2}y^TBy\right)\right)(x) = c_t\exp\left(-\frac{1}{2}x^T B_t x\right)
$$
And $B_t$ can be computed by explicitly computing and rearranging the LHS of the equation above [2].
We introduce the notation $A=\Sigma_1$, $B=\Sigma_2, C = e^{-tA}, D=\sqrt{I-e^{-2ta}}$ and $W = \sqrt{D^TBD + A}$ to obtain that $B_t$ is:
$$
B_t = C^TBC - C^TBDW^{-1/2}(W^{-1/2})^TD^TB^TC
$$
and $c_t$ is given by:
$$
c_t = \frac{cc_0\sqrt{\pi}}{2\left|\det(\sqrt{W})\right|}
$$

We can implement $B_t$ as a function, done below.

In [92]:
function Bₜ(Σ₁, Σ₂, t)
    C = exp(-t.*Σ₁)
    D = √(I-exp(-(2*t).*Σ₁))
    W_root = √(D'*Σ₂*D + Σ₁)
    prod = C'*Σ₂*D*inv(W_root)
    ret = C'*Σ₂*C - prod*prod'
    ret
end

Bₜ (generic function with 1 method)

$B_t$ should satisfy $B_0 = B = \Sigma_2$ [2] which we can use as a sanity check. Additionally, tending $t\to\infty$ in the expression for $P^t_A(f)$ with our choice of $f$ yields that $B_\infty = \lim\limits_{t\to\infty}B_t = \log(\text{Id}) = \mathbf{0}_{2\times 2}$

### Testing $B_t$
We reuse the random positive definite matrices generated above:

In [99]:
Bₜ(C₁, C₂, 0) == C₂ && Bₜ(C₁, C₂, 10000000) == zeros(2, 2)

true

Once we have $B_t$, the Langevin map is given by solving the following PDE:
$$
\frac{\partial S_t(x)}{\partial t} = B_t S_t(x)
$$

Fixing $x$, this becomes:

$$
\frac{d S_t(x)}{dt} = B_t S_t(x)
$$

Which we can solve using a basic ODE solver, such as the Forward Euler method [3].

## Solving for the Langevin map using Forward Euler
We now fix $x$ and solve for $S_t(x)$. As we fix $x$, it will be easier to write $S_t(x) = S_x(t)$. We first discretise the time dimension into $n$ equal sized time steps between 0 and $T$.

In [9]:
T = 10
n = 10_000
t = range(0, T, length=n)
h = step(t)

0.001000100010001

Now, we use the Forward Euler method, given by the following equation, to solve for $S_x(t)$:
$$
S_x(0) = x,\, \text{ as } t_1 = 0\\
S_x(t_{k+1}) = \left(\text{Id} + hB_{t_k}\right)S_x(t_{k}),\, 2\leq k\leq n
$$
This is implemented below as a function that takes in $x$ as its initial condition and outputs the time evolution of the Langevin map for the discrete time steps betwen $t=0$ and $t=T$.

In [10]:
function LT_map(Σ₁, Σ₂, n, T, x, full=false)
    # full=True returns array of time evolution, if false just returns value at end T
    t = range(0, T, length=n)
    h = step(t)
    if full
        Sₜx = zeros(2, n)
        Sₜx[:, 1] = x
        for k = 1:n-1
            Sₜx[:, k+1] = (I + h.*Bₜ(Σ₁, Σ₂, t[k]))*Sₜx[:, k]
        end
    else
        Sₜx = x
        for k = 1:n-1
            Sₜx = (I + h.*Bₜ(Σ₁, Σ₂, t[k]))*Sₜx
        end
    end
    Sₜx
end

LT_map (generic function with 2 methods)

We can also solve for the derivative, $DS$, by using solving the following differential equation:
$$
\frac{dDS_x(t)}{dt} = B_tDS_t(x);\; DS_0 = \text{Id}
$$

We can do this by extending Forward Euler to solve matrix ODEs; we have:
$$
DS_x(0) = \text{Id},\, \text{ as } t_1 = 0\\
DS_x(t_{k+1}) = \left(\text{Id} + hB_{t_k}\right)DS_x(t_{k}),\, 2\leq k\leq n
$$

This is implemented below. This is a more elegant approach as our solution does not depend on $x$ and therefore we do not need to interpolate off a grid.

In [124]:
function LT_derivative(Σ₁, Σ₂, n, T)
    t = range(0, T, length=n)
    h = step(t)
    S∞ = [1 0; 0 1]
    for k = 1:n-1
        S∞ = (I + h.*Bₜ(Σ₁, Σ₂, t[k]))*S∞
    end
    S∞, svdvals(S∞)
end

LT_derivative(C₁, C₂, 10000, 1000)[1]

2×2 Matrix{Float64}:
 1.57361  0.073648
 0.14508  1.72818

In [137]:
gen_helper = randn(2,2);

In [148]:
check_1 = [1 0 ; 0 1]
check_2 = gen_helper'*gen_helper
LT_derivative(check_1, check_2, 100000, 100)[1]


2×2 Matrix{Float64}:
  1.14269   -0.308269
 -0.308269   1.71125

In [152]:
norm(LT_derivative(check_1, check_2, 1000000, 10000)[1] - OT_derivative(check_1, check_2)[1]) 

0.93721003963539

It looks like we cannot choose $n<T$ as if not $DS_\infty$ blows up. I think this has to do with using Forward Euler and the pointwise multiplication of $||h\odot DS_t||_{\text{operator}} \to \infty$ as in this case $h > 1$. But the code doesn't scale well with $n$ (why? should be $O(n)$, maybe computing Bt is slowing it down?) so it's hard to compute an accurate result of $DS_\infty$ to compare to $DT$, but at low values (such as the one above) it looks like $DS_t \to DT$

## TODO


Fix this

## Langevin map interpolation

We first redefine our grid so we can recall over which values we are evaluating the Langevin map over, and broadcast the function over the square grid created in the preliminaries to get the values of the Langevin map for all points in our grid:

In [103]:
LT_map_custom = x -> LT_map(C₁, C₂, 1000, 10, x)
A = LT_map_custom.(grid);
A_interpolated = x -> interpolate(sqr_grid, A, x)

#12 (generic function with 1 method)

In [104]:
interpolants(sqr_grid, [0,1])

([263], [1.0])

In [105]:
interpolate(sqr_grid, A, [0,1])

LoadError: DimensionMismatch("x and y are of different lengths!")

In [106]:
size(sqr_grid)
size(grid)

(441,)

In [107]:
A_interpolated([0, 0])

LoadError: DimensionMismatch("x and y are of different lengths!")

## TODO

Grid interpolation

# References
[1] - https://djalil.chafai.net/blog/2010/04/30/wasserstein-distance-between-two-gaussians/

[2] - Anastasiya Tanana (2020). Comparison of transport map generated by heat flow interpolation and the optimal transport Brenier map. Communications in Contemporary Mathematics, 23(06), 2050025.

[3] - https://github.com/Imperial-MATH50003/MATH50003NumericalAnalysis/blob/main/notes/MATH50003_numerical_analysis_lecture_notes.pdf