# Sampling Energy Models

**Author**: Chris Oswald

**Course**: CS676/ECE689 Advanced Topics in Deep Learning (Spring 2024)

## Question 2: Sampling from energy models with Langevin dynamics and stein scores

Energy based models learn an energy functional $E_{\theta}:\mathcal{X}\rightarrow \mathbb{R}$. We look at the Gibbs distribution as follows:

$p_{\theta}(x) = \frac{1}{Z_{\theta}}e^{-E_{\theta}(x)}$, where $Z_{\theta} = \int_{\mathcal{X}}e^{-E_{\theta}(y)}dy$.

Directly sampling from $p_{\theta}$ is hard, but we can approximate samples using a Markov chain with stationary distribution $p_{\theta}$, spscifically, we have the discretized Langevin dynamics:

<!-- '''$\frac{d x_t}{dt} = \nabla_x\log p_{\theta}(x_t)dt+\sqrt{2}dW_t,$ -->

<!-- where $dW_t$ is a white noise process, given by the Brownian motion $W_t$. -->

<!-- (Diffusion following these dynamics converges asymptotically to samples $x_t\sim p_{\theta}$, in the sense that $D(x_t\|p_{\theta})→0$ as $t→∞$.) -->

<!-- Discretizing the Langevin dynamics, we have -->

$x_{t+1} = x_t-\eta\nabla_x\log p_{\theta}(x_t)+\sqrt{2\eta}\epsilon_t,$

where $\epsilon_t\sim\mathcal{N}(0,I)$, $\eta$ is the step size.

We consider a 2D case, where $x\in\mathbb{R}^2$. Say $E_{\theta}(x) = \theta\cdot x$, where $\theta\in\mathbb{R}^{2}$ is a vector and has all the parameters.

Calculate the expression for the distribution $x_N$, where $x_0\sim \mathcal{N}(0,I)$, and $N$ is the number of steps, in terms of $\eta, \theta, N$.

(You can implement and see if your computational results match your analytical results. A helpful website: https://courses.cs.washington.edu/courses/cse599i/20au/resources/L16_ebm.pdf)

## My Solution

Consider the first step $\mathbf{x_1}$:

<br>

$$
\begin{align}
    \mathbf{x_1}
    &=
    \mathbf{x_0} - \eta \nabla_x \log(\frac{1}{Z_\theta} e^{-(\mathbf{\theta} \cdot \mathbf{x_0})} ) + \sqrt{2\eta} \epsilon_{t_0} \\
    &=
    \mathbf{x_0} - \eta \nabla_x
    \left[\log(e^{-(\mathbf{\theta} \cdot \mathbf{x_0})} )
    - \log(Z_\theta)\right]
    + \sqrt{2\eta} \epsilon_{t_0} \\
    &=
    \mathbf{x_0} - \eta \nabla_x
    \left[-(\mathbf{\theta} \cdot \mathbf{x_0})
    - \log(Z_\theta)\right]
    + \sqrt{2\eta} \epsilon_{t_0} \\
    &=
    \mathbf{x_0} + \eta \nabla_x
    \left[(\mathbf{\theta} \cdot \mathbf{x_0})
    \right]
    + \sqrt{2\eta} \epsilon_{t_0} \\
    &=
    \mathbf{x_0} + \eta \mathbf{\theta}
    + \sqrt{2\eta} \epsilon_{t_0} \\
\end{align}
$$





Thus, for any step $\mathbf{x_N}$ where $N>1$, we have:

$$
\begin{align}
    \mathbf{x_N}
    &=
    \mathbf{x_{(N-1)}} + \eta \mathbf{\theta}
    + \sqrt{2\eta} \epsilon_{t_{(N-1)}} \\
    &=
    \left[\mathbf{x_{(N-2)}} + \eta \mathbf{\theta}
    + \sqrt{2\eta} \epsilon_{t_{(N-2)}} \right]
    + \eta \mathbf{\theta}
    + \sqrt{2\eta} \epsilon_{t_{(N-1)}}\\
    &
    \vdots \\
    &=
    \mathbf{x_{0}} + N *\eta \mathbf{\theta}
    + \sqrt{2\eta} \sum_{i=0}^{N-1}\epsilon_{t_{i}} \\
\end{align}
$$

where

$$
\mathbf{\epsilon_{t_{i}}} \thicksim \mathcal{N}\big(0, I), \quad i=0, \dots, N-1
$$

and

$$
\mathbf{x_0} \thicksim \mathcal{N}\big(0, I)
$$

Since the sum of (independent) standard normal random variables is also normal, we have

$$
\begin{align}
    \mathbf{x_N}
    &=
    N \eta \mathbf{\theta}
    + \sqrt{2\eta} \mathcal{E} \\
\end{align}
$$

where

$$
\mathbf{\mathcal{E}} \thicksim \mathcal{N}\big(0, (N+1)I)
$$

### Computational Check

In [None]:
import numpy as np

In [None]:
# Define functions
def gen_gaus_noise(ndim, scale=1):
    return np.random.multivariate_normal([0]*ndim, scale * np.eye(ndim))

def calc_analytic_solution(N, theta, ndim, lr):
    return N * lr * theta + np.sqrt(2*lr) * gen_gaus_noise(ndim, scale=(N+1))

def gen_next_sample(x0, theta, ndim, lr):
    return x0 - lr * (-theta) + np.sqrt(2*lr) * gen_gaus_noise(ndim)

In [None]:
# Define parameters
np.random.seed(999)
n_samples = 100000
ndim = 2
lr = 1e-2
theta = np.array([np.random.rand(), np.random.rand()])
print(f'Theta: {theta}')

# Calculate analytical results
analytic_solution = calc_analytic_solution(n_samples, theta, ndim, lr)

# Calculate Markov Chain results
samples = np.zeros((n_samples, ndim))
current_sample = gen_gaus_noise(ndim)
for i in range(n_samples):
    next_sample = gen_next_sample(current_sample, theta, ndim, lr)
    samples[i, :] = next_sample
    current_sample = next_sample

print(f'Analytical solution: {analytic_solution}')
print(f'MC solution: {samples[-1, :]}')

Theta: [0.80342804 0.5275223 ]
Analytical solution: [817.50706469 489.13108999]
MC solution: [808.83338304 495.30474785]


### References:

- https://courses.cs.washington.edu/courses/cse599i/20au/resources/L16_ebm.pdf