In [3]:
import numpy as np

### Image data and discrete and continuous likelihoods

#### Dequantization

Pixels generally take a finite number of brightness values ranging from e.g. $z_i\in[0,255]$. Modeling discretized data using a real-valued distribution $p(\mathbf{x})$ can lead to arbitrarily high density values, by locating narrow high density spike on each of the possible discrete values. In order to avoid this ‘cheating’ solution, one should add noise uniformly distributed between 0 and 1 to the value of each pixel and then divide by 256, making each pixel take a value in the range [0, 1] and the image have a smooth distribution over pixel values [RNADE: The real-valued neural autoregressive density-estimator, 2013].
$$
x_i = \frac{z_i + u}{256}, \quad u\sim\mathcal{U}(0,1)
$$
This preprocessing was used in [NICE: Non-Linear Independent Components Estimation, 2015].

A somewhat common alternative preprocessing is to compute the log-likelihood in "logit-space" by transforming
$$
x_i = \text{logit}\left(\lambda+(1-2\lambda)\frac{z_i}{256}\right)
$$
where $\lambda$ is a a small number a bit larger than the smallest value of $z_i/256$. This preprocessing was used in [Masked Autoregressive Flow for Density Estimation, 2018].

#### Conversion of continuous log-likelihood to continuous log-likelhood

By the change of variables formula for probability density functions, we can compute the probability distribution $p_z(z)$ where $\mathbf{z}=g^{-1}(\mathbf{x})$ and we know $p_x(\mathbf{x})$.
$$
p_\mathbf{z}(\mathbf{z}) = p_\mathbf{x}(g(\mathbf{z})) \left| \frac{d\mathbf{x}}{d\mathbf{z}} \right|
$$
In $D$ dimensions, the derivative corresponds to the Jacobian and then we take the determinant of it. With an element-wise transform as the above, this Jacobian is diagonal.

For the first transformation
$$
J_{g,ii} = \frac{d}{dz_{i}} \left(\frac{z_i + u}{256}\right) = \frac{1}{256}\\
\text{det}\;\mathbf{J} = 256^{-D}
$$
such that 
$$
p_\mathbf{z}(\mathbf{z}) = p_\mathbf{x}(\mathbf{x}) 256^{-D}\\
\log p_\mathbf{z}(\mathbf{z}) = \log p_\mathbf{x}(\mathbf{x}) - D\log(256)
$$

#### Bits per dimension	

$$
\text{nats}/\text{dim} = -\left( \left(\dfrac{\log_e p(x)}{hwc}\right)-\log_e q \right)
$$

where $\log_e p(x)$ is the data log-likelihood in nats, $h, w$ and $c$ are the height, width and depth dimensions of the data (colour image) and $q$ is the number of pixel values allowed in the orignal quantized data before each quantized pixel $p_q$ was transformed by
$$
p_c = \frac{p_q + u}{q}, \quad u \sim \mathcal{U}(0,1)
$$



### MNIST

#### Binarized

In [11]:
D = 28**2
V = 256

In [12]:
log_e_px = -84

In [13]:
log_e_pz = log_e_px - D * np.log(V)
print(log_e_pz)

-4431.419116471977


In [15]:
bpd = - log_2_pz / D
print(bpd)

8.154574468666675


#### Continuous

In [None]:
D = 784
V = 256

In [16]:
log_e_px = 3400

In [None]:
log_e_pz = log_e_px - D * np.log(V)
print(log_e_pz)

In [None]:
log_2_pz = log_e_pz / np.log(2)
print(log_2_pz)

In [None]:
bpd = - log_2_pz / D
print(bpd)