# Intro 
This is practice for normalization flow.  All the resource comes from  https://gebob19.github.io/normalizing-flows/

### Part1 R-NVP Flows

We consider a single R-NVP function $f: \mathbb{R}^{d} \rightarrow \mathbb{R}^{d}$, with input $\mathbf{x} \in \mathbb{R}^{d}$ and output $\mathbf{z} \in \mathbb{R}^{d}$.

To quickly recap, in order to optimize our function $f$ to model our data distribution $p_{X}$, we want to know the forward pass $f$, and the Jacobian $\left|\operatorname{det}\left(\frac{d f}{d x}\right)\right|$.

We then will want to know the inverse of our function $f^{-1}$ so we can transform a sampled latent-value $z \sim p_{Z}$ to our data distribution $p_{X}$, generating new samples!

### Key Equations
$$
\log \left(p_{X}(x)\right)=\log \left(\left|\operatorname{det}\left(\frac{d f}{d x}\right)\right|\right)+\log \left(p_{Z}(f(x))\right)
$$
Using the handy dandy chain rule, we can get
$$
\log \left(p_{X}(x)\right)=\log \left(\prod_{i=1}^{n}\left|\operatorname{det}\left(\frac{d z_{i}}{d z_{i-1}}\right)\right|\right)+\log \left(p_{Z}(f(x))\right)
$$
Where $x \triangleq z_{0}$ for conciseness.
$$
\log \left(p_{X}(x)\right)=\sum_{i=1}^{n} \log \left(\left|\operatorname{det}\left(\frac{d z_{i}}{d z_{i-1}}\right)\right|\right)+\log \left(p_{Z}(f(x))\right)
$$

#### 1.1 Forward Pass
first split x(1...d) into 2 pars i.e. x(1...k),x(k+1...d)

R-NVPs forward pass is then the following
$$
\begin{array}{c}
\mathbf{z}_{1: k}=\mathbf{x}_{1: k} \\
\mathbf{z}_{k+1: d}=\mathbf{x}_{k+1: d} \odot \exp \left(\sigma\left(\mathbf{x}_{1: k}\right)\right)+\mu\left(\mathbf{x}_{1: k}\right)
\end{array}
$$

Where $\sigma, \mu: \mathbb{R}^{k} \rightarrow \mathbb{R}^{d-k}$ and are any arbitrary functions. Hence, we will choose $\sigma$ and $\mu$ to both be deep neural networks. 

In [1]:
import torch

In [2]:
def forward(self,x):
    x1,x2=x[:,:self.k],x[:,self.k:]
    
    sig=self.sig_net(x1)
    mu=self.mu_net(x1)
    
    z1=x1
    z2=x2*torch.exp(sig)+mu
    z=torch.cat([z1,z2],dim=-1)
    
    # log(p_Z(f(x)))
    log_pz=self.p_Z.log_prob(z)
    
    #...

#### 1.2 Log Jacobian
The Jacobian matrix $\frac{d f}{d \mathbf{x}}$ of this function will be
$$
\left[\begin{array}{cc}
I_{d} & 0 \\
\frac{d z_{k+1: d}}{d \mathbf{x}_{1: k}} & \operatorname{diag}\left(\exp \left[\sigma\left(\mathbf{x}_{1: k}\right)\right]\right)
\end{array}\right]
$$
The log determinant of such a Jacobian Matrix will be
$$
\begin{array}{c}
\log \left(\operatorname{det}\left(\frac{d f}{d \mathbf{x}}\right)\right)=\log \left(\prod_{i=1}^{d-k}\left|\exp \left[\sigma_{i}\left(\mathbf{x}_{1: k}\right)\right]\right|\right) \\
\log \left(\left|\operatorname{det}\left(\frac{d f}{d \mathbf{x}}\right)\right|\right)=\sum_{i=1}^{d-k} \log \left(\exp \left[\sigma_{i}\left(\mathbf{x}_{1: k}\right)\right]\right) \\
\log \left(\left|\operatorname{det}\left(\frac{d f}{d \mathbf{x}}\right)\right|\right)=\sum_{i=1}^{d-k} \sigma_{i}\left(\mathbf{x}_{1: k}\right)
\end{array}
$$

In [3]:
# single R-NVP calculation
def forward(x): 
  #...
  log_jacob = sig.sum(-1)
  #...
  
  return z, log_pz, log_jacob

# multiple sequential R-NVP calculation
def forward(self, x):
  log_jacobs = []
  z = x
  
  for rvnp in self.rvnps:
      z, log_pz, log_j = rvnp(z)
      log_jacobs.append(log_j)

  return z, log_pz, sum(log_jacobs)


#### 1.3 Inverse
$$
f^{-1}(\mathbf{z})=\mathbf{x}
$$
One of the benefits of R-NVPs compared to other flows is the ease of inverting $f$ into $f^{-1}$, which we formulate below:
$$\mathbf{x}_{1: k}=\mathbf{z}_{1: k}$$
$\mathbf{x}_{k+1: d}=\left(\mathbf{z}_{k+1: d}-\mu\left(\mathbf{x}_{1: k}\right)\right) \odot \exp \left(-\sigma\left(\mathbf{x}_{1: k}\right)\right)$
$\Leftrightarrow \mathbf{x}_{k+1: d}=\left(\mathbf{z}_{k+1: d}-\mu\left(\mathbf{z}_{1: k}\right)\right) \odot \exp \left(-\sigma\left(\mathbf{z}_{1: k}\right)\right)$

In [4]:
def inverse(self, z):
  z1, z2 = z[:, :self.k], z[:, self.k:] 
  
  sig = self.sig_net(z1)
  mu = self.mu_net(z1)
  
  x1 = z1
  x2 = (z2 - mu) * torch.exp(-sig)
  
  x = torch.cat([x1, x2], dim=-1)
  return x

#### 1.4 Optimization

In [6]:
for _ in range(epochs):
  optim.zero_grad()
  
  # forward pass
  X = get_batch(data)
  z, log_pz, log_jacob = model(X)
  
  # maximize p_X(x) == minimize -p_X(x)
  loss = -(log_jacob + log_pz).mean()
  losses.append(loss)

  # backpropigate loss
  loss.backward()
  optim.step()

NameError: name 'optim' is not defined

#### 1.5 Generate data from model

In [None]:
# p_Z - gaussian
mu, cov = torch.zeros(2), torch.eye(2)
p_Z = MultivariateNormal(mu, cov)

# sample 3000 points (z ~ p_Z)
z = p_Z.rsample(sample_shape=(3000,))

# invert f^-1(z) = x
x_gen = model.inverse(z)