<center>
    <h1>Adversarial Losses of GANs</h1>
</center>

|    Anish Shah    |    Deshana Desai    |Benjamin Ahlbrand|
|:----------------:|:-------------------:|:---------------:|
|shah.anish@nyu.edu|deshana.desai@nyu.edu| ba1404@nyu.edu  |

<center> 
    <h4>Abstract</h4>
We study a large class of loss functions that have been used for training Generative Adversarial Networks (GANs) called “adversarial losses”. Besides vanilla GAN, we review the loss functions in $f$-GAN, Maximum Mean Discrepancy (MMD) GAN, Wasserstein GAN and Energy-based GAN. We discuss relevant statistical properties of these distance measures that affect their behaviour and how they are employed in GANs. Further, We perform experiments and create simple visualizations to demonstrate relationships of how these distance measures affect the network's ability to cover all modes / generate better samples by covering fewer modes, lead to vanishing gradients, produce disentangled latent spaces or the variance of the cost values as a function of discriminator outputs. We also review the effectiveness of the distance measures in producing samples using metrics such as visual quality, smooth interpolations, inception score on the LSUN dataset. We perform some of these experiments on smaller synthetic datasets due to hardware and computational time bottlenecks. A natural extension of our study in the measurement of the distance between the distributions of generator model and training data, and separately the distributions of the discriminator model versus the training data distribution, is provided by optimal transport theory (OT). Recently, GANs have been used in conjunction with techniques from OT by framing the problem as one of minimization of the transportation cost of moving one data distribution to another. We review and include some of these techniques in our discussion of distance measures.
</center>

## 1. Introduction

## 2. List of Notations

<table>
    <tr>
        <td>$D$</td>
        <td>The discriminator</td>
    </tr>
    <tr>
        <td>$\omega$</td>
        <td>parameter for our discriminator</td>
    </tr>
    <tr>
        <td>$f$</td>
        <td>A convex, lower-semicontinuous function satisfying $f(1) = 0$</td>
    </tr>
    <tr>
        <td>$f^{*}$</td>
        <td>Fenchel conjugate of $f$</td>
    </tr>
    <tr>
        <td>$V$</td>
        <td>$\mathcal{X} \mapsto \mathbb{R}$, output of discriminator without the activation function</td>
    </tr>
    <tr>
        <td>$g_f$</td>
        <td>$\mathbb{R} \mapsto \text{dom}_{f^{*}}$, output activation function which respects the domain $\text{dom}_{f^{*}}$</td>
    </tr>
    <tr>
        <td>$G$</td>
        <td>The generator</td>
    </tr>
    <tr>
        <td>$P$</td>
        <td>True or Target distribution</td>
    </tr>
    <tr>
        <td>$Q$</td>
        <td>Model or generated distribution</td>
    </tr>
    <tr>
        <td>$p(x)$</td>
        <td>probability density function of $P$</td>
    </tr>
    <tr>
        <td>$q(x)$</td>
        <td>probability density function of $Q$</td>
    </tr>
    <tr>
        <td>$\theta$</td>
        <td>parameter for model distribution or generator</td>
    </tr>
    <tr>
        <td>$z$</td>
        <td>The latent vector</td>
    </tr>
    <tr>
        <td>$\mathcal{Z}$</td>
        <td>The latent space</td>
    </tr>
    <tr>
        <td>$\mathcal{X}$</td>
        <td>The samples space</td>
    </tr>
    <tr>
        <td>$L$</td>
        <td>loss function of GAN </td>
    </tr>
</table>

## 3. Statistical Divergence Measures

A divergence measure is defined as a function which establishes the similarity between two probability distributions. The divergence need not be symmetric (that is, in general the divergence from $p$ to $q$ is not equal to the divergence from $q$ to $p$), and need not satisfy the triangle inequality \cite{wiki:xxx}.

### f-divergence

In statistics and probability theory, an $f$-divergence is a function $D_{f}\left( P \parallel Q \right)$ that measures the difference between two probability distributions $P$ and $Q$ \cite{csiszar2004information, liese2006divergences}. If $P$ and $Q$ are absolutely continuous distributions with respect to a reference $dx$ on $\mathcal{X}$ and $p$ and $q$ are its probability density function respectively, then we define the $f$-divergence,

\begin{align} \label{eq:fdiv}
    D_f(P \parallel Q) = \int_{\mathcal{X}} q(x) f \left( \frac{p(x)}{q(x)} \right) dx
\end{align}
    

where the \textit{generator function} $f: \mathbb{R}_{+} \mapsto \mathbb{R}$ is a convex, lower-semicontinuous function satisfying
$f(1) = 0$. Every convex, lower-semicontinuous function $f$ has a \textit{convex conjugate} function $f^{*}$ known as \textit{Fenchel conjugate} \cite{hiriart2012fundamentals}. The function is defined as  $f^{*}(t) = \sup\limits_{u \in \text{dom}_{f}} \{ut -  f(u)\}$,

Using Fenchel Conjugate in (\ref{eq:fdiv}),

\begin{align*}
    D_f(P \parallel Q) &= \int_{\mathcal{X}} q(x) \sup\limits_{t \in \text{dom}_{f^{*}} } \left\{ t \frac{p(x)}{q(x)} - f^{*}(t) \right\}  dx 
    \intertext{By Jensen Inequality,}
    &\geq \sup\limits_{T \in \mathcal{T}} \left( \int_{\mathcal{X}}p(x)T(x)dx - \int_{\mathcal{X}}q(x)f^{*}(T(x)) dx \right) \\
    &= \sup\limits_{T \in \mathcal{T}} \left( \mathbb{E}_{x \sim P} \left[T(X)\right] - \mathbb{E}_{x \sim Q} \left[f^{*}(T(X))\right] \right)
\end{align*}

where $\mathcal{T}$ is an arbitrary class of function $T : \mathcal{X} \mapsto \mathbb{R}$.
The lower bound is tight for $T^{*}(x) = f^{'} \left( \frac{p(x)}{q(x)} \right)$ \cite{nguyen2010estimating} where $f'$ is the first order derivative of $f$.

![KL](output_KL.gif)

## 4. Generative Adversarial Networks

## 5. Adversarial Losses

## 6. Experiments

## 7. Evaluations