### Discriminative Model 
Class of models used in machine learning for modeling the dependence of an unobserved variable $y$ on an observed variable $x$. Examples: Logistic Regression, SVM, Boosting, Linear Regression, Random Forests, Neural Networks.
- Models the posterior $p(y|x)$ directly or learn a direct map from inputs $x$ to class labels $y$.
- Superior performance for classification and regression tasks where joint distribution is not required.
- Samples can not be generated from the joint distribution of $x$ and $y$
- Inherently supervised and can not be easily extended to unsupervised learning.

### Generative Model 
A class of models used in machine learning for modeling how the data was generated in order to categorize an input. Examples: GMM, HMM, Naive Bayes, LDA, RBM, GAN.
- Learns a model of the joint probability $p(x, y)$ of the inputs $x$ and class label $y$ and make predictions by using **`Bayes`** theorem to calculate $p(y|x)$ and then picking the most likely label $y$. *Model asks question: based on my generation assumptions which category is most likely to generate this input.*
- $p(x, y)$ can be used to generate synthetic data similar to observed data
- More flexible than discriminative model in expressing dependencies in complex learning tasks.

![](images/dng.png)

### Generative Adversarial Nets

**Authors:** Ian Goodfellow et al.

- Proposed Adversarial Network framework where: A generative model is matched against an adversary: a discriminative model that learns whether a sample is from the model distribution or the data distribution.

> Adversarial Process Analogy: **Currency Counterfeiters and Police Competition** - The goal of a team of counterfeiters is to produce fake currency and use it without detection (i.e. fool police). The goal of police is to detect counterfieted currency as best as possible. The competition in this game drives both counterfeiters and police to improve their methods `until counterfit currency is indistinguishable from real money`. This scenario can be modeled as *minimax* game in Game Theory.

- GAN Process
    - Generator network generates sample (pseudo-negative).
    - Discriminator network learns weather a sample is "fake" or "real". Output of discriminator is a scalar which represent the probability of real data. The discriminator is not meant to perform multi-class classification task.
    ![](images/gan.png)
    ![](images/gan-alg.png)
---
- GAN was motivated from an observation that adding small perturbations to an image leads to classification errors that are absurd to humans. 
- GAN uses a generator and a discriminator with the objective of making use of the discriminator to help the generator generate "faithful" samples.

### Deep Dream Idea (Google)

Use gradient of a neural network amplify patterns in the input image.
- Derive the gradient for a given layer in the network with respect to the input image.
- Use the derived gradient to update the input image.
- The addition of gradient to the input image increases the mean value or $\ell^2$ norm of the activation.
![](images/deepdream-idea.png)


- To visualize the kinds of patterns that the network learned to recognize **generate images that maximize the sum of activations of particular channel of a particular convolutional layer of the nerual network**
- Using gradient ascent to optimize an image so it maximizes the $\ell^2$ norm or mean value of the given layer activations.
---
**Google DeepDream Notebook**

*It is a gradient ascent process that tries to maximize the $\ell^2$ norm of activations of a particular deep neural network layer.*

### Introspective Classifier Learning: Empower Generatively
**Authors: ** Long Jin et al.
**[ArXiv Link](https://arxiv.org/pdf/1704.07816.pdf)**
---

**Contributions**
- Proposed Introspective Classifier Learning (ICL) Framework - A single model that is simultaneously discriminative and generative
- Studied how generative aspect of their a benefits its own discriminative training
- Developed an efficient sampling procedure to synthesize new data (from scratch) from a discriminative classifier
- Developed *Reclassification-By-Synthesis* algorithm to iteratively augment negative samples and update the classifier.
- Proposed a formulation to train a *multi-class classifier* on training set and augmented samples.
---

**Improving Classifier Performance**
- Using more data (hard examples) to train the classifier (a common way)
- Bootstrapping, active learning, semi-supervied learning
- Data augmentation
- *Above approaches use data that is already present in training set or created by humans or separate algorithms* - Utilizes positive samples
---

**Generative-Discriminative Modeling Concept** (Tu, 2007)
- A generative model can be successfully modeled by learning a sequence of discriminative classifiers via self-generated samples called *pseudo-negatives*
- **New samples that pass the learned classifier are considered as a new set of pseudo-negatives in the next round**

---

**ICL Advantages compared to other approaches**
- Convolutional - Automatic feature learning (Vs features pre-selected manually)
- More efficient learning process
- More efficient sampling process (Vs time consuming MCMC)
- Simplicity of having a single classifier as opposed having a sequence of boosting classifiers
---

**NOTES:**
- Reference distribution - Generates first batch of pseudo-negatives
- Discriminative classifier is trained to separate the given input data and pseudo-negatives
- Algorithm repeates until pseudo-negatives are no longer distinguishable from the input data

- Discriminative classifier computes the probability of $\mathbf x$ being positive or negative i.e. $p(y|\mathbf x)$
- Probabilities should sum to 1, i.e. $p(y=+1|\mathbf x) + p(y=-1|\mathbf x) = 1$
- Generative model models $p(y, \mathbf x) = p(\mathbf x|y)p(y)$ which captures the underlying generation process of $\mathbf x$ for class $y$
- Binary classification: Positive samples are of primary interest. Using Bayes theorem and assuming equal priors $p(y=+1)=p(y=-1)$
$$p(\mathbf x|y=+1) = \frac{p(y=+1|\mathbf x)}{p(y=-1|\mathbf x)}p(\mathbf x|y=-1)$$

**Input** - Input data has an underlying distribution we wish to learn
![](images/icl-data.png)

Goal is to gradually learn $p_t(\mathbf x|y=-1)$ such that samples drawn from it become indistinguishable from the given positive samples.
$$p_t(\mathbf x|y=-1)\overset{t=\infty}{\rightarrow}p(\mathbf x|y=+1)$$
![](images/icl-goal.png)

---
**Initialization**
![](images/icl-init.png)

---
**Synthesis-Reclassification Loop**

**For $t=0, 1, ..., T$**

- Update **Generative** Model: 
**_NOTE:_ Please ask the authors to clarify the process for updating Generative Model $p_t(\mathbf x|y=-1)$, and how are they calculating $Z_t$**
$$p_t(\mathbf x|y=-1) = \frac{1}{Z_t}\frac{q_t(y=+1|\mathbf x)}{q_t(y=-1|\mathbf x)}p_r(\mathbf x|y=-1)$$
where $Z_t = \int \frac{q_t(y=+1|\mathbf x)}{q_t(y=-1|\mathbf x)}p_r(\mathbf x|y=-1)dx$

    - NOTE: $Z_t$ is Partition Function (also known as Normalizing Constant). Unnormalized probability distribution is guaranteed to be non-negative everywhere, however it is not guaranteed to sum or integrate to 1. To obtain a valid probability distribution the unnormalized probability distribution needs to be normalized: $$p(x) = \frac{1}{Z}\tilde{p(x)}$$
    And $Z$ is integral or sum over all possible joint assignments of the state $x$, i.e. $Z = \int \tilde{p(x)} dx$ and it is often intractable to compute (resort to approximation).
    
- **Synthesis Step**
    - Sample $l$ pseudo-negative samples ![](images/l-samples.png) $\mathbf x_i \sim p_t(\mathbf x|y=-1), i=n+tl+1, ..., n+tl+l$ from the current model $p_t(\mathbf x|y=-1)$ using **Variational Sampling Procedure (Stochastic Gradient on Input)**
    - NOTE: Authors update a random sample $\mathbf x$ drawn from $p_r(\mathbf x|y=-1)$ by increasing $\frac{q_t(y=+1|\mathbf x; W_t)}{q_t(y=-1|\mathbf x; W_t)}$ using backpropagation.
    
- **Augment Pseudo-negative Set**
![](images/pn-samples-t.png)
    - $\mathbf S_{pn}^{t+1} = \mathbf S_{pn}^{t} \cup \{(\mathbf x_i, -1), i=n+tl+1, ..., n+tl+l\}$
![](images/pn-samples-tt.png)

- **Reclassification Step**
    - Expand training set: $\mathbf S_{e}^{t+1} = \mathbf S \cup \mathbf S_{pn}^{t+1}$
![](images/e-set.png)
    - Update CNN classifier to $\mathbf C^{t+1}$ on expanded set $\mathbf S_{e}^{t+1}$ to get $q_{t+1}(y=+1|\mathbf x)$
    ![](images/c-t.png)
- **$t \leftarrow t+1$ until convergence *(e.g. no improvement on validation set)***

### Synthesis Process
- Goal is to draw fair samples from $$p_t(\mathbf x|y=-1) = \frac{1}{Z_t}\frac{q_t(y=+1|\mathbf x; W_t)}{q_t(y=-1|\mathbf x; W_t)}p_r(\mathbf x|y=-1)$$
- Update a random sample $\mathbf x$ drawn from $p_t(\mathbf x|y=-1)$ by increasing $\frac{q_t(y=+1|\mathbf x; W_t)}{q_t(y=-1|\mathbf x|W_t)}$ using backpropagation.
- Partition function $Z_t$ is a constant and not dependent on the sample $\mathbf x$
- Let
$$g_t(\mathbf x) = \frac{q_t(y=+1|\mathbf x; W_t)}{q_t(y=-1|\mathbf x|W_t)} = \exp\{\mathbf w_t^{(1)T}\phi(\mathbf x; \mathbf w_t^{(0)})\}$$
Which is essentially odds ratio. Taking *natural log* of both sides:
$$ln(g_t(\mathbf x)) = \mathbf w_t^{(1)T}.\phi(\mathbf x; \mathbf w_t^{(0)})$$

- **Please ask the authors to clarify this sentence:**
Starting from $x$ drawn from $p_r(\mathbf x|y=-1)$, we directly increase  $\mathbf w_t^{(1)T}.\phi(\mathbf x; \mathbf w_t^{(0)})$ using stochastic gradient **ascent** on $\mathbf x$ via backpropagation, which allows us to obtain fair samples subject to $$p_t(\mathbf x|y=-1) = \frac{1}{Z_t}\frac{q_t(y=+1|\mathbf x; W_t)}{q_t(y=-1|\mathbf x; W_t)}p_r(\mathbf x|y=-1)$$

---

### Synthesis process

- Sample **3** samples $\{x_1, x_2, x_3\}$ from $p_t(\mathbf x|y=-1)$
![](images/samples-3.png)
- Update sample $x_i$ in **3** samples by increasing $\mathbf w_t^{(1)T}.\phi(\mathbf x_i; \mathbf w_t^{(0)})$ using stochastic gradient **ascent** on $\mathbf x_i$ via backpropagation as follows:
![](images/synth-input.png)

![](images/synth-grad.png)

![](images/synth-update.png)


### Introspective Generative Modeling: Decide Discriminatively
**Authors: ** Justin Lazarow et al.

- Developed Introspective Generative Modeling (IGM) that attains a generator using progressively learned deep convolutional neural networks. Generator is able to self-evaluate the difference between its generated samples and the given training data (i.e. Generator is also a discriminator).

- Unsupervised learning (much harder task because of learning complexity and assumptions) models are often *generative* and supervised classifiers are often *discriminative*

- Introspective Generative Modeling (IGM) is simultaneously a generator and a discriminator. Modeling consists of two stages during training:
    1. A pseudo-negative sampling stage (synthesis) for self-generation
    2. A CNN classifier learning stage (classification) for self evaluation and model updating.
    
- Some properties about IGM
    - Existing CNN classifiers can be directly made into generators (if trained properly)
    - Able to train on images of a size and generate an image of larger size while maintaining the coherence for the entire image.
- General pipeline of IGM is similar to Generative Modeling via Discriminative approach method (GDL) with boosting algorithm replaced by a CNN in IGM (demonstrated significant improvement in modeling and computational power)
- GDL learns a generative model through a sequence of discriminative classifiers (boosting) using repeadedly self-generated samples, called *pseudo-negatives*

- Differences between GDL and IGM
    - CNN in IGM results in a significant boost to feature learning
    - GDL: Markov Chain Monte Carlo based sampling process (computational bottleneck. IGM: Backpropagation to synthesis/sampling process
- Differences between GAN and IGM
    - IGM maintains a single model that is simultaneously a generator and a discriminator. GAN uses two CNN's, a generator and a discriminator.
    - GAN's are hard to train. IGM carries out a straightforward use of backpropagation in both the sampling and the classifier training stage, making the learning process direct.
    - GAN generator is a mapping from features to images. IGM directly models the underlying statistics of an image with an efficient sampling/inference process, which makes IGM flexible.
    - GAN performs a forward pass to reconstruct an image. In IGM image synthesis is carried out using backpropagation so it is slower (but feasible)
    - IGM has larger model complexity (a cascade of ~ 60 to 200 CNN classifiers are included) than GAN.
    
----

### A Neural Algorithm of Artistic Style

**Authors:** Leon A. Gatys et al.

- Developed an artificial system based on a deep neural network that creates artistic images of high perceptual quality. The system uses neural representations to separate and recombine content and style of arbitrary images, providing a neural algorithm for the creation of artistic images.

- [GAN in TF](http://blog.evjang.com/2016/06/generative-adversarial-nets-in.html)
- [GAN in PyTorch](https://medium.com/@devnag/generative-adversarial-networks-gans-in-50-lines-of-code-pytorch-e81b79659e3f)
- [GAN intro in TF](http://blog.aylien.com/introduction-generative-adversarial-networks-code-tensorflow/)
- [DCGAN in TF](https://medium.com/@awjuliani/generative-adversarial-networks-explained-with-a-classic-spongebob-squarepants-episode-54deab2fce39)