# GANs

**What are they, what makes them work, and what is their future.**

Seth Weidman

Boston Machine Learning Meetup

February 1, 2018

# Agenda

* What are GANs
* What makes them work
* What is their future

# What are GANs?

TODO: Add a high level overview here.



## What are neural nets?

We've all seen diagrams like this when trying to understand neural nets:

![](img/neural_network_diagram.png)

But what are neural nets, mathematically?

They are:

* Universal function approximators
* Differentiable

If each layer is written as $a$, $b$, $c$, with weights $V$ and $W$, then the prediction can be written as:

$$ P = p(c(b(a(x, V)), W)) $$

And the loss can be written as:

$$ L = l(p(c(b(a(x, V)), W))) $$



What does differentiable mean? It means that if we have a loss $L$ can compute:

$$ \frac{\partial L}{\partial W} $$

$$ \frac{\partial L}{\partial V} $$

etc. Indeed, this is the information we need to update the weights so that we can "train" the neural network. 

**In addition**, it also means we can compute:

$$ \frac{\partial L}{\partial X} $$

In other words, how much the loss would change if the individual pixels of the _input_ changed.

It was _this_ insight that sparked Ian Goodfellow to investigate GANs:

Could a machine learning algorithm use this information to learn how to "trick" another algorithm by producing examples that reduced this loss?

## Origin story

![](img/ian_goodfellow_beer.png)

In 2013, Ian Goodfellow and Yoshua Bengio are about to run a speech synthesis contest. They want to have a discriminator network that could listen to artificially generated speech and decide if it was real or not. 

They decide not to run the contest, concluding that people will just game the system by generating examples that will fool this particular discriminator.

Then, Ian Goodfellow was in a bar one night, and asked the question: **can this be fixed by the _discriminator_ learning**?

## How could you do it?

### Part 1

First: randomly generate a feature vector; feed the feature vector through a randomly initialized neural network to produce an output image.

$$ \begin{bmatrix}z_1 \\
                  z_2 \\
                  ... \\
                  z_{100}
                  \end{bmatrix} $$

![](img/gan_1.png)

Let's denote the matrix of pixels in this image $X$.

Then, feed this image (matrix of pixels $X$) into a second network and get a prediction:

![](img/gan_2.png)

Use this loss to train the discriminator. 

Critically, also compute $$ \frac{\partial L}{\partial X} $$ - how much each of the _pixels generated_ affects the loss.

Then, update the generator with $$ -\frac{\partial L}{\partial X} $$

negative because we want the generator to be continually making the discriminator _more_ likely to say that the images it is generating are real.

![](img/gan_3.png)

Generate a _new_ random noise vector $Z$, and repeat the process, so that the generator will learn to turn _any_ random noise vector into an image that the discriminator thinks is real.

### What's missing?

This will train the generator to generate good fake images, but it will likely result in the discriminator not being a very smart classifier since we only gave it one of the two classes it is trying to classify. So, we'll have to give it images from the true class as well.

## Part 2:

![](img/gans_4.png)

Cool aside: this is the [Original GitHub repo with Ian Goodfellow's code](https://github.com/goodfeli/galatea/commit/d960968919b0856ba6753198a0e035228d7c03e6) that he used to generate MNIST digits.

# Let's code one up

See notebook [here](GAN_example/dlnd_face_generation.ipynb). TODO: get this running on AWS (easy)

## What makes GANs work?

* Deep Convolutional Architecture
* Batch Normalization

Let's first cover the Deep Convolutional architecture. We'll review:

* What convolutions are
* What deconvolutions are

## Convolutions

We've all seen diagrams like this in the context of convolutional neural nets:

![](img/AlexNet_0.jpg)

What's really going on here?

Let's say we have an input layer of size $[224x224x3]$, as we do in ImageNet. This next layer seems to be $96$ deep. What does that mean?

## Review of convolutions

See the visual [here](http://cs231n.github.io/convolutional-networks/).

"_Filters_" are slid over images using the convolution operation. 

In theory, these filters can act as _feature detectors_, and the images that result from the convolving these filters with the image can be thought of as versions of the original image where the detected features have been highlighted.

In practice, the neural network _learns_ filters that are useful to solving the particular problem it has been given.

We can then visualize these filters once the network has learned.

Let's return to the concrete example of the AlexNet architecture:

For each of 96 _filters_, the following happens:

For each of the 3 _input channels_, one of these _filter_, which happens to be dimension $11 x 11$ in this case, is slid over the image, "detecting the presence of different features" at each location. 

So, there are actually a total of $96 x 3$ convolution operations that take place, resulting in 96 filters.

We can combine these 96 filters - the one slid over the red, green, and blue color channels - can be combined together and visualized as if they were a mini 11x11 image:

### The 96 AlexNet filters:

![](img/AlexNet_filt1.png)

## More on convolutions

That's what convolutions are - the specifics of the size of the input and resulting output are a result of the specifics of how we do the convolution, for example:

1. What filter size do we use?
2. How big is our "stride" - how much do we move the filter by as we convolve it with the image?  
3. How much "padding" or space around the image should we use?

[Here](http://deeplearning.net/software/theano/tutorial/conv_arithmetic.html) is a very in depth look at convolutions - let's look at a couple examples from this.