# Self2Face Presentation

By: Alexander Comerford (alexanderjcomerford@gmail.com)

This notebook is a presentation markdown notebook to describe the contents of the self2face repo

## Table of contents:
* [Background on GANs](#first-bullet)
* [Training GANS](#second-bullet)
* [pix2pix](#pix2pix)
* [self2face & kubeflow](#self2face)

### Background on GANs

GAN - Generative Adversarial Network

https://arxiv.org/abs/1406.2661

#### Idea

* Train a generative model G(z) to generate data with random noise z as input
* Adversary is discriminator D(x) trained to distinguish generated and true data
* Represent both G(z) and D(x) by multilayer perceptrons for differentiability

#### Analogy
* Generative: team of counterfeiters, trying to fool police with fake currency
* Discriminative: police, trying to detect the counterfeit currency
* Competition drives both to improve, until counterfeits are 
indistinguishable
from genuine currency
* Now counterfeiters have as a side-effect learned something about real 
currency

### Training GANs

GANS are hard to train

Without the right hyperparameters, network architecture, and training procedure, there is a high chance that either the generator or discriminator will overpower the other.

A common case of this is the situation where the generator is able to find a flaw in the discriminator by repeatedly outputting an image that fits the data distribution the discriminator is looking for, but is nowhere close to being a readable output. The generator has collapsed onto a single point, and therefore we won’t output a variety of digits.

There are also cases where the discriminator becomes too powerful and is able to easily make the distinction between real and fake images.

The mathematical intuition behind this phenomenon lies in that GANs are typically trained using gradient descent techniques that are designed to find the minimum value of a cost function, rather than to find the Nash equilibrium of a game. When used to seek for a Nash equilibrium, these algorithms may fail to converge. Further research into game theory and stable optimization techniques may result in GANs that are as easy to train as ConvNets!

![Image of Yaktocat](https://ai2-s2-public.s3.amazonaws.com/figures/2016-11-08/42f6f5454dda99d8989f9814989efd50fe807ee8/3-Figure1-1.png)

--------------

![](https://cdn-images-1.medium.com/max/800/1*M_YipQF_oC6owsU1VVrfhg.jpeg)

GAN is based on the zero-sum non-cooperative game. In short, if one wins the other loses. A zero-sum game is also called minimax. Your opponent wants to maximize its actions and your actions are to minimize them. In game theory, the GAN model converges when the discriminator and the generator reach a Nash equilibrium. This is the optimal point for the minimax equation below.

![](./pics/Selection_0004.png)

1. log prob of D predicting that real-world data is genuine

2. log prob of D predicting that G’s 
generated data is not genuine



Here we set up our losses and optimizers.

    The upside-down capital delta symbol denotse the gradient of the generator
    m is the number of samples
    Sigma notation tells you to sum up the function evaluated at particular points determined by the little numbers on top and below the big sigma. It is used to add a series of numbers.


![Image of Yaktocat](https://cdn-images-1.medium.com/max/1600/1*V4nu4ThcHFQXmWKxx9lhbA.png)

 The gradient ascent expression for the discriminator. The first term corresponds to optimizing the probability that the real data (x) is rated highly. The second term corresponds to optimizing the probability that the generated data G(z) is rated poorly. Notice we apply the gradient to the discriminator, not the generator.

Gradient methods generally work better optimizing log⁡p(x) than p(x) because the gradient of log⁡p(x) is generally more well-scaled. That is, it has a size that consistently and helpfully reflects the objective function's geometry, making it easier to select an appropriate step size and get to the optimum in fewer steps. The computer uses a limited digit floating point representation of fractions, multiplying so many probabilities is guaranteed to be very very close to zero. With log, we don't have this issue.

The generator is then optimized in order to increase the probability of the generated data being rated highly.

![Image of Yaktocat](https://cdn-images-1.medium.com/max/1600/1*njOrO3uNzUH1dZb7-i-dhw.png)

The gradient descent expression for the generator. The term corresponds to optimizing the probability that the generated data G(z) is rated highly. Notice we apply the gradient to the generator network, not the discriminator.

By alternating gradient optimization between the two networks using these expressions on new batches of real and generated data each time, the GAN will slowly converge to producing data that is as realistic as the network is capable of modeling.

## Proof of optimality

https://arxiv.org/pdf/1406.2661.pdf#page=4&zoom=100,-111,470

# Pix2pix

https://arxiv.org/pdf/1611.07004.pdf

![](https://phillipi.github.io/pix2pix/images/teaser_v3.png)

### Interactive papers???

https://affinelayer.com/pixsrv/

## Self2face

* Personal project aimed at creative a workflow to creating you personal virtual avatar
* Built on kubeflow + tensorflow + gpus
* 10 minutes of video == ~10,000 images

#### Kubeflow

The Kubeflow project is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable. Our goal is not to recreate other services, but to provide a straightforward way to deploy best-of-breed open-source systems for ML to diverse infrastructures. Anywhere you are running Kubernetes, you should be able to run Kubeflow.

![](https://camo.githubusercontent.com/bd4adfc06b0e349c47f2bae3319544a2e547c796/68747470733a2f2f7777772e6b756265666c6f772e6f72672f696d616765732f6c6f676f2e737667)

![](https://opensource.com/sites/default/files/uploads/kubeflow_ml-model-training.png)

* Components of Kubeflow
* Logical components that make up Kubeflow
* Chainer Training
* <b>Hyperparameter Tuning (Katib)</b>
* Istio Integration (for TF Serving)
* <b>Jupyter Notebooks</b>
* ModelDB
* ksonnet
* MPI Training
* MXNet Training
* Pipelines
* PyTorch Training
* Seldon Serving
* NVIDIA TensorRT Inference Server
* TensorFlow Serving
* TensorFlow Batch Predict
* TensorFlow Training (TFJob)
* PyTorch Serving

# DEMOOOOO!