<a href="https://colab.research.google.com/github/RubeRad/StyleGAN2-TensorFlow-2.x/blob/master/StyleGAN2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Intro
StyleGAN2 is an AI for generating photorealistic synthetic faces. It was trained on huge numbers of photos of faces from the internet. "GAN" means "Generative Adversarial Network". 

**Network** because it's a neural network like we've seen with the MNIST example: data fed into an initial layer of nodes, which feeds through arcs and weights and many more layers of nodes to an output layer.

**Generative** because it is a **Generator** of images. Basically you feed a list of random numbers into the input layer, and the output layer is pixels of an image (kind of backwards from MNIST, where pixels are fed into the input, and the output layer is numbers).

**Adversarial** is the interesting part. They actually started with a **Discriminator** network, which was designed (more like MNIST) to accept pixels in the input layer, and have an output layer with just two nodes: Real or Fake. Then the baby, untrained Discriminator, and baby, untrained Generator were put in an MMA Cage to train each other: 
* The Generator would try to generate images that would be realistic enough to fool the Discriminator
* ...which trained the Discriminator to be more discriminating in order to defeat the Generator.
* ...which trained the Generator to make more realistic images that were better able to fool the Discriminator.
* ...which trained the Discriminator to be more discriminating.
* ...which trained the Generator to make more realistic images.
* ...you get the picture.

Eventually the Generator and the Discriminator reach a balanced detente where neither is improving anymore (the Adversarial training has converged to its optimum). Now that the Generator is in amazing shape, the Discriminator doesn't have much use (or maybe I'm just not creative enough to imagine its uses), and is left behind as a worn-out sparring partner, while the Generator goes and gets famous on the internet.

Go check out the generator's work at [thispersondoesnotexist.com](http://thispersondoesnotexist.com)

The original StyleGAN2 was a research project from the GPU-manufacturer NVIDIA, using tensorflow v1:

* Paper: https://arxiv.org/pdf/1812.04948.pdf
* Video: https://youtu.be/kSLJriaOumA
* Code: https://github.com/NVlabs/stylegan
* FFHQ: https://github.com/NVlabs/ffhq-dataset

I used to have a StyleGAN2 notebook based on [this tutorial by Mikael Christensen](https://colab.research.google.com/drive/1ShgW6wohEFQtqs_znMna3dzrcVoABKIH). Unfortunately, Google Colab advanced to tensorflow v2, and I couldn't get that notebook to work anymore. The capability in this network is thanks to the work of Alberto Rosas Garcia, who upgraded StyleGAN2 to work with tensorflow v2, and [posted his version onto github](https://github.com/rosasalberto/StyleGAN2-TensorFlow-2.x).

# Setup
This is a group of cells to get all the code and data in place, because it's not just standard stuff like numpy and pands that can be brought in with `import`. We also need a large data file with all the weights.

This group of cells will need to be run once at the beginning of each session. Then this section can be collapsed in Colab, so it's out of the way.

In [None]:
# import the normal stuff
import os
import numpy             as np
import matplotlib.pyplot as plt

In [None]:
# DO THIS IF YOU ARE IN COLAB
# Clone the repository with the TensorFlow2 update to StyleGAN2
# Thanks Alberto Rosas!!
%cd /content
!git clone https://github.com/ruberad/StyleGAN2-TensorFlow-2.x.git stylegan2
%cd /content/stylegan2

In [None]:
import tensorflow as tf # common neural network module
# import StyleGAN2-specific stuff from 
# that python we just pulled down from github
from utils.utils_stylegan2 import convert_images_to_uint8
from stylegan2_generator   import StyleGan2Generator

In [None]:
# Check if you have access to a GPU in this (virtual) machine
!nvidia-smi -L
gpuname = tf.test.gpu_device_name()
print('GPU Identified at: "{}"'.format(gpuname))

# set the impl and gpu variables appropriately
# these are needed so the Generator knows how to run
if gpuname:
  impl = 'cuda'
  gpu = True
  print('yay fast!')
else:
  impl = 'ref'
  gpu = False
  print('aww, slow.')

In [None]:
import gdown
# This is the public URL of the online file with weights that StyleGan2 needs for generating faces
url = 'https://drive.google.com/uc?export=download&confirm=pbef&id=1afMN3e_6UuTTPDL63WHaA0Fb9EQrZceE'
# This is where we want the file to go (path on the virtual machine)
out = 'weights/ffhq.npy'

if os.path.exists(out):
  print('ffhq.npy weights file is present')
else:
  gdown.download(url, out, quiet=False)

# After this cell is run, there should be a file ffhq.npy in the weights/ subdirectory,
# check the colab folder sidebar to make sure it's there.
# The full file is about 250MB 
# That's alotta weights! This is a BIG neural network!
# The paper says the network has 26.2 million trainable parameters

# FFHQ stands for Flickr Faces, High Quality

In [None]:
# Now that we have all the weights, we can create the Generator object
# that can use those weights. We'll call it sg2
sg2 = StyleGan2Generator(weights='ffhq', impl=impl, gpu=gpu)

# If this says "Loaded ffhq generator weights!", we're ready to start generating faces!

# First run
These commands will seem mysterious at first, but we wil understand them more as we go along.

Random Number Generators (rngs) are technically only *Pseudo*random. They are in fact quite deterministic. If you give them the same seed, they will repeatably generate the same stream of 'random' numbers.

In [None]:
seed = 1
rng = np.random.RandomState(seed)

Let's use the RNG to generate a pile of random numbers. Each one is from the Standard Normal distribution (mean $\mu=0$, standard deviation $\sigma=1$).

In [None]:
z = rng.randn(1, 512).astype('float32')
z

That's a lot of numbers, mostly in the $\pm 1$ range, some a little bigger. We can slap the values into a histogram and see the typical 'bell curve' of a Normal distribution. (And if you go back and rerun just the `z=rng` cell, you will see it sampled new numbers, further down the rng stream (But if you go further back and rerun the `seed=1` cell, you will see that it samples the exact same 'random' numbers every time))

In [None]:
fig=plt.figure( figsize=(8,3) )
ax=plt.gca()
ax.hist(list(z), bins=40)
plt.show()

Now we use our generator object to push our pile of seeded random numbers through the 'Mapping Network' to get something that for now we'll just call `w`. More on this later.

In [None]:
w = sg2.mapping_network(z)
w

That `w` is numbers in the right ranges, and in the right 'shape', to feed into the input layer of the Generator. So let's push it on through!

This may take a bunch of seconds. Even for a computer it takes a while to propagate 18x512 input numbers through a network of 26.2 million weights!

In [None]:
out = sg2.synthesis_network(w)
out

What is that `out` that we got out? It's a pile of 1024x1024 numbers, which need to be reinterpreted as pixels of an image. Fortunately all this stylegan2 python includes a function for that.

In [None]:
img = convert_images_to_uint8(out)

And now that we have an image, this is how we use matplotlib to display it:

In [None]:
fig = plt.figure( figsize=(12,12) )  # like normal
ax  = plt.gca()                         
ax.axis('off')     # turn off the axis lines, we're not plotting a graph
ax.imshow(img)     # bet you can guess what 'imshow' does
plt.show()    

OK, there's a dude, he looks like a grumpy late-middle-aged white guy. (Note I can 'predict' what face you will see even before you run this notebook, because we have taken control of the seeding of the RNG, and the same grumpy white guy will get generated from `seed=1` every time.)

I'm pretty sure this grumpy guy's name is ***Gus***.

Believe it or not, even though this looks like a real photo taken of a real guy, there is no Gus. This image is completely synthetic. Take a moment to bask in how amazing that is.

## Exercise
Use the code cell below to copy all the important python commands from above (leaving out the ones that show intermediate stuff we don't want to look at every time), and run it a bunch of times with different seeds to see a bunch of different faces! 

Use this markup cell to accumulate a list of seeds that yield faces that interest you
* put a seed here
* and here
* etc

In [None]:
# Use this code cell to gather all the important python commands from above
# So an image can be generated and displayed by running just this one code cell
seed=2 # or seed = np.random.randint(1000000)
#...

# Mean Girl
Who is 'Mean Girl'? And why is she so mean? Let's get a look at her...

The Generator object has an attribute (a data element which is part of the object) called `dlatent_avg`. This is a collection of numbers in the right ranges, and of the right `shape` to input to the input layer of the Generator network. This is called a `latent space vector`, and the shorthand is typically to call it `w` or a vector in $w$-space.

In [None]:
w_avg = sg2.dlatent_avg
w_avg

Let's feed `w_avg` into the Generator and see what we get!

In [None]:
out = sg2.synthesis_network(w_avg)
out

As before, we need to take this pile of 1024x1024 numbers, reinterpreted them as pixels of an image, and display it with matplotlib.

In [None]:
img = convert_images_to_uint8(out) # same as before
fig = plt.figure( figsize=(12,12) )
ax  = plt.gca() 
ax.axis('off')
ax.imshow(img) 
plt.show()

She doesn't look so mean after all! 

But she is *average* (aka *mean*, get it?). This is the image that is generated from the input vector which is called `dlatent_avg`, which is an input vector in the `middle` of the (highly multidimensional) $w$-space of input vectors, and a face which is in some sense in the `middle` of all the images in the training set. From the paper:

> In case of FFHQ this point represents a sort of an average face ... the "mean" face of FFHQ. This face is similar for all trained networks.

[Open up the paper](https://arxiv.org/pdf/1812.04948.pdf) and look closely at Page 8, Figure 8, the face at $\psi=0$. It is very similar to this face, maybe the same woman on a different day, with a different haircut, and a different(?) gray shirt. That figure in the paper was likely generated with an earlier training of StyleGAN2, but "similar for all trained networks" means each time the tweaked their network design and retrained, the "mean girl" ended up looking pretty much the same.

What does it mean that this "mean girl" is a woman? A white woman?

# Deviations from the mean
Note that `w_avg=sg2.dlatent_avg` and `w=sg2.mapping_network(z)` are both, when you boil it down, piles of numbers, of the same shape, the right shape to feed into the input layer of the Generator. They are **vectors**, and because they are vectors, they can be manipulated as vectors, i.e. added, subtracted, scaled, etc.

In [None]:
# in case we don't have Mean Girl's w_avg defined from above
w_avg = sg2.dlatent_avg

In [None]:
# Let's go back to Gus
seed=1

In [None]:
# Generate a z from the seed and map a w from the z
rng = np.random.RandomState(seed)
z = rng.randn(1, 512).astype('float32')
w = sg2.mapping_network(z)

In [None]:
# This is the vector between Mean Girl and Gus
# (or whatever face comes from your chosen seed)
dw = w - w_avg

Simple algebra tells us that since `dw = w - w_avg`, to get from Mean Girl to Gus we just need to `w = w_avg + dw`. Which is also to say `dw` is the *direction* from Mean Girl to Gus. But remember what we learned from *Despicable Me*: vectors have not only direction, but also ***magnitude***.

<img src="https://i.kym-cdn.com/photos/images/original/001/755/239/9da.jpg" width="400">

To get from Mean Girl's `w_avg` to Gus' `w` means going the full magnitude of `dw` away from `w_avg`
> `w = w_avg + 1.0*dw`

But what if we don't want to go the whole distance? What if we only want to go halfway?

In [None]:
# This is the 'truncation factor' t
t = 0.5
w_trunc = w_avg + t*dw

In [None]:
# now do the stuff from before to show what face we get
out = sg2.synthesis_network(w_trunc)
img = convert_images_to_uint8(out)
fig = plt.figure( figsize=(12,12) )
ax = plt.gca()
ax.axis('off')
img_plot = ax.imshow(img)

Hey wow, Gus isn't so grumpy. Or old. Does this face seem 'halfway between' Mean Girl and original Gus?

Play around with different truncation factors `t`. What happens when you slide `t` from 1.0 down to 0.0? What happens if you keep going and let `t` go ***negative***?!?!

From the paper, the pair of faces generated by `w_avg + t*dw` and `w_avg - t*dw` are called "anti-faces"
> By applying negative scaling to styles, we get the corresponding opposite or "anti-face". It is interesting that various high-level attributes often flip between the opposites, including viewpoint, glasses, age, coloring, hair length, and often gender.

So the direction vector `dw` is called a ***style***. Mean Girl is the bland center of all faces (she has no style!), but applying randomly generated `dw` moves in the direction of applying styles such as the list in the quote. And moving in the opposite direction `-dw` applies the opposites of those styles. (This is why it's called ***Style***GAN2.)

## Exercise
Refer to the list of interesting face seeds from the previous exercise. Go back through the cells above to re-generate some of those faces, and then modify `t` down through 0 and to negative to see how your face transforms to its anti-face. 

## Exercise
It's time to stop copy&pasting piles of the same commands over and over again. In programming, copying & pasting is always a sure sign that code needs to be modularized into functions for convenient reuse.

In the code cells below, write the functions as described by the comments

In [None]:
# given a seed s, return the z-vector of standard normal random numbers
def sample_z(s):
    

In [None]:
# Test the sample_z() function, make sure it behaves as expected
z=sample_z(1)
z

Don't move on to defining the next function until `sample_z()` is done and working. If you go through and implement all the functions and don't test until you get to the end, it is ***much*** harder to find the bugs, because they could be anywhere (will be everywhere)!

In [None]:
# given a z-vector, push it through generator g's mapping network to get a w-vector
def map_z_to_w(g, z):
    

In [None]:
# Test
w = map_z_to_w(sg2, z) # note we pass sg2 in as generator g
w

In [None]:
# Input a given w-vector to generator g and return the generated image
def generate_image_from_w(g, w):
    

In [None]:
# Test
img = generate_image_from_w(sg2, w)
img

This next function should use all of the functions defined above. Make sure those little, simple ones are all working before starting on this big, complex one!

In [None]:
# Starting from a seed s, compute the corresponding z and w
# Then compute the style direction dw from generator g's mean girl
# Then compute two different truncated w vectors, one that is +t away
# from the mean, and one that is -t away from the mean
# Use g to generate 3 images: the face, mean girl, and the anti-face
# Use matplotlib to plot all three
# (Refer to MatplotlibIntro.ipynb for a row of subplots 131 132 133)
def plot_face_mean_anti(g, s, t=1.0):
    

In [None]:
# Test
plot_face_mean_anti(sg2, 1) # try different seeds, different values for the +/-t

In [None]:
# Similar to plot_face_mean_anti, but this time we want a 3x3 grid
# Start by finding the z, w, and dw.
# Also start a matplotlib figure going.
# Use tvec=np.arange(start, limit, step) (with the right inputs) to
# get 9 values equally spaced from t to -t (with 0 in the middle)
# Matplotlib should have a 3x3 grid of subplots 331, 332, ... 339
# Loop through tvec, and for each t:
#   compute w_trunc = avg_w + t*dw
#   generate the image for w_trunc
#   create the next subplot
#   show the image on it
def plot_anti_face_grid(g, s, t=1.0):
    