# Paper Notes

This notebook summarizes some important question to embedding an image into the laten space of StyleGAN

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


In [None]:
%cd /content/gdrive/MyDrive/internship/style-gan/

/content/gdrive/MyDrive/internship/style-gan


In [None]:
!pip install click requests tqdm pyspng ninja imageio-ffmpeg==0.4.3
!pip install lpips
!pip install pytorch-ignite
!pip install pytorch-msssim

Collecting pyspng
[?25l  Downloading https://files.pythonhosted.org/packages/cf/b2/c18f96ccc62153631fdb3122cb1e13fa0c89303f4e64388a49a04bfad9f2/pyspng-0.1.0-cp37-cp37m-manylinux2010_x86_64.whl (195kB)
[K     |████████████████████████████████| 204kB 4.2MB/s 
[?25hCollecting ninja
[?25l  Downloading https://files.pythonhosted.org/packages/1d/de/393468f2a37fc2c1dc3a06afc37775e27fde2d16845424141d4da62c686d/ninja-1.10.0.post2-py3-none-manylinux1_x86_64.whl (107kB)
[K     |████████████████████████████████| 112kB 24.1MB/s 
[?25hCollecting imageio-ffmpeg==0.4.3
[?25l  Downloading https://files.pythonhosted.org/packages/89/0f/4b49476d185a273163fa648eaf1e7d4190661d1bbf37ec2975b84df9de02/imageio_ffmpeg-0.4.3-py3-none-manylinux2010_x86_64.whl (26.9MB)
[K     |████████████████████████████████| 26.9MB 109kB/s 
Installing collected packages: pyspng, ninja, imageio-ffmpeg
Successfully installed imageio-ffmpeg-0.4.3 ninja-1.10.0.post2 pyspng-0.1.0
Collecting lpips
[?25l  Downloading https://fi

In [None]:
# Import the needed libraries
import matplotlib.pyplot as plt
import pickle
import torch
import torch.nn as nn
import os
import IPython
from PIL import Image
import glob
from sklearn.decomposition import PCA
import numpy as np 
from torchvision.utils import save_image
from torchsummary import summary
from torchvision import models, transforms
from ignite.metrics import PSNR
from ignite.engine import Engine
from pytorch_msssim import ssim

import torch.optim as optim
import torch.nn.functional as F
import lpips
import warnings
import dnnlib
import pandas as pd
import PIL


In [None]:
# Setting global attributes
RESOLUTION = 1024
#DEVICE = 'cuda:0' if torch.cuda.is_available() else 'cpu'
DEVICE = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

ITERATIONS = 1300
SAVE_STEP = 100

# OPTIMIZER
LEARNING_RATE = 0.01
BETA_1 = 0.9
BETA_2 = 0.999
EPSILON = 1e-8
regularizer_lambda = 0.001

# IMAGE TO EMBED
#PATH_IMAGE = "stuff/data/expression02.png"
PATH_DIR = "stuff/data/input/"
SAVING_DIR = 'stuff/results/paper_notes/'

In [None]:
DEVICE

device(type='cuda', index=0)

## Loadding Pretrained Model
Load the pretrained model using the pickle file. I need the libraries `dnnlib` and `torch_utils` to load this model.

It does not need source code for the networks themselves — their class definitions are loaded from the pickle via `torch_utils.persistence`.


In [None]:
PRETRAINED_MODEL = "stuff/pretrained_models/ffhq.pkl"

with open(PRETRAINED_MODEL, 'rb') as f:
    G = pickle.load(f)['G_ema'].to(DEVICE)  # torch.nn.Module-


The pickle contains three networks. `'G'` and `'D'` are instantaneous snapshots taken during training, and `'G_ema'` represents a moving average of the generator weights over several training steps. The networks are regular instances of `torch.nn.Module`, with all of their parameters and buffers placed on the CPU at import and gradient computation disabled by default.

The generator consists of two submodules, `G.mapping` and `G.synthesis`, that can be executed separately. They also support various additional options:

```.python
w = G.mapping(z, c, truncation_psi=0.5, truncation_cutoff=8)
img = G.synthesis(w, noise_mode='const', force_fp32=True)
```
where

```.python
z: latent_code
c: class label
w: intermediate latent_code
```

Please refer to [`generate.py`](./generate.py), [`style_mixing.py`](https://github.com/NVlabs/stylegan2-ada-pytorch/blob/main/style_mixing.py), and [`projector.py`](./projector.py) for further examples.

From G we need to extract the `mapping` and the `synthesis` modules.

## Sampling


1. Sample (generate) 50 face images (in z-space) using StyleGAN and look at them carefully. What artifacts can you observe? How could you tell an StyleGAN image apart from a real image?

In [None]:
z_latents = torch.randn([50, G.z_dim]).to(DEVICE)

In [None]:
truncation_value = 1
output_dir = os.path.join(SAVING_DIR,f'/trunc_{truncation_value}')
if not os.path.exits(output_dir):
  os.makedir(output_dir)

for latent_idx, z in enumerate(z_latents):
    print('Generating image for latent (%d/%d) ...' % (latent_idx, len(z_latents)))
    img = G(z.unsqueeze(0), None, truncation_psi=truncation_value, noise_mode='const')
    img = (img.permute(0, 2, 3, 1) * 127.5 + 128).clamp(0, 255).to(torch.uint8)
    filename = os.path.join(output_dir, f'image_{latent_idx:03d}.png')
    PIL.Image.fromarray(img[0].cpu().numpy(), 'RGB').save(filename)

2. Sample 50 images without truncation (truncation factor t = 1), 50 images with truncation factor t = 0.7, and to images with truncation factor t = 0.5. What do you observe? Can you verbally describe the differences between the different truncation factors (the truncation factor has another name in the paper, not t).

In [None]:
truncation_value = 0.7
output_dir = os.path.join(SAVING_DIR,f'/trunc_{truncation_value}')
if not os.path.exits(output_dir):
  os.makedir(output_dir)

for latent_idx, z in enumerate(z_latents):
    print('Generating image for latent (%d/%d) ...' % (latent_idx, len(z_latents)))
    img = G(z.unsqueeze(0), None, truncation_psi=truncation_value, noise_mode='const')
    img = (img.permute(0, 2, 3, 1) * 127.5 + 128).clamp(0, 255).to(torch.uint8)
    filename = os.path.join(output_dir, f'image_{latent_idx:03d}.png')
    PIL.Image.fromarray(img[0].cpu().numpy(), 'RGB').save(filename)

In [None]:
truncation_value = 0.5
output_dir = os.path.join(SAVING_DIR,f'/trunc_{truncation_value}')
if not os.path.exits(output_dir):
  os.makedir(output_dir)

for latent_idx, z in enumerate(z_latents):
    print('Generating image for latent (%d/%d) ...' % (latent_idx, len(z_latents)))
    img = G(z.unsqueeze(0), None, truncation_psi=truncation_value, noise_mode='const')
    img = (img.permute(0, 2, 3, 1) * 127.5 + 128).clamp(0, 255).to(torch.uint8)
    filename = os.path.join(output_dir, f'image_{latent_idx:03d}.png')
    PIL.Image.fromarray(img[0].cpu().numpy(), 'RGB').save(filename)

3. What is the difference between w / w+ or z / z+ space? In w there is only one latent code with 512 floating point variables and in w+ there are 18 latent codes with 512 floating point variables. There are also different spaces like w2, w3, w6, w9. In these other spaces the latent codes are split into groups that can be different. For example, in w2 there are two groups of latent codes (one for the first 9 layers and one for the later 9 layers). Sample k images from z, z2, z3, z6, z9, z18 = z+. What do you observe? Which images look realistic and which do not look realistic?