faceswap-GAN

Adding Adversarial loss and perceptual loss (VGGface) to deepfakes' auto-encoder architecture.

Descriptions

GAN-v1

FaceSwap_GAN_github.ipynb
1. Build a GAN model.
2. Train the GAN from scratch.
3. Use GAN to swap a single face image to target face.
4. Detect faces in an image using dlib's cnn model.
5. Use GAN to transform detected face into target face.
6. Use moviepy module to output a video clip with swapped face.

GAN-v2

FaceSwap_GAN_v2_train.ipynb: Detailed training procedures can be found in this notebook.
1. Build a GAN model.
2. Train the GAN from scratch.
3. (Optoinal) Detect faces in an image using dlib's cnn model.
4. (Optoinal) Use GAN to transform detected face into target face.
5. (Optoinal) Use moviepy module to output a video clip with swapped face.
FaceSwap_GAN_v2_test_img.ipynb: Provides swap_face() function that require less VRAM.
1. Load trained model.
2. Swap a single face image to target face.
FaceSwap_GAN_v2_test_video.ipynb
1. Load trained model.
2. Detect faces in an image using dlib's cnn model.
3. Use GAN to transform detected face into target face.
4. Use moviepy module to output a video clip with swapped face.

Others

dlib_video_face_detection.ipynb
1. Detect/Crop faces in a video using dlib's cnn model.
2. Pack cropped face images into a zip file.
Training data: Face images are supposed to be in ./faceA/ and ./faceB/ folder for each target respectively. Face images can be of any size. (Updated 3, Jan., 2018)

Results

In below are results that show trained models transforming Hinako Sano (佐野ひなこ, left) to Emi Takei (武井咲, right).

Source video: 佐野ひなことすごくどうでもいい話？(遊戯王)

1. Autorecoder

Autoencoder based on deepfakes' script. It should be mentoined that the result of autoencoder (AE) can be much better if we trained it for longer.

2. Generative Adversarial Network, GAN (version 1)

Improved output resolution: Adversarial loss improves resolution of generated images. In addition, when perceptual loss is apllied, the movemnet of eyeballs becomes more realistic and consistent with input face.

VGGFace(GitHub repo) perceptual loss (PL): The following figure shows nuanced eyeballs direction in model output trained with/wihtout PL.

Smoothed bounding box (Smoothed bbox): Exponential moving average of bounding box position over frames is introduced to eliminate jittering on the swapped face. See the below gif for comparison.

A. Source face.
B. Swapped face, using smoothing mask (smoothes edges of output image when pasting it back to input image).
C. Swapped face, using smoothing mask and face alignment.
D. Swapped face, using smoothing mask and smoothed bounding box.

3. Generative Adversarial Network, GAN (version 2)

Version 1 features: Most of features in version 1 are inherited, including perceptual loss and smoothed bbox.

Segmentation mask prediction: Model learns a proper mask that helps on handling occlusion, eliminating artifacts on bbox edges, and producing natrual skin tone.

Left: Source face.
Middle: Swapped face, before masking.
Right: Swapped face, after masking.

Mask visualization: The following gif shows output mask & face bounding box.

Left: Source face.
Middle: Swapped face, after masking.
Right: Mask heatmap & face bounding box.

Requirements

keras 2
Tensorflow 1.3
Python 3
OpenCV
dlib
face_recognition
moviepy

Notes:

BatchNorm/InstanceNorm: Caused input/output skin color inconsistency when the 2 training dataset had different skin color dsitribution (light condition, shadow, etc.).
Increasing perceptual loss weighting factor (to 1) unstablized training. But the weihgting [.01, .1, .1] I used is not optimal either.
In the encoder architecture, flattening Conv2D and shrinking it to Dense(1024) is crutial for model to learn semantic features, or face representation. If we used Conv layers only (which means larger dimension), will it learn features like visaul descriptors? (source paper, last paragraph of sec 3.1)
Transform Emi Takei to Hinko Sano gave suboptimal results, due to imbalanced training data that over 65% of images of Hinako Sano came from the same video series.
Mixup technique (arXiv) and least squares loss function are adopted (arXiv) for training GAN. However, I did not do any ablation experiment on them. Don't know how much impact they had on outputs.

Acknowledgments

Code borrows from tjwei and deepfakes. The generative network is adopted from CycleGAN.

Name		Name	Last commit message	Last commit date
Latest commit History 185 Commits
gifs		gifs
readme_imgs		readme_imgs
temp		temp
FaceSwap_GAN_github.ipynb		FaceSwap_GAN_github.ipynb
FaceSwap_GAN_v2_test_img.ipynb		FaceSwap_GAN_v2_test_img.ipynb
FaceSwap_GAN_v2_test_video.ipynb		FaceSwap_GAN_v2_test_video.ipynb
FaceSwap_GAN_v2_train.ipynb		FaceSwap_GAN_v2_train.ipynb
README.md		README.md
dlib_video_face_detection.ipynb		dlib_video_face_detection.ipynb
image_augmentation.py		image_augmentation.py
model_GAN_v2.py		model_GAN_v2.py
pixel_shuffler.py		pixel_shuffler.py
training_data.py		training_data.py
umeyama.py		umeyama.py
utils.py		utils.py

Clorr/faceswap-GAN

Folders and files

Latest commit

History

Repository files navigation

faceswap-GAN

Descriptions

GAN-v1

GAN-v2

Others

Results

Source video: 佐野ひなことすごくどうでもいい話？(遊戯王)

1. Autorecoder

2. Generative Adversarial Network, GAN (version 1)

3. Generative Adversarial Network, GAN (version 2)

Requirements

Notes:

Acknowledgments

About

Resources

Stars

Watchers

Forks

Languages