Skip to content

A GAN model built upon deepfakes' autoencoder for face swapping.

Notifications You must be signed in to change notification settings

Clorr/faceswap-GAN

 
 

Repository files navigation

faceswap-GAN

Adding Adversarial loss and perceptual loss (VGGface) to deepfakes' auto-encoder architecture.

Descriptions

GAN-v1

  • FaceSwap_GAN_github.ipynb

    1. Build a GAN model.
    2. Train the GAN from scratch.
    3. Use GAN to swap a single face image to target face.
    4. Detect faces in an image using dlib's cnn model.
    5. Use GAN to transform detected face into target face.
    6. Use moviepy module to output a video clip with swapped face.

GAN-v2

  • FaceSwap_GAN_v2_train.ipynb: Detailed training procedures can be found in this notebook.

    1. Build a GAN model.
    2. Train the GAN from scratch.
    3. (Optoinal) Detect faces in an image using dlib's cnn model.
    4. (Optoinal) Use GAN to transform detected face into target face.
    5. (Optoinal) Use moviepy module to output a video clip with swapped face.
  • FaceSwap_GAN_v2_test_img.ipynb: Provides swap_face() function that require less VRAM.

    1. Load trained model.
    2. Swap a single face image to target face.
  • FaceSwap_GAN_v2_test_video.ipynb

    1. Load trained model.
    2. Detect faces in an image using dlib's cnn model.
    3. Use GAN to transform detected face into target face.
    4. Use moviepy module to output a video clip with swapped face.

Others

  • dlib_video_face_detection.ipynb

    1. Detect/Crop faces in a video using dlib's cnn model.
    2. Pack cropped face images into a zip file.
  • Training data: Face images are supposed to be in ./faceA/ and ./faceB/ folder for each target respectively. Face images can be of any size. (Updated 3, Jan., 2018)

Results

In below are results that show trained models transforming Hinako Sano (佐野ひなこ, left) to Emi Takei (武井咲, right).

1. Autorecoder

Autoencoder based on deepfakes' script. It should be mentoined that the result of autoencoder (AE) can be much better if we trained it for longer.

AE GIFAE_results

2. Generative Adversarial Network, GAN (version 1)

Improved output resolution: Adversarial loss improves resolution of generated images. In addition, when perceptual loss is apllied, the movemnet of eyeballs becomes more realistic and consistent with input face.

GAN_PL_GIFGAN_PL_results

VGGFace(GitHub repo) perceptual loss (PL): The following figure shows nuanced eyeballs direction in model output trained with/wihtout PL.

Comp PL

Smoothed bounding box (Smoothed bbox): Exponential moving average of bounding box position over frames is introduced to eliminate jittering on the swapped face. See the below gif for comparison.

bbox

  • A. Source face.
  • B. Swapped face, using smoothing mask (smoothes edges of output image when pasting it back to input image).
  • C. Swapped face, using smoothing mask and face alignment.
  • D. Swapped face, using smoothing mask and smoothed bounding box.

3. Generative Adversarial Network, GAN (version 2)

Version 1 features: Most of features in version 1 are inherited, including perceptual loss and smoothed bbox.

Segmentation mask prediction: Model learns a proper mask that helps on handling occlusion, eliminating artifacts on bbox edges, and producing natrual skin tone.

mask0

mask1  mask2

  • Left: Source face.
  • Middle: Swapped face, before masking.
  • Right: Swapped face, after masking.

Mask visualization: The following gif shows output mask & face bounding box.

mask_vis

  • Left: Source face.
  • Middle: Swapped face, after masking.
  • Right: Mask heatmap & face bounding box.

Requirements

Notes:

  1. BatchNorm/InstanceNorm: Caused input/output skin color inconsistency when the 2 training dataset had different skin color dsitribution (light condition, shadow, etc.).
  2. Increasing perceptual loss weighting factor (to 1) unstablized training. But the weihgting [.01, .1, .1] I used is not optimal either.
  3. In the encoder architecture, flattening Conv2D and shrinking it to Dense(1024) is crutial for model to learn semantic features, or face representation. If we used Conv layers only (which means larger dimension), will it learn features like visaul descriptors? (source paper, last paragraph of sec 3.1)
  4. Transform Emi Takei to Hinko Sano gave suboptimal results, due to imbalanced training data that over 65% of images of Hinako Sano came from the same video series.
  5. Mixup technique (arXiv) and least squares loss function are adopted (arXiv) for training GAN. However, I did not do any ablation experiment on them. Don't know how much impact they had on outputs.

Acknowledgments

Code borrows from tjwei and deepfakes. The generative network is adopted from CycleGAN.

About

A GAN model built upon deepfakes' autoencoder for face swapping.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 91.1%
  • Python 8.9%