Follow this YouTube tutorial to understand the installation process more easily and if you have any questions feel free to join my discord and ask there.
Stitch it in Time: GAN-Based Facial Editing of Real Videos
Rotem Tzaban, Ron Mokady, Rinon Gal, Amit Bermano, Daniel Cohen-Or
Abstract:
The ability of Generative Adversarial Networks to encode rich semantics within their latent space has been widely adopted for facial image editing. However, replicating their success with videos has proven challenging. Sets of high-quality facial videos are lacking, and working with videos introduces a fundamental barrier to overcome - temporal coherency. We propose that this barrier is largely artificial. The source video is already temporally coherent, and deviations from this state arise in part due to careless treatment of individual components in the editing pipeline. We leverage the natural alignment of StyleGAN and the tendency of neural networks to learn low frequency functions, and demonstrate that they provide a strongly consistent prior. We draw on these insights and propose a framework for semantic editing of faces in videos, demonstrating significant improvements over the current state-of-the-art. Our method produces meaningful face manipulations, maintains a higher degree of temporal consistency, and can be applied to challenging, high quality, talking head videos which current methods struggle with.
You only need to do the following steps once in this setup section.
- NVIDIA GPU
- Anaconda3 Prompt
- VSBuildTools
Download the codes from this repository and cd
on your Anaconda prompt to the folder.
conda create -n STIT python=3.8
conda activate STIT
pip install -r requirements.txt
pip install cmake
pip install dlib
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
pip install git+https://github.com/openai/CLIP.git
conda install -c conda-forge ffmpeg
In order to use this project you need to download pretrained models from the following Link.
Unzip it inside the project's main directory, and the file structure should look like this
πSTIT/ # this is root
βββ πpretrained_models/
β βββ π79999_iter.pth
β βββ πe4e_ffhq_encode.pt
β βββ πffhq.pkl
β βββ πshape_predictor_68_face_landmarks.dat
β...
Run the AI! Where you start running the codes of the AI and a reminder that you just need to start here when you already setup the environment once. Just remember to activate the virtual environment with conda activate STIT
, and cd
into the right file directory.
Our code expects videos in the form of a directory with individual frame images. To produce such a directory from an existing video, we recommend using ffmpeg:
ffmpeg -i "video.mp4" "video_frames/out%04d.png"
create a folder corresponding to your video's name.
eg.
ffmpeg -i "elons.mp4" "elons/out%04d.png"
The videos used to produce our results can be downloaded from the following Link.
This needs to be done for every new video. To invert a video run:
python train.py --input_folder </path/to/images_dir> --output_folder </path/to/experiment_dir> --run_name <RUN_NAME> --num_pti_steps <NUM_STEPS>
This includes aligning, cropping, e4e encoding and PTI
For example:
python train.py --input_folder elons --output_folder edits_elons --run_name elons --num_pti_steps 80
Weights and biases logging is disabled by default. to enable, add --use_wandb
To run edits without stitching tuning:
python edit_video.py --input_folder </path/to/images_dir> --output_folder </path/to/experiment_dir> --run_name <RUN_NAME> --edit_name <EDIT_NAME> --edit_range <EDIT_RANGE>
edit_range determines the strength of the edits applied.
It should be in the format RANGE_START RANGE_END RANGE_STEPS.
for example, if we use --edit_range 1 5 2
,
we will apply edits with strength 1, 3 and 5.
eg. For young Obama use:
python edit_video.py --input_folder /data/obama --output_folder edits/obama/ --run_name obama --edit_name age --edit_range -8 -8 1
To run edits with stitching tuning:
python edit_video_stitching_tuning.py --input_folder </path/to/images_dir> --output_folder </path/to/experiment_dir> --run_name <RUN_NAME> --edit_name <EDIT_NAME> --edit_range <EDIT_RANGE> --outer_mask_dilation <MASK_DILATION>
We support early breaking the stitching tuning process, when the loss reaches a specified threshold.
This enables us to perform more iterations for difficult frames while maintaining a reasonable running time.
To use this feature, add --border_loss_threshold THRESHOLD
to the command(Shown in the Jim and Kamala Harris examples below).
For videos with a simple background to reconstruct (e.g Obama, Jim, Emma Watson, Kamala Harris), we use THRESHOLD=0.005
.
For videos where a more exact reconstruction of the background is required (e.g Michael Scott), we use THRESHOLD=0.002
.
Early breaking is disabled by default.
For young Elon use:
python edit_video_stitching_tuning.py --input_folder elons --output_folder edits/elons/ --run_name elons --edit_name age --edit_range -8 -8 1 --outer_mask_dilation 50
For young Obama use:
python edit_video_stitching_tuning.py --input_folder /data/obama --output_folder edits/obama/ --run_name obama --edit_name age --edit_range -8 -8 1 --outer_mask_dilation 50
For gender editing on Obama use:
python edit_video_stitching_tuning.py --input_folder /data/obama --output_folder edits/obama/ --run_name obama --edit_name gender --edit_range -6 -6 1 --outer_mask_dilation 50
For young Emma Watson use:
python edit_video_stitching_tuning.py --input_folder /data/emma_watson --output_folder edits/emma_watson/ --run_name emma_watson --edit_name age --edit_range -8 -8 1 --outer_mask_dilation 50
For smile removal on Emma Watson use:
python edit_video_stitching_tuning.py --input_folder /data/emma_watson --output_folder edits/emma_watson/ --run_name emma_watson --edit_name smile --edit_range -3 -3 1 --outer_mask_dilation 50
For Emma Watson lipstick editing use: (done with styleclip global direction)
python edit_video_stitching_tuning.py --input_folder /data/emma_watson --output_folder edits/emma_watson/ --run_name emma_watson --edit_type styleclip_global --edit_name lipstick --neutral_class "Face" --target_class "Face with lipstick" --beta 0.2 --edit_range 10 10 1 --outer_mask_dilation 50
For Old + Young Jim use (with early breaking):
python edit_video_stitching_tuning.py --input_folder datasets/jim/ --output_folder edits/jim --run_name jim --edit_name age --edit_range -8 8 2 --outer_mask_dilation 50 --border_loss_threshold 0.005
My fork edits end here.
With stitching tuning:
out.mp4
Without stitching tuning:
out.mp4
Gender editing:
out.mp4
Young Emma Watson:
out.mp4
Emma Watson with lipstick:
out.mp4
Emma Watson smile removal:
out.mp4
Old Jim:
out.mp4
Young Jim:
out.mp4
Smiling Kamala Harris:
out.mp4
For editing out of domain videos, Some different parameters are required while training.
First, dlib's face detector doesn't detect all animated faces, so we use a different face detector provided by the face_alignment package.
Second, we reduce the smoothing of the alignment parameters with --center_sigma 0.0
Third, OOD videos require more training steps, as they are more difficult to invert.
To train, we use:
python train.py --input_folder datasets/ood_spiderverse_gwen/ \
--output_folder training_results/ood \
--run_name ood \
--num_pti_steps 240 \
--use_fa \
--center_sigma 0.0
Afterwards, editing is performed the same way:
python edit_video.py --input_folder datasets/ood_spiderverse_gwen/ \
--output_folder edits/ood \
--run_name ood \
--edit_name smile \
--edit_range 2 2 1
out.mp4
python edit_video.py --input_folder datasets/ood_spiderverse_gwen/ \
--output_folder edits/ood \
--run_name ood \
--edit_type styleclip_global \
--edit_range 10 10 1 \
--edit_name lipstick \
--target_class 'Face with lipstick'
out.mp4
StyleGAN2-ada model and implementation:
https://github.com/NVlabs/stylegan2-ada-pytorch
Copyright Β© 2021, NVIDIA Corporation.
Nvidia Source Code License https://nvlabs.github.io/stylegan2-ada-pytorch/license.html
PTI implementation:
https://github.com/danielroich/PTI
Copyright (c) 2021 Daniel Roich
License (MIT) https://github.com/danielroich/PTI/blob/main/LICENSE
LPIPS model and implementation:
https://github.com/richzhang/PerceptualSimilarity
Copyright (c) 2020, Sou Uchida
License (BSD 2-Clause) https://github.com/richzhang/PerceptualSimilarity/blob/master/LICENSE
e4e model and implementation:
https://github.com/omertov/encoder4editing
Copyright (c) 2021 omertov
License (MIT) https://github.com/omertov/encoder4editing/blob/main/LICENSE
StyleCLIP model and implementation:
https://github.com/orpatashnik/StyleCLIP
Copyright (c) 2021 orpatashnik
License (MIT) https://github.com/orpatashnik/StyleCLIP/blob/main/LICENSE
StyleGAN2 Distillation for Feed-forward Image Manipulation - for editing directions:
https://github.com/EvgenyKashin/stylegan2-distillation
Copyright (c) 2019, Yandex LLC
License (Creative Commons NonCommercial) https://github.com/EvgenyKashin/stylegan2-distillation/blob/master/LICENSE
face-alignment Library:
https://github.com/1adrianb/face-alignment
Copyright (c) 2017, Adrian Bulat
License (BSD 3-Clause License) https://github.com/1adrianb/face-alignment/blob/master/LICENSE
face-parsing.PyTorch:
https://github.com/zllrunning/face-parsing.PyTorch
Copyright (c) 2019 zll
License (MIT) https://github.com/zllrunning/face-parsing.PyTorch/blob/master/LICENSE
If you make use of our work, please cite our paper:
@misc{tzaban2022stitch,
title={Stitch it in Time: GAN-Based Facial Editing of Real Videos},
author={Rotem Tzaban and Ron Mokady and Rinon Gal and Amit H. Bermano and Daniel Cohen-Or},
year={2022},
eprint={2201.08361},
archivePrefix={arXiv},
primaryClass={cs.CV}
}