<a href="https://colab.research.google.com/github/happy-jihye/face-vid2vid-demo/blob/main/colab_demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Face Vid2Vid Demo

- Paper: One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing (CVPR 2021): [project](https://nvlabs.github.io/face-vid2vid/), [arxiv](https://arxiv.org/abs/2011.15126)
- 👩🏻‍💻 Developer : [Jihye Back](https://github.com/happy-jihye)

This notebooks is an unofficial demo web app of the `face video2video`.

The codes are heavily based on [this code, created by `zhanglonghao1992`](https://github.com/zhanglonghao1992/One-Shot_Free-View_Neural_Talking_Head_Synthesis) (thank you!😊).

---

In [1]:
#@markdown ## 1. Git clone & download pretrained model (by zhanglonghao1992)
!git clone https://github.com/happy-jihye/One-Shot_Free-View_Neural_Talking_Head_Synthesis.git face-vid2vid-demo

!pip install -q gradio
!pip install face_alignment

%cd face-vid2vid-demo

# download pretrained model (by zhanglonghao1992)
!mkdir ckpt
!gdown https://drive.google.com/uc?id=1ghvzYXdmiCuX5I757id73jWuRLMCzXAX -O ckpt/00000189-checkpoint.pth.tar

Cloning into 'face-vid2vid-demo'...
remote: Enumerating objects: 303, done.[K
remote: Counting objects: 100% (303/303), done.[K
remote: Compressing objects: 100% (287/287), done.[K
remote: Total 303 (delta 160), reused 29 (delta 3), pack-reused 0[K
Receiving objects: 100% (303/303), 9.00 MiB | 28.90 MiB/s, done.
Resolving deltas: 100% (160/160), done.
[K     |████████████████████████████████| 979 kB 5.3 MB/s 
[K     |████████████████████████████████| 2.0 MB 43.3 MB/s 
[K     |████████████████████████████████| 206 kB 48.0 MB/s 
[K     |████████████████████████████████| 3.6 MB 36.6 MB/s 
[K     |████████████████████████████████| 961 kB 49.6 MB/s 
[K     |████████████████████████████████| 63 kB 2.3 MB/s 
[?25h  Building wheel for ffmpy (setup.py) ... [?25l[?25hdone
  Building wheel for flask-cachebuster (setup.py) ... [?25l[?25hdone
Collecting face_alignment
  Downloading face_alignment-1.3.5.tar.gz (27 kB)
Building wheels for collected packages: face-alignment
  Building w

In [3]:
#@markdown ## 2. import libraries & load checkpoints

import os
import yaml

import imageio, cv2
from moviepy.editor import *
from skimage.transform import resize
from skimage import img_as_ubyte
import PIL.Image
import face_alignment
from ffhq_align import image_align

from demo import load_checkpoints, make_animation, find_best_frame
import gradio as gr

config = 'config/vox-256-spade.yaml'
checkpoint = 'ckpt/00000189-checkpoint.pth.tar'
gen = 'spade'
cpu = False

generator, kp_detector, he_estimator = load_checkpoints(config_path=config, 
                                                        checkpoint_path=checkpoint, 
                                                        gen=gen, cpu=cpu)


## 3. Inference ✨


In [32]:
#@markdown ### 3.1 Upload Image & Videos
#@markdown - If you want to inference with the image you want, you have to align the image.
#@markdown - refer to [this repo](https://github.com/happy-jihye/FFHQ-Alignment).

from google.colab import files

# images --------------------------------------------
uploaded_image = list(files.upload().keys())
os.makedirs('asset/raw_image', exist_ok=True)

for img in uploaded_image:
  os.system(f'mv {img} asset/raw_image')
  
# image align
landmarks_detector = face_alignment.FaceAlignment(face_alignment.LandmarksType._3D, flip_input=False)

for img_name in uploaded_image:
    if img_name == '.ipynb_checkpoints': continue
    raw_img_path = os.path.join('./asset/raw_image', img_name)

    for i, face_landmarks in enumerate(landmarks_detector.get_landmarks(raw_img_path), start=1):
        aligned_face_path = os.path.join('./asset/aligned_image', img_name)
        result_img = image_align(raw_img_path, face_landmarks)
        result_img.save(aligned_face_path, 'PNG')

# video --------------------------------------------
uploaded_video = list(files.upload().keys())


Saving asdfasdf.jpg to asdfasdf.jpg
Saving art_15570315345037_b62896.jpg to art_15570315345037_b62896.jpg
Saving 430656_545497_2640.jpg to 430656_545497_2640.jpg


Saving 5.mp4 to 5 (2).mp4


In [71]:
#@markdown ### 3.2 Generated Video

source_list = [f'./asset/aligned_image/{img}' for img in uploaded_image]
driving = uploaded_video[0]

# saving path
os.makedirs('asset/output', exist_ok=True)

find_best_frame_ = True #@param {type:"boolean"}
free_view = False #@param {type:"boolean"}
yaw = 0 #@param {type:"slider", min:-90, max:90, step:1}
pitch = 0 #@param {type:"slider", min:-90, max:90, step:1}
roll = 0 #@param {type:"slider", min:-90, max:90, step:1}

cpu = False
best_frame = None
relative = True #@param {type:"boolean"}
adapt_scale = True #@param {type:"boolean"}
estimate_jacobian = False #@param {type:"boolean"}


# driving
reader = imageio.get_reader(driving)
fps = reader.get_meta_data()['fps']
driving_video = []

try:
  for im in reader:
    driving_video.append(im)
except RuntimeError:
  pass
reader.close()
driving_video = [resize(frame, (256, 256))[..., :3] for frame in driving_video]

source_images = []
final = []
for idx, source in enumerate(source_list):
    print(source)
    # source 
    source_image = cv2.imread(source)
    source_image = cv2.cvtColor(source_image,cv2.COLOR_BGR2RGB)
    source_image = resize(source_image, (256, 256))[..., :3]
    source_images.append(source_image)
    
    # inference
    if find_best_frame_ or best_frame is not None:
        i = best_frame if best_frame is not None else find_best_frame(source_image, driving_video, cpu=cpu)
        print ("Best frame: " + str(i))
        driving_forward = driving_video[i:]
        driving_backward = driving_video[:(i+1)][::-1]
        predictions_forward = make_animation(source_image, driving_forward, generator, kp_detector, he_estimator, relative=relative, adapt_movement_scale=adapt_scale, estimate_jacobian=estimate_jacobian, cpu=cpu, free_view=free_view, yaw=yaw, pitch=pitch, roll=roll)
        predictions_backward = make_animation(source_image, driving_backward, generator, kp_detector, he_estimator, relative=relative, adapt_movement_scale=adapt_scale, estimate_jacobian=estimate_jacobian, cpu=cpu, free_view=free_view, yaw=yaw, pitch=pitch, roll=roll)
        predictions = predictions_backward[::-1] + predictions_forward[1:]
    else:
        predictions = make_animation(source_image, driving_video, generator, kp_detector, he_estimator, relative=relative, adapt_movement_scale=adapt_scale, estimate_jacobian=estimate_jacobian, cpu=cpu, free_view=free_view, yaw=yaw, pitch=pitch, roll=roll)

    imageio.mimsave(f'asset/output/{idx}.mp4', [img_as_ubyte(frame) for frame in predictions])
    final.append(predictions)
    

./asset/aligned_image/asdfasdf.jpg


51it [00:19,  2.56it/s]


Best frame: 37


100%|██████████| 14/14 [00:14<00:00,  1.02s/it]
100%|██████████| 38/38 [00:38<00:00,  1.01s/it]


./asset/aligned_image/art_15570315345037_b62896.jpg


51it [00:19,  2.56it/s]


Best frame: 5


100%|██████████| 46/46 [00:46<00:00,  1.01s/it]
100%|██████████| 6/6 [00:06<00:00,  1.01s/it]


./asset/aligned_image/430656_545497_2640.jpg


51it [00:19,  2.56it/s]


Best frame: 36


100%|██████████| 15/15 [00:15<00:00,  1.01s/it]
100%|██████████| 37/37 [00:37<00:00,  1.01s/it]


In [72]:
#@markdown ### 3.3 Make & Show Generated Video

# display
# https://github.com/tg-bomze/Face-Image-Motion-Model

import matplotlib.pyplot as plt
import base64
import numpy as np
import matplotlib.animation as animation

placeholder_bytes = base64.b64decode('iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAQAAAC1HAwCAAAAC0lEQVR42mP8/x8AAwMCAO+ip1sAAAAASUVORK5CYII=')
placeholder_image = imageio.imread(placeholder_bytes, '.png')
placeholder_image = resize(placeholder_image, (256, 256))[..., :3]

def display(source, driving, generated=None):
    fig = plt.figure(figsize=(8 + 4 * (generated is not None), 6))
    ims = []
    for i in range(len(driving)):
        cols = [[placeholder_image], []]
        for sourceitem in source:
            cols[0].append(sourceitem)
        cols[1].append(driving[i])
        if generated is not None:
            for generateditem in generated:
                cols[1].append(generateditem[i])

        endcols = []
        for thiscol in cols:
            endcols.append(np.concatenate(thiscol, axis=1))

        im = plt.imshow(np.vstack(endcols), animated=True) # np.concatenate(cols[0], axis=1)
        plt.axis('off')
        ims.append([im])
    ani = animation.ArtistAnimation(fig, ims, interval=50, repeat_delay=1000)
    plt.close()
    return ani

final_video = display(source_images, driving_video, final)
final_video.save(f'asset/output/final.mp4', fps=fps)

from IPython.display import HTML
from base64 import b64encode
mp4 = open('asset/output/final.mp4','rb').read()
data_url = "data:video/mp4;base64," + b64encode(mp4).decode()
HTML("""
<video width=600 controls>
      <source src="%s" type="video/mp4">
</video>
""" % data_url)

---

In [6]:
#@markdown ## 4. Run Gradio App ✨


def inference(source,
              driving,
              output_name = 'output.mp4',
              find_best_frame_ = False,
              free_view = False,
              yaw = None,
              pitch = None,
              roll = None,
              
              audio = True,
              cpu = False,
              best_frame = None,
              relative = True,
              adapt_scale = True,
              ):

    # source 
    source_image = resize(source, (256, 256))
    
    # driving
    reader = imageio.get_reader(driving)
    fps = reader.get_meta_data()['fps']
    driving_video = []
    try:
        for im in reader:
            driving_video.append(im)
    except RuntimeError:
        pass
    reader.close()

    driving_video = [resize(frame, (256, 256))[..., :3] for frame in driving_video]
    
    with open(config) as f:
        config_ = yaml.load(f)
    estimate_jacobian = config_['model_params']['common_params']['estimate_jacobian']
    print(f'estimate jacobian: {estimate_jacobian}')

    if find_best_frame_ or best_frame is not None:
        i = best_frame if best_frame is not None else find_best_frame(source_image, driving_video, cpu=cpu)
        print ("Best frame: " + str(i))
        driving_forward = driving_video[i:]
        driving_backward = driving_video[:(i+1)][::-1]
        predictions_forward = make_animation(source_image, driving_forward, generator, kp_detector, he_estimator, relative=relative, adapt_movement_scale=adapt_scale, estimate_jacobian=estimate_jacobian, cpu=cpu, free_view=free_view, yaw=yaw, pitch=pitch, roll=roll)
        predictions_backward = make_animation(source_image, driving_backward, generator, kp_detector, he_estimator, relative=relative, adapt_movement_scale=adapt_scale, estimate_jacobian=estimate_jacobian, cpu=cpu, free_view=free_view, yaw=yaw, pitch=pitch, roll=roll)
        predictions = predictions_backward[::-1] + predictions_forward[1:]
    else:
        predictions = make_animation(source_image, driving_video, generator, kp_detector, he_estimator, relative=relative, adapt_movement_scale=adapt_scale, estimate_jacobian=estimate_jacobian, cpu=cpu, free_view=free_view, yaw=yaw, pitch=pitch, roll=roll)
    
    # save video
    output_path = 'asset/output'
    os.makedirs(output_path, exist_ok=True)
    
    imageio.mimsave(f'{output_path}/{output_name}', [img_as_ubyte(frame) for frame in predictions], fps=fps)
    
    if audio:
        audioclip = VideoFileClip(driving)
        audio = audioclip.audio
        videoclip = VideoFileClip(output_name)
        videoclip.audio = audio
        output_name = output_name.strip('.mp4')
        output_name = f'{output_name}_audio.mp4'
        videoclip.write_videofile(f'{output_path}/{output_name}')
        
    return f'{output_path}/{output_name}'



import gradio as gr

iface = gr.Interface(
    inference, # main function
    inputs = [ 
        gr.inputs.Image(shape=(255, 255), label='Source Image'), # source image
        gr.inputs.Video(label='Driving Video', type='mp4'), # driving video
        
        gr.inputs.Checkbox(label="fine best frame", default=False), 
        gr.inputs.Checkbox(label="free view", default=False), 
        gr.inputs.Slider(minimum=0, maximum=90, default=0, label="yaw"),
        gr.inputs.Slider(minimum=0, maximum=90, default=0, label="pitch"),
        gr.inputs.Slider(minimum=0, maximum=90, default=0, label="raw"),
        
    ],
    outputs = [
        gr.outputs.Video(label='result') # generated video
    ], 
    
    title = 'Face Vid2Vid Demo',
    description = "This app is an unofficial demo web app of the face video2video. The codes are heavily based on this repo, created by zhanglonghao1992",
    )
iface.launch()

Colab notebook detected. To show errors in colab notebook, set `debug=True` in `launch()`
Running on public URL: https://29521.gradio.app

This share link will expire in 72 hours. To get longer links, send an email to: support@gradio.app


(<Flask 'gradio.networking'>,
 'http://127.0.0.1:7860/',
 'https://29521.gradio.app')