# Introduction to SadTalker on Google Colab

Welcome to the SadTalker interactive notebook on Google Colab! SadTalker is an advanced AI-driven tool that generates facial animations from still images synced with given audio. This technology uses state-of-the-art algorithms to analyze the content of the audio and then animates an image to match the speech or singing in the audio clip. Here’s what you need to know to get started:

## How SadTalker Works

Creating a realistic talking head video is tricky. Previous attempts often resulted in stiff facial expressions, awkward movements, or the character looking slightly different from the original photo.

SadTalker tackles these challenges by learning how faces move in three dimensions (3D) when they talk. Instead of just copying the 2D movements seen in videos, it understands how the head tilts, turns, and how each part of the face moves while speaking.

    3D Motion Coefficients: It carefully listens to the audio and figures out the 3D motion (like how the head moves and facial expressions) that would match the speech.

    ExpNet: This part of SadTalker focuses on getting the expressions just right by learning from both the sound and how faces move in 3D.

    PoseVAE: This component is in charge of creating realistic head movements to match the speaker’s style and emotion in the audio.

    3D-Aware Face Render: After figuring out all the movements, SadTalker uses a special process to turn these 3D instructions into a video that looks smooth and natural.

Read more about SadTalker here:
[arxiv](https://arxiv.org/abs/2211.12194) | [project](https://sadtalker.github.io) | [Github](https://github.com/Winfredy/SadTalker)


## What to Expect
In this notebook, you will:

1. Select an image: Use an example image provided by SadTalker or upload your own.
2. Choose your driven audio: Start by using example audio clips provided or upload your own audio file (recommended to start with 1 minute of audio to ensure optimal processing time and resource use).
3. Generate your SadTalker animation: Run the animation process to bring your image to life with your chosen audio.
4. Download your video: Once the animation is complete, you can download the resulting video to your device.

## Using the Notebook
Follow the step-by-step instructions provided within the notebook. Initially, you'll select an image and an audio file. The notebook interface is interactive, allowing you to choose from provided examples or upload your own. When you're ready, you'll execute the code cells to run the SadTalker animation process.

## Important Notes
- **GPU Time Limitation**: Google Colab offers free access to GPUs but with usage limits. Be aware that the GPU runtime can disconnect if it exceeds these limits or if it detects inactivity for a certain period. Please save your work frequently.
  
- **Consequences of Inactivity**: If you’re inactive in the Colab notebook (i.e., not running any cells or interacting with the notebook), you may be disconnected from the GPU runtime. To avoid losing progress, ensure you're actively using the notebook or save your work regularly.

- **Mindful Usage**: Given the limited GPU resources, please be considerate and use the resources judiciously. Lengthy or excessive use may prevent others from accessing these shared resources.

## Ethical Considerations

As you explore the capabilities of SadTalker, it's important to keep ethical considerations in mind:

    Consent: Always have permission to use someone’s likeness and voice when creating talking head videos. Never use someone's image or voice without their explicit consent.

    Avoidance of Deception: Be particularly careful to avoid using voice conversion to create deepfake audio or any form of media meant to deceive, manipulate, or mislead audiences.

    Respect and Dignity: Ensure that the content created with SadTalker respects the dignity of the individuals whose images and voices are being used. Avoid creating content that could be considered defamatory, discriminatory, or offensive.

    Deepfakes and Manipulation: Be aware of the implications of creating deepfakes — videos that could be used to deceive viewers. Always clearly label the content created with SadTalker as synthetic media.

    Legal Compliance: Abide by all relevant laws and regulations regarding synthetic media, intellectual property rights, and personal privacy.

    Responsible Sharing: When sharing content created with SadTalker, consider the impact it may have on the subjects depicted and society at large. Share responsibly and with a clear indication of the video's synthetic nature.



Installation (around 5 mins)

In [None]:
#@title Install Dependencies
# Required Libraries

import time
from datetime import datetime
import subprocess
import os
import ipywidgets as widgets
import matplotlib.pyplot as plt
from IPython.display import HTML, display, clear_output
import io
from base64 import b64encode
import os


def start_time():
  start = time.time()
  current_time = datetime.now()
  print("Current time:", current_time.strftime("%Y-%m-%d %H:%M:%S"))
  return start

def elapsed_time(start):
  print("Time (mins) it took to run this cell: ", round((time.time()- start)/60,2))


def setup_environment():
    # Update alternatives for Python
    subprocess.run(['sudo', 'update-alternatives', '--install', '/usr/local/bin/python3', 'python3', '/usr/bin/python3.8', '2'])
    subprocess.run(['sudo', 'update-alternatives', '--install', '/usr/local/bin/python3', 'python3', '/usr/bin/python3.9', '1'])
    subprocess.run(['sudo', 'apt', 'install', 'python3.8'])
    subprocess.run(['sudo', 'apt-get', 'install', 'python3.8-distutils'])
    python_version = subprocess.check_output(['python', '--version'], universal_newlines=True)
    print(python_version)
    subprocess.run(['apt-get', 'update'])
    subprocess.run(['apt', 'install', 'software-properties-common'])
    subprocess.run(['sudo', 'dpkg', '--remove', '--force-remove-reinstreq', 'python3-pip', 'python3-setuptools', 'python3-wheel'])
    subprocess.run(['apt-get', 'install', 'python3-pip'])

    print('Git clone project and install requirements...')
    subprocess.run(['git', 'clone', 'https://github.com/artificialnouveau/SadTalker'], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
    os.chdir('SadTalker')
    os.environ['PYTHONPATH'] = '/content/SadTalker:' + os.environ.get('PYTHONPATH', '')
    subprocess.run(['python3.8', '-m', 'pip', 'install', 'torch==1.12.1+cu113', 'torchvision==0.13.1+cu113', 'torchaudio==0.12.1', '--extra-index-url', 'https://download.pytorch.org/whl/cu113'])
    subprocess.run(['apt', 'update'])
    subprocess.run(['apt', 'install', 'ffmpeg'], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
    subprocess.run(['python3.8', '-m', 'pip', 'install', '-r', 'requirements.txt'])
    print('Setup complete.')

def create_image_audio_selector(img_list, audio_list):
    selected_img = None
    selected_audio = None

    # Define a function to handle image and audio selection
    def select_image_audio(change):
        nonlocal selected_img, selected_audio

        # Clear the output area
        output.clear_output(wait=True)

        if image_source.value == 'Select from Examples':
            selected_img = 'examples/source_image/{}.png'.format(example_images.value)
        elif image_source.value == 'Upload Image':
            if uploaded_image.data:
                selected_img = 'user_uploaded.png'
                with open(selected_img, 'wb') as f:
                    f.write(uploaded_image.data[-1])
            else:
                selected_img = 'examples/source_image/{}.png'.format(example_images.value)
        else:
            selected_img = '/content/gdrive/MyDrive/ColabNotebooks/' + colab_folder.value + '/{}.png'.format(example_images.value)

        if audio_source.value == 'Select from Examples':
            selected_audio = 'examples/driven_audio/{}.wav'.format(example_audio.value)
        elif audio_source.value == 'Upload Audio':
            if uploaded_audio.data:
                selected_audio = 'user_uploaded.wav'
                with open(selected_audio, 'wb') as f:
                    f.write(uploaded_audio.data[-1])
            else:
                selected_audio = 'examples/driven_audio/{}.wav'.format(example_audio.value)
        else:
            selected_audio = '/content/gdrive/MyDrive/ColabNotebooks/' + colab_folder_audio.value + '/{}.wav'.format(example_audio.value)

        img = plt.imread(selected_img)

        with output:
            # Display the selected image
            plt.imshow(img)
            plt.axis('off')
            plt.show()

            # Display the selected audio information
            print(f'Selected Image: {selected_img}')
            print(f'Selected Audio: {selected_audio}')

    # Create an output widget for displaying the image and audio information
    output = widgets.Output()

    # Create a dropdown for image source selection
    image_source = widgets.Dropdown(
        options=['Select from Examples', 'Upload Image', 'Google Colab Folder'],
        value='Select from Examples',
        description='Image Source:',
        disabled=False,
    )

    # Create a dropdown for audio source selection
    audio_source = widgets.Dropdown(
        options=['Select from Examples', 'Upload Audio', 'Google Colab Folder'],
        value='Select from Examples',
        description='Audio Source:',
        disabled=False,
    )

    # Create a dropdown for selecting example images
    example_images = widgets.Dropdown(
        options=img_list,
        value='full3',
        description='Select Image:',
        disabled=False,
    )

    # Create a dropdown for selecting example audio
    example_audio = widgets.Dropdown(
        options=audio_list,
        value='RD_Radio31_000',
        description='Select Audio:',
        disabled=False,
    )

    # Create a file upload widget for images
    uploaded_image = widgets.FileUpload(
        accept='.png',
        multiple=False,
        description='Upload PNG Image:',
        disabled=False
    )

    # Create a file upload widget for audio
    uploaded_audio = widgets.FileUpload(
        accept='.wav',
        multiple=False,
        description='Upload WAV Audio:',
        disabled=False
    )

    # Create a text input widget for Colab folder path (image)
    colab_folder = widgets.Text(
        value='',
        placeholder='Enter Colab Folder Path (Image)',
        description='Colab Folder (Image):',
        disabled=False
    )

    # Create a text input widget for Colab folder path (audio)
    colab_folder_audio = widgets.Text(
        value='',
        placeholder='Enter Colab Folder Path (Audio)',
        description='Colab Folder (Audio):',
        disabled=False
    )

    # Add event handlers
    image_source.observe(select_image_audio, names='value')
    example_images.observe(select_image_audio, names='value')
    uploaded_image.observe(select_image_audio, names='data')
    audio_source.observe(select_image_audio, names='value')
    example_audio.observe(select_image_audio, names='value')
    uploaded_audio.observe(select_image_audio, names='data')

    # Display widgets
    display(image_source)
    display(example_images)
    display(uploaded_image)
    display(colab_folder)
    display(audio_source)
    display(example_audio)
    display(uploaded_audio)
    display(colab_folder_audio)
    display(output)

    # Initial call to select_image_audio to display the default values
    select_image_audio(None)

    return selected_img, selected_audio


# Function to display video with a download link
def display_video_with_download_link(video_path):
    # Read the video file
    video_data = open(video_path, 'rb').read()

    # Encode the video data in base64
    video_base64 = b64encode(video_data).decode()

    # Create a download link
    download_link = f'<a href="data:video/mp4;base64,{video_base64}" download="output_video.mp4">Download Video</a>'

    # Display the video and download link
    video_html = f'''
    <video width="640" controls>
        <source src="data:video/mp4;base64,{video_base64}" type="video/mp4">
    </video>
    <br>{download_link}
    '''
    display(HTML(video_html))



import ipywidgets as widgets
from IPython.display import display
import matplotlib.pyplot as plt

def create_image_audio_selector(img_list, audio_list):
    selected_img = None
    selected_audio = None

    # Define a function to handle image and audio selection
    def select_image_audio(change):
        nonlocal selected_img, selected_audio

        # Clear the output area
        output.clear_output(wait=True)

        if image_source.value == 'Select from Examples':
            selected_img = 'examples/source_image/{}.png'.format(example_images.value)
        else:
            selected_img = None

        if audio_source.value == 'Select from Examples':
            selected_audio = 'examples/driven_audio/{}.wav'.format(example_audio.value)
        else:
            selected_audio = None

        if selected_img:
            img = plt.imread(selected_img)
            with output:
                # Display the selected image
                plt.imshow(img)
                plt.axis('off')
                plt.show()

            # Display the selected audio information
            if selected_audio:
                print(f'Selected Image: {selected_img}')
                print(f'Selected Audio: {selected_audio}')

    # Create an output widget for displaying the image and audio information
    output = widgets.Output()

    # Create a dropdown for image source selection
    image_source = widgets.Dropdown(
        options=['Select from Examples'],
        value='Select from Examples',
        description='Image Source:',
        disabled=False,
    )

    # Create a dropdown for audio source selection
    audio_source = widgets.Dropdown(
        options=['Select from Examples'],
        value='Select from Examples',
        description='Audio Source:',
        disabled=False,
    )

    # Create a dropdown for selecting example images
    example_images = widgets.Dropdown(
        options=img_list,
        value='full3',
        description='Select Image:',
        disabled=False,
    )

    # Create a dropdown for selecting example audio
    example_audio = widgets.Dropdown(
        options=audio_list,
        value='RD_Radio31_000',
        description='Select Audio:',
        disabled=False,
    )

    # Add event handlers
    image_source.observe(select_image_audio, names='value')
    example_images.observe(select_image_audio, names='value')
    audio_source.observe(select_image_audio, names='value')
    example_audio.observe(select_image_audio, names='value')

    # Display widgets
    # display(image_source)
    display(example_images)
    # display(audio_source)
    display(example_audio)
    display(output)

    # Initial call to select_image_audio to display the default values
    select_image_audio(None)

    return selected_img, selected_audio

import ipywidgets as widgets
from IPython.display import display, clear_output
import matplotlib.pyplot as plt
import os

def create_upload_widgets():
    selected_img = None
    selected_audio = None
    output = widgets.Output()
    progress_bar_audio = widgets.IntProgress(description='Uploading Audio:', max=100, style={'bar_color': 'green'})  # Progress bar for audio only

    # Function to handle image upload
    def on_img_upload(change):
        nonlocal selected_img
        # Process the uploaded image file
        for name, file_info in uploader_img.value.items():
            selected_img = f"/content/{name}"
            with open(selected_img, "wb") as output_file:
                output_file.write(file_info['content'])
            display_img(selected_img)  # Display the uploaded image

    # Function to handle audio upload
    def on_audio_upload(change):
        nonlocal selected_audio
        # Display the progress bar for audio upload
        progress_bar_audio.value = 0  # Reset progress bar
        display(progress_bar_audio)

        # Process the uploaded audio file
        for name, file_info in uploader_audio.value.items():
            selected_audio = f"/content/{name}"
            with open(selected_audio, "wb") as output_file:
                file_size = len(file_info['content'])
                chunk_size = file_size // 100  # Update the progress for each 1%
                for i in range(0, file_size, chunk_size):
                    output_file.write(file_info['content'][i:i+chunk_size])
                    progress_bar_audio.value = (i / file_size) * 100
                progress_bar_audio.value = 100  # Mark as complete
            with output:
                clear_output(wait=True)
                print(f'Selected Audio: {selected_audio}')  # Display the selected audio

    # Function to display the selected image
    def display_img(file_path):
        if file_path and os.path.isfile(file_path):
            img = plt.imread(file_path)
            plt.imshow(img)
            plt.axis('off')
            plt.show()
            print(f'Selected Image: {file_path}')
        else:
            print("No image selected or file path is invalid.")

    # Create file upload widget for images
    uploader_img = widgets.FileUpload(
        accept='.png',  # Accept only .png files
        multiple=False  # Allow only one file to be uploaded
    )
    uploader_img.observe(on_img_upload, names='value')

    # Create file upload widget for audio
    uploader_audio = widgets.FileUpload(
        accept='.wav',  # Accept only .wav files
        multiple=False  # Allow only one file to be uploaded
    )
    uploader_audio.observe(on_audio_upload, names='value')

    # Display widgets
    display(widgets.Label('Upload an image (.png):'))
    display(uploader_img)
    display(widgets.Label('Upload an audio file (.wav):'))
    display(uploader_audio)
    display(output)  # The output is displayed only once

    return uploader_img, uploader_audio

In [None]:
### make sure that CUDA is available in Edit -> Nootbook settings -> GPU
!nvidia-smi --query-gpu=name,memory.total,memory.free --format=csv,noheader

In [None]:
# This step will take 4-5 minutes
start = start_time()
setup_environment()
print('Download pre-trained models...')
!rm -rf checkpoints
!bash scripts/download_models.sh
clear_output(wait=True)
elapsed_time(start)

## Select your source images and driven audio



1. **Source Image:** The source image is typically an image or a series of images of a person's face. This image serves as the visual reference or template for the deepfake. The deepfake algorithm will manipulate and replace the face in the target video with the face from the source image, making it appear as though the person in the source image is speaking or acting in the target video.

2. **Driven Audio:** The driven audio is the audio content that you want to synchronize with the deepfake video. It can be a recording of someone speaking, singing, or any audio content you want to associate with the deepfake. The deepfake algorithm analyzes the audio's speech patterns and uses this information to animate the mouth movements of the face in the target video so that it appears to be speaking or syncing with the audio.

Together, these elements allow deepfake technology to create videos where a person in the source image appears to say or do things that align with the content of the driven audio, making it look like the source person is delivering the audio's message or performance.

Use this code if you just want to use the example images and audio

In [None]:
# Define the directory containing audio files
audio_directory = 'examples/driven_audio'
audio_list = [os.path.splitext(file)[0] for file in os.listdir(audio_directory) if file.endswith('.wav')]
audio_list.sort()

# Define a list of image options
# img_list = ['full1', 'full2', 'full3']
img_directory = 'examples/source_image'
img_list = [os.path.splitext(file)[0] for file in os.listdir(img_directory) if file.endswith('.png')]
img_list.sort()

# Call the create_image_audio_selector function with img_list and audio_list
selected_img, selected_audio = create_image_audio_selector(img_list, audio_list)


Use this code if you want to upload your own images and audio

In [None]:
# Call the function to create upload widgets
uploader_img, uploader_audio = create_upload_widgets()

## Animation Time!

In [None]:
#@title Run this cell if you selected audio from example/driven_audio and if you only want to animate the face
start = start_time()
print(selected_img)
!python3.8 inference.py --driven_audio {selected_audio} --source_image {selected_img} --result_dir ./results --enhancer gfpgan
elapsed_time(start)

In [None]:
#@title Run this cell if you selected audio from example/driven_audio and if you want to animate the face with a natural full body video
start = start_time()
print(selected_img)
!python3.8 inference.py --driven_audio {selected_audio} --source_image {selected_img} --result_dir ./results --still --preprocess full --enhancer gfpgan
elapsed_time(start)

In [None]:
#@title Run this cell if you uploaded audio and if you only want to animate the face
start = start_time()
print(uploader_img)
!python3.8 inference.py --driven_audio {uploader_audio} --source_image {uploader_img} --result_dir ./results --enhancer gfpgan
elapsed_time(start)

In [None]:
#@title Run this cell if you uploaded audio and if you want to animate the face with a natural full body video
start = start_time()
print(uploader_img)
!python3.8 inference.py --driven_audio {uploader_audio} --source_image {uploader_img} --result_dir ./results --still --preprocess full --enhancer gfpgan
elapsed_time(start)

In [None]:
#@title Here are the filenames for all of your generated results
results = sorted(os.listdir('./results/'))
print(results)

In [None]:
#@title Choose a file to view and download from the results listed above. By default, the first file is selected. To view the second file, enter '1' in the space provided. To select a file, always subtract one from its position in the list. For example, to select the third file, you would enter '2'.
video_path = glob.glob('./results/*.mp4')[0]

# Display the video with a download link
display_video_with_download_link(video_path)


##Credits

**SadTalker dev team** - Original SadTalker software developers and original colab authors: Wenxuan Zhang, Xiaodong Cun, Xuan Wang, Yong Zhang, Xi Shen, Yu Guo, Ying Shan, Fei Wang. Xi'an Jiaotong University, Tencent AI Lab, Ant Group. Click here for more info: <br>
**Artificial Nouveau** updated the notebook for workshops


---------------

Backup model archive (outdated): https://huggingface.co/QuickWick/Music-AI-Voices/tree/main