# OpenVoice: Advanced Voice Cloning and Synthesis

## Project Overview

OpenVoice is a cutting-edge project dedicated to the development of voice cloning and speech synthesis technologies. This repository hosts the tools and models necessary for implementing state-of-the-art voice cloning capabilities, enabling users to create lifelike synthetic voices from audio samples.

## Voice Translation Widgets

This application provides a user-friendly interface for voice translation. Below are the features available:

### Features

1. **Voice Upload**: Users can upload their voice recordings in MP3 format.
   
2. **Language Selection**: A dropdown menu allows users to select the language of the text to which the uploaded voice will be translated.
   
3. **Text Input Box**: Users can input or edit the text that will be translated into the selected language using the uploaded voice.

### Usage

- **Uploading Voice**: Click on the 'Upload' button and select an MP3 file from your device.
- **Selecting Language**: Use the dropdown to choose the target language for translation.
- **Entering Text**: Type into the text box the text you want to use for translation.

This application is designed to be intuitive and easy to use, ensuring that users can quickly translate voices with minimal effort.

## Setup and Installation

## How To Use

1. **Setup Environment**:
   - Clone the repository or download the specific project files.
   - Ensure Python 3.x is installed.

2. **Install Required Packages**:

   - To enhance the functionality of the CTPO environment, you may need to install some libraries not pre-installed but required for this notebook. Follow these steps to install the necessary libraries from the `requirements.txt` file:

   **2.1 Create and Activate the Virtual Environment:**
   
   Open your terminal or command prompt within the jupyter notebook. `File -> New -> Terminal`
   
   Navigate to the project directory where you want to set up the environment.
   
   Execute the following commands to create and activate the virtual environment:
   
   ```
   bash
   python3 -m venv --system-site-packages myvenv #myvenv is name of virtual environment you can change it
   source myvenv/bin/activate
   pip3 install ipykernel
   python -m ipykernel install --user --name=myvenv --display-name="Python (myvenv)"
   ```

 **2.2 Install Required Libraries**
   
 Before running the following command in the Jupyter notebook, make sure you are in the directory where the Jupyter Notebook and virtual environment is located. Load the newly created "Python (myvenv)" kernel. This ensures the `./` path is always current. You can use the `cd` command to change to your project directory and `pwd` to verify your current directory.

## Run the Notebook:
- Open the `demo.ipynb` notebook in a Jupyter environment.
-  Follow the instructions within the notebook, executing the code cells in sequence. Each cell includes comments explaining the purpose of the code, which will guide you through the demo process.
- Make sure to read any embedded instructions or comments carefully to maximize your understanding and troubleshooting any issues that may arise.


In [None]:
!. ./myvenv/bin/activate; pip install -r requirements.txt

## Model Preparation
Download and prepare the model data:

1. Create a directory for model checkpoints:
2. Download the model checkpoint data:

In [None]:
# Check if the directory 'checkpoints_v2' exists and download/unzip if it doesn't
import os
if not os.path.isdir("checkpoints_v2"):
    !wget https://myshell-public-repo-host.s3.amazonaws.com/openvoice/checkpoints_v2_0417.zip
    !unzip checkpoints_v2_0417.zip
    print("File downloaded and extracted.")
else:
    print("Directory already exists, no need to download again.")


Set a `HF_HOME` for HuggingFace downloads to cache locally.

In [None]:
import os
os.makedirs('HF_HOME', exist_ok=True)
os.environ['HF_HOME'] = 'HF_HOME'

### Download UniDic
"UniDic is a dictionary tool required for Japanese text processing.

In [None]:
!. ./myvenv/bin/activate; test -d /iti/myvenv/lib/python3.10/site-packages/unidic/dicdir || python -m unidic download

## Troubleshooting
Common Issues and Fixes
CUDA Library Error: If you encounter an error related to libcublas.so.11, create a symbolic link to the newer version

In [None]:
!ln -s /usr/local/cuda/lib64/libcublas.so.12 /usr/local/cuda/lib64/libcublas.so.11

Install portaudio: Portaudio is required for handling audio input and output in many applications. If you experience issues related to audio operations, ensure that `portaudio19-dev` is installed.

In [None]:
! apt -q install -y portaudio19-dev

This ensures that all dependencies for audio processing are properly configured.


In [None]:
import os
import torch
from openvoice import se_extractor
from openvoice.api import ToneColorConverter
import ipywidgets as widgets
from IPython.display import  display, Audio
import sounddevice as sd
import scipy.io.wavfile as wav
from pydub import AudioSegment
from io import BytesIO
import base64
from ipywidgets import FileUpload
from base64 import b64decode
from pathlib import Path

### Initialization

In this example, we will use the checkpoints from OpenVoiceV2. OpenVoiceV2 is trained with more aggressive augmentations and thus demonstrate better robustness in some cases.

In [None]:
ckpt_converter = 'checkpoints_v2/converter'
device = "cuda:0" if torch.cuda.is_available() else "cpu"
output_dir = 'outputs_v2'

tone_color_converter = ToneColorConverter(f'{ckpt_converter}/config.json', device=device)
tone_color_converter.load_ckpt(f'{ckpt_converter}/checkpoint.pth')

os.makedirs(output_dir, exist_ok=True)

### Obtain Tone Color Embedding
We only extract the tone color embedding for the target speaker. The source tone color embeddings can be directly loaded from `checkpoints_v2/ses` folder.

In [None]:
target_directory = 'resources/'
reference_speaker = None

# Ensure the target directory exists, create it if it does not
if not os.path.exists(target_directory):
    os.makedirs(target_directory)  # This will create the directory and any necessary parent directories
    print(f"Directory {target_directory} created.")
else:
    print(f"Target directory: {target_directory}")

# Function to handle uploaded files
def handle_upload(change):
    global reference_speaker
    if change['new']:
        print("Upload started...")
        # Print the structure of 'change' to understand its content
        print(change)

        for file_upload in change['new'].values():
            print(f"Handling file: {file_upload['name']}")
            filepath = os.path.join(target_directory, file_upload['name'])
            print(f"Saving to: {filepath}")
            reference_speaker = filepath
            with open(filepath, 'wb') as f:
                f.write(file_upload['content'])
            print(f'Saved {file_upload["name"]} to {filepath}')
        # List the files in the target directory after upload
        print(f'Files in target directory ({target_directory}): {list(Path(target_directory).glob("*"))}')
        print("Upload completed.")
        update_dropdown()  # Update dropdown after upload
        dropdown.disabled = True  # Disable dropdown after upload
        upload_widget.disabled = False  # Keep upload enabled if reset is needed

print("Please select an audio file to use as reference speaker. If uploading your own, it is recommended to upload in MP3 format, a 30 second audio sample of spoken text.")

# Create an output widget to capture print statements
output = widgets.Output()

# Create an upload widget
upload_widget = widgets.FileUpload(multiple=False)

# Function to handle the change event using output widget
def handle_upload_with_output(change):
    with output:
        handle_upload(change)

# Attach the observer to the upload widget
upload_widget.observe(handle_upload_with_output, names='value')

# Create a Dropdown widget for selecting existing files
def update_dropdown():
    files = [file.name for file in Path(target_directory).glob('*.mp3')]
    dropdown.options = files

dropdown = widgets.Dropdown(
    options=[],
    description='Select File:',
    disabled=False,
)

def dropdown_change(change):
    if change['new']:
        global reference_speaker
        reference_speaker = os.path.join(target_directory, change['new'])
        print(f"Selected file: {reference_speaker}")
        upload_widget.disabled = True  # Disable upload after selection
        dropdown.disabled = False  # Keep dropdown enabled if reset is needed

dropdown.observe(dropdown_change, names='value')

# Create a button to reset the selections
reset_button = widgets.Button(description="Reset Selections")

def on_reset_clicked(b):
    dropdown.disabled = False
    upload_widget.disabled = False
    update_dropdown()  # Update the dropdown list
    print("Selections reset. You can upload a file or select from the dropdown.")

reset_button.on_click(on_reset_clicked)

# Display the widgets and output widget
display(upload_widget, dropdown, output, reset_button)
update_dropdown()  # Initial update for the dropdown

In [None]:
target_se, audio_name = se_extractor.get_se(reference_speaker, tone_color_converter, vad=False)

#### Use MeloTTS as Base Speakers

MeloTTS is a high-quality multi-lingual text-to-speech library by @MyShell.ai, supporting languages including English (American, British, Indian, Australian, Default), Spanish, French, Chinese, Japanese, Korean. In the following example, we will use the models in MeloTTS as the base speakers. 

In [None]:
from melo.api import TTS
def run_tts(language, text):
    src_path = f'{output_dir}/tmp.wav'
    speed = 1.0  # Adjustable speed

    # Initialize the TTS model for the selected language
    model = TTS(language=language, device=device)
    speaker_ids = model.hps.data.spk2id

    for speaker_key in speaker_ids.keys():
        speaker_id = speaker_ids[speaker_key]
        speaker_key = speaker_key.lower().replace('_', '-')
        
        source_se = torch.load(f'checkpoints_v2/base_speakers/ses/{speaker_key}.pth', map_location=device)
        model.tts_to_file(text, speaker_id, src_path, speed=speed)
        save_path = f'{output_dir}/output_v2_{speaker_key}.wav'

        # Run the tone color converter
        encode_message = "@MyShell"
        tone_color_converter.convert(
            audio_src_path=src_path, 
            src_se=source_se, 
            tgt_se=target_se, 
            output_path=save_path,
            message=encode_message)

        # Play the generated audio file
        display(Audio(save_path, autoplay=True))
        print(f"Generated audio saved to {save_path}")


## Text-To-Speech Interface Overview

This interactive section allows users to convert text into speech using a simple interface built with Jupyter widgets.

### Features

- **Language Selection**: Users can choose from multiple languages including English, Spanish, French, Chinese, Japanese, and Korean.
- **Text Input**: Provides a textarea for entering or modifying text for speech conversion.
- **Convert to Speech**: A button that initiates the conversion of text into speech based on the selected language.

### Usage

Upon selecting a language from the dropdown, the text area updates with a preloaded text specific to that language, which can be edited. Clicking the "Convert to Speech" button processes the entered text using the chosen language’s TTS capabilities.

This tool is designed for demonstrations and educational use, offering a straightforward way to interact with TTS technology.

In [None]:

# Predefined texts for demonstration; you might want to start with empty strings in practice.
texts = {
    'EN_NEWEST': "Did you ever hear a folk tale about a giant turtle?",  # The newest English base speaker model
    'EN': "Did you ever hear a folk tale about a giant turtle?",
    'ES': "El resplandor del sol acaricia las olas, pintando el cielo con una paleta deslumbrante.",
    'FR': "La lueur dorée du soleil caresse les vagues, peignant le ciel d'une palette éblouissante.",
    'ZH': "在这次vacation中，我们计划去Paris欣赏埃菲尔铁塔和卢浮宫的美景。",
    'JP': "彼は毎朝ジョギングをして体を健康に保っています。",
    'KR': "안녕하세요! 오늘은 날씨가 정말 좋네요.",
}

# Create widgets
language_dropdown = widgets.Dropdown(
    options=[('English - Newest', 'EN_NEWEST'), ('English', 'EN'), ('Spanish', 'ES'), 
             ('French', 'FR'), ('Chinese', 'ZH'), ('Japanese', 'JP'), ('Korean', 'KR')],
    value='EN_NEWEST',
    description='Language:',
)

text_input = widgets.Textarea(
    value=texts[language_dropdown.value],
    placeholder='Type something',
    description='Text:',
    disabled=False
)

def update_text_input(*args):
    text_input.value = texts[language_dropdown.value]

language_dropdown.observe(update_text_input, 'value')

button = widgets.Button(description="Convert to Speech")

output = widgets.Output()

def on_button_clicked(b):
    with output:
        language = language_dropdown.value
        text = text_input.value
        print(f"Processing TTS for {language}: '{text}'")
        run_tts(language, text)  # Call the TTS function with selected language and text

button.on_click(on_button_clicked)

# Display widgets
display(language_dropdown, text_input, button, output)