<a href="https://colab.research.google.com/github/Troyanovsky/awesome-TTS-Colab/blob/main/OpenVoice_V2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🗣️ OpenVoice V2 Google Colab

## 📄 Description  
This Colab notebook uses xTTS V2 as base TTS model and OpenVoice V2 for voice conversion (vocie cloning).

**Languages supported**: English (en), Spanish (es), French (fr), Chinese (zh-cn), Japanese (ja), Korean (ko)  

**Capabilities**: Text-to-speech, Multi-lingual, Voice Cloning

---

## How to use
- Run all cells
- Follow the instructions to input text to generate and upload a reference voice.
- Run all cells and output will be in `output.wav`

---

## 🔗 Resources

- **GitHub Repository:** [myshell-ai/OpenVoice](https://github.com/myshell-ai/OpenVoice) (Used for voice conversion based on reference voice), [coqui-tts](https://github.com/idiap/coqui-ai-TTS) (Use as base TTS model)
- **Model Availability:** [myshell-ai/OpenVoiceV2](https://huggingface.co/myshell-ai/OpenVoiceV2)

---

## 🎙️ Explore More TTS Models  
Want to try out additional TTS models? Check out the curated collection here:  
👉 [awesome-TTS-Colab](https://github.com/Troyanovsky/awesome-TTS-Colab)


In [1]:
# @title Setup and Imports
!pip install coqui-tts==0.26.1

import torch
from TTS.api import TTS
import os
from google.colab import files
from IPython.display import Audio, display
import ipywidgets as widgets

print("Setup complete. Libraries installed and imported.")

Collecting coqui-tts==0.26.1
  Downloading coqui_tts-0.26.1-py3-none-any.whl.metadata (19 kB)
Collecting anyascii>=0.3.0 (from coqui-tts==0.26.1)
  Downloading anyascii-0.3.2-py3-none-any.whl.metadata (1.5 kB)
Collecting coqpit-config<0.3.0,>=0.2.0 (from coqui-tts==0.26.1)
  Downloading coqpit_config-0.2.0-py3-none-any.whl.metadata (11 kB)
Collecting coqui-tts-trainer<0.3.0,>=0.2.0 (from coqui-tts==0.26.1)
  Downloading coqui_tts_trainer-0.2.3-py3-none-any.whl.metadata (8.1 kB)
Collecting encodec>=0.1.1 (from coqui-tts==0.26.1)
  Downloading encodec-0.1.1.tar.gz (3.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.7/3.7 MB[0m [31m56.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting gruut>=2.4.0 (from gruut[de,es,fr]>=2.4.0->coqui-tts==0.26.1)
  Downloading gruut-2.4.0.tar.gz (85 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m85.3/85.3 kB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Pr

In [4]:
# @title Load xTTS and OpenVoice V2 Models
# Get device
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

# Initialize TTS for initial Text-to-Speech (using xTTS v2)
print("Loading xTTS v2 model for initial TTS...")
try:
    tts_xtts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to(device)
    print("xTTS v2 model loaded successfully.")
    # Example of available speakers for xTTS (optional to list, but good to know)
    # print("Available xTTS speakers:", tts_xtts.speakers[:10]) # Print first 10
    default_speaker = "Ana Florence" # Using a default speaker provided by xTTS
    print(f"Using default xTTS speaker: {default_speaker}")

except Exception as e:
    print(f"Error loading xTTS v2 model: {e}")
    print("Please check the model path or try restarting the runtime.")
    tts_xtts = None

print("\nLoading OpenVoice V2 model for Voice Conversion...")
try:
    tts_vc = TTS("voice_conversion_models/multilingual/multi-dataset/openvoice_v2").to(device)
    print("OpenVoice V2 model loaded successfully.")
except Exception as e:
    print(f"Error loading OpenVoice V2 model: {e}")
    print("Please check the model path or try restarting the runtime.")
    tts_vc = None # Ensure tts_vc is None if loading fails

Using device: cuda
Loading xTTS v2 model for initial TTS...
 > You must confirm the following:
 | > "I have purchased a commercial license from Coqui: licensing@coqui.ai"
 | > "Otherwise, I agree to the terms of the non-commercial CPML: https://coqui.ai/cpml" - [y/n]
 | | > y


100%|██████████| 131M/131M [00:30<00:00, 4.34MiB/s]
 99%|█████████▉| 1.86G/1.87G [00:30<00:00, 97.3MiB/s]
100%|██████████| 1.87G/1.87G [00:30<00:00, 60.3MiB/s]
100%|██████████| 4.37k/4.37k [00:00<00:00, 34.4kiB/s]
 55%|█████▍    | 198k/361k [00:00<00:00, 1.57MiB/s]
100%|██████████| 361k/361k [00:01<00:00, 272kiB/s] 
100%|██████████| 32.0/32.0 [00:00<00:00, 187iB/s]
100%|██████████| 7.75M/7.75M [00:16<00:00, 71.3MiB/s]

xTTS v2 model loaded successfully.
Using default xTTS speaker: Ana Florence

Loading OpenVoice V2 model for Voice Conversion...
OpenVoice V2 model loaded successfully.


In [9]:
# @title Get Text and Upload Reference Audio
reference_filename = None
print("\nPlease upload the REFERENCE audio file (.wav, .mp3, etc.) whose voice you want to convert TO.")
print("This file provides the target voice style for conversion.")

uploaded_reference = files.upload()

if len(uploaded_reference) == 0:
    print("No reference file uploaded. Please try again.")
elif len(uploaded_reference) > 1:
    print("Warning: More than one reference file uploaded. Using the first one detected.")
    reference_filename = list(uploaded_reference.keys())[0]
else:
    reference_filename = list(uploaded_reference.keys())[0]

if reference_filename:
    print(f"Reference file '{reference_filename}' uploaded successfully.")
else:
    print("Reference file upload cancelled or failed.")


Please upload the REFERENCE audio file (.wav, .mp3, etc.) whose voice you want to convert TO.
This file provides the target voice style for conversion.


Saving trump_promptvn.wav to trump_promptvn (1).wav
Reference file 'trump_promptvn (1).wav' uploaded successfully.


In [10]:
text_input = "This is voice generated from xTTS and OpenVoice V2" # Change this text

In [6]:
# @title Generate Initial Speech with xTTS
intermediate_source_filename = "intermediate_xtts_output.wav"

# Check if xTTS model is loaded and text was provided
if tts_xtts and text_input.value:
    print(f"\nGenerating initial audio from text using xTTS v2...")
    print(f"Text: {text_input.value[:100]}...") # Print first 100 chars
    print(f"Using default speaker: {default_speaker}")

    try:
        # Perform TTS with xTTS using the default speaker
        tts_xtts.tts_to_file(
            text=text_input.value,
            speaker=default_speaker,
            language="en", # Assuming English based on default speaker examples
            file_path=intermediate_source_filename
        )
        print(f"Initial audio saved to '{intermediate_source_filename}'")

        if not os.path.exists(intermediate_source_filename):
             print(f"Error: Initial audio file '{intermediate_source_filename}' was not created.")
             intermediate_source_filename = None # Indicate failure

    except Exception as e:
        print(f"\nAn error occurred during initial TTS generation: {e}")
        print("Please check the text input or try again.")
        intermediate_source_filename = None # Indicate failure

elif not tts_xtts:
    print("\nInitial TTS skipped because the xTTS model failed to load.")
else:
    print("\nInitial TTS skipped because no text was entered.")


Generating initial audio from text using xTTS v2...
Text: Hello this is audio generated with xTTS and Open Voice V2....
Using default speaker: Ana Florence
Initial audio saved to 'intermediate_xtts_output.wav'


In [7]:
# @title Perform Voice Conversion with OpenVoice V2
output_filename = "output.wav"

# Check if VC model is loaded, intermediate source exists, and reference exists
if tts_vc and intermediate_source_filename and reference_filename and os.path.exists(intermediate_source_filename):
    print(f"\nPerforming voice conversion using OpenVoice V2...")
    print(f"Source Audio (from xTTS): {intermediate_source_filename}")
    print(f"Target Voice (Reference): {reference_filename}")
    print(f"Saving final output to: {output_filename}")

    try:
        # Perform the voice conversion
        tts_vc.voice_conversion_to_file(
          source_wav=intermediate_source_filename,
          target_wav=reference_filename,
          file_path=output_filename
        )
        print("\nVoice conversion complete!")

    except Exception as e:
        print(f"\nAn error occurred during voice conversion: {e}")
        print("Please check if your uploaded reference file is a valid audio format.")

elif not tts_vc:
    print("\nVoice conversion skipped because the OpenVoice V2 model failed to load.")
elif not intermediate_source_filename or not os.path.exists(intermediate_source_filename):
     print("\nVoice conversion skipped because initial audio generation failed or was skipped.")
else:
    print("\nVoice conversion skipped because the reference audio file was not uploaded.")


Performing voice conversion using OpenVoice V2...
Source Audio (from xTTS): intermediate_xtts_output.wav
Target Voice (Reference): trump_promptvn.wav
Saving final output to: output.wav

Voice conversion complete!


In [8]:
# @title Play Final Output
# Check if the output file was created
if os.path.exists(output_filename):
    print(f"Final output file '{output_filename}' created.")
    # Provide the play button
    print("\nHere is the converted audio:")
    display(Audio(output_filename))
else:
     print(f"\nOutput file '{output_filename}' was not found. Voice conversion might have failed or was skipped.")

# Clean up intermediate file (optional)
if os.path.exists(intermediate_source_filename):
    try:
        os.remove(intermediate_source_filename)
        print(f"\nCleaned up intermediate file: {intermediate_source_filename}")
    except Exception as e:
        print(f"Could not remove intermediate file {intermediate_source_filename}: {e}")

Final output file 'output.wav' created.

Here is the converted audio:



Cleaned up intermediate file: intermediate_xtts_output.wav
