## Multi-Accent and Multi-Lingual Voice Clone Demo with MeloTTS

In [1]:
import os
import torch
from openvoice import se_extractor
from openvoice.api import ToneColorConverter

  from .autonotebook import tqdm as notebook_tqdm


Importing the dtw module. When using in academic works please cite:
  T. Giorgino. Computing and Visualizing Dynamic Time Warping Alignments in R: The dtw Package.
  J. Stat. Soft., doi:10.18637/jss.v031.i07.



### Initialization

In this example, we will use the checkpoints from OpenVoiceV2. OpenVoiceV2 is trained with more aggressive augmentations and thus demonstrate better robustness in some cases.

In [2]:
ckpt_converter = '../assets/checkpoints_v2/converter'
device = "cuda" if torch.cuda.is_available() else "cpu"
output_dir = '../assets/output/convert_audio'

tone_color_converter = ToneColorConverter(f'{ckpt_converter}/config.json', device=device)
tone_color_converter.load_ckpt(f'{ckpt_converter}/checkpoint.pth')

os.makedirs(output_dir, exist_ok=True)

  WeightNorm.apply(module, name, dim)
  checkpoint = torch.load(resume_path, map_location=torch.device('cpu'))
  checkpoint_dict = torch.load(ckpt_path, map_location=torch.device(self.device))


Loaded checkpoint '../assets/checkpoints_v2/converter/checkpoint.pth'
missing/unexpected keys: [] []


### Obtain Tone Color Embedding
We only extract the tone color embedding for the target speaker. The source tone color embeddings can be directly loaded from `checkpoints_v2/ses` folder.

In [3]:
from pydub.utils import which
print(which("ffmpeg"))
print(which("ffprobe"))


C:\tools\ffmpeg.exe
C:\tools\ffprobe.exe


In [4]:

reference_speaker = 'resources/example_reference.mp3' # This is the voice you want to clone
target_se, audio_name = se_extractor.get_se(reference_speaker, tone_color_converter, vad=True)

OpenVoice version: v2
[(0.0, 58.8188125)]
after vad: dur = 58.81798185941043


Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\SpectralOps.cpp:878.)
  return _VF.stft(  # type: ignore[attr-defined]


#### Use MeloTTS as Base Speakers

MeloTTS is a high-quality multi-lingual text-to-speech library by @MyShell.ai, supporting languages including English (American, British, Indian, Australian, Default), Spanish, French, Chinese, Japanese, Korean. In the following example, we will use the models in MeloTTS as the base speakers. 

In [5]:
import sys
import os

# 获取当前工作目录，并将上一级目录添加到系统路径中
current_dir = os.getcwd()
parent_dir = os.path.abspath(os.path.join(current_dir, '../'))  # 上两级目录
sys.path.append(parent_dir)

from MeloTTS.melo.api import TTS

texts = {
    # 'EN_NEWEST': "Did you ever hear a folk tale about a giant turtle?",  # The newest English base speaker model
    # 'EN': "Did you ever hear a folk tale about a giant turtle?",
    # 'ES': "El resplandor del sol acaricia las olas, pintando el cielo con una paleta deslumbrante.",
    'FR': "La lueur dorée du soleil caresse les vagues, peignant le ciel d'une palette éblouissante.",
    'ZH': "在这次vacation中，我们计划去Paris欣赏埃菲尔铁塔和卢浮宫的美景。",
    'JP': "彼は毎朝ジョギングをして体を健康に保っています。",
    'KR': "안녕하세요! 오늘은 날씨가 정말 좋네요.",
}


src_path = f'{output_dir}/tmp.wav'

# Speed is adjustable
speed = 1.0

for language, text in texts.items():
    model = TTS(language=language, device=device)
    speaker_ids = model.hps.data.spk2id
    
    for speaker_key in speaker_ids.keys():
        speaker_id = speaker_ids[speaker_key]
        speaker_key = speaker_key.lower().replace('_', '-')
        
        source_se = torch.load(f'../assets/checkpoints_v2/base_speakers/ses/{speaker_key}.pth', map_location=device)
        model.tts_to_file(text, speaker_id, src_path, speed=speed)
        save_path = f'{output_dir}/output_v2_{speaker_key}.wav'

        # Run the tone color converter
        encode_message = "@MyShell"
        tone_color_converter.convert(
            audio_src_path=src_path, 
            src_se=source_se, 
            tgt_se=target_se, 
            output_path=save_path,
            message=encode_message)

  WeightNorm.apply(module, name, dim)
  return torch.load(ckpt_path, map_location=device)
  source_se = torch.load(f'../assets/checkpoints_v2/base_speakers/ses/{speaker_key}.pth', map_location=device)


 > Text split to sentences.
La lueur dorée du soleil caresse les vagues, peignant le ciel d'une palette éblouissante.


To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
  return torch.load(checkpoint_file, map_location="cpu")
Some weights of the model checkpoint at dbmdz/bert-base-french-europeana-cased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
100%|██████████| 1/1 [01:38<

 > Text split to sentences.
在这次vacation中,
我们计划去Paris欣赏埃菲尔铁塔和卢浮宫的美景.


  0%|          | 0/2 [00:00<?, ?it/s]Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\jerry\AppData\Local\Temp\jieba.cache
Loading model cost 1.051 seconds.
Prefix dict has been built successfully.
Some weights of the model checkpoint at bert-base-multilingual-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
100%|██████████| 2/2 [01:03<00:00, 31.99s/it]


 > Text split to sentences.
彼は毎朝ジョギングをして体を健康に保っています.


100%|██████████| 1/1 [01:35<00:00, 95.00s/it]


 > Text split to sentences.
안녕하세요! 오늘은 날씨가 정말 좋네요.


  0%|          | 0/1 [00:00<?, ?it/s]

you have to install eunjeon. install it...


  0%|          | 0/1 [00:24<?, ?it/s]

you have to install eunjeon. "pip install eunjeon"





TypeError: exceptions must derive from BaseException