<a href="https://colab.research.google.com/github/Spr-Aachen/Easy-Voice-Toolkit/blob/main/run.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Terms of Use

**Please solve the authorization problem of the dataset on your own. You shall be solely responsible for any problems caused by the use of non-authorized datasets for training and all consequences thereof.The repository and its maintainer have nothing to do with the consequences!**

1. This project is established for academic exchange purposes only and is intended for communication and learning purposes. It is not intended for production environments.
2. Any videos based on Easy Voice Toolkit that are published on video platforms must clearly indicate in the description that they are used for voice changing and specify the input source of the voice or audio, for example, using videos or audios published by others and separating the vocals as input source for conversion, which must provide clear original video links. If your own voice or other synthesized voices from other commercial vocal synthesis software are used as the input source for conversion, you must also explain it in the description.
3. You shall be solely responsible for any infringement problems caused by the input source. When using other commercial vocal synthesis software as input source, please ensure that you comply with the terms of use of the software. Note that many vocal synthesis engines clearly state in their terms of use that they cannot be used for input source conversion.
4. Continuing to use this project is deemed as agreeing to the relevant provisions stated in this repository README. This repository README has the obligation to persuade, and is not responsible for any subsequent problems that may arise.
5. If you distribute this repository's code or publish any results produced by this project publicly (including but not limited to video sharing platforms), please indicate the original author and code source (this repository).
6. If you use this project for any other plan, please contact and inform the author of this repository in advance. Thank you very much.

## Configure Colab

### 防止断连<br>Prevent Disconnection

按住 Ctrl+Shift 再按下 I 呼出浏览器的开发工具，于控制台内输入以下内容并回车
```
function ConnectButton()
{
    console.log("Connect pushed"); 
    document.querySelector("#top-toolbar > colab-connect-button").shadowRoot.querySelector("#connect").click()
}
setInterval(ConnectButton,60000);
```

### 使用GPU<br>Use GPU

找到上方菜单栏“代码执行程序”——>“更改运行时类型”——>"硬件加速器"，选择GPU

### 克隆仓库<br>Clone Repository

In [None]:
!git clone --recurse-submodules https://github.com/Spr-Aachen/Easy-Voice-Toolkit.git
%cd /content/Easy-Voice-Toolkit
!sed -i '10s/False/True/' ./EVT_Core/GPT_SoVITS/config.py

### 安装依赖<br>Install Dependencies

In [None]:
!apt-get update``
!apt-get install portaudio19-dev
!pip3 install -r requirements.txt
#!pip3 install --force-reinstall --yes torch torchvision torchaudio
!/usr/local/bin/pip install ipykernel
'''
!apt-get install python3.9
!cp -r /usr/local/lib/python3.10/dist-packages /usr/local/lib/python3.9/
'''
#exit() # Enable this only when you decide to delete the runtime

### 下载模型<br>Download Models

In [None]:
# get UVR5 models
!mkdir -p /content/models/download/uvr5
%cd /content/models/download/uvr5
!git clone https://huggingface.co/Delik/uvr5_weights
!mv /content/models/download/uvr5/uvr5_weights/* /content/models/download/uvr5/
# get VPR models
!mkdir -p /content/models/download/VPR
%cd /content/models/download/VPR
!git clone https://huggingface.co/SprAachen/VPR
!mv /content/models/download/VPR/VPR/* /content/models/download/VPR
# get Whisper models
!mkdir -p /content/models/download/Whisper
%cd /content/models/download/Whisper
!git clone https://huggingface.co/SprAachen/Whisper
!mv /content/models/download/Whisper/Whisper/* /content/models/download/Whisper
# get GPT-SoVITS pretrains
!mkdir -p /content/models/download/GPT-SoVITS
%cd /content/models/download/GPT-SoVITS
!git clone https://huggingface.co/lj1995/GPT-SoVITS
!mv /content/models/download/GPT-SoVITS/GPT-SoVITS/* /content/models/download/GPT-SoVITS/
# get VITS pretrains
# **暂无，抱歉 Not Available**

### 装载硬盘<br>Mount Google Drive

In [None]:
from google.colab import drive
drive.mount('/content/drive')

### 准备文件<br>Prepare Files

检查是否已将需要处理的文件上传到了 https://drive.google.com/drive/my-drive 中

## Run Tools

### 音频处理 AudioProcessor
将媒体文件批量转换为音频文件然后自动切除音频的静音部分

In [None]:
#@title Execute 运行
%cd /content/Easy-Voice-Toolkit

from datetime import date
from pathlib import Path
from EVT_Core.AudioProcessor.process import Audio_Processing

#@markdown **媒体输入目录**：需要输出为音频文件的媒体文件的目录
Media_Dir_Input: str = '/content/drive/MyDrive/%MediaInput%'   #@param {type:"string"}
#@markdown **媒体输出格式**：需要输出为的音频文件的格式
Media_Format_Output: str = 'wav'   #@param ["flac", "wav", "mp3", "aac", "ogg", "m4a", "wma", "aiff", "au"]
#@markdown **启用降噪**：音频中的噪声将被降噪处理
Denoise_Audio: bool = True   #@param {type:"boolean"}
#@markdown **降噪目标**：选择在降噪时要保留的声音对象
Denoise_Target: str = '人声'   #@param ["人声", "背景声"]
#@markdown **启用静音切除**：音频中的静音部分将被切除
Slice_Audio: bool = True   #@param {type:"boolean"}
#@markdown **均方根阈值 (db)**：低于该阈值的片段将被视作静音进行处理，若有降噪需求可以增加该值
RMS_Threshold: float = -40.   #@param {type:"number"}
#@markdown **跳跃大小 (ms)**：每个RMS帧的长度，增加该值能够提高分割精度但会减慢进程
Hop_Size: int = 10   #@param {type:"integer"}
#@markdown **最小静音间隔 (ms)**：静音部分被分割成的最小长度，若音频只包含短暂中断可以减小该值（注意：这个值必须小于 Audio Length Min，大于 Hop Size）
Silent_Interval_Min: int = 300   #@param {type:"integer"}
#@markdown **最大静音长度 (ms)**：被分割的音频周围保持静音的最大长度（提示：这个值无需完全对应被分割音频中的静音长度。算法将自行检索最佳的分割位置）
Silence_Kept_Max: int = 1000   #@param {type:"integer"}
#@markdown **最小音频长度 (ms)**：每个被分割的音频片段所需的最小长度
Audio_Length_Min: int = 3000   #@param {type:"integer"}
#@markdown **输出采样率**：输出音频所拥有的采样率，若维持不变则保持'None'即可
SampleRate: int = None   #@param ["None", 44100, 48000, 96000, 192000]
#@markdown **输出采样位数**：输出音频所拥有的采样位数，若维持不变则保持'None'即可
SampleWidth: int = None   #@param ["None", 8, 16, 24, 32]
#@markdown **合并声道**：将输出音频的声道合并为单声道
ToMono: bool = False   #@param {type:"boolean"}
#@markdown **输出目录**：用于保存最后生成的音频文件的目录
Media_Dir_Output: str = f'/content/drive/MyDrive/EVT/音频处理结果/{date.today()}'   #@param {type:"string"}

AudioConvertandSlice = Audio_Processing(
    Media_Dir_Input,
    Media_Format_Output,
    SampleRate if SampleRate != "None" else None,
    SampleWidth if SampleWidth != "None" else None,
    ToMono,
    Denoise_Audio,
    '/content/models/download/uvr5/HP5_only_main_vocal.pth',
    Denoise_Target,
    Slice_Audio,
    RMS_Threshold,
    Audio_Length_Min,
    Silent_Interval_Min,
    Hop_Size,
    Silence_Kept_Max,
    Path(Media_Dir_Output).parent.__str__(),
    Path(Media_Dir_Output).name
)
AudioConvertandSlice.processAudio()

### 语音识别 VoiceIdentifier
在不同说话人的音频中批量筛选出属于同一说话人的音频

In [None]:
#@title Execute 运行
%cd /content/Easy-Voice-Toolkit

from datetime import date
from pathlib import Path
from EVT_Core.VPR.infer import Voice_Contrasting

#@markdown **音频输入目录**：需要进行语音识别筛选的音频文件的目录
Audio_Dir_Input: str = '/content/drive/MyDrive/%...%'   #@param {type:"string"}
#@markdown **目标人物与音频**：目标人物的名字及其语音文件的所在路径
StdAudioSpeaker: dict = {'%SpeakerName%': '/content/drive/MyDrive/%StdAudio.wav%'}   #@param {type:"raw"}
#@markdown **判断阈值**：判断是否为同一人的阈值，若参与比对的说话人声音相识度较高可以增加该值
DecisionThreshold: float = 0.75   #@param {type:"number"}
#@markdown **音频长度**：用于预测的音频长度
Duration_of_Audio: float = 3.00   #@param {type:"number"}
#@markdown **输出目录**：用于保存最后生成的结果文件的目录
Output_Dir: str = f'/content/drive/MyDrive/EVT/语音识别结果/{date.today()}'   #@param {type:"string"}
#@markdown **识别结果文本名**：用于保存最后生成的记录音频文件与对应说话人的txt文件的名字
AudioSpeakersDataName: str = 'Recgonition'   #@param {type:"string"}

import os, shutil
def ASRResult_Update(AudioSpeakersData_Path: str, MoveToDst: str):
    os.makedirs(MoveToDst, exist_ok = True) if Path(MoveToDst).exists() == False else None
    with open(AudioSpeakersData_Path, mode = 'w', encoding = 'utf-8') as AudioSpeakersData:
        AudioSpeakers = AudioSpeakersData.readlines()
        Lines = []
        for AudioSpeaker in AudioSpeakers:
            Audio, Speaker = AudioSpeaker.split('|', maxsplit = 1)
            if Speaker.strip() != '':
                MoveToDst_Sub = Path(MoveToDst).joinpath(Speaker).as_posix()
                os.makedirs(MoveToDst_Sub, exist_ok = True) if Path(MoveToDst_Sub).exists() == False else None
                Audio_Dst = Path(MoveToDst_Sub).joinpath(Path(Audio).name).as_posix()
                shutil.copy(Audio, MoveToDst_Sub) if not Path(Audio_Dst).exists() else None
                Lines.append(f"{Audio_Dst}|{Speaker}\n")
        AudioSpeakersData.writelines(Lines)

AudioContrastInference = Voice_Contrasting(
    StdAudioSpeaker,
    Audio_Dir_Input,
    '/content/models/download/VPR/Ecapa-Tdnn_spectrogram.pth',
    'Ecapa-Tdnn',
    'spectrogram',
    DecisionThreshold,
    Duration_of_Audio,
    Path(Output_Dir).parent.__str__(),
    Path(Output_Dir).name,
    AudioSpeakersDataName
)
AudioContrastInference.getModel()
AudioContrastInference.inference()
ASRResult_Update(
    Path(Output_Dir).joinpath(AudioSpeakersDataName) + ".txt",
    Output_Dir
)

### 语音转录 VoiceTranscriber
将语音文件的内容批量转换为带时间戳的文本并以字幕文件的形式保存

In [None]:
#@title Execute 运行
%cd /content/Easy-Voice-Toolkit

from datetime import date
from pathlib import Path
from EVT_Core.Whisper.transcribe import Voice_Transcribing

#@markdown **音频目录**：需要将语音内容转为文字的wav文件的目录
Audio_Dir: str = '/content/drive/MyDrive/%EVT/语音识别结果/...%'   #@param {type:"string"}
#@markdown **标注语言信息**：标注音频中说话人所使用的语言，若用于VITS数据集制作则建议启用
Add_LanguageInfo: str = True   #@param {type:"boolean"}
#@markdown **半精度训练**：主要使用半精度浮点数进行计算，若GPU不可用则忽略或禁用此项
fp16: bool = True   #@param {type:"boolean"}
#@markdown **启用输出日志**：是否输出debug日志
Verbose: bool = True   #@param {type:"boolean"}
#@markdown **关联上下文**：在音频之间的内容具有关联性时启用该项可以获得更好的效果，若模型陷入了失败循环则禁用此项
Condition_on_Previous_Text: bool = False   #@param {type:"boolean"}
#@markdown **输出目录**：最后生成的字幕文件将会保存到该目录中
Output_Dir: str = f'/content/drive/MyDrive/EVT/语音转录结果/{date.today()}'   #@param {type:"string"}

WAVtoSRT = Voice_Transcribing(
    '/content/models/download/Whisper/base.pt',
    Audio_Dir,
    Verbose,
    Add_LanguageInfo,
    Condition_on_Previous_Text,
    fp16,
    Path(Output_Dir).parent.__str__(),
    Path(Output_Dir).name
)
WAVtoSRT.transcribe()

### GPT-SoVITS数据集制作 DatasetCreator - GPT-SoVITS
生成适用于语音模型训练的数据集

In [None]:
#@title Execute 运行
%cd /content/Easy-Voice-Toolkit

from datetime import date
from pathlib import Path
from EVT_Core.GPT_SoVITS.preprocess import Dataset_Creating

#@markdown **音频文件目录/语音识别结果文件路径**：音频文件的所在目录（要求按说话人分类），或者提供由语音识别得到的文本文件的所在路径
AudioSpeakersData_Path: str = '/content/drive/MyDrive/%EVT/语音识别结果/GPT-SoVITS/...%'   #@param {type:"string"}
#@markdown **字幕输入目录**：需要转为适用于模型训练的csv文件的srt文件的目录
SRT_Dir: str = '/content/drive/MyDrive/%EVT/语音转录结果/GPT-SoVITS/...%'   #@param {type:"string"}
#@markdown **输出目录**：用于保存最后生成的数据集的目录
Output_Dir: str = f'/content/drive/MyDrive/EVT/数据集制作结果/GPT-SoVITS/{date.today()}'   #@param {type:"string"}
#@markdown **训练集文本名**：用于保存最后生成的训练集txt文件的名字
FileList_Name_Training: str = 'train'   #@param {type:"string"}

SRTtoCSVandSplitAudio = Dataset_Creating(
    SRT_Dir = SRT_Dir,
    AudioSpeakersData_Path = AudioSpeakersData_Path,
    Output_Root = Path(Output_Dir).parent.__str__(),
    Output_DirName = Path(Output_Dir).name,
    FileList_Name = FileList_Name_Training
)
SRTtoCSVandSplitAudio.run()

### GPT-SoVITS模型训练 VoiceTrainer - GPT-SoVITS
训练出适用于语音合成的模型文件（若在使用过程中出现报错，可以尝试先`断开连接并删除运行时`，然后重新运行 Configure Colab 部分以及本代码块）

In [None]:
#@title Execute 运行
%cd /content/Easy-Voice-Toolkit

from datetime import date
from pathlib import Path
from EVT_Core.GPT_SoVITS.train import train

#@markdown **训练版本**
version: str = 'v2'   #@param ["v2", "v3"]
#@markdown **训练集文本路径**：用于提供训练集音频路径及其语音内容的训练集txt文件的路径
fileList_path: str = '/content/drive/MyDrive/%EVT/数据集制作结果/GPT-SoVITS/train.txt%'   #@param {type:"string"}
#@markdown **半精度训练**：通过混合了float16精度的训练方式减小显存占用以支持更大的批处理量
half_precision: bool = True   #@param {type:"boolean"}
#@markdown **输出目录**：用于存放生成的模型和配置文件的目录，若目录中已存在模型则会将其视为检查点（注意：当目录中存在多个模型时，编号最大的会被选为检查点）
Output_Dir: str = f'/content/drive/MyDrive/EVT/模型训练结果/GPT-SoVITS/{date.today()}'   #@param {type:"string"}

PreprocessandTrain = train(
    version,
    fileList_path,
    modelDir_bert = '/content/models/download/GPT-SoVITS/chinese-roberta-wwm-ext-large',
    modelDir_hubert = '/content/models/download/GPT-SoVITS/chinese-hubert-base',
    modelPath_gpt = '/content/models/download/GPT-SoVITS/s1bert25hz-2kh-longer-epoch=68e-step=50232.ckpt',
    modelPath_sovitsG = '/content/models/download/GPT-SoVITS/s2G488k.pth',
    modelPath_sovitsD = '/content/models/download/GPT-SoVITS/s2D488k.pth',
    half_precision = half_precision,
    if_grad_ckpt = False,
    lora_rank = 32,
    Output_Root = Path(Output_Dir).parent.__str__(),
    Output_DirName = Path(Output_Dir).name,
    Output_LogDir = "/content/drive/MyDrive/EVT/log"
)

### GPT-SoVITS语音合成 VoiceConverter - GPT-SoVITS
将文字转为语音并生成音频文件

In [None]:
#@title Execute 运行
%cd /content/Easy-Voice-Toolkit

from EVT_Core.GPT_SoVITS.infer_webui import infer

#@markdown **推理版本**
version: str = 'v2'   #@param ["v2", "v3"]
#@markdown **半精度推理**：通过混合了float16精度的推理方式减小显存占用以支持更大的批处理量
half_precision: bool = True   #@param {type:"boolean"}
# #@markdown **启用批处理推理**：通过批处理推理的方式减小显存占用以支持更大的批处理量
# batched_infer: bool = True   #@param {type:"boolean"}

VoiceConverting = infer(
    version,
    sovits_path = '/content/models/download/GPT-SoVITS/s1bert25hz-2kh-longer-epoch=68e-step=50232.ckpt',
    path_sovits_v3 = '/content/models/download/GPT-SoVITS/s2Gv3.pth',
    gpt_path = '/content/models/download/GPT-SoVITS/s2G488k.pth',
    cnhubert_base_path = '/content/models/download/GPT-SoVITS/chinese-hubert-base',
    bert_path = '/content/models/download/GPT-SoVITS/chinese-roberta-wwm-ext-large',
    bigvgan_path = '/content/models/download/GPT-SoVITS/models--nvidia--bigvgan_v2_24khz_100band_256x',
    half_precision = half_precision,
    # batched_infer = batched_infer,
)

### VITS2数据集制作 DatasetCreator - VITS2
生成适用于语音模型训练的数据集

In [None]:
#@title Execute 运行
%cd /content/Easy-Voice-Toolkit

from datetime import date
from pathlib import Path
from EVT_Core.VITS.preprocess import Dataset_Creating

#@markdown **音频文件目录/语音识别结果文件路径**：音频文件的所在目录（要求按说话人分类），或者提供由语音识别得到的文本文件的所在路径
AudioSpeakersData_Path: str = '/content/drive/MyDrive/%EVT/语音识别结果/VITS/...%'   #@param {type:"string"}
#@markdown **字幕输入目录**：需要转为适用于模型训练的csv文件的srt文件的目录
SRT_Dir: str = '/content/drive/MyDrive/%EVT/语音转录结果/VITS/...%'   #@param {type:"string"}
#@markdown **添加辅助数据**：添加用以辅助训练的数据集，若当前语音数据的质量/数量较低则建议启用
Add_AuxiliaryData: bool = False   #@param {type:"boolean"}
#@markdown **辅助数据文本路径**：辅助数据集的文本的所在路径
AuxiliaryData_Path: str = '/content/drive/MyDrive/%EVT/AuxiliaryData/VITS/AuxiliaryData.txt%'   #@param {type:"string"}
#@markdown **添加其它语言辅助数据**：启用以允许添加与当前数据集语言不匹配的辅助数据
Add_UnmatchedLanguage: bool = False   #@param {type:"boolean"}
#@markdown **采样率 (HZ)**：数据集所要求的音频采样率，若维持不变则保持'None'即可
SampleRate: int = 22050   #@param ["None", 22050, 44100, 48000, 96000, 192000]
#@markdown **采样位数**：数据集所要求的音频采样位数，若维持不变则保持'None'即可
SampleWidth: str = '16'   #@param ["None", 8, 16, 24, 32]
#@markdown **合并声道**：将输出音频的声道合并为单声道
ToMono: bool = True   #@param {type:"boolean"}
#@markdown **训练集占比**：划分给训练集的数据在数据集中所占的比例
TrainRatio: float = 0.7   #@param {type:"number"}
#@markdown **输出目录**：用于保存最后生成的数据集的目录
Output_Dir: str = f'/content/drive/MyDrive/EVT/数据集制作结果/VITS/{date.today()}'   #@param {type:"string"}
#@markdown **训练集文本名**：用于保存最后生成的训练集txt文件的名字
FileList_Name_Training: str = 'train'   #@param {type:"string"}
#@markdown **验证集文本名**：用于保存最后生成的验证集txt文件的名字
FileList_Name_Validation: str = 'Val'   #@param {type:"string"}

SRTtoCSVandSplitAudio = Dataset_Creating(
    SRT_Dir,
    AudioSpeakersData_Path,
    SampleRate if SampleRate != "None" else None,
    SampleWidth if SampleWidth != "None" else None,
    ToMono,
    Add_AuxiliaryData,
    AuxiliaryData_Path,
    Add_UnmatchedLanguage,
    TrainRatio,
    Path(Output_Dir).parent.__str__(),
    Path(Output_Dir).name,
    FileList_Name_Training,
    FileList_Name_Validation
)
SRTtoCSVandSplitAudio.run()

### VITS2模型训练 VoiceTrainer - VITS2
训练出适用于语音合成的模型文件（若在使用过程中出现报错，可以尝试先`断开连接并删除运行时`，然后重新运行 Configure Colab 部分以及本代码块）

In [None]:
#@title Execute 运行
%cd /content/Easy-Voice-Toolkit

from datetime import date
from pathlib import Path
from EVT_Core.VITS.train import train

#@markdown **训练集文本路径**：用于提供训练集音频路径及其语音内容的训练集txt文件的路径
FileList_Path_Training: str = '/content/drive/MyDrive/%EVT/数据集制作结果/VITS/train.txt%'   #@param {type:"string"}
#@markdown **验证集文本路径**：用于提供验证集音频路径及其语音内容的验证集txt文件的路径
FileList_Path_Validation: str = '/content/drive/MyDrive/%EVT/数据集制作结果/VITS/Val.txt%'   #@param {type:"string"}
#@markdown **迭代次数**：将全部样本完整迭代一轮的次数
Epochs: int = 300   #@param {type:"integer"}
#@markdown **批处理量**：每轮迭代中单位批次的样本数量（注意：最好设置为2的幂次）
Batch_Size: int = 16   #@param {type:"integer"}
#@markdown **使用预训练模型**：使用预训练模型（底模），注意其载入优先级高于检查点
Use_PretrainedModels: bool = True   #@param {type:"boolean"}
#@markdown **[可选]预训练G模型路径**：预训练生成器（Generator）模型的路径
Model_Path_Pretrained_G: str = '/content/drive/MyDrive/%EVT/Pretrained Models/standard_G.pth%'   #@param {type:"string"}
#@markdown **[可选]预训练D模型路径**：预训练判别器（Discriminator）模型的路径
Model_Path_Pretrained_D: str = '/content/drive/MyDrive/%EVT/Pretrained Models/standard_D.pth%'   #@param {type:"string"}
#@markdown **[可选]保留原说话人**：保留底模中原有的说话人，请保证每个原角色至少有一两条音频参与训练
Keep_Original_Speakers: bool = False   #@param {type:"boolean"}
#@markdown **[可选]配置加载路径**：用于加载底模人物信息的配置文件的所在路径
Config_Path_Load: str = '/content/drive/MyDrive/%EVT/Pretrained Models/standard_Config.json%'   #@param {type:"string"}
#@markdown **进程数量**：进行数据加载时可并行的进程数量
Num_Workers: int = 8   #@param {type:"integer"}
#@markdown **半精度训练**：通过混合了float16精度的训练方式减小显存占用以支持更大的批处理量
FP16_Run: bool = True   #@param {type:"boolean"}
#@markdown **评估间隔**：每次保存模型所间隔的step数
Eval_Interval: int = 1000   #@param {type:"integer"}
#@markdown **输出目录**：用于存放生成的模型和配置文件的目录，若目录中已存在模型则会将其视为检查点（注意：当目录中存在多个模型时，编号最大的会被选为检查点）
Output_Dir: str = f'/content/drive/MyDrive/EVT/模型训练结果/VITS/{date.today()}'   #@param {type:"string"}

# Load the TensorBoard notebook extension
%load_ext tensorboard
# Start TensorBoard
%tensorboard --logdir /content/drive/MyDrive/EVT/TrainResult

PreprocessandTrain = train(
    FileList_Path_Training,
    FileList_Path_Validation,
    Eval_Interval,
    Epochs,
    Batch_Size,
    FP16_Run,
    Keep_Original_Speakers,
    Config_Path_Load,
    Num_Workers,
    Use_PretrainedModels,
    Model_Path_Pretrained_G if Model_Path_Pretrained_G != "None" else None,
    Model_Path_Pretrained_D if Model_Path_Pretrained_D != "None" else None,
    Path(Output_Dir).parent.__str__(),
    Path(Output_Dir).name,
    "/content/drive/MyDrive/EVT/log"
)

### VITS2语音合成 VoiceConverter - VITS2
将文字转为语音并生成音频文件

In [None]:
#@title Execute 运行
%cd /content/Easy-Voice-Toolkit

from datetime import date
from pathlib import Path
from EVT_Core.VITS.infer import infer

#@markdown **配置加载路径**：该路径对应的配置文件会用于推理
Config_Path_Load: str = '/content/drive/MyDrive/%EVT/模型训练结果/VITS/Config.json%'   #@param {type:"string"}
#@markdown **G模型加载路径**：用于推理的生成器（Generator）模型所在路径
Model_Path_Load: str = '/content/drive/MyDrive/%EVT/模型训练结果/VITS/G_*.pth%'   #@param {type:"string"}
#@markdown **输入文字**：输入的文字会作为说话人的语音内容
Text: str = '请输入语句'   #@param {type:"string"}
#@markdown **所用语言**：说话人/文字所使用的语言，若使用自动检测则保持'None'即可
Language: str = '[ZH]'   #@param ["None", "[ZH]", "[EN]", "[JA]"]
#@markdown **人物名字**：说话人物的名字
Speaker: str = '%Name%'   #@param {type:"string"}
#@markdown **情感强度**：情感的变化程度
EmotionStrength: float = .667   #@param {type:"number"}
#@markdown **音素音长**：音素的发音长度
PhonemeDuration: float = 0.8   #@param {type:"number"}
#@markdown **整体语速**：整体的说话速度
SpeechRate: float = 1.0   #@param {type:"number"}
#@markdown **音频保存路径**：用于保存推理得到的音频的路径
Audio_Path_Save: str = f'/content/drive/MyDrive/EVT/语音合成结果/VITS/{date.today()}.wav'   #@param {type:"string"}

VoiceConverting = infer(
    Config_Path_Load,
    Model_Path_Load,
    Text,
    Language,
    Speaker,
    EmotionStrength,
    PhonemeDuration,
    SpeechRate,
    Audio_Path_Save
)