https://github.com/PlayVoice/so-vits-svc-5.0/

↑Original repository

*《Methods to keep the colab connection alive》*https://zhuanlan.zhihu.com/p/144629818

Preview version, inference with preset models is available

# **Environment Setup & Necessary File Downloads**


In [None]:
#@title Let's see what card we got~~mostly T4~~
!nvidia-smi

In [None]:
#@title Clone the GitHub repository
!git clone https://github.com/Darquedante/so-vits-svc-5.0-EN.git/ -b bigvgan-mix-v2

In [None]:
#@title Install Dependencies & Download Necessary Files
%cd /content/so-vits-svc-5.0

!pip install -r requirements.txt
!pip install --upgrade pip setuptools numpy numba

!wget -P hubert_pretrain/ https://github.com/bshall/hubert/releases/download/v0.1/hubert-soft-0d54a1f4.pt
!wget -P whisper_pretrain/ https://openaipublic.azureedge.net/main/whisper/models/81f7c96c852ee8fc832187b0132e569d6c3065a3252ed18e56effd0b6a73e524/large-v2.pt
!wget -P speaker_pretrain/ https://github.com/PlayVoice/so-vits-svc-5.0/releases/download/dependency/best_model.pth.tar
!wget -P crepe/assets https://github.com/PlayVoice/so-vits-svc-5.0/releases/download/dependency/full.pth
!wget -P vits_pretrain https://github.com/PlayVoice/so-vits-svc-5.0/releases/download/5.0/sovits5.0.pretrain.pth

In [None]:
#@title Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

# Preview of Multi-Speaker Inference

In [None]:
#@title Extract Content Encoding

#@markdown **Upload the processed " .wav " source file to the root directory of the cloud drive and modify the following options**

#@markdown **" .wav " file [File Name]**
input = "\u30AE\u30BF\u30FC\u3068\u5B64\u72EC\u3068\u84BC\u3044\u60D1\u661F" #@param {type:"string"}
input_path = "/content/drive/MyDrive/"
input_name =  input_path + input
!PYTHONPATH=. python whisper/inference.py -w {input_name}.wav -p test.ppg.npy

In [None]:
#@title Inference

#@markdown **Upload the processed " .wav " source file to the root directory of the cloud drive and modify the following options**

#@markdown **" .wav " file [File Name]**
input = "\u30AE\u30BF\u30FC\u3068\u5B64\u72EC\u3068\u84BC\u3044\u60D1\u661F" #@param {type:"string"}
input_path = "/content/drive/MyDrive/"
input_name =  input_path + input
#@markdown **Specify the speaker (0001~0056) (Recommendations: 0022, 0030, 0047, 0051)**
speaker = "0002" #@param {type:"string"}
!PYTHONPATH=. python svc_inference.py --config configs/base.yaml --model vits_pretrain/sovits5.0.pretrain.pth --spk ./configs/singers/singer{speaker}.npy --wave {input_name}.wav  --ppg test.ppg.npy

The inference results are saved in the root directory, with the file name svc_out.wav

# Training

Clip audio to segments less than 30 seconds, match loudness and convert to mono channel, preprocessing will resample so there's no requirement for the sampling rate. (However, reducing the sampling rate will decrease the quality of your data.)

**Use Adobe Audition™'s loudness matching feature to complete resampling, channel modification, and loudness matching in one go.**

Then save the audio files in the following file structure:
```
dataset_raw
├───speaker0
│   ├───xxx1-xxx1.wav
│   ├───...
│   └───Lxx-0xx8.wav
└───speaker1
    ├───xx2-0xxx2.wav
    ├───...
    └───xxx7-xxx007.wav
```

Pack it in a zip format, name it data.zip, and upload it to the root directory of the cloud drive.

In [None]:
#@title Retrieve Dataset from Cloud Drive
!unzip -d /content/so-vits-svc-5.0/ /content/drive/MyDrive/data.zip #Modify the path and file name as needed

In [None]:
#@title Resampling
# Generate audio at 16000Hz sample rate, storage path: ./data_svc/waves-16k
!python prepare/preprocess_a.py -w ./dataset_raw -o ./data_svc/waves-16k -s 16000
# Generate audio at 32000Hz sample rate, storage path: ./data_svc/waves-32k
!python prepare/preprocess_a.py -w ./dataset_raw -o ./data_svc/waves-32k -s 32000

In [None]:
#@title Extract f0
!python prepare/preprocess_f0.py -w data_svc/waves-16k/ -p data_svc/pitch

In [None]:
#@title Use 16k audio to extract content encoding
!PYTHONPATH=. python prepare/preprocess_ppg.py -w data_svc/waves-16k/ -p data_svc/whisper

In [None]:
#@title Use 16k audio to extract content encoding
!PYTHONPATH=. python prepare/preprocess_hubert.py -w data_svc/waves-16k/ -v data_svc/hubert

In [None]:
#@title Extract Timbre Features
!PYTHONPATH=. python prepare/preprocess_speaker.py data_svc/waves-16k/ data_svc/speaker

In [None]:
# (Resolve '.ipynb_checkpoints' related errors)
!rm -rf "find -type d -name .ipynb_checkpoints"

In [None]:
# (Resolve '.ipynb_checkpoints' related errors)
!rm -rf .ipynb_checkpoints
!find . -name ".ipynb_checkpoints" -exec rm -rf {} \;

In [None]:
#@title Extract Average Timbre
!PYTHONPATH=. python prepare/preprocess_speaker_ave.py data_svc/speaker/ data_svc/singer

In [None]:
#@title Extract Spec
!PYTHONPATH=. python prepare/preprocess_spec.py -w data_svc/waves-32k/ -s data_svc/specs

In [None]:
#@title Generate Index
!python prepare/preprocess_train.py

In [None]:
#@title Training File Debugging
!PYTHONPATH=. python prepare/preprocess_zzz.py

In [None]:
#@title  Set Model Backup
#@markdown **Whether to backup the model to cloud drive, it is recommended to backup as colab can crash at any time, by default saved to root directory Sovits5.0 folder on cloud drive**
Save_to_drive = True #@param {type:"boolean"}
if Save_to_drive:
  !mkdir -p /content/so-vits-svc-5.0/chkpt/
  !rm -rf /content/so-vits-svc-5.0/chkpt/
  !mkdir -p /content/drive/MyDrive/Sovits5.0
  !ln -s /content/drive/MyDrive/Sovits5.0 /content/so-vits-svc-5.0/chkpt/

In [None]:
#@title  Start Training
%load_ext tensorboard
%tensorboard --logdir /content/so-vits-svc-5.0/logs/

!PYTHONPATH=. python svc_trainer.py -c configs/base.yaml -n sovits5.0