## Before training

This program saves the last 3 generations of models to Google Drive. Since 1 generation of models is >1GB, you should have at least 3GB of free space in Google Drive. If you do not have such free space, it is recommended to create another Google Account.

Training requires >10GB VRAM. (T4 should be enough) Inference does not require such a lot of VRAM.

## Installation

In [1]:
#@title Check GPU
!nvidia-smi

Fri Jun  9 01:39:13 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.116.04   Driver Version: 525.116.04   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0 Off |                  N/A |
|  0%   48C    P8    18W / 125W |      6MiB /  6144MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [2]:
#@title Mount Google Drive
# from google.colab import drive
# drive.mount('/content/drive')

In [3]:
#@title Install dependencies
#@markdown pip may fail to resolve dependencies and raise ERROR, but it can be ignored.
# !python -m pip install -U pip wheel
# %pip install -U ipython 

# #@markdown Branch (for development)
# BRANCH = "none" #@param {"type": "string"}
# if BRANCH == "none":
#     %pip install -U so-vits-svc-fork
# else:
#     %pip install -U git+https://github.com/34j/so-vits-svc-fork.git@{BRANCH}

## Preprocessing

In [2]:
import os, shutil
from os.path import dirname, abspath
from utils.demucs.utils import separate

import librosa  # Optional. Use any library you like to read audio files.
import soundfile  # Optional. Use any library you like to write audio files.
from utils.audioslicer.slicer import Slicer
from utils.audiosplitter import audiosplitter

def preprocess(audiofile):
   # audiofile = '../dataset_raw/ssk/speaking/ssk_podcast_20230825.mp3'
   filename, ext = os.path.splitext(os.path.basename(audiofile))

   # create output dir
   output_dir = os.path.join('/home/arvin/so-vits-svc-fork/dataset_processed', filename)
   demucs_dir = os.path.join(output_dir, 'demucs')
   chunks_dir = os.path.join(output_dir, 'chunks_clean')

   # Create folders
   for path in [output_dir, demucs_dir, chunks_dir]:
      if not os.path.exists(path):
         os.makedirs(path, exist_ok=True)

   # Copy raw file to output folder root
   shutil.copy(audiofile, output_dir)

   # DEMUCS
   separate(inp=output_dir, outp=demucs_dir)

   # SILENCE SLICER
   demucs_model = 'htdemucs'
   audiofile = os.path.join(demucs_dir, demucs_model, filename, 'vocals.mp3')
   audio, sr = librosa.load(audiofile, sr=None, mono=False)  # Load an audio file with librosa.
   slicer = Slicer(
      sr=sr,
      db_threshold=-40,
      min_length=5000,
      win_l = 300, # min_interval=300,
      win_s = 10, # hop_size=10,
      max_silence_kept=200
   )
   chunks = slicer.slice(audio)
   for i, chunk in enumerate(chunks):
      if len(chunk.shape) > 1:
         chunk = chunk.T  # Swap axes if the audio is stereo.
      soundfile.write(f'{chunks_dir}/{filename}_{i}.wav', chunk, sr)  # Save sliced audio files with soundfile.

   # 10SEC SPLITTER
   audiosplitter.run_audiosplitter(input_directory=chunks_dir)

   return
   # split_audio_file(chunks_dir, segment_duration=10)      

In [3]:
# audiofile = '../dataset_raw/ssk/speaking/ssk_podcast_20230825.mp3'
# audiofile = '../dataset_raw/ssk/singing/성시경_넌_감동이었어.mp3'
# audiofile = '../dataset_raw/ssk/singing/성시경_너의_모든_순간.mp3'
# audiofile = '../dataset_raw/ssk/singing/성시경_제주도의_푸른_밤.mp3'

#SSK
# files = ['../dataset_raw/ssk/singing/성시경_두_사람.mp3',
#         '../dataset_raw/ssk/singing/성시경_사랑하는일.mp3',
#         '../dataset_raw/ssk/singing/성시경_세_사람.mp3',
#         '../dataset_raw/ssk/singing/성시경_안녕_나의_사랑.mp3',
#         '../dataset_raw/ssk/singing/성시경_희재.mp3']

# EDSHEERAN
# !audioconvert convert /home/arvin/so-vits-svc-fork/dataset_source/edsheeran_x/ /home/arvin/so-vits-svc-fork/dataset_source/edsheeran_x/ --output-format .mp3

# files = ['/home/arvin/so-vits-svc-fork/dataset_source/edsheeran_x/06 Photograph.mp3']

# audiofile = '/home/arvin/so-vits-svc-fork/dataset_source/edsheeran_x/06 Photograph.mp3'
# preprocess(audiofile)
import glob
files = glob.glob('/home/arvin/so-vits-svc-fork/dataset_source/edsheeran_x/*.mp3')
for audiofile in files:
    print('='*120)
    print('FILE:', audiofile)
    print('='*120)    
    preprocess(audiofile)


FILE: /home/arvin/so-vits-svc-fork/dataset_source/edsheeran_x/16 I See Fire.mp3
Going to separate the files:
/home/arvin/so-vits-svc-fork/dataset_processed/16 I See Fire/16 I See Fire.mp3
With command:  python3 -m demucs.separate -o /home/arvin/so-vits-svc-fork/dataset_processed/16 I See Fire/demucs -n htdemucs --mp3 --mp3-bitrate=320
Selected model is a bag of 1 models. You will see that many progress bars per track.
Separated tracks will be stored in /home/arvin/so-vits-svc-fork/dataset_processed/16 I See Fire/demucs/htdemucs
Separating track /home/arvin/so-vits-svc-fork/dataset_processed/16 I See Fire/16 I See Fire.mp3


100%|████████████████████████████████████████████████████████████████████████| 304.2/304.2 [00:12<00:00, 25.33seconds/s]


executing 'slice' cost 18.144s
Segment 1/1 saved: 16 I See Fire_2_1.wav
Segment 1/2 saved: 16 I See Fire_8_1.wav
Segment 2/2 saved: 16 I See Fire_8_2.wav
Segment 1/1 saved: 16 I See Fire_6_1.wav
Segment 1/2 saved: 16 I See Fire_13_1.wav
Segment 2/2 saved: 16 I See Fire_13_2.wav
Segment 1/2 saved: 16 I See Fire_7_1.wav
Segment 2/2 saved: 16 I See Fire_7_2.wav
Segment 1/2 saved: 16 I See Fire_17_1.wav
Segment 2/2 saved: 16 I See Fire_17_2.wav
Segment 1/1 saved: 16 I See Fire_0_1.wav
Segment 1/2 saved: 16 I See Fire_11_1.wav
Segment 2/2 saved: 16 I See Fire_11_2.wav
Segment 1/2 saved: 16 I See Fire_16_1.wav
Segment 2/2 saved: 16 I See Fire_16_2.wav
Segment 1/2 saved: 16 I See Fire_4_1.wav
Segment 2/2 saved: 16 I See Fire_4_2.wav
Segment 1/1 saved: 16 I See Fire_9_1.wav
Segment 1/2 saved: 16 I See Fire_12_1.wav
Segment 2/2 saved: 16 I See Fire_12_2.wav
Segment 1/1 saved: 16 I See Fire_3_1.wav
Segment 1/1 saved: 16 I See Fire_1_1.wav
Segment 1/3 saved: 16 I See Fire_18_1.wav
Segment 2/3 sav

100%|████████████████████████████████████████████████████████████████████████| 315.9/315.9 [00:12<00:00, 25.45seconds/s]


executing 'slice' cost 19.042s
Segment 1/2 saved: 12 Afire Love_7_1.wav
Segment 2/2 saved: 12 Afire Love_7_2.wav
Segment 1/1 saved: 12 Afire Love_0_1.wav
Segment 1/3 saved: 12 Afire Love_10_1.wav
Segment 2/3 saved: 12 Afire Love_10_2.wav
Segment 3/3 saved: 12 Afire Love_10_3.wav
Segment 1/3 saved: 12 Afire Love_4_1.wav
Segment 2/3 saved: 12 Afire Love_4_2.wav
Segment 3/3 saved: 12 Afire Love_4_3.wav
Segment 1/1 saved: 12 Afire Love_1_1.wav
Segment 1/1 saved: 12 Afire Love_9_1.wav
Segment 1/2 saved: 12 Afire Love_2_1.wav
Segment 2/2 saved: 12 Afire Love_2_2.wav
Segment 1/1 saved: 12 Afire Love_6_1.wav
Segment 1/1 saved: 12 Afire Love_8_1.wav
Segment 1/2 saved: 12 Afire Love_11_1.wav
Segment 2/2 saved: 12 Afire Love_11_2.wav
Segment 1/9 saved: 12 Afire Love_12_1.wav
Segment 2/9 saved: 12 Afire Love_12_2.wav
Segment 3/9 saved: 12 Afire Love_12_3.wav
Segment 4/9 saved: 12 Afire Love_12_4.wav
Segment 5/9 saved: 12 Afire Love_12_5.wav
Segment 6/9 saved: 12 Afire Love_12_6.wav
Segment 7/9 sav

100%|██████████████████████████████████████████████████████████████████████| 286.65/286.65 [00:11<00:00, 25.32seconds/s]


executing 'slice' cost 17.174s
Segment 1/3 saved: 11 Thinking Out Loud_18_1.wav
Segment 2/3 saved: 11 Thinking Out Loud_18_2.wav
Segment 3/3 saved: 11 Thinking Out Loud_18_3.wav
Segment 1/1 saved: 11 Thinking Out Loud_13_1.wav
Segment 1/1 saved: 11 Thinking Out Loud_14_1.wav
Segment 1/1 saved: 11 Thinking Out Loud_1_1.wav
Segment 1/2 saved: 11 Thinking Out Loud_7_1.wav
Segment 2/2 saved: 11 Thinking Out Loud_7_2.wav
Segment 1/3 saved: 11 Thinking Out Loud_17_1.wav
Segment 2/3 saved: 11 Thinking Out Loud_17_2.wav
Segment 3/3 saved: 11 Thinking Out Loud_17_3.wav
Segment 1/1 saved: 11 Thinking Out Loud_12_1.wav
Segment 1/1 saved: 11 Thinking Out Loud_10_1.wav
Segment 1/1 saved: 11 Thinking Out Loud_9_1.wav
Segment 1/2 saved: 11 Thinking Out Loud_16_1.wav
Segment 2/2 saved: 11 Thinking Out Loud_16_2.wav
Segment 1/1 saved: 11 Thinking Out Loud_11_1.wav
Segment 1/1 saved: 11 Thinking Out Loud_2_1.wav
Segment 1/2 saved: 11 Thinking Out Loud_8_1.wav
Segment 2/2 saved: 11 Thinking Out Loud_8_2.

100%|████████████████████████████████████████████████████████████████████████| 234.0/234.0 [00:09<00:00, 24.80seconds/s]


executing 'slice' cost 13.901s
Segment 1/1 saved: 15 Even My Dad Does Sometimes_13_1.wav
Segment 1/1 saved: 15 Even My Dad Does Sometimes_16_1.wav
Segment 1/1 saved: 15 Even My Dad Does Sometimes_10_1.wav
Segment 1/1 saved: 15 Even My Dad Does Sometimes_7_1.wav
Segment 1/1 saved: 15 Even My Dad Does Sometimes_15_1.wav
Segment 1/1 saved: 15 Even My Dad Does Sometimes_9_1.wav
Segment 1/1 saved: 15 Even My Dad Does Sometimes_11_1.wav
Segment 1/1 saved: 15 Even My Dad Does Sometimes_2_1.wav
Segment 1/1 saved: 15 Even My Dad Does Sometimes_17_1.wav
Segment 1/1 saved: 15 Even My Dad Does Sometimes_3_1.wav
Segment 1/5 saved: 15 Even My Dad Does Sometimes_14_1.wav
Segment 2/5 saved: 15 Even My Dad Does Sometimes_14_2.wav
Segment 3/5 saved: 15 Even My Dad Does Sometimes_14_3.wav
Segment 4/5 saved: 15 Even My Dad Does Sometimes_14_4.wav
Segment 5/5 saved: 15 Even My Dad Does Sometimes_14_5.wav
Segment 1/1 saved: 15 Even My Dad Does Sometimes_8_1.wav
Segment 1/1 saved: 15 Even My Dad Does Sometim

100%|████████████████████████████████████████████████████████████████████████| 245.7/245.7 [00:09<00:00, 24.89seconds/s]


executing 'slice' cost 15.089s
Segment 1/1 saved: 02 I'm a Mess_6_1.wav
Segment 1/1 saved: 02 I'm a Mess_1_1.wav
Segment 1/1 saved: 02 I'm a Mess_7_1.wav
Segment 1/9 saved: 02 I'm a Mess_16_1.wav
Segment 2/9 saved: 02 I'm a Mess_16_2.wav
Segment 3/9 saved: 02 I'm a Mess_16_3.wav
Segment 4/9 saved: 02 I'm a Mess_16_4.wav
Segment 5/9 saved: 02 I'm a Mess_16_5.wav
Segment 6/9 saved: 02 I'm a Mess_16_6.wav
Segment 7/9 saved: 02 I'm a Mess_16_7.wav
Segment 8/9 saved: 02 I'm a Mess_16_8.wav
Segment 9/9 saved: 02 I'm a Mess_16_9.wav
Segment 1/1 saved: 02 I'm a Mess_3_1.wav
Segment 1/1 saved: 02 I'm a Mess_15_1.wav
Segment 1/1 saved: 02 I'm a Mess_9_1.wav
Segment 1/2 saved: 02 I'm a Mess_4_1.wav
Segment 2/2 saved: 02 I'm a Mess_4_2.wav
Segment 1/1 saved: 02 I'm a Mess_11_1.wav
Segment 1/3 saved: 02 I'm a Mess_12_1.wav
Segment 2/3 saved: 02 I'm a Mess_12_2.wav
Segment 3/3 saved: 02 I'm a Mess_12_3.wav
Segment 1/1 saved: 02 I'm a Mess_10_1.wav
Segment 1/1 saved: 02 I'm a Mess_14_1.wav
Segment 1/

100%|██████████████████████████████████████████████| 228.14999999999998/228.14999999999998 [00:09<00:00, 24.72seconds/s]


executing 'slice' cost 13.701s
Segment 1/2 saved: 05 Nina_7_1.wav
Segment 2/2 saved: 05 Nina_7_2.wav
Segment 1/4 saved: 05 Nina_6_1.wav
Segment 2/4 saved: 05 Nina_6_2.wav
Segment 3/4 saved: 05 Nina_6_3.wav
Segment 4/4 saved: 05 Nina_6_4.wav
Segment 1/3 saved: 05 Nina_5_1.wav
Segment 2/3 saved: 05 Nina_5_2.wav
Segment 3/3 saved: 05 Nina_5_3.wav
Segment 1/2 saved: 05 Nina_0_1.wav
Segment 2/2 saved: 05 Nina_0_2.wav
Segment 1/5 saved: 05 Nina_1_1.wav
Segment 2/5 saved: 05 Nina_1_2.wav
Segment 3/5 saved: 05 Nina_1_3.wav
Segment 4/5 saved: 05 Nina_1_4.wav
Segment 5/5 saved: 05 Nina_1_5.wav
Segment 1/1 saved: 05 Nina_2_1.wav
Segment 1/6 saved: 05 Nina_3_1.wav
Segment 2/6 saved: 05 Nina_3_2.wav
Segment 3/6 saved: 05 Nina_3_3.wav
Segment 4/6 saved: 05 Nina_3_4.wav
Segment 5/6 saved: 05 Nina_3_5.wav
Segment 6/6 saved: 05 Nina_3_6.wav
Segment 1/1 saved: 05 Nina_4_1.wav
FILE: /home/arvin/so-vits-svc-fork/dataset_source/edsheeran_x/04 Don't.mp3
Going to separate the files:
/home/arvin/so-vits-svc-f

100%|██████████████████████████████████████████████| 222.29999999999998/222.29999999999998 [00:08<00:00, 24.71seconds/s]


executing 'slice' cost 13.266s
Segment 1/1 saved: 04 Don't_0_1.wav
Segment 1/21 saved: 04 Don't_1_1.wav
Segment 2/21 saved: 04 Don't_1_2.wav
Segment 3/21 saved: 04 Don't_1_3.wav
Segment 4/21 saved: 04 Don't_1_4.wav
Segment 5/21 saved: 04 Don't_1_5.wav
Segment 6/21 saved: 04 Don't_1_6.wav
Segment 7/21 saved: 04 Don't_1_7.wav
Segment 8/21 saved: 04 Don't_1_8.wav
Segment 9/21 saved: 04 Don't_1_9.wav
Segment 10/21 saved: 04 Don't_1_10.wav
Segment 11/21 saved: 04 Don't_1_11.wav
Segment 12/21 saved: 04 Don't_1_12.wav
Segment 13/21 saved: 04 Don't_1_13.wav
Segment 14/21 saved: 04 Don't_1_14.wav
Segment 15/21 saved: 04 Don't_1_15.wav
Segment 16/21 saved: 04 Don't_1_16.wav
Segment 17/21 saved: 04 Don't_1_17.wav
Segment 18/21 saved: 04 Don't_1_18.wav
Segment 19/21 saved: 04 Don't_1_19.wav
Segment 20/21 saved: 04 Don't_1_20.wav
Segment 21/21 saved: 04 Don't_1_21.wav
FILE: /home/arvin/so-vits-svc-fork/dataset_source/edsheeran_x/07 Bloodstream.mp3
Going to separate the files:
/home/arvin/so-vits-sv

100%|████████████████████████████████████████████████████████████████████████| 304.2/304.2 [00:12<00:00, 25.25seconds/s]


executing 'slice' cost 18.542s
Segment 1/1 saved: 07 Bloodstream_1_1.wav
Segment 1/1 saved: 07 Bloodstream_11_1.wav
Segment 1/1 saved: 07 Bloodstream_15_1.wav
Segment 1/1 saved: 07 Bloodstream_21_1.wav
Segment 1/3 saved: 07 Bloodstream_12_1.wav
Segment 2/3 saved: 07 Bloodstream_12_2.wav
Segment 3/3 saved: 07 Bloodstream_12_3.wav
Segment 1/1 saved: 07 Bloodstream_2_1.wav
Segment 1/1 saved: 07 Bloodstream_10_1.wav
Segment 1/1 saved: 07 Bloodstream_18_1.wav
Segment 1/1 saved: 07 Bloodstream_0_1.wav
Segment 1/1 saved: 07 Bloodstream_4_1.wav
Segment 1/2 saved: 07 Bloodstream_5_1.wav
Segment 2/2 saved: 07 Bloodstream_5_2.wav
Segment 1/1 saved: 07 Bloodstream_17_1.wav
Segment 1/1 saved: 07 Bloodstream_8_1.wav
Segment 1/8 saved: 07 Bloodstream_19_1.wav
Segment 2/8 saved: 07 Bloodstream_19_2.wav
Segment 3/8 saved: 07 Bloodstream_19_3.wav
Segment 4/8 saved: 07 Bloodstream_19_4.wav
Segment 5/8 saved: 07 Bloodstream_19_5.wav
Segment 6/8 saved: 07 Bloodstream_19_6.wav
Segment 7/8 saved: 07 Bloodstr

100%|██████████████████████████████████████████████| 251.54999999999998/251.54999999999998 [00:10<00:00, 25.00seconds/s]


executing 'slice' cost 15.502s
Segment 1/1 saved: 10 The Man_7_1.wav
Segment 1/1 saved: 10 The Man_4_1.wav
Segment 1/1 saved: 10 The Man_13_1.wav
Segment 1/1 saved: 10 The Man_14_1.wav
Segment 1/1 saved: 10 The Man_10_1.wav
Segment 1/1 saved: 10 The Man_11_1.wav
Segment 1/1 saved: 10 The Man_3_1.wav
Segment 1/1 saved: 10 The Man_8_1.wav
Segment 1/1 saved: 10 The Man_6_1.wav
Segment 1/1 saved: 10 The Man_2_1.wav
Segment 1/5 saved: 10 The Man_5_1.wav
Segment 2/5 saved: 10 The Man_5_2.wav
Segment 3/5 saved: 10 The Man_5_3.wav
Segment 4/5 saved: 10 The Man_5_4.wav
Segment 5/5 saved: 10 The Man_5_5.wav
Segment 1/3 saved: 10 The Man_0_1.wav
Segment 2/3 saved: 10 The Man_0_2.wav
Segment 3/3 saved: 10 The Man_0_3.wav
Segment 1/3 saved: 10 The Man_1_1.wav
Segment 2/3 saved: 10 The Man_1_2.wav
Segment 3/3 saved: 10 The Man_1_3.wav
Segment 1/5 saved: 10 The Man_9_1.wav
Segment 2/5 saved: 10 The Man_9_2.wav
Segment 3/5 saved: 10 The Man_9_3.wav
Segment 4/5 saved: 10 The Man_9_4.wav
Segment 5/5 sav

100%|████████████████████████████████████████████████████████████████████████| 210.6/210.6 [00:08<00:00, 24.70seconds/s]


executing 'slice' cost 12.732s
Segment 1/2 saved: 09 Runaway_7_1.wav
Segment 2/2 saved: 09 Runaway_7_2.wav
Segment 1/3 saved: 09 Runaway_13_1.wav
Segment 2/3 saved: 09 Runaway_13_2.wav
Segment 3/3 saved: 09 Runaway_13_3.wav
Segment 1/1 saved: 09 Runaway_0_1.wav
Segment 1/4 saved: 09 Runaway_4_1.wav
Segment 2/4 saved: 09 Runaway_4_2.wav
Segment 3/4 saved: 09 Runaway_4_3.wav
Segment 4/4 saved: 09 Runaway_4_4.wav
Segment 1/1 saved: 09 Runaway_1_1.wav
Segment 1/2 saved: 09 Runaway_3_1.wav
Segment 2/2 saved: 09 Runaway_3_2.wav
Segment 1/1 saved: 09 Runaway_12_1.wav
Segment 1/1 saved: 09 Runaway_2_1.wav
Segment 1/1 saved: 09 Runaway_9_1.wav
Segment 1/1 saved: 09 Runaway_14_1.wav
Segment 1/1 saved: 09 Runaway_5_1.wav
Segment 1/1 saved: 09 Runaway_10_1.wav
Segment 1/1 saved: 09 Runaway_6_1.wav
Segment 1/3 saved: 09 Runaway_8_1.wav
Segment 2/3 saved: 09 Runaway_8_2.wav
Segment 3/3 saved: 09 Runaway_8_3.wav
Segment 1/2 saved: 09 Runaway_11_1.wav
Segment 2/2 saved: 09 Runaway_11_2.wav
Segment 1/1

100%|████████████████████████████████████████████████████████████████████████| 257.4/257.4 [00:10<00:00, 25.15seconds/s]


executing 'slice' cost 15.452s
Segment 1/5 saved: 01 One_7_1.wav
Segment 2/5 saved: 01 One_7_2.wav
Segment 3/5 saved: 01 One_7_3.wav
Segment 4/5 saved: 01 One_7_4.wav
Segment 5/5 saved: 01 One_7_5.wav
Segment 1/1 saved: 01 One_0_1.wav
Segment 1/4 saved: 01 One_6_1.wav
Segment 2/4 saved: 01 One_6_2.wav
Segment 3/4 saved: 01 One_6_3.wav
Segment 4/4 saved: 01 One_6_4.wav
Segment 1/6 saved: 01 One_5_1.wav
Segment 2/6 saved: 01 One_5_2.wav
Segment 3/6 saved: 01 One_5_3.wav
Segment 4/6 saved: 01 One_5_4.wav
Segment 5/6 saved: 01 One_5_5.wav
Segment 6/6 saved: 01 One_5_6.wav
Segment 1/1 saved: 01 One_4_1.wav
Segment 1/1 saved: 01 One_3_1.wav
Segment 1/1 saved: 01 One_1_1.wav
Segment 1/6 saved: 01 One_2_1.wav
Segment 2/6 saved: 01 One_2_2.wav
Segment 3/6 saved: 01 One_2_3.wav
Segment 4/6 saved: 01 One_2_4.wav
Segment 5/6 saved: 01 One_2_5.wav
Segment 6/6 saved: 01 One_2_6.wav
FILE: /home/arvin/so-vits-svc-fork/dataset_source/edsheeran_x/03 Sing.mp3
Going to separate the files:
/home/arvin/so-v

100%|██████████████████████████████████████████████████████████████████████| 239.85/239.85 [00:09<00:00, 24.91seconds/s]


executing 'slice' cost 14.465s
Segment 1/6 saved: 03 Sing_6_1.wav
Segment 2/6 saved: 03 Sing_6_2.wav
Segment 3/6 saved: 03 Sing_6_3.wav
Segment 4/6 saved: 03 Sing_6_4.wav
Segment 5/6 saved: 03 Sing_6_5.wav
Segment 6/6 saved: 03 Sing_6_6.wav
Segment 1/1 saved: 03 Sing_10_1.wav
Segment 1/1 saved: 03 Sing_0_1.wav
Segment 1/2 saved: 03 Sing_4_1.wav
Segment 2/2 saved: 03 Sing_4_2.wav
Segment 1/2 saved: 03 Sing_7_1.wav
Segment 2/2 saved: 03 Sing_7_2.wav
Segment 1/3 saved: 03 Sing_3_1.wav
Segment 2/3 saved: 03 Sing_3_2.wav
Segment 3/3 saved: 03 Sing_3_3.wav
Segment 1/5 saved: 03 Sing_11_1.wav
Segment 2/5 saved: 03 Sing_11_2.wav
Segment 3/5 saved: 03 Sing_11_3.wav
Segment 4/5 saved: 03 Sing_11_4.wav
Segment 5/5 saved: 03 Sing_11_5.wav
Segment 1/1 saved: 03 Sing_1_1.wav
Segment 1/1 saved: 03 Sing_5_1.wav
Segment 1/1 saved: 03 Sing_12_1.wav
Segment 1/3 saved: 03 Sing_9_1.wav
Segment 2/3 saved: 03 Sing_9_2.wav
Segment 3/3 saved: 03 Sing_9_3.wav
Segment 1/1 saved: 03 Sing_2_1.wav
Segment 1/1 saved

100%|████████████████████████████████████████████████████████████████████████| 210.6/210.6 [00:08<00:00, 24.54seconds/s]


executing 'slice' cost 12.655s
Segment 1/1 saved: 13 Take It Back_7_1.wav
Segment 1/5 saved: 13 Take It Back_2_1.wav
Segment 2/5 saved: 13 Take It Back_2_2.wav
Segment 3/5 saved: 13 Take It Back_2_3.wav
Segment 4/5 saved: 13 Take It Back_2_4.wav
Segment 5/5 saved: 13 Take It Back_2_5.wav
Segment 1/1 saved: 13 Take It Back_6_1.wav
Segment 1/1 saved: 13 Take It Back_5_1.wav
Segment 1/1 saved: 13 Take It Back_3_1.wav
Segment 1/5 saved: 13 Take It Back_0_1.wav
Segment 2/5 saved: 13 Take It Back_0_2.wav
Segment 3/5 saved: 13 Take It Back_0_3.wav
Segment 4/5 saved: 13 Take It Back_0_4.wav
Segment 5/5 saved: 13 Take It Back_0_5.wav
Segment 1/1 saved: 13 Take It Back_8_1.wav
Segment 1/6 saved: 13 Take It Back_4_1.wav
Segment 2/6 saved: 13 Take It Back_4_2.wav
Segment 3/6 saved: 13 Take It Back_4_3.wav
Segment 4/6 saved: 13 Take It Back_4_4.wav
Segment 5/6 saved: 13 Take It Back_4_5.wav
Segment 6/6 saved: 13 Take It Back_4_6.wav
Segment 1/1 saved: 13 Take It Back_1_1.wav
FILE: /home/arvin/so-vi

100%|██████████████████████████████████████████████████████████████████████| 263.25/263.25 [00:10<00:00, 25.05seconds/s]


executing 'slice' cost 15.868s
Segment 1/2 saved: 06 Photograph_3_1.wav
Segment 2/2 saved: 06 Photograph_3_2.wav
Segment 1/2 saved: 06 Photograph_2_1.wav
Segment 2/2 saved: 06 Photograph_2_2.wav
Segment 1/1 saved: 06 Photograph_19_1.wav
Segment 1/1 saved: 06 Photograph_8_1.wav
Segment 1/5 saved: 06 Photograph_17_1.wav
Segment 2/5 saved: 06 Photograph_17_2.wav
Segment 3/5 saved: 06 Photograph_17_3.wav
Segment 4/5 saved: 06 Photograph_17_4.wav
Segment 5/5 saved: 06 Photograph_17_5.wav
Segment 1/2 saved: 06 Photograph_18_1.wav
Segment 2/2 saved: 06 Photograph_18_2.wav
Segment 1/1 saved: 06 Photograph_6_1.wav
Segment 1/2 saved: 06 Photograph_7_1.wav
Segment 2/2 saved: 06 Photograph_7_2.wav
Segment 1/1 saved: 06 Photograph_10_1.wav
Segment 1/1 saved: 06 Photograph_12_1.wav
Segment 1/1 saved: 06 Photograph_0_1.wav
Segment 1/2 saved: 06 Photograph_11_1.wav
Segment 2/2 saved: 06 Photograph_11_2.wav
Segment 1/1 saved: 06 Photograph_9_1.wav
Segment 1/1 saved: 06 Photograph_13_1.wav
Segment 1/1 s

100%|████████████████████████████████████████████████████████████████████████| 245.7/245.7 [00:09<00:00, 24.98seconds/s]


executing 'slice' cost 14.762s
Segment 1/1 saved: 08 Tenerife Sea_7_1.wav
Segment 1/1 saved: 08 Tenerife Sea_21_1.wav
Segment 1/1 saved: 08 Tenerife Sea_0_1.wav
Segment 1/1 saved: 08 Tenerife Sea_27_1.wav
Segment 1/1 saved: 08 Tenerife Sea_18_1.wav
Segment 1/1 saved: 08 Tenerife Sea_12_1.wav
Segment 1/1 saved: 08 Tenerife Sea_6_1.wav
Segment 1/1 saved: 08 Tenerife Sea_17_1.wav
Segment 1/1 saved: 08 Tenerife Sea_8_1.wav
Segment 1/1 saved: 08 Tenerife Sea_2_1.wav
Segment 1/1 saved: 08 Tenerife Sea_24_1.wav
Segment 1/1 saved: 08 Tenerife Sea_3_1.wav
Segment 1/1 saved: 08 Tenerife Sea_26_1.wav
Segment 1/1 saved: 08 Tenerife Sea_13_1.wav
Segment 1/1 saved: 08 Tenerife Sea_23_1.wav
Segment 1/1 saved: 08 Tenerife Sea_9_1.wav
Segment 1/1 saved: 08 Tenerife Sea_5_1.wav
Segment 1/1 saved: 08 Tenerife Sea_14_1.wav
Segment 1/1 saved: 08 Tenerife Sea_11_1.wav
Segment 1/1 saved: 08 Tenerife Sea_20_1.wav
Segment 1/1 saved: 08 Tenerife Sea_1_1.wav
Segment 1/1 saved: 08 Tenerife Sea_25_1.wav
Segment 1/

100%|██████████████████████████████████████████████| 193.04999999999998/193.04999999999998 [00:07<00:00, 24.49seconds/s]


executing 'slice' cost 11.503s
Segment 1/4 saved: 14 Shirtsleeves_8_1.wav
Segment 2/4 saved: 14 Shirtsleeves_8_2.wav
Segment 3/4 saved: 14 Shirtsleeves_8_3.wav
Segment 4/4 saved: 14 Shirtsleeves_8_4.wav
Segment 1/1 saved: 14 Shirtsleeves_9_1.wav
Segment 1/3 saved: 14 Shirtsleeves_4_1.wav
Segment 2/3 saved: 14 Shirtsleeves_4_2.wav
Segment 3/3 saved: 14 Shirtsleeves_4_3.wav
Segment 1/1 saved: 14 Shirtsleeves_5_1.wav
Segment 1/1 saved: 14 Shirtsleeves_0_1.wav
Segment 1/3 saved: 14 Shirtsleeves_10_1.wav
Segment 2/3 saved: 14 Shirtsleeves_10_2.wav
Segment 3/3 saved: 14 Shirtsleeves_10_3.wav
Segment 1/1 saved: 14 Shirtsleeves_6_1.wav
Segment 1/1 saved: 14 Shirtsleeves_2_1.wav
Segment 1/1 saved: 14 Shirtsleeves_1_1.wav
Segment 1/2 saved: 14 Shirtsleeves_7_1.wav
Segment 2/2 saved: 14 Shirtsleeves_7_2.wav
Segment 1/1 saved: 14 Shirtsleeves_3_1.wav


## Training

In [4]:
#@title Make dataset directory
!mkdir -p "../dataset_raw"

In [None]:
# !rm -r "dataset_raw"
# !rm -r "dataset/44k"

In [9]:
# Copy files
train_files = glob.glob('/home/arvin/so-vits-svc-fork/dataset_processed/*/chunks_clean_10/*.wav')
for file in train_files:
    shutil.copy(file, '/home/arvin/so-vits-svc-fork/dataset_raw/edsheeran/')

In [None]:
#@title Copy your dataset
#@markdown **We assume that your dataset is in your Google Drive's `so-vits-svc-fork/dataset/(speaker_name)` directory.**
# DATASET_NAME = "edsheeran" #@param {type: "string"}
# !cp -R /{DATASET_NAME}/' -t "dataset_raw/"

In [None]:
#@title Download dataset (Tsukuyomi-chan JVS)
#@markdown You can download this dataset if you don't have your own dataset.
#@markdown Make sure you agree to the license when using this dataset.
#@markdown https://tyc.rei-yumesaki.net/material/corpus/#toc6
# !wget https://tyc.rei-yumesaki.net/files/sozai-tyc-corpus1.zip
# !unzip sozai-tyc-corpus1.zip
# !mv "/content/つくよみちゃんコーパス Vol.1 声優統計コーパス（JVSコーパス準拠）/おまけ：WAV（+12dB増幅＆高音域削減）/WAV（+12dB増幅＆高音域削減）" "dataset_raw/tsukuyomi"

In [None]:
#@title Automatic preprocessing
!svc pre-resample

In [None]:
!svc pre-config

In [None]:
#@title Copy configs file
# !cp configs/44k/config.json drive/MyDrive/so-vits-svc-fork

In [None]:
F0_METHOD = "crepe" #@param ["crepe", "crepe-tiny", "parselmouth", "dio", "harvest"]
!svc pre-hubert -fm {F0_METHOD}

In [None]:
#@title Train
%load_ext tensorboard
%tensorboard --logdir drive/MyDrive/so-vits-svc-fork/logs/44k
!svc train --model-path drive/MyDrive/so-vits-svc-fork/logs/44k

## Training Cluster model

In [None]:
!svc train-cluster --output-path drive/MyDrive/so-vits-svc-fork/logs/44k/kmeans.pt

## Inference

In [None]:
#@title Get the author's voice as a source
import random
NAME = str(random.randint(1, 49))
TYPE = "fsd50k" #@param ["", "digit", "dog", "fsd50k"]
CUSTOM_FILEPATH = "" #@param {type: "string"}
if CUSTOM_FILEPATH != "":
    NAME = CUSTOM_FILEPATH
else:
    # it is extremely difficult to find a voice that can download from the internet directly
    if TYPE == "dog":
        !wget -N f"https://huggingface.co/datasets/437aewuh/dog-dataset/resolve/main/dogs/dogs_{NAME:.0000}.wav" -O {NAME}.wav
    elif TYPE == "digit":
        # george, jackson, lucas, nicolas, ...
        !wget -N f"https://github.com/Jakobovski/free-spoken-digit-dataset/raw/master/recordings/0_george_{NAME}.wav" -O {NAME}.wav
    elif TYPE == "fsd50k":
        !wget -N f"https://huggingface.co/datasets/Fhrozen/FSD50k/blob/main/clips/dev/{10000+int(NAME)}.wav" -O {NAME}.wav
    else:
        !wget -N f"https://zunko.jp/sozai/utau/voice_{"kiritan" if NAME < 25 else "itako"}{NAME % 5 + 1}.wav" -O {NAME}.wav
from IPython.display import Audio, display
display(Audio(f"{NAME}.wav"))

In [None]:
#@title Use trained model
#@markdown **Put your .wav file in `so-vits-svc-fork/audio` directory**
from IPython.display import Audio, display
!svc infer drive/MyDrive/so-vits-svc-fork/audio/{NAME}.wav -m drive/MyDrive/so-vits-svc-fork/logs/44k/ -c drive/MyDrive/so-vits-svc-fork/logs/44k/config.json
display(Audio(f"drive/MyDrive/so-vits-svc-fork/audio/{NAME}.out.wav", autoplay=True))

In [None]:
##@title Use trained model (with cluster)
!svc infer {NAME}.wav -s speaker -r 0.1 -m drive/MyDrive/so-vits-svc-fork/logs/44k/ -c drive/MyDrive/so-vits-svc-fork/logs/44k/config.json -k drive/MyDrive/so-vits-svc-fork/logs/44k/kmeans.pt
display(Audio(f"{NAME}.out.wav", autoplay=True))

### Pretrained models

In [None]:
#@title https://huggingface.co/TachibanaKimika/so-vits-svc-4.0-models/tree/main
!wget -N "https://huggingface.co/TachibanaKimika/so-vits-svc-4.0-models/resolve/main/riri/G_riri_220.pth"
!wget -N "https://huggingface.co/TachibanaKimika/so-vits-svc-4.0-models/resolve/main/riri/config.json"

In [None]:
!svc infer {NAME}.wav -c config.json -m G_riri_220.pth
display(Audio(f"{NAME}.out.wav", autoplay=True))

In [None]:
#@title https://huggingface.co/therealvul/so-vits-svc-4.0/tree/main
!wget -N "https://huggingface.co/therealvul/so-vits-svc-4.0/resolve/main/Pinkie%20(speaking%20sep)/G_166400.pth"
!wget -N "https://huggingface.co/therealvul/so-vits-svc-4.0/resolve/main/Pinkie%20(speaking%20sep)/config.json"

In [None]:
!svc infer {NAME}.wav --speaker "Pinkie {neutral}" -c config.json -m G_166400.pth
display(Audio(f"{NAME}.out.wav", autoplay=True))