## Before training

This program saves the last 3 generations of models to Google Drive. Since 1 generation of models is >1GB, you should have at least 3GB of free space in Google Drive. If you do not have such free space, it is recommended to create another Google Account.

## Installation

In [None]:
# @title Check GPU
!nvidia-smi

Tue Feb 10 19:47:28 2026       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.82.07              Driver Version: 580.82.07      CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   36C    P8              9W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+----------------------------------------------

In [None]:
# @title Mount Google Drive
from google.colab import drive

drive.mount("/content/drive")

In [1]:
# @title Install dependencies
# @markdown pip may fail to resolve dependencies and raise ERROR, but it can be ignored.
!python -m pip install -U pip wheel
%pip install -U ipython

# @markdown Branch (for development)
BRANCH = "none"  # @param {"type": "string"}
if BRANCH == "none":
    %pip install -U so-vits-svc-fork
else:
    %pip install -U git+https://github.com/34j/so-vits-svc-fork.git@{BRANCH}

Collecting pip
  Downloading pip-26.0.1-py3-none-any.whl.metadata (4.7 kB)
Downloading pip-26.0.1-py3-none-any.whl (1.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m37.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 24.1.2
    Uninstalling pip-24.1.2:
      Successfully uninstalled pip-24.1.2
Successfully installed pip-26.0.1
Collecting ipython
  Downloading ipython-9.10.0-py3-none-any.whl.metadata (4.6 kB)
Collecting ipython-pygments-lexers>=1.0.0 (from ipython)
  Downloading ipython_pygments_lexers-1.1.1-py3-none-any.whl.metadata (1.1 kB)
Collecting jedi>=0.18.1 (from ipython)
  Downloading jedi-0.19.2-py2.py3-none-any.whl.metadata (22 kB)
Collecting stack_data>=0.6.0 (from ipython)
  Downloading stack_data-0.6.3-py3-none-any.whl.metadata (18 kB)
Collecting traitlets>=5.13.0 (from ipython)
  Downloading traitlets-5.14.3-py3-none-any.whl.metadata (10 kB)
C

Collecting so-vits-svc-fork
  Downloading so_vits_svc_fork-4.2.30-py3-none-any.whl.metadata (36 kB)
Collecting cm-time>=0.1.2 (from so-vits-svc-fork)
  Downloading cm_time-0.1.2-py3-none-any.whl.metadata (5.0 kB)
Collecting lightning>=2.5.5 (from so-vits-svc-fork)
  Downloading lightning-2.6.1-py3-none-any.whl.metadata (44 kB)
Collecting pebble>=5.1.3 (from so-vits-svc-fork)
  Downloading pebble-5.2.0-py3-none-any.whl.metadata (3.8 kB)
Collecting praat-parselmouth>=0.4.6 (from so-vits-svc-fork)
  Downloading praat_parselmouth-0.4.7-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (2.9 kB)
Collecting psutil>=7.1.2 (from so-vits-svc-fork)
  Downloading psutil-7.2.2-cp36-abi3-manylinux2010_x86_64.manylinux_2_12_x86_64.manylinux_2_28_x86_64.whl.metadata (22 kB)
Collecting pysimplegui-4-foss>=4.60.4.1 (from so-vits-svc-fork)
  Downloading PySimpleGUI_4_foss-4.60.4.1-py3-none-any.whl.metadata (1.1 kB)
Collecting pyworld>=0.3.5 (from so-vits-svc-fork)
  Downloading pyworld-

## Training

In [None]:
# @title Make dataset directory
!mkdir -p "dataset_raw"

In [None]:
# @title Copy your dataset
# @markdown **We assume that your dataset is in your Google Drive's `so-vits-svc-fork/dataset/(speaker_name)` directory.**
DATASET_NAME = "kiritan"  # @param {type: "string"}
!cp -R /content/drive/MyDrive/so-vits-svc-fork/dataset/{DATASET_NAME}/ -t "dataset_raw/"

In [None]:
# @title Download dataset (Tsukuyomi-chan JVS)
# @markdown You can download this dataset if you don't have your own dataset.
# @markdown Make sure you agree to the license when using this dataset.
# @markdown https://tyc.rei-yumesaki.net/material/corpus/#toc6
# !wget -N https://tyc.rei-yumesaki.net/files/voice/tyc-corpus1.zip
# !unzip -O sjis tyc-corpus1.zip
# !mv "/content/つくよみちゃんコーパス Vol.1 声優統計コーパス（JVSコーパス準拠）/おまけ：WAV（+12dB増幅＆高音域削減）/WAV（+12dB増幅＆高音域削減）" "dataset_raw/tsukuyomi"

In [None]:
# @title Automatic preprocessing
!svc pre-resample

In [None]:
!svc pre-config

In [None]:
F0_METHOD = "dio"  # @param ["crepe", "crepe-tiny", "parselmouth", "dio", "harvest"]
!svc pre-hubert -fm {F0_METHOD}

In [None]:
# @title Train
%load_ext tensorboard
%tensorboard --logdir drive/MyDrive/so-vits-svc-fork/logs/44k

In [None]:
!svc train --model-path drive/MyDrive/so-vits-svc-fork/logs/44k

## Training Cluster model

In [None]:
!svc train-cluster --output-path drive/MyDrive/so-vits-svc-fork/logs/44k/kmeans.pt

## Inference

In [None]:
# @title Get the author's voice as a source
import random

NAME = str(random.randint(1, 49))
TYPE = "fsd50k"  # @param ["", "digit", "dog", "fsd50k"]
CUSTOM_FILEPATH = ""  # @param {type: "string"}
if CUSTOM_FILEPATH != "":
    NAME = CUSTOM_FILEPATH
else:
    # it is extremely difficult to find a voice that can download from the internet directly
    if TYPE == "dog":
        !wget -N f"https://huggingface.co/datasets/437aewuh/dog-dataset/resolve/main/dogs/dogs_{NAME:.0000}.wav" -O {NAME}.wav
    elif TYPE == "digit":
        # george, jackson, lucas, nicolas, ...
        !wget -N f"https://github.com/Jakobovski/free-spoken-digit-dataset/raw/master/recordings/0_george_{NAME}.wav" -O {NAME}.wav
    elif TYPE == "fsd50k":
        !wget -N f"https://huggingface.co/datasets/Fhrozen/FSD50k/blob/main/clips/dev/{10000+int(NAME)}.wav" -O {NAME}.wav
    else:
        !wget -N f"https://zunko.jp/sozai/utau/voice_{"kiritan" if NAME < 25 else "itako"}{NAME % 5 + 1}.wav" -O {NAME}.wav
from IPython.display import Audio, display

display(Audio(f"{NAME}.wav"))

In [None]:
# @title Use trained model
# @markdown **Put your .wav file in `so-vits-svc-fork/audio` directory**
from IPython.display import Audio, display

# !svc infer drive/MyDrive/so-vits-svc-fork/audio/{NAME}.wav -m drive/MyDrive/so-vits-svc-fork/logs/44k/ -c drive/MyDrive/so-vits-svc-fork/logs/44k/config.json
NAME = 'reference.mp3'
display(Audio(f"drive/MyDrive/so-vits-svc-fork/audio/{NAME}.out.wav", autoplay=True))

In [None]:
##@title Use trained model (with cluster)
!svc infer {NAME}.wav -s speaker -r 0.1 -m drive/MyDrive/so-vits-svc-fork/logs/44k/ -c drive/MyDrive/so-vits-svc-fork/logs/44k/config.json -k drive/MyDrive/so-vits-svc-fork/logs/44k/kmeans.pt
display(Audio(f"{NAME}.out.wav", autoplay=True))

### Pretrained models

In [None]:
# @title https://huggingface.co/TachibanaKimika/so-vits-svc-4.0-models/tree/main
!wget -N "https://huggingface.co/TachibanaKimika/so-vits-svc-4.0-models/resolve/main/riri/G_riri_220.pth"
!wget -N "https://huggingface.co/TachibanaKimika/so-vits-svc-4.0-models/resolve/main/riri/config.json"

In [None]:
!svc infer {NAME}.wav -c config.json -m G_riri_220.pth
display(Audio(f"{NAME}.out.wav", autoplay=True))

In [None]:
# @title https://huggingface.co/therealvul/so-vits-svc-4.0/tree/main
!wget -N "https://huggingface.co/therealvul/so-vits-svc-4.0/resolve/main/Pinkie%20(speaking%20sep)/G_166400.pth"
!wget -N "https://huggingface.co/therealvul/so-vits-svc-4.0/resolve/main/Pinkie%20(speaking%20sep)/config.json"

In [None]:
!svc infer {NAME}.wav --speaker "Pinkie {neutral}" -c config.json -m G_166400.pth
display(Audio(f"{NAME}.out.wav", autoplay=True))

In [None]:
!pip install -q so-vits-svc-fork torch torchaudio soundfile

In [None]:
# 1. Установка (запустить один раз)


# 2. Загрузка вашего вокала
from google.colab import files
print("⬆️ Загрузите 2–3 сек ЧИСТОГО вокала (без музыки)")
uploaded = files.upload()
ref_path = list(uploaded.keys())[0]

# 3. Генерация слова
from so_vits_svc_fork.inference.core import Svc
import soundfile as sf

svc = Svc(
    "https://huggingface.co/spaces/innnky/sovits_pretrained/resolve/main/G_10000.pth",
    "https://huggingface.co/spaces/innnky/sovits_pretrained/resolve/main/config.json"
)

audio, sr = svc.infer(
    text="напол",          # ← ЗАМЕНИТЕ НА ВАШЕ СЛОВО всё напол (наапол)
    reference_audio=ref_path,
    language="ru"
)

sf.write("/content/output_word.wav", audio, sr)
print("✅ Готово! Скачайте файл ниже:")

# 4. Прослушать результат
from IPython.display import Audio, display
display(Audio("/content/output_word.wav"))