<a href="https://colab.research.google.com/github/MLo7Ghinsan/DiffSinger_colab_notebook_MLo7/blob/main/DiffSinger_colab_notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# _**[DiffSinger](https://github.com/openvpi/DiffSinger)**_
_Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS)_

____

Note:
- This notebook will get update semi-frequently based from the feedback or response from users

\
____
\
#### **This notebook is an edited copy of Kei's Diffsinger [colab notebook](https://colab.research.google.com/drive/1kUg9dz8PPH92NfnLZwgq0_9B9an39t1J?usp=sharing)**
####**This notebook is maintained by MLo7**
\
___

```Expand this cell for more details```

#### IMPORTANT NOTE:

- your_speaker_folder's folder name will be used as *spk_name* so please be careful about your file naming
- colab notebook primarily uses python; thus space in file name or folder path may be invalid
- for an in-depth guide for SVS training and/or labeling, please see [SVS Singing Voice Database - Tutorial](https://docs.google.com/document/d/1uMsepxbdUW65PfIWL1pt2OM6ZKa5ybTTJOpZ733Ht6s/edit?usp=sharing)

This notebook converts your data (lab + wav) to compatible format via [nnsvs-db-converter](https://github.com/UtaUtaUtau/nnsvs-db-converter)

It is advised to edit your data using [SlurCutter](https://github.com/openvpi/MakeDiffSinger/releases) for a more refined data for your pitch model

Zip file format [example](https://github.com/MLo7Ghinsan/DiffSinger_colab_notebook_MLo7/releases/tag/ref):
<pre>
#single speaker (lab + wav | ds + wav)
your_zip.zip:
    |
    |
    your_speaker_folder:
        |
        |
        data_1.wav
        data_1.lab (or.ds)
        .
        data_2.wav
        data_2.lab (or.ds)
        .
        data_3.wav
        data_3.lab (or.ds)
        .
        ...
</pre>
<pre>
#single speaker (csv + wav)
your_zip.zip:
    |
    |
    your_speaker_folder:
        |
        |
        wavs (folder named "wavs" containing all the wavs)
        .
        transcriptions.csv
</pre>
<pre>
#multi speaker (lab + wav | ds + wav)
your_zip.zip:
    |
    |
    your_speaker_folder_1:
        |
        |
        data_1.wav
        data_1.lab (or.ds)
        .
        data_2.wav
        data_2.lab (or.ds)
        .
        data_3.wav
        data_3.lab (or.ds)
        .
        ...
    your_speaker_folder_2:
        |
        |
        data_1.wav
        data_1.lab (or.ds)
        .
        data_2.wav
        data_2.lab (or.ds)
        .
        data_3.wav
        data_3.lab (or.ds)
        .
        ...
</pre>
<pre>
#multi speaker (csv + wav)
your_zip.zip:
    |
    |
    your_speaker_folder_1:
        |
        |
        wavs (folder named "wavs" containing all the wavs)
        .
        transcriptions.csv
    your_speaker_folder_2:
        |
        |
        wavs (folder named "wavs" containing all the wavs)
        .
        transcriptions.csv

</pre>

_**Credits:** _

  - [openvpi](https://openvpi.github.io/) for DiffSinger fork and more

  - [UtaUtaUtau](https://utautautau.neocities.org/) for nnsvs-db-converter

  - [Kei](https://pronouns.page/@kei.wendt06) for the original notebook

  - [MLo7](https://github.com/MLo7Ghinsan) for the notebook edit

  - [PixPrucer](https://twitter.com/PixPrucer?s=20) for an in-depth SVS guide

# **Setup**

In [None]:
#@markdown Select this if you don't like seeing warnings throughout your training since most of the time the warnings are nothing to worry about

#@markdown ****WARNING**** this will also hides the error message
no_warn = False # @param {type:"boolean"}

#@markdown <font size="-1.5"> you can always come back and enable or disable this cell without re-running the installation

In [None]:
#@title # Mount Google Drive and Setup


from IPython.display import clear_output
from IPython.display import Audio, display, HTML
import os
from google.colab import drive
drive.mount("/content/drive")

if not os.path.exists("/content/play_sound"):
    os.makedirs("/content/play_sound")
%cd /content/play_sound
!wget -O setup_complete.wav https://github.com/MLo7Ghinsan/MLo7_Diff-SVC_models/releases/download/audio/setup_complete.wav
%cd /content
!rm -rf /content/sample_data
!apt-get install aria2
clear_output()

!git clone https://github.com/UtaUtaUtau/nnsvs-db-converter
!git clone https://github.com/openvpi/DiffSinger.git --branch v2.1.0

clear_output()
!pip install torch==1.13.0 torchvision==0.14.0 torchaudio==0.13.0
clear_output()
!pip install -r /content/DiffSinger/requirements.txt
clear_output()
!aria2c https://github.com/openvpi/vocoders/releases/download/nsf-hifigan-v1/nsf_hifigan_20221211.zip
!aria2c https://github.com/openvpi/DiffSinger/releases/download/v2.1.0/rmvpe.zip
!unzip -q /content/nsf_hifigan_20221211.zip -d /content/DiffSinger/checkpoints
!unzip -q /content/rmvpe.zip -d /content/DiffSinger/checkpoints
!unzip -q /content/rmvpe.zip -d /content/MakeDiffSinger/variance-temp-solution/assets
!rm /content/nsf_hifigan_20221211.zip
!rm /content/rmvpe.zip
clear_output()
!pip install --upgrade tensorboard
clear_output()
!pip install protobuf #protobuf==3.20
clear_output()
!pip install onnxruntime
clear_output()
#shit tons of clear output cus i dont wanna see anything <3

print("setup complete!")
print("|")
print("|")
print("|")

chika_dance = '<img src="https://cdn.discordapp.com/attachments/816517150175920138/1090112497446563950/icegif-2013.gif"/>'
display(HTML(chika_dance))

with open("/content/play_sound/setup_complete.wav", "rb") as f:
    setup_complete_sound = f.read()
Audio(data=setup_complete_sound, autoplay=True)

# **Preprocess data for training**

In [None]:
#@title #Extract Data
#@markdown ___
%cd /content
#@markdown this cell will create a folder name [raw_data] in the root folder and extract your data into it

#@markdown <font size="-1.5"> See [data type zip format](https://colab.research.google.com/github/MLo7Ghinsan/DiffSinger_colab_notebook_MLo7/blob/main/DiffSinger_colab_notebook.ipynb#scrollTo=ZxsTaNBJLd7Y) for examples (drop down the introduction cell)

data_type = "lab + wav (NNSVS format)" # @param ["lab + wav (NNSVS format)", "csv + wav (DiffSinger format)", "ds + wav (DiffSinger format)"]

#@markdown <font size="-1.5"> The path to your data zip file

data_zip_path = "" #@param {type:"string"}

#@markdown ___

#@markdown this lower section is for variance training (lab + wav ONLY)

#@markdown <font size="-1.5"> Use this if you don't have .cvs that is for variance dataset (skippable if you are doing acoustic)

estimate_midi = False # @param {type:"boolean"}


all_shits = "/content/raw_data"
all_shits_not_wav_n_lab = "/content/raw_data/diffsinger_db"

import os
import zipfile
import csv
import json

if not os.path.exists(all_shits_not_wav_n_lab):
  os.makedirs(all_shits_not_wav_n_lab)

# using 'if not' bc i edited the wrong section which im also too lazy to fix it <3
if not data_type == "lab + wav (NNSVS format)":
    #changed back to !unzip
    !unzip {data_zip_path} -d {all_shits_not_wav_n_lab}
    clear_output()
else:
    !unzip {data_zip_path} -d {all_shits_not_wav_n_lab}
    for root, dirs, files in os.walk(all_shits):
        for filename in files:
            if filename.endswith(".lab"):
                file_path = os.path.join(root, filename)
                with open(file_path, "r") as file:
                    file_data = file.read()
                file_data = file_data.replace("SP", "pau")
                file_data = file_data.replace("br", "AP")
                with open(file_path, "w") as file:
                    file.write(file_data)

# for funny auto dict generator lmao
out = "/content/DiffSinger/dictionaries/custom_dict.txt"

phonemes = set()

def is_excluded(phoneme):
    return phoneme in ["pau", "AP", "SP"]

if data_type == "lab + wav (NNSVS format)":
    phoneme_folder_path = all_shits
    for root, dirs, files in os.walk(phoneme_folder_path):
        for file in files:
            if file.endswith(".lab"):
                fpath = os.path.join(root, file)
                with open(fpath, "r") as lab_file:
                    for line in lab_file:
                        line = line.strip()
                        if line:
                            phoneme = line.split()[2]
                            if not is_excluded(phoneme):
                                phonemes.add(phoneme)
elif data_type == "csv + wav (DiffSinger format)":
    phoneme_folder_path = all_shits_not_wav_n_lab
    for root, dirs, files in os.walk(phoneme_folder_path):
        for file in files:
            if file.endswith(".csv"):
                fpath = os.path.join(root, file)
                with open(fpath, "r", newline="") as csv_file:
                    csv_reader = csv.DictReader(csv_file)
                    for row in csv_reader:
                        if "ph_seq" in row:
                            ph_seq = row["ph_seq"].strip()
                            for phoneme in ph_seq.split():
                                if not is_excluded(phoneme):
                                    phonemes.add(phoneme)
else:
    phoneme_folder_path = all_shits
    for root, dirs, files in os.walk(phoneme_folder_path):
        for file in files:
            if file.endswith(".json"):
                fpath = os.path.join(root, file)
                with open(fpath, "r") as json_file:
                    row = json.load(json_file)
                    ph_seq = row["ph_seq"]
                    for phoneme in ph_seq.split():
                        if not is_excluded(phoneme):
                            phonemes.add(phoneme)

with open(out, "w") as f:
    for phoneme in sorted(phonemes):
        f.write(phoneme + "	" + phoneme + "\n")

# for vowels and consonants.txt.... well adding luquid type for uta's script
dict_path = out
vowel_types = {"a", "i", "u", "e", "o", "N", "M", "NG"}
liquid_types = {"y", "w", "n", "m", "ng", "l", "r"} # r for english labels, it should be fine with jp too
vowel_data = []
consonant_data = []
liquid_data = []

with open(dict_path, "r") as f:
    for line in f:
        phoneme, _ = line.strip().split("\t")
        if phoneme[0] in vowel_types:
            vowel_data.append(phoneme)
        elif phoneme[0] in liquid_types:
            liquid_data.append(phoneme)
        else:
            consonant_data.append(phoneme)

vowel_data.sort()
liquid_data.sort()
consonant_data.sort()
directory = os.path.dirname(dict_path)

# make txt for language json file
vowel_txt_path = os.path.join(directory, "vowels.txt")
with open(vowel_txt_path, "w") as f:
    f.write(" ".join(vowel_data))
liquid_txt_path = os.path.join(directory, "liquids.txt")
with open(liquid_txt_path, "w") as f:
    f.write(" ".join(liquid_data))
consonant_txt_path = os.path.join(directory, "consonants.txt")
with open(consonant_txt_path, "w") as f:
    f.write(" ".join(consonant_data))


# here's a funny json append
with open(vowel_txt_path, "r") as f:
    vowel_data = f.read().split()
with open(liquid_txt_path, "r") as f:
    liquid_data = f.read().split()
with open(consonant_txt_path, "r") as f:
    consonant_data = f.read().split()
phones4json = {"vowels": vowel_data, "liquids": liquid_data}
with open("/content/nnsvs-db-converter/lang.sample.json", "w") as rawr:
    json.dump(phones4json, rawr, indent=4)


if data_type == "lab + wav (NNSVS format)":
    db_converter_script = "/content/nnsvs-db-converter/db_converter.py"
    for raw_folder_name in os.listdir(all_shits_not_wav_n_lab):
        raw_folder_path = os.path.join(all_shits_not_wav_n_lab, raw_folder_name)
        if os.path.isdir(raw_folder_path):
            if estimate_midi:
                !python {db_converter_script} -s 50 -S 20 -l 30 -m -c -L "/content/nnsvs-db-converter/lang.sample.json" --folder {raw_folder_path} 2> /dev/null

            else:
                #!python {db_converter_script} -s 2 --folder {raw_folder_path} 2> /dev/null
                !python {db_converter_script} -s 50 -S 20 -l 30 -L "/content/nnsvs-db-converter/lang.sample.json" --folder {raw_folder_path} 2> /dev/null
    clear_output()
else:
    pass

for raw_folder_name in os.listdir(all_shits_not_wav_n_lab):
    raw_folder_path = os.path.join(all_shits_not_wav_n_lab, raw_folder_name)
    !rm -rf {raw_folder_path}/*.wav {raw_folder_path}/*.lab
    !mv {raw_folder_path}/diffsinger_db/* {raw_folder_path} 2> /dev/null
    !rm -rf {raw_folder_path}/diffsinger_db
    #!cp {raw_folder_path}/wavs/*.wav {raw_folder_path}

print("extraction complete!")
print("|")
print("|")
print("|")
print("I'm also nice enough to convert your data and also write your dict.txt lmao. You are welcome :)")

In [None]:
#@title #Edit Config
#@markdown ___

import os
import yaml
import random #for the random test files lmaoz

%cd /content
clear_output()
#@markdown <font size="-1.5"> The training type you want to do
config_type = "variance" # @param ["acoustic", "variance"]
config_cap = config_type.upper()

spk_name = [folder_name for folder_name in os.listdir(all_shits_not_wav_n_lab) if os.path.isdir(os.path.join(all_shits_not_wav_n_lab, folder_name))]
# i used spk_name for something else cus i forgor now imma just copy and paste it
spk_names = [folder_name for folder_name in os.listdir(all_shits_not_wav_n_lab) if os.path.isdir(os.path.join(all_shits_not_wav_n_lab, folder_name))]
num_spk = len(spk_name)
raw_dir = []
for folder_name in spk_name:
    folder_path = os.path.join(all_shits_not_wav_n_lab, folder_name)
    raw_dir.append(folder_path)
if num_spk == 1:
    singer_type = "SINGLE-SPEAKER"
    diff_loss_type = "l2"
    f0_emb = "continuous"
    use_spk_id = False
    all_wav_files = []
    for root, dirs, files in os.walk("/content/raw_data/diffsinger_db"):
        for file in files:
            if file.endswith(".wav"):
                full_path = os.path.join(root, file)
                all_wav_files.append(full_path)
    random.shuffle(all_wav_files)
    random_ass_wavs = all_wav_files[:3]
    random_ass_test_files = [os.path.splitext(os.path.basename(file))[0] for file in random_ass_wavs]

else:
    singer_type = "MULTI-SPEAKER"
    diff_loss_type = "l1"
    f0_emb = "discrete"
    use_spk_id = True
    folder_to_id = {folder_name: i for i, folder_name in enumerate(spk_name)}
    random_ass_test_files = []
    for folder_path in raw_dir:
        audio_files = [f[:-4] for f in os.listdir(folder_path + "/wavs") if f.endswith(".wav")]
        folder_name = os.path.basename(folder_path)
        folder_id = folder_to_id.get(folder_name, -1)
        prefixed_audio_files = [f"{folder_id}:{audio_file}" for audio_file in audio_files]
        random_ass_test_files.extend(prefixed_audio_files[:3])
spk_id = []
for i, spk_name in enumerate(spk_name):
    spk_id_format = f"{i}:{spk_name}"
    spk_id.append(spk_id_format)

#@markdown <font size="-1.5"> Path to where you want to save your binary data for later use
binary_save_dir = "t1" #@param{type:"string"}

#@markdown <font size="-1.5"> Pitch extractor algorithm

f0_ext = "parselmouth" # @param ["parselmouth", "rmvpe"]
if f0_ext == "rmvpe":
    pe_ckpt_pth = "checkpoints/rmvpe/model.pt"
else:
    pe_ckpt_pth = None

#@markdown <font size="-1.5"> Select this is you want to use data augmentation (default pitch shift and time stretch values)
data_aug = False #@param {type:"boolean"}

#@markdown <font size="-1.5"> Step interval of when your model will be validate and save
save_interval = 100 #@param {type:"slider", min:100, max:10000, step:100}

#@markdown <font size="-1.5"> Your model save path
save_dir = "t2" #@param{type:"string"}

if config_type == "acoustic":
    with open("/content/DiffSinger/configs/acoustic.yaml", "r") as config:
        bitch_ass_config = yaml.safe_load(config)
    bitch_ass_config["speakers"] = spk_names
    bitch_ass_config["test_prefixes"] = random_ass_test_files
    bitch_ass_config["raw_data_dir"] = raw_dir
    bitch_ass_config["num_spk"] = num_spk
    bitch_ass_config["use_spk_id"] = use_spk_id
    #bitch_ass_config["spk_ids"] = spk_id
    bitch_ass_config["diff_loss_type"] = diff_loss_type
    bitch_ass_config["f0_embed_type"] = f0_emb
    bitch_ass_config["binary_data_dir"] = binary_save_dir
    bitch_ass_config["dictionary"] = "dictionaries/custom_dict.txt"
    bitch_ass_config["augmentation_args"]["random_pitch_shifting"]["enabled"] = data_aug
    bitch_ass_config["augmentation_args"]["random_time_stretching"]["enabled"] = data_aug
    bitch_ass_config["use_key_shift_embed"] = data_aug
    bitch_ass_config["use_speed_embed"] = data_aug
    bitch_ass_config["max_batch_size"] = 9 #ive never tried reaching the limit so ill trust kei's setting for this
    bitch_ass_config["val_check_interval"] = save_interval
    bitch_ass_config["pe"] = f0_ext
    bitch_ass_config["pe_ckpt"] = pe_ckpt_pth
    with open("/content/DiffSinger/configs/acoustic.yaml", "w") as config:
        yaml.dump(bitch_ass_config, config)
else:
    with open("/content/DiffSinger/configs/variance.yaml", "r") as config:
        bitch_ass_config = yaml.safe_load(config)
    bitch_ass_config["speakers"] = spk_names
    bitch_ass_config["test_prefixes"] = random_ass_test_files
    bitch_ass_config["raw_data_dir"] = raw_dir
    bitch_ass_config["num_spk"] = num_spk
    bitch_ass_config["use_spk_id"] = use_spk_id
    bitch_ass_config["diff_loss_type"] = diff_loss_type
    bitch_ass_config["binary_data_dir"] = binary_save_dir
    bitch_ass_config["dictionary"] = "dictionaries/custom_dict.txt"
    bitch_ass_config["max_batch_size"] = 9 #ive never tried reaching the limit so ill trust kei's setting for this
    bitch_ass_config["val_check_interval"] = save_interval
    bitch_ass_config["pe"] = f0_ext # i think variance uses it for pitch ref as ground-truth for pitch training soooo
    bitch_ass_config["pe_ckpt"] = pe_ckpt_pth #same goes to this one
    with open("/content/DiffSinger/configs/variance.yaml", "w") as config:
        yaml.dump(bitch_ass_config, config)

os.makedirs(save_dir, exist_ok=True)
search_text = "        args_work_dir = os.path.join("
replacement = f"        args_work_dir = '{save_dir}'"
with open("/content/DiffSinger/utils/hparams.py", "r") as file:
    lines = file.readlines()
for i, line in enumerate(lines):
    if search_text in line:
        lines[i] = replacement + "\n"
        break
with open("/content/DiffSinger/utils/hparams.py", "w") as file:
        file.writelines(lines)
#incase if anyone wanna change it lmao
search_text_alt = "        args_work_dir = '"
replacement_alt = f"        args_work_dir = '{save_dir}'"
with open("/content/DiffSinger/utils/hparams.py", "r") as file:
    lines = file.readlines()
for i, line in enumerate(lines):
    if search_text_alt in line:
        lines[i] = replacement_alt + "\n"
        break
with open("/content/DiffSinger/utils/hparams.py", "w") as file:
        file.writelines(lines)

relative_p = "            relative_path = filepath.relative_to(Path('.').resolve())"
relative_change = "            relative_path = filepath.relative_to(Path('/content').resolve())"
with open("/content/DiffSinger/utils/training_utils.py", "r") as file:
    lines = file.readlines()
for i, line in enumerate(lines):
    if relative_p in line:
        lines[i] = relative_change + "\n"
        break
with open("/content/DiffSinger/utils/training_utils.py", "w") as file:
        file.writelines(lines)
relative_p_2 = "        relative_path = filepath.relative_to(Path('.').resolve())"
relative_change_2 = "        relative_path = filepath.relative_to(Path('/content').resolve())"
with open("/content/DiffSinger/utils/training_utils.py", "r") as file:
    lines_2 = file.readlines()
for i, line in enumerate(lines):
    if relative_p_2 in line:
        lines_2[i] = relative_change_2 + "\n"
        break
with open("/content/DiffSinger/utils/training_utils.py", "w") as file:
        file.writelines(lines_2)

if not estimate_midi:
    !python /content/DiffSinger/scripts/migrate.py txt /content/raw_data/diffsinger_db/transcriptions.txt 2> /dev/null
else:
  pass

print("config updated! see below for config's information")
print("|")
print("|")
print("|")
print(f"+++---{config_cap} {singer_type} TRAINING---+++")
print("|")
print("|")
print("|")
print("+++---user's settings---+++")
print("\n")
print(f"speaker name: {spk_names}")
print("\n")
print(f"data augmentation: {data_aug}")
print("\n")
print(f"pitch extractor: {f0_ext}")
print("\n")
print(f"binary data save directory: {binary_save_dir}")
print("\n")
print(f"your model will be saved every: {save_interval} steps")
print("\n")
print(f"your model will be saved to: {save_dir}")
print("\n")
print("==========================================================================================")
print("\n")
print("+++---other auto-defined settings---+++")
print("\n")
print(f"test files (auto selected): {random_ass_test_files}")
print("\n")
print("dictionary (auto generated): custom_dict.txt")
print("\n")
print("max_sentences: 9")
print("\n")
print("==========================================================================================")
print("\n")
print("if you don't like or disagree with any of these options,")
print(f"you can go and edit the config at [/content/DiffSinger/configs/{config_type}.yaml]")


config updated! see below for config's information
|
|
|
+++---VARIANCE SINGLE-SPEAKER TRAINING---+++
|
|
|
+++---user's settings---+++


speaker name: ['pj_dat']


data augmentation: False


pitch extractor: parselmouth


binary data save directory: t1


your model will be saved every: 100 steps


your model will be saved to: t2




+++---other auto-defined settings---+++


test files (auto selected): ['pjs001_speech_seg000', 'pjs001_singing_seg000']


dictionary (auto generated): custom_dict.txt


max_sentences: 9




if you don't like or disagree with any of these options,
you can go and edit the config at [/content/DiffSinger/configs/variance.yaml]


In [None]:
#@markdown # Preprocess data
import os

# idk i just feel like 800 is a lil low for some people part 2
new_f0_max = 1600
og_script = "/content/DiffSinger/utils/binarizer_utils.py"
with open(og_script, 'r') as file:
    mate = file.read()
up_f0_val = mate.replace("f0_max = 800", f"f0_max = {new_f0_max}")
with open(og_script, 'w') as file:
    file.write(up_f0_val)

training_config = f"/content/DiffSinger/configs/{config_type}.yaml"

%cd /content/DiffSinger
os.environ['PYTHONPATH']='.'
if no_warn:
    !CUDA_VISIBLE_DEVICES=0 python /content/DiffSinger/scripts/binarize.py --config {training_config} 2> /dev/null
else:
    !CUDA_VISIBLE_DEVICES=0 python /content/DiffSinger/scripts/binarize.py --config {training_config}

/content/DiffSinger
| Hparams chains:  ['configs/base.yaml', '/content/DiffSinger/configs/variance.yaml']
| Hparams: 
[0;33mK_step[0m: 1000, [0;33maccumulate_grad_batches[0m: 1, [0;33maudio_num_mel_bins[0m: 128, [0;33maudio_sample_rate[0m: 44100, [0;33mbase_config[0m: ['configs/base.yaml'], 
[0;33mbinarization_args[0m: {'shuffle': True, 'num_workers': 0, 'prefer_ds': False}, [0;33mbinarizer_cls[0m: preprocessing.variance_binarizer.VarianceBinarizer, [0;33mbinary_data_dir[0m: t1, [0;33mbreathiness_db_max[0m: -20.0, [0;33mbreathiness_db_min[0m: -96.0, 
[0;33mbreathiness_smooth_width[0m: 0.12, [0;33mclip_grad_norm[0m: 1, [0;33mdataloader_prefetch_factor[0m: 2, [0;33mddp_backend[0m: nccl, [0;33mdictionary[0m: dictionaries/custom_dict.txt, 
[0;33mdiff_accelerator[0m: ddim, [0;33mdiff_decoder_type[0m: wavenet, [0;33mdiff_loss_type[0m: l2, [0;33mdropout[0m: 0.1, [0;33mds_workers[0m: 4, 
[0;33mdur_prediction_args[0m: {'arch': 'fs2', 'dropout': 0.1, 'hi

# **Training**

In [None]:
#@markdown # Tensorboard
#@markdown ___

#@markdown For monitoring training progress. Enter the directory to your model save location (save_dir)

#@markdown <font size="-1.5"> if you are continuing from latest checkpoint, this would be the directory of a folder that you saved your model, it should have [lightning_logs] folder in it

logs = "" #@param{type:"string"}
%reload_ext tensorboard
%tensorboard --logdir {logs}/lightning_logs

In [None]:
#@markdown #Train your model
%cd /content/DiffSinger

#@markdown ___

#@markdown ###**Only edit this section if you want to resume training**
resume_training = False #@param {type:"boolean"}

#@markdown <font size="-1.5"> path to the config you got from training
re_config_path = "" #@param {type:"string"}

#@markdown <font size="-1.5"> path to the resume model's **FOLDER** (should mostlikely be the path you put above minus [ /config.yaml ])

model_dir = "" #@param {type:"string"}

if resume_training:
    save_dir = model_dir
    config_path = re_config_path
    search_text = "        args_work_dir = os.path.join("
    replacement = f"        args_work_dir = '{model_dir}'"
    with open("/content/DiffSinger/utils/hparams.py", "r") as file:
        lines = file.readlines()
    for i, line in enumerate(lines):
        if search_text in line:
            lines[i] = replacement + "\n"
            break
    with open("/content/DiffSinger/utils/hparams.py", "w") as file:
            file.writelines(lines)
    #incase if anyone wanna change it lmao
    search_text_alt = "        args_work_dir = '"
    replacement_alt = f"        args_work_dir = '{model_dir}'"
    with open("/content/DiffSinger/utils/hparams.py", "r") as file:
        lines = file.readlines()
    for i, line in enumerate(lines):
        if search_text_alt in line:
            lines[i] = replacement_alt + "\n"
            break
    with open("/content/DiffSinger/utils/hparams.py", "w") as file:
            file.writelines(lines)

    relative_p = "            relative_path = filepath.relative_to(Path('.').resolve())"
    relative_change = "            relative_path = filepath.relative_to(Path('/content').resolve())"
    with open("/content/DiffSinger/utils/training_utils.py", "r") as file:
        lines = file.readlines()
    for i, line in enumerate(lines):
        if relative_p in line:
            lines[i] = relative_change + "\n"
            break
    with open("/content/DiffSinger/utils/training_utils.py", "w") as file:
            file.writelines(lines)
    relative_p_2 = "        relative_path = filepath.relative_to(Path('.').resolve())"
    relative_change_2 = "        relative_path = filepath.relative_to(Path('/content').resolve())"
    with open("/content/DiffSinger/utils/training_utils.py", "r") as file:
        lines_2 = file.readlines()
    for i, line in enumerate(lines):
        if relative_p_2 in line:
            lines_2[i] = relative_change_2 + "\n"
            break
    with open("/content/DiffSinger/utils/training_utils.py", "w") as file:
            file.writelines(lines_2)
    !cp {model_dir}/dictionary.txt /content/DiffSinger/dictionaries/custom_dict.txt

else:
    config_path = training_config

if no_warn:
    !CUDA_VISIBLE_DEVICES=0 python /content/DiffSinger/scripts/train.py --config {config_path} --exp_name {save_dir} --reset 2> /dev/null
else:
    !CUDA_VISIBLE_DEVICES=0 python /content/DiffSinger/scripts/train.py --config {config_path} --exp_name {save_dir} --reset

# **Convert model to ONNX format**

In [None]:
#@markdown # Build OpenUtau compatible voicebank
#@markdown ___
%cd /content
from IPython.display import clear_output
clear_output()
import os
import zipfile
import shutil
# to counter IF the user is to re-run this cell <3
if os.path.exists("/content/OU_compatible_files"):
    shutil.rmtree("/content/OU_compatible_files")
    os.remove("/content/jpn_dict.txt")
else:
    pass

#@markdown <font size="-1.5"> select this if you don't want to see the onnx converter's output
no_output = True # @param {type:"boolean"}

#@markdown <font size="-1.5"> path to your **ACOUSTIC CHECKPOINT**: automatically use latest checkpoint that is in the same folder
acoustic_checkpoint_path = "" #@param{type:"string'}
acoustic_folder_name = os.path.basename(os.path.dirname(acoustic_checkpoint_path)) + "_acoustic"
acoustic_folder_path = os.path.dirname(acoustic_checkpoint_path)

#@markdown <font size="-1.5"> path to your **VARIANCE CHECKPOINT** (leave blank if you don't have any): automatically use latest checkpoint that is in the same folder
variance_checkpoint_path = "" #@param{type:"string'}
variance_folder_name = os.path.basename(os.path.dirname(variance_checkpoint_path)) + "_variance"
variance_folder_path = os.path.dirname(variance_checkpoint_path)

#@markdown <font size="-1.5"> path to your word to phoneme dict (leave blank to use default Japanese dict)
dictionary_path = "" #@param{type:"string"}

#@markdown <font size="-1.5"> path to where you want to save your OpenUtau bank
exp_folder = "" #@param{type:"string"}

acoustic_onnx_exp = exp_folder + "/onnx/acoustic"
variance_onnx_exp = exp_folder + "/onnx/variance"

print("\n")
print("getting base files...")

!mkdir -p /content/OU_compatible_files/enunux_base
!mkdir -p /content/OU_compatible_files/variance_base
!wget https://github.com/MLo7Ghinsan/DiffSinger_colab_notebook_MLo7/releases/download/OU_files/enunux_base.zip >/dev/null 2>&1
!wget https://github.com/MLo7Ghinsan/DiffSinger_colab_notebook_MLo7/releases/download/OU_files/variance_base.zip >/dev/null 2>&1
!wget https://github.com/MLo7Ghinsan/DiffSinger_colab_notebook_MLo7/releases/download/OU_files/jpn_dict.txt >/dev/null 2>&1
!unzip -q /content/enunux_base.zip -d /content/OU_compatible_files/enunux_base
!unzip -q /content/variance_base.zip -d /content/OU_compatible_files/variance_base
!rm /content/enunux_base.zip
!rm /content/variance_base.zip

!cp {acoustic_checkpoint_path} -r /content/DiffSinger/checkpoints/{acoustic_folder_name}
!cp {acoustic_folder_path}/config.yaml -r /content/DiffSinger/checkpoints/{acoustic_folder_name}
!cp {acoustic_folder_path}/dictionary.txt -r /content/DiffSinger/checkpoints/{acoustic_folder_name} # i dont think this is needed but its only one file oh well
!cp {acoustic_folder_path}/spk_map.json -r /content/DiffSinger/checkpoints/{acoustic_folder_name}

print("\n")
print("converting acoustic to onnx...")
search_text = "        args_work_dir = os.path.join("
replacement = f"        args_work_dir = '{acoustic_folder_path}'"
with open("/content/DiffSinger/utils/hparams.py", "r") as file:
    lines = file.readlines()
for i, line in enumerate(lines):
    if search_text in line:
        lines[i] = replacement + "\n"
        break
with open("/content/DiffSinger/utils/hparams.py", "w") as file:
        file.writelines(lines)
#incase if anyone wanna change it lmao
search_text_alt = "        args_work_dir = '"
replacement_alt = f"        args_work_dir = '{acoustic_folder_path}'"
with open("/content/DiffSinger/utils/hparams.py", "r") as file:
    lines = file.readlines()
for i, line in enumerate(lines):
    if search_text_alt in line:
        lines[i] = replacement_alt + "\n"
        break
with open("/content/DiffSinger/utils/hparams.py", "w") as file:
        file.writelines(lines)

if no_output:
    !python /content/DiffSinger/scripts/export.py acoustic --exp {acoustic_folder_name} --out {exp_folder}/onnx/acoustic >/dev/null 2>&1
else:
    !python /content/DiffSinger/scripts/export.py acoustic --exp {acoustic_folder_name} --out {exp_folder}/onnx/acoustic


if not variance_checkpoint_path:
    print("\n")
    print("variance ckeckpoint path not specified, using enunux instead...")
else:
    print("\n")
    print("converting variance to onnx...")
    !cp {variance_checkpoint_path} -r /content/DiffSinger/checkpoints/{variance_folder_name}
    !cp {variance_folder_path}/config.yaml -r /content/DiffSinger/checkpoints/{variance_folder_name}
    !cp {variance_folder_path}/dictionary.txt -r /content/DiffSinger/checkpoints/{variance_folder_name} # i dont think this is needed but its only one file oh well
    !cp {variance_folder_path}/spk_map.json -r /content/DiffSinger/checkpoints/{variance_folder_name}
    search_text = "        args_work_dir = os.path.join("
    replacement = f"        args_work_dir = '{variance_folder_path}'"
    with open("/content/DiffSinger/utils/hparams.py", "r") as file:
        lines = file.readlines()
    for i, line in enumerate(lines):
        if search_text in line:
            lines[i] = replacement + "\n"
            break
    with open("/content/DiffSinger/utils/hparams.py", "w") as file:
            file.writelines(lines)
    #incase if anyone wanna change it lmao
    search_text_alt = "        args_work_dir = '"
    replacement_alt = f"        args_work_dir = '{variance_folder_path}'"
    with open("/content/DiffSinger/utils/hparams.py", "r") as file:
        lines = file.readlines()
    for i, line in enumerate(lines):
        if search_text_alt in line:
            lines[i] = replacement_alt + "\n"
            break
    with open("/content/DiffSinger/utils/hparams.py", "w") as file:
            file.writelines(lines)
    if no_output:
        !python /content/DiffSinger/scripts/export.py variance --exp {variance_folder_name} --out {exp_folder}/onnx/variance >/dev/null 2>&1
    else:
        !python /content/DiffSinger/scripts/export.py variance --exp {variance_folder_name} --out {exp_folder}/onnx/variance

if not variance_checkpoint_path:
    folder_paths = [acoustic_onnx_exp]
else:
    folder_paths = [acoustic_onnx_exp, variance_onnx_exp]

#renaming these so its gonna be easier
patterns = {"acoustic.onnx": "acoustic.onnx", "dur.onnx": "dur.onnx", "linguistic.onnx": "linguistic.onnx", "pitch.onnx": "pitch.onnx", "variance.onnx": "variance.onnx", "phonemes.txt": "phonemes.txt"}


for folder_path in folder_paths:
    for filename in os.listdir(folder_path):
        for pattern, new_name in patterns.items():
            if pattern in filename:
                old_path = os.path.join(folder_path, filename)
                new_path = os.path.join(folder_path, new_name)
                if os.path.exists(old_path):
                    os.rename(old_path, new_path)

print("\n")
print("writing dsdict.yaml...")

if not dictionary_path:
    dict_path = "/content/jpn_dict.txt"
else:
    dict_path = dictionary_path

# for symbols list
phoneme_dict_path = f"{acoustic_folder_path}/dictionary.txt"

dsdict = "dsdict.yaml"

def parse_phonemes(phonemes_str):
    return phonemes_str.split()

entries = []
vowel_types = {"a", "i", "u", "e", "o", "N", "M", "NG", "cl", "vf"}
vowel_data = []
stop_data = []

# Process the specified dictionary
with open(dict_path, "r") as f:
    for line in f:
        word, phonemes_str = line.strip().split("\t")
        phonemes = parse_phonemes(phonemes_str)
        if len(phonemes) == 1:
            entries.append({"grapheme": word, "phonemes": phonemes})
        else:
            entries.append({"grapheme": word, "phonemes": phonemes})

with open(phoneme_dict_path, "r") as f:
    for line in f:
        phoneme, _ = line.strip().split("\t")
        phoneme_type = "vowel" if phoneme[0] in vowel_types else "stop"
        entry = {"symbol": phoneme, "type": phoneme_type}
        if phoneme_type == "vowel":
            vowel_data.append(entry)
        else:
            stop_data.append(entry)

vowel_data.sort(key=lambda x: x["symbol"])
stop_data.sort(key=lambda x: x["symbol"])

dsdict_path = os.path.join("/content/OU_compatible_files", dsdict)
with open(dsdict_path, "w") as f:
    f.write("entries:\n")
    for entry in entries:
        f.write(f"- grapheme: {entry['grapheme']}\n")
        f.write("  phonemes:\n")
        for phoneme in entry["phonemes"]:
            f.write(f"  - {phoneme}\n")

    f.write("\nsymbols:\n")
    for entry in vowel_data + stop_data:
        f.write(f"- symbol: {entry['symbol']}\n")
        f.write(f"  type: {entry['type']}\n")

print("\n")
print("putting your vb together...")

if not variance_checkpoint_path:
    acoustic_1 = f"{acoustic_onnx_exp}" + "/acoustic.onnx"
    !rm /content/OU_compatible_files/enunux_base/acoustic.onnx
    !cp {acoustic_1} /content/OU_compatible_files/enunux_base >/dev/null 2>&1
    !rm /content/OU_compatible_files/enunux_base/phonemes.txt
    !cp {exp_folder}/onnx/acoustic/phonemes.txt /content/OU_compatible_files/enunux_base >/dev/null 2>&1
    !cp {dsdict_path} /content/OU_compatible_files/enunux_base >/dev/null 2>&1 #enunux doesnt need this but it doesnt hurt to include this file with it
    !mv /content/OU_compatible_files/enunux_base /content/OU_compatible_files/OU_voicebank
    folder_to_zip = "/content/OU_compatible_files/OU_voicebank"
    output_zip_filename = exp_folder + "/OU_voicebank.zip"
    with zipfile.ZipFile(output_zip_filename, 'w', zipfile.ZIP_DEFLATED) as zipf:
        for root, dirs, files in os.walk(folder_to_zip):
            for file in files:
                file_path = os.path.join(root, file)
                relative_path = os.path.relpath(file_path, folder_to_zip)
                zipf.write(file_path, relative_path)
else:
    acoustic_1 = f"{acoustic_onnx_exp}" + "/acoustic.onnx"
    variance_1 = f"{variance_onnx_exp}" + "/variance.onnx"
    variance_2 = f"{variance_onnx_exp}" + "/pitch.onnx"
    variance_3 = f"{variance_onnx_exp}" + "/dur.onnx"
    variance_4 = f"{variance_onnx_exp}" + "/linguistic.onnx"
    !rm /content/OU_compatible_files/variance_base/acoustic.onnx
    !cp {acoustic_1} /content/OU_compatible_files/variance_base >/dev/null 2>&1
    !rm /content/OU_compatible_files/variance_base/linguistic.onnx
    !cp {variance_4} /content/OU_compatible_files/variance_base >/dev/null 2>&1
    !rm /content/OU_compatible_files/variance_base/dsvariance/variance.onnx
    !cp {variance_1} /content/OU_compatible_files/variance_base/dsvariance/variance.onnx >/dev/null 2>&1
    !rm /content/OU_compatible_files/variance_base/dspitch/pitch.onnx
    !cp {variance_2} /content/OU_compatible_files/variance_base/dspitch/pitch.onnx >/dev/null 2>&1
    !rm /content/OU_compatible_files/variance_base/dsdur/dur.onnx
    !cp {variance_3} /content/OU_compatible_files/variance_base/dsdur/dur.onnx >/dev/null 2>&1
    !rm /content/OU_compatible_files/variance_base/phonemes.txt
    !cp {exp_folder}/onnx/acoustic/phonemes.txt /content/OU_compatible_files/variance_base
    !rm /content/OU_compatible_files/variance_base/dsdict.yaml
    !cp {dsdict_path} /content/OU_compatible_files/variance_base
    !mv /content/OU_compatible_files/variance_base /content/OU_compatible_files/OU_voicebank
    folder_to_zip = "/content/OU_compatible_files/OU_voicebank"
    output_zip_filename = exp_folder + "/OU_voicebank.zip"
    with zipfile.ZipFile(output_zip_filename, 'w', zipfile.ZIP_DEFLATED) as zipf:
        for root, dirs, files in os.walk(folder_to_zip):
            for file in files:
                file_path = os.path.join(root, file)
                relative_path = os.path.relpath(file_path, folder_to_zip)
                zipf.write(file_path, relative_path)

print("\n")
print("Go extract and edit character.txt and character.yaml to your liking for OpenUtau <3")


# **Miscellaneous**
###### (This is an archive section, will either be removed or changed to something else)

In [None]:
import os

#@title Generate enunux.yaml (not including grapheme)

#@markdown <font size="-2.5"> path to your dictionary.txt

dict_path = "" #@param{type:"string'}
enunux = "enunux.yaml"
vowel_types = {"a", "i", "u", "e", "o", "N", "M", "NG"}
enunux_data = []
vowel_data = []
stop_data = []
with open(dict_path, "r") as f:
    for line in f:
        phoneme, _ = line.strip().split("\t")
        phoneme_type = "vowel" if phoneme[0] in vowel_types else "stop"
        entry = f"- {{\"symbol\": \"{phoneme}\", \"type\": \"{phoneme_type}\"}}"
        if phoneme_type == "vowel":
            vowel_data.append(entry)
        else:
            stop_data.append(entry)
vowel_data.sort()
stop_data.sort()
enunux_data.extend(["# Vowel type symbols:", *vowel_data, "", "# Stop type symbols:", *stop_data])
directory = os.path.dirname(dict_path)
enunux_path = os.path.join(directory, enunux)
with open(enunux_path, "w") as f:
    f.write("symbols:\n")
    f.write("\n".join(enunux_data))
    f.write("\n")

# Last Section Note
Wow you made it to the very bottom.... Why though lmao hahahahhshahhasdksajidhasjl

Anyways, now that you are here i guess ill tell you my plan/todo list for this notebook \
Feel free to suggest or ask any question via [discord](https://discord.com/invite/wwbu2JUMjj) my user display name is MLo7 and my user name is ghin_mlo7

todo list:
- add option to use pretrained model
- add enable/disable checks for breathiness and energy training[**THESE TWO OPTIONS ARE OFF BY DEFAULT**]
- add link to vocoder training notebook (yet to be ready) or add a vocoder training section