# <font color="ffc800"> **[Piper](https://github.com/rhasspy/piper) training notebook.**
## ![Piper logo](https://contribute.rhasspy.org/img/logo.png)

---

- Notebook made by [rmcpantoja](http://github.com/rmcpantoja)
- Collaborator: [Xx_Nessu_xX](http://github.com/Xx_Nessu_xX)
- With some modifications by [KiON-GiON](https://github.com/KiON-GiON)

---

# Notes:

- <font color="orange">**Things in orange mean that they are important.**

# Credits:

* [Feanix-Fyre fork](https://github.com/Feanix-Fyre/piper) with some improvements.
* [Tacotron2 NVIDIA training notebook](https://github.com/justinjohn0306/FakeYou-Tacotron2-Notebook) - Dataset duration snippet.
* [üê∏TTS](https://github.com/coqui-ai/TTS) - Resampler and XTTS formater demo.

# <font color="ffc800">üîß ***First steps.*** üîß

In [None]:
#@markdown ## <font color="ffc800"> **Google Colab Anti-Disconnect.** üîå
#@markdown ---
#@markdown #### Avoid automatic disconnection. Still, it will disconnect after <font color="orange">**6 to 12 hours**</font>.

import IPython
js_code = '''
function ClickConnect(){
console.log("Working");
document.querySelector("colab-toolbar-button#connect").click()
}
setInterval(ClickConnect,60000)
'''
display(IPython.display.Javascript(js_code))

In [None]:
#@markdown ## <font color="ffc800"> **Check GPU type.** üëÅÔ∏è
#@markdown ---
#@markdown #### A higher capable GPU can lead to faster training speeds. By default, you will have a <font color="orange">**Tesla T4**</font>.
!nvidia-smi

In [None]:
#@markdown # <font color="ffc800"> **Mount Google Drive.** üìÇ
#@markdown ---
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

In [None]:
#@markdown # <font color="ffc800"> **Install software.** üì¶
#@markdown ---
#@markdown ####In this cell the synthesizer and its necessary dependencies to execute the training will be installed. (this may take a while)
import os
!apt-get -q update -y
!apt-get -q install build-essential cmake ninja-build espeak-ng aria2

# clone:
REPO_URL = "https://github.com/KiON-GiON/piper1-gpl.git"
REPO_DIR = "/content/piper1-gpl"

if not os.path.exists(REPO_DIR):
    !git clone -q -b fixes {REPO_URL} {REPO_DIR}
    %cd {REPO_DIR}
!wget -q "https://raw.githubusercontent.com/coqui-ai/TTS/dev/TTS/bin/resample.py"
!python -m pip install -e .[train]
!bash build_monotonic_align.sh
!pip install -q --upgrade gdown scikit-build protobuf==3.20.3
!python setup.py build_ext --inplace
!pip install -q faster-whisper
# Useful vars:
use_whisper = True
print("Done!")

# <font color="ffc800"> ü§ñ ***Training.*** ü§ñ

In [None]:
#@markdown # <font color="ffc800"> **1. Extract dataset.** üì•
#@markdown ---
#@markdown ####Important: the audios must be in <font color="orange">**wav format, (16000 or 22050hz, 16-bits, mono), and, for convenience, numbered. Example:**

#@markdown * <font color="orange">**1.wav**</font>
#@markdown * <font color="orange">**2.wav**</font>
#@markdown * <font color="orange">**3.wav**</font>
#@markdown * <font color="orange">**.....**</font>

#@markdown ---
import os
import wave
import zipfile
import datetime

def get_dataset_duration(wav_path):
    totalduration = 0
    for file_name in [x for x in os.listdir(wav_path) if os.path.isfile(x) and ".wav" in x]:
        with wave.open(file_name, "rb") as wave_file:
            frames = wave_file.getnframes()
            rate = wave_file.getframerate()
            duration = frames / float(rate)
            totalduration += duration
    wav_count = len(os.listdir(wav_path))
    duration_str = str(datetime.timedelta(seconds=round(totalduration, 0)))
    return wav_count, duration_str

%cd /content
if not os.path.exists("/content/dataset"):
    os.makedirs("/content/dataset")
    os.makedirs("/content/dataset/wavs")
%cd /content/dataset
#@markdown ### Audio dataset path to unzip:
zip_path = "/content/drive/MyDrive/wavs.zip" #@param {type:"string"}
zip_path = zip_path.strip()
if zip_path:
    if os.path.exists(zip_path):
        if zipfile.is_zipfile(zip_path):
            print("Unzipping audio content...")
            !unzip -q -j "{zip_path}" -d /content/dataset/wavs
        else:
            print("Copying audio contents of this folder...")
            fp = zip_path + "/."
            !cp -a "$fp" "/content/dataset/wavs"
    else:
        raise Exception("The path provided to the wavs is not correct. Please set a valid path.")
else:
    raise Exception("You must provide with a path to the wavs.")
if os.path.exists("/content/dataset/wavs/wavs"):
    for file in os.listdir("/content/dataset/wavs/wavs"):
        !mv /content/dataset/wavs/wavs/"$file"  /content/dataset/wavs/"$file"
    !rm -r /content/dataset/wavs/*.txt
    !rm -r /content/dataset/wavs/*.csv
%cd /content/dataset/wavs
audio_count, dataset_dur = get_dataset_duration("/content/dataset/wavs")
print(f"Opened dataset with {audio_count} wavs with duration {dataset_dur}.")
%cd /content
#@markdown ---

In [None]:
#@markdown # <font color="ffc800"> **2. Upload the transcript file.** üìù
#@markdown ---
#@markdown ####<font color="orange">**Important: the transcription means writing what the character says in each of the audios, and it must have the following structure:**

#@markdown ##### <font color="orange">For a single-speaker dataset:
#@markdown * 1.wav|This is what my character says in audio 1.
#@markdown * 2.wav|This, the text that the character says in audio 2.
#@markdown * ...

#@markdown ##### <font color="orange">For a multi-speaker dataset:

#@markdown * speaker1audio1.wav|speaker1|This is what the first speaker says.
#@markdown * speaker1audio2.wav|speaker1|This is another audio of the first speaker.
#@markdown * speaker2audio1.wav|speaker2|This is what the second speaker says in the first audio.
#@markdown * speaker2audio2.wav|speaker2|This is another audio of the second speaker.
#@markdown * ...

#@markdown And so on. In addition, the transcript must be in a <font color="orange">**.csv or .txt format. (UTF-8 without BOM)**

#@markdown ## Auto-transcribe with whisper if transcription is not provided.

#@markdown **Note: If you don't upload any transcription files, the wavs will be transcribed using the whisper tool when you execute the next step. Then, the notebook will continue with the rest of the preprocessing if there are no errors. Although the Whisper tool has good transcription results, in my experience I recommend transcribing manually and uploading it from this cell, since a good TTS voice needs to be optimized to give even better results. For example, when transcribing manually you will be able to observe every detail that the speaker makes (such as punctuation, sounds, etc.), and capture them in the transcription according to the speaker's intonations.**


#@markdown However, if you want to transcribe and review this transcription, you can use the individual notebooks:

#@markdown * [English](http://colab.research.google.com/github/rmcpantoja/My-Colab-Notebooks/blob/main/notebooks/OpenAI_Whisper_-_DotCSV_(Speech_dataset_multi-transcryption_support)en.ipynb)
#@markdown * [French](http://colab.research.google.com/github/rmcpantoja/My-Colab-Notebooks/blob/main/notebooks/OpenAI_Whisper_-_DotCSV_(Speech_dataset_multi-transcryption_support)fr.ipynb)
#@markdown * [Spanish](http://colab.research.google.com/github/rmcpantoja/My-Colab-Notebooks/blob/main/notebooks/OpenAI_Whisper_-_DotCSV_(Speech_dataset_multi-transcryption_support)es.ipynb)

#@markdown ---
%cd /content/dataset
from google.colab import files
!rm /content/dataset/metadata.csv

if os.path.exists("/content/dataset/wavs/_transcription.txt"):
  !mv "/content/dataset/wavs/_transcription.txt" metadata.csv
else:
  listfn, length = files.upload().popitem()
  if listfn != "metadata.csv":
    !mv "$listfn" metadata.csv

use_whisper = False
%cd /content

In [None]:
#@markdown # <font color="ffc800"> **3. Preprocess dataset (metadata & setup).** üîÑ
#@markdown ---
import os
if use_whisper:
    import torch
    from faster_whisper import WhisperModel
    from tqdm import tqdm
    from google import colab

    device = "cuda" if torch.cuda.is_available() else "cpu"
    print(f"Using device: {device}")

    def make_dataset(path, language):
        metadata = ""
        text = ""
        files = [f for f in os.listdir(path) if f.endswith(".wav")]
        assert len(files) > 0, "You don't have wavs uploaded either! Please upload at least one zip with the wavs in step 2."
        metadata_file = open(f"{path}/../metadata.csv", "w")
        whisper = WhisperModel("large-v3", device=device, compute_type="float16")
        for audio_file in tqdm(files):
            full_path = os.path.join(path, audio_file)
            segments, _ = whisper.transcribe(full_path, word_timestamps=False, language=language)
            for segment in segments:
                text += segment.text
            text = text.strip()
            text = text.replace('\n', ' ')
            metadata = f"{audio_file}|{text}\n"
            metadata_file.write(metadata)
            text = ""
        colab.files.download(f"{path}/../metadata.csv")
        del whisper
        return True

#@markdown ### First of all, select the language of your dataset.
language = "English (U.S.)" #@param ["ÿ£ŸÑÿπŸéÿ±Ÿéÿ®ŸêŸä", "Catal√†", "ƒçe≈°tina", "Dansk", "Deutsch", "ŒïŒªŒªŒ∑ŒΩŒπŒ∫Œ¨", "English (British)", "English (U.S.)", "Espa√±ol (Castellano)", "Espa√±ol (Latinoamericano)", "Suomi", "Fran√ßais", "Magyar", "Icelandic", "Italiano", "·É•·Éê·É†·Éó·É£·Éö·Éò", "“õ–∞–∑–∞“õ—à–∞", "L√´tzebuergesch", "‡§®‡•á‡§™‡§æ‡§≤‡•Ä", "Nederlands", "Norsk", "Polski", "Portugu√™s (Brasil)", "Portugu√™s (Portugal)", "Rom√¢nƒÉ", "–†—É—Å—Å–∫–∏–π", "–°—Ä–ø—Å–∫–∏", "Svenska", "Kiswahili", "T√ºrk√ße", "—É–∫—Ä–∞—óÃÅ–Ω—Å—å–∫–∞", "Ti·∫øng Vi·ªát", "ÁÆÄ‰Ωì‰∏≠Êñá"]
#@markdown ---
# language definition:
languages = {
    "ÿ£ŸÑÿπŸéÿ±Ÿéÿ®ŸêŸä": "ar",
    "Catal√†": "ca",
    "ƒçe≈°tina": "cs",
    "Dansk": "da",
    "Deutsch": "de",
    "ŒïŒªŒªŒ∑ŒΩŒπŒ∫Œ¨": "el",
    "English (British)": "en",
    "English (U.S.)": "en-us",
    "Espa√±ol (Castellano)": "es",
    "Espa√±ol (Latinoamericano)": "es-419",
    "Suomi.": "fi",
    "Fran√ßais": "fr",
    "Magyar": "hu",
    "Icelandic": "is",
    "Italiano": "it",
    "·É•·Éê·É†·Éó·É£·Éö·Éò": "ka",
    "“õ–∞–∑–∞“õ—à–∞": "kk",
    "L√´tzebuergesch": "lb",
    "‡§®‡•á‡§™‡§æ‡§≤‡•Ä": "ne",
    "Nederlands": "nl",
    "Norsk": "nb",
    "Polski": "pl",
    "Portugu√™s (Brasil)": "pt-br",
    "Portugu√™s (Portugal)": "pt-pt",
    "Rom√¢nƒÉ": "ro",
    "–†—É—Å—Å–∫–∏–π": "ru",
    "–°—Ä–ø—Å–∫–∏": "sr",
    "Svenska": "sv",
    "Kiswahili": "sw",
    "T√ºrk√ße": "tr",
    "—É–∫—Ä–∞—óÃÅ–Ω—Å—å–∫–∞": "uk",
    "Ti·∫øng Vi·ªát": "vi",
    "ÁÆÄ‰Ωì‰∏≠Êñá": "zh"
}

def _get_language(code):
    return languages[code]

final_language = _get_language(language)
#@markdown ### Choose a name for your model:
model_name = "" #@param {type:"string"}
#@markdown ---
# output:
#@markdown ### Choose the working folder: (recommended to save to Drive)

#@markdown The working folder will be used in preprocessing, but also in training the model.
output_path = "/content/drive/MyDrive/colab/piper" #@param {type:"string"}
output_dir = output_path+"/"+model_name
if not os.path.exists(output_dir):
  os.makedirs(output_dir)

#@markdown ### Select the sample rate of the dataset:
sample_rate = "22050" #@param ["16000", "22050"]
#@markdown ---
# creating paths:
audio_cache_dir = "/content/audio_cache"
os.makedirs(audio_cache_dir, exist_ok=True)
#@markdown ### Do you want to train using this sample rate, but your audios don't have it?
#@markdown The resampler helps you do it quickly!
resample = False #@param {type:"boolean"}

%cd {REPO_DIR}

if resample:
  !python resample.py --input_dir "/content/dataset/wavs" --output_dir "/content/dataset/wavs_resampled" --output_sr {sample_rate} --file_ext "wav"
  !mv /content/dataset/wavs_resampled/* /content/dataset/wavs
#@markdown ---
# check transcription:
if use_whisper:
    print("Transcript file hasn't been uploaded. Transcribing these audios using Whisper...")
    make_dataset("/content/dataset/wavs", final_language[:2])
    print("Transcription done! Metadata ready!")
print("Metadata and basic settings ready. Actual preprocessing will be done during training.")

In [None]:
#@markdown # <font color="ffc800"> **4. Settings.** üß∞
#@markdown ---
import json
import ipywidgets as widgets
from IPython.display import display
from google.colab import output
import os
import re
import glob
import csv

metadata_path = "/content/dataset/metadata.csv"
detected_num_speakers = 1

if os.path.exists(metadata_path):
    speakers = set()
    with open(metadata_path, "r", encoding="utf-8") as f:
        reader = csv.reader(f, delimiter="|")
        for row in reader:
            if len(row) >= 3:
                speakers.add(row[1].strip())
    if speakers:
        detected_num_speakers = len(speakers)

print(f"Detected {detected_num_speakers} speaker(s) from metadata.csv.")

#@markdown ### Override detected number of speakers (0 = use detected)
override_num_speakers = 0 #@param {type:"integer"}
if override_num_speakers > 0:
    model_num_speakers = override_num_speakers
else:
    model_num_speakers = detected_num_speakers

if model_num_speakers > 1:
    num_speakers_arg = f"--model.num_speakers {model_num_speakers} "
    print(f"Using multi-speaker model with {model_num_speakers} speakers.")
else:
    num_speakers_arg = ""
    print("Using single-speaker model.")
#@markdown ### <font color="orange">**Select the action to train this dataset: (READ CAREFULLY)**</font>

#@markdown * The option to <font color="orange">continue a training</font> is self-explanatory. If you've previously trained a model with free colab, your time is up and you're considering training it some more, this is ideal for you. You just have to set the same settings that you set when you first trained this model.
#@markdown * The option to <font color="orange">convert a single-speaker model to a multi-speaker model</font> is self-explanatory, and for this it is important that you have processed a dataset that contains text and audio from all possible speakers that you want to train in your model.
#@markdown * The <font color="orange">finetune</font> option is used to train a dataset using a pretrained model, that is, train on that data. This option is ideal if you want to train a very small dataset (more than five minutes recommended).
#@markdown * The <font color="orange">train from scratch</font> option builds features such as dictionary and speech form from scratch, and this may take longer to converge. For this, hours of audio (8 at least) are recommended, which have a large collection of phonemes.

action = "finetune" #@param ["Continue training", "convert single-speaker to multi-speaker model", "finetune", "train from scratch"]
#@markdown ---
train_resume_arg = ""
init_from_arg = ""

#@markdown ### Use external checkpoint instead of the default pretrained models selection?
use_external_checkpoint = False  #@param {type:"boolean"}
#@markdown ### Path to external checkpoint (if enabled above)
#@markdown Accepts:
#@markdown - Google drive links.
#@markdown - HTTP/HTTPS links.
#@markdown - Mounted Google Drive path (i.e. **/content/drive/MyDrive/pretrained.ckpt**).
external_checkpoint_path = ""  #@param {type:"string"}
if action == "convert single-speaker to multi-speaker model" and model_num_speakers <= 1:
    raise Exception("You selected 'convert single-speaker to multi-speaker model' but metadata.csv only shows 1 speaker. Please provide a multi-speaker metadata or adjust override_num_speakers.")

if action == "Continue training":
    pattern = os.path.join(output_dir, "lightning_logs", "**", "checkpoints", "last.ckpt")
    checkpoints = glob.glob(pattern, recursive=True)
    if len(checkpoints):
        def _version_num(path):
            m = re.search(r'version_(\d+)', path)
            return int(m.group(1)) if m else -1

        last_checkpoint = sorted(checkpoints, key=_version_num)[-1]
        train_resume_arg = f'--ckpt_path "{last_checkpoint}" '
        print(f"Continuing {model_name}'s training from: {last_checkpoint}")
    else:
        raise Exception("Training cannot be continued as there is no checkpoint to continue at.")

elif action in ("finetune", "convert single-speaker to multi-speaker model"):
    if os.path.exists(os.path.join(output_dir, "lightning_logs", "version_0", "checkpoints", "last.ckpt")):
        raise Exception("Oh no! You have already trained this model before, you cannot choose this option since your progress will be lost. Please select the option to continue a training.")
    else:
        if use_external_checkpoint:
            import re

            ext_ckpt = external_checkpoint_path.strip()
            if not ext_ckpt:
                raise Exception("You enabled 'use external checkpoint' but did not provide a path or URL.")

            ckpt_local = ext_ckpt

            is_url = ext_ckpt.startswith("http://") or ext_ckpt.startswith("https://")
            is_drive_url = "drive.google.com" in ext_ckpt
            is_drive_id = (not is_url) and ext_ckpt.startswith("1") and ("/" not in ext_ckpt)

            if is_url or is_drive_id:
                print("\033[93mDownloading external checkpoint...")

                if is_drive_url or is_drive_id:
                    if is_drive_url:
                        !gdown -q "{ext_ckpt}" -O "/content/pretrained.ckpt" --fuzzy
                    else:
                        !gdown -q "{ext_ckpt}" -O "/content/pretrained.ckpt"
                else:
                    !aria2c --quiet=true --file-allocation=none -d "/content" -o "pretrained.ckpt" "{ext_ckpt}"

                ckpt_local = "/content/pretrained.ckpt"

                if not os.path.exists(ckpt_local):
                    raise Exception("Couldn't download the external checkpoint!")
                else:
                    print("\033[93mExternal checkpoint downloaded!")
            else:
                if not os.path.exists(ext_ckpt):
                    print(f"Warning: external checkpoint path does not exist yet: {ext_ckpt}")
                    print("Make sure to upload/mount it before running the training cell.")

            init_from_arg = f'--model.init_from_checkpoint "{ckpt_local}" '
            print(f"Using external checkpoint for init_from_checkpoint: {ckpt_local}")

        else:
            pretrained_json = os.path.join(REPO_DIR, "notebooks", "pretrained_models.json")
            if not os.path.exists(pretrained_json):
                print("Warning: pretrained_models.json not found. You must provide your own checkpoint at /content/pretrained.ckpt (or enable 'use external checkpoint').")
                init_from_arg = '--model.init_from_checkpoint "/content/pretrained.ckpt" '
            else:
                try:
                    with open(pretrained_json, "r", encoding="utf-8") as f:
                        pretrained_models = json.load(f)

                    if final_language in pretrained_models:
                        models = pretrained_models[final_language]
                        model_options = [(model_name, model_name) for model_name, model_url in models.items()]
                        model_dropdown = widgets.Dropdown(description = "Choose pretrained model", options=model_options)
                        download_button = widgets.Button(description="Download")

                        def download_model(btn):
                            model_name_sel = model_dropdown.value
                            model_url = pretrained_models[final_language][model_name_sel]
                            print("\033[93mDownloading pretrained model...")
                            if model_url.startswith("1"):
                                !gdown -q "{model_url}" -O "/content/pretrained.ckpt"
                            elif model_url.startswith("https://drive.google.com/file/d/"):
                                !gdown -q "{model_url}" -O "/content/pretrained.ckpt" --fuzzy
                            else:
                                !aria2c --quiet=true --file-allocation=none -d "/content" -o "pretrained.ckpt" "{model_url}"
                            model_dropdown.close()
                            download_button.close()
                            output.clear()
                            if os.path.exists("/content/pretrained.ckpt"):
                                print("\033[93mModel downloaded!")
                            else:
                                raise Exception("Couldn't download the pretrained model!")

                        download_button.on_click(download_model)
                        display(model_dropdown, download_button)
                    else:
                        raise Exception(f"There are no pretrained models available for the language {final_language}")
                except FileNotFoundError:
                    raise Exception("The pretrained_models.json file was not found.")

                init_from_arg = '--model.init_from_checkpoint "/content/pretrained.ckpt" '

else:
    print("\033[93mWarning: this model will be trained from scratch. You need at least 8 hours of data for everything to work decent. Good luck!")

#@markdown ### Choose batch size based on this dataset:
batch_size = 12 #@param {type:"integer"}
#@markdown ---

#@markdown ### Choose the quality for this model:
#@markdown * x-low - 16Khz audio, 5-7M params
#@markdown * medium - 22.05Khz audio, 15-20 params
#@markdown * high - 22.05Khz audio, 28-32M params
quality = "medium" #@param ["high", "x-low", "medium"]
if quality == "x-low":
    quality_args_cli = (
        "--model.hidden_channels 96 "
        "--model.inter_channels 96 "
        "--model.filter_channels 384 "
    )
elif quality == "high":
    quality_args_cli = (
        "--model.resblock 1 "
        '--model.resblock_kernel_sizes "(3, 7, 11)" '
        '--model.resblock_dilation_sizes "((1, 3, 5), (1, 3, 5), (1, 3, 5))" '
        '--model.upsample_rates "(8, 8, 2, 2)" '
        "--model.upsample_initial_channel 512 "
        '--model.upsample_kernel_sizes "(16, 16, 4, 4)" '
    )
else:
    quality_args_cli = ""
#@markdown ---
#@markdown ### Checkpoint settings (how your training is saved)

#@markdown **What is a checkpoint?**
#@markdown A *checkpoint* is a snapshot of the model during training.
#@markdown It lets you stop training and continue later, or go back to an earlier version.

#@markdown ---
#@markdown ### **`checkpoint_epochs` ‚Äì How often to save checkpoints (in epochs)**
#@markdown - If **> 0**, this value is used as an interval (in epochs):
#@markdown   - to update `last.ckpt` (the most recent state), if `save_last` is enabled,
#@markdown   - and for Lightning‚Äôs internal `ModelCheckpoint` callback (controlled by `num_ckpt`) to decide when it is allowed to write a checkpoint.
#@markdown - If **0**, the interval is disabled:
#@markdown   - `last.ckpt` is only written once at the very end of training (if `save_last` is enabled),
#@markdown   - `ModelCheckpoint` falls back to Lightning‚Äôs default behavior (typically once per validation epoch when validation is enabled).
checkpoint_epochs = 5  #@param {type:"integer"}
#@markdown ---
#@markdown ### **`save_last` ‚Äì Always keep a `last.ckpt` with the latest model**
#@markdown - If **`True`**:
#@markdown   - A custom callback (`LastCheckpoint`) writes **only** a file named **`last.ckpt`**.
#@markdown   - This file is overwritten:
#@markdown     - every `checkpoint_epochs` epochs (if `checkpoint_epochs > 0`),
#@markdown     - and **always** one more time at the very end of training (provided Colab does not interrupt it).
#@markdown   - It works with or without validation and **does not depend on any metric**.
#@markdown - If **`False`**:
#@markdown   - No `last.ckpt` is created; only the checkpoints managed by `num_ckpt` are saved (if any).
save_last = False       #@param {type:"boolean"}
#@markdown ---
#@markdown ### **`num_ckpt` ‚Äì How many checkpoints to keep from `ModelCheckpoint`**
#@markdown This controls Lightning‚Äôs standard `ModelCheckpoint` callback.
#@markdown
#@markdown - **`num_ckpt > 0`**
#@markdown   - With validation **enabled**:
#@markdown     - Up to `num_ckpt` ‚Äúbest‚Äù checkpoints according to `val_loss` (mode: `min`) are kept.
#@markdown     - The check to save is done every `checkpoint_epochs` epochs (if `checkpoint_epochs > 0`).
#@markdown   - With validation **disabled**:
#@markdown     - There is no validation metric, so this behaves like `0` (no ‚Äúbest‚Äù checkpoints are saved).
#@markdown
#@markdown - **`num_ckpt = 0`**
#@markdown   - Disables metric‚Äëbased checkpoint saving.
#@markdown   - With validation enabled, you still see the `val_loss` curve and audio samples,
#@markdown     but **no** checkpoints are kept by `ModelCheckpoint`; only `last.ckpt` is written if `save_last` is `True`.
#@markdown
#@markdown - **`num_ckpt = -1`**
#@markdown   - Keeps **all** checkpoints that are triggered by `ModelCheckpoint`.
#@markdown   - With validation **enabled**:
#@markdown     - A new checkpoint is written every `checkpoint_epochs` epochs (if `checkpoint_epochs > 0`) based on the validation cycle.
#@markdown   - With validation **disabled**:
#@markdown     - A new checkpoint is written every `checkpoint_epochs` epochs (if `checkpoint_epochs > 0`) without using any metric.
#@markdown
#@markdown In practice:
#@markdown - Use a small positive `num_ckpt` (e.g. 1‚Äì3) if you only care about the best models by `val_loss`.
#@markdown - Use **`num_ckpt = -1`** if you want to keep a snapshot at every checkpoint interval.
#@markdown - Use **`num_ckpt = 0`** if you want to rely only on `last.ckpt` (plus any manual exports you do later).
num_ckpt = 0           #@param {type:"integer"}
#@ markdown ---
#@markdown ### Step interval to generate model samples:
log_every_n_steps = 1000 #@param {type:"integer"}
#@markdown ---
#@markdown ### Training epochs:
max_epochs = 10000 #@param {type:"integer"}
#@markdown ---

In [None]:
#@markdown # <font color="ffc800"> **5. Run the TensorBoard extension.** üìà
#@markdown ---
#@markdown The TensorBoard is used to visualize the results of the model while it's being trained such as audio and losses.

%load_ext tensorboard
%tensorboard --logdir {output_dir}

In [None]:
#@markdown # <font color="ffc800"> **6. Train.** üèãÔ∏è‚Äç‚ôÇÔ∏è
#@markdown ---
#@markdown ### Run this cell to train your final model!

#@markdown ---
#@markdown ### <font color="orange">**Disable validation?**
#@markdown By unchecking this checkbox, training will use the full dataset without
#@markdown holding out any audio files for validation. No validation audio will be
#@markdown generated in TensorBoard during training. This is only recommended for
#@markdown extremely small datasets.
validation = True #@param {type:"boolean"}

#@markdown **Validation split (fraction of examples used for validation)**
#@markdown Recommended values:
#@markdown * Very small datasets (< 100 utterances): `0.0‚Äì0.02` (or disable validation).
#@markdown * Small datasets (100‚Äì500 utterances): around `0.05`.
#@markdown * Medium / large datasets (> 500 utterances): `0.05‚Äì0.10`.
#@markdown 
#@markdown Try to keep **at least ~50 validation utterances**, but usually not more than **10% of the dataset**.
validation_fraction = 0.01 #@param {type:"number"}

if validation and validation_fraction > 0:
    validation_args = (
        f"--data.validation_split {validation_fraction} "
        "--data.num_test_examples 1 "
    )
else:
    validation_args = (
        "--data.validation_split 0 "
        "--data.num_test_examples 0 "
    )

import os

config_filename = "config.json"
config_path = os.path.join(output_dir, config_filename)


cmd = (
    f"cd {REPO_DIR} && "
    "python -m piper.train fit "
    f'--data.voice_name "{model_name}" '
    f'--data.csv_path "/content/dataset/metadata.csv" '
    f'--data.audio_dir "/content/dataset/wavs" '
    f'--data.espeak_voice "{final_language}" '
    f'--data.cache_dir "{audio_cache_dir}" '
    f'--data.config_path "{config_path}" '
    f"--data.batch_size {batch_size} "
    f"--model.sample_rate {sample_rate} "
)

if model_num_speakers > 1:
    cmd += f"--model.num_speakers {model_num_speakers} "

cmd += quality_args_cli + " "

cmd += (
    f'--trainer.default_root_dir "{output_dir}" '
    "--trainer.accelerator gpu "
    "--trainer.devices 1 "
    f"--trainer.max_epochs {max_epochs} "
    f"--trainer.log_every_n_steps {log_every_n_steps} "
    "--trainer.precision 16-mixed "
)

cmd += validation_args

if num_ckpt == -1:
    if checkpoint_epochs > 0:
        cmd += f"--checkpoint.every_n_epochs {checkpoint_epochs} "
    cmd += "--checkpoint.save_top_k -1 "

    if validation:
        cmd += "--checkpoint.monitor val_loss "
        cmd += "--checkpoint.mode min "
    else:
        cmd += "--checkpoint.monitor null "

elif validation and num_ckpt > 0:
    if checkpoint_epochs > 0:
        cmd += f"--checkpoint.every_n_epochs {checkpoint_epochs} "
    cmd += f"--checkpoint.save_top_k {num_ckpt} "
    cmd += "--checkpoint.monitor val_loss "
    cmd += "--checkpoint.mode min "

else:
    cmd += "--checkpoint.save_top_k 0 "
    cmd += "--checkpoint.monitor null "

if save_last:
    if checkpoint_epochs > 0:
        cmd += f"--last_checkpoint.every_n_epochs {checkpoint_epochs} "
    else:
        cmd += "--last_checkpoint.every_n_epochs null "

cmd += train_resume_arg
cmd += init_from_arg

print("Running command:\n", cmd)
get_ipython().system(cmd)

#  <font color="orange">**Have you finished training and want to test the model?**

* If you want to run this model in any software that Piper integrates or the same Piper app, export your model using the [model exporter notebook](https://colab.research.google.com/github/KiON-GiON/piper1-gpl/blob/fixes/notebooks/Piper_ONNX_Export.ipynb)!
* Wait! I want to test this right now before exporting it to the supported format for Piper. Test your generated last.ckpt with [this notebook](https://colab.research.google.com/github/rmcpantoja/piper/blob/master/notebooks/piper_inference_(ckpt).ipynb)! (**Outdated**)