# Train your first 🐸 TTS model 💫

### 👋 Hello and welcome to Coqui (🐸) TTS

The goal of this notebook is to show you a **typical workflow** for **training** and **testing** a TTS model with 🐸.

Let's train a very small model on a very small amount of data so we can iterate quickly.

In this notebook, we will:

1. Download data and format it for 🐸 TTS.
2. Configure the training and testing runs.
3. Train a new model.
4. Test the model and display its performance.

So, let's jump right in!


In [1]:
## Install Coqui TTS
! pip install -U pip
! pip install TTS

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pip
  Downloading pip-22.3.1-py3-none-any.whl (2.1 MB)
[K     |████████████████████████████████| 2.1 MB 13.8 MB/s 
[?25hInstalling collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 21.1.3
    Uninstalling pip-21.1.3:
      Successfully uninstalled pip-21.1.3
Successfully installed pip-22.3.1
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting TTS
  Downloading TTS-0.10.1-cp38-cp38-manylinux1_x86_64.whl (590 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m590.5/590.5 kB[0m [31m11.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting coqpit>=0.0.16
  Downloading coqpit-0.0.17-py3-none-any.whl (13 kB)
Collecting gruut[de]==2.2.3
  Downloading gruut-2.2.3.tar.gz (73 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m73.5/73.5 kB[0m [31m9.8 MB/s[0m et

## ✅ Data Preparation

### **First things first**: we need some data.

We're training a Text-to-Speech model, so we need some _text_ and we need some _speech_. Specificially, we want _transcribed speech_. The speech must be divided into audio clips and each clip needs transcription. More details about data requirements such as recording characteristics, background noise abd vocabulary coverage can be found in the [🐸TTS documentation](https://tts.readthedocs.io/en/latest/formatting_your_dataset.html).

If you have a single audio file and you need to **split** it into clips. It is also important to use a lossless audio file format to prevent compression artifacts. We recommend using **wav** file format.

The data format we will be adopting for this tutorial is taken from the widely-used  **LJSpeech** dataset, where **waves** are collected under a folder:

<span style="color:purple;font-size:15px">
/wavs<br /> 
 &emsp;| - audio1.wav<br /> 
 &emsp;| - audio2.wav<br /> 
 &emsp;| - audio3.wav<br /> 
  ...<br /> 
</span>

and a **metadata.csv** file will have the audio file name in parallel to the transcript, delimited by `|`: 
 
<span style="color:purple;font-size:15px">
# metadata.csv <br /> 
audio1|This is my sentence. <br /> 
audio2|This is maybe my sentence. <br /> 
audio3|This is certainly my sentence. <br /> 
audio4|Let this be your sentence. <br /> 
...
</span>

In the end, we should have the following **folder structure**:

<span style="color:purple;font-size:15px">
/MyTTSDataset <br /> 
&emsp;| <br /> 
&emsp;| -> metadata.csv<br /> 
&emsp;| -> /wavs<br /> 
&emsp;&emsp;| -> audio1.wav<br /> 
&emsp;&emsp;| -> audio2.wav<br /> 
&emsp;&emsp;| ...<br /> 
</span>

🐸TTS already provides tooling for the _LJSpeech_. if you use the same format, you can start training your models right away. <br /> 

After you collect and format your dataset, you need to check two things. Whether you need a **_formatter_** and a **_text_cleaner_**. <br /> The **_formatter_** loads the text file (created above) as a list and the **_text_cleaner_** performs a sequence of text normalization operations that converts the raw text into the spoken representation (e.g. converting numbers to text, acronyms, and symbols to the spoken format).

If you use a different dataset format then the LJSpeech or the other public datasets that 🐸TTS supports, then you need to write your own **_formatter_** and  **_text_cleaner_**.

## ⏳️ Loading your dataset
Load one of the dataset supported by 🐸TTS.

We will start by defining dataset config and setting LJSpeech as our target dataset and define its path.


In [2]:
import os

# BaseDatasetConfig: defines name, formatter and path of the dataset.
from TTS.tts.configs.shared_configs import BaseDatasetConfig

output_path = "tts_train_dir"
if not os.path.exists(output_path):
    os.makedirs(output_path)
    

In [3]:
# Download and extract LJSpeech dataset.

!wget -O $output_path/LJSpeech-1.1.tar.bz2 https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2 
!tar -xf $output_path/LJSpeech-1.1.tar.bz2 -C $output_path

--2022-12-26 17:39:22--  https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2
Resolving data.keithito.com (data.keithito.com)... 174.138.79.61
Connecting to data.keithito.com (data.keithito.com)|174.138.79.61|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2748572632 (2.6G) [application/octet-stream]
Saving to: ‘tts_train_dir/LJSpeech-1.1.tar.bz2’


2022-12-26 17:40:58 (27.5 MB/s) - ‘tts_train_dir/LJSpeech-1.1.tar.bz2’ saved [2748572632/2748572632]

^C


In [15]:
dataset_config = BaseDatasetConfig(
    formatter="ljspeech", meta_file_train="metadata.csv", path=os.path.join(output_path, "LJSpeech-1.1/")
)

## ✅ Train a new model

Let's kick off a training run 🚀🚀🚀.

Deciding on the model architecture you'd want to use is based on your needs and available resources. Each model architecture has it's pros and cons that define the run-time efficiency and the voice quality.
We have many recipes under `TTS/recipes/` that provide a good starting point. For this tutorial, we will be using `GlowTTS`.

We will begin by initializing the model training configuration.

In [24]:
# GlowTTSConfig: all model related values for training, validating and testing.
from TTS.tts.configs.glow_tts_config import GlowTTSConfig
config = GlowTTSConfig(
    batch_size=32,
    eval_batch_size=16,
    num_loader_workers=4,
    num_eval_loader_workers=4,
    run_eval=True,
    test_delay_epochs=-1,
    epochs=100,
    text_cleaner="phoneme_cleaners",
    use_phonemes=True,
    phoneme_language="en-us",
    phoneme_cache_path=os.path.join(output_path, "phoneme_cache"),
    print_step=25,
    print_eval=False,
    mixed_precision=True,
    output_path=output_path,
    datasets=[dataset_config],
    save_step=1000,
)

Next we will initialize the audio processor which is used for feature extraction and audio I/O.

In [30]:
from TTS.utils.audio import AudioProcessor
ap = AudioProcessor.init_from_config(config)
# Modify sample rate if for a custom audio dataset:
ap.sample_rate = 48000


 > Setting up Audio Processor...
 | > sample_rate:22050
 | > resample:False
 | > num_mels:80
 | > log_func:np.log10
 | > min_level_db:-100
 | > frame_shift_ms:None
 | > frame_length_ms:None
 | > ref_level_db:20
 | > fft_size:1024
 | > power:1.5
 | > preemphasis:0.0
 | > griffin_lim_iters:60
 | > signal_norm:True
 | > symmetric_norm:True
 | > mel_fmin:0
 | > mel_fmax:None
 | > pitch_fmin:1.0
 | > pitch_fmax:640.0
 | > spec_gain:20.0
 | > stft_pad_mode:reflect
 | > max_norm:4.0
 | > clip_norm:True
 | > do_trim_silence:True
 | > trim_db:45
 | > do_sound_norm:False
 | > do_amp_to_db_linear:True
 | > do_amp_to_db_mel:True
 | > do_rms_norm:False
 | > db_level:None
 | > stats_path:None
 | > base:10
 | > hop_length:256
 | > win_length:1024


Next we will initialize the tokenizer which is used to convert text to sequences of token IDs.  If characters are not defined in the config, default characters are passed to the config.

In [31]:
from TTS.tts.utils.text.tokenizer import TTSTokenizer
tokenizer, config = TTSTokenizer.init_from_config(config)

Next we will load data samples. Each sample is a list of ```[text, audio_file_path, speaker_name]```. You can define your custom sample loader returning the list of samples.

In [32]:
from TTS.tts.datasets import load_tts_samples
train_samples, eval_samples = load_tts_samples(
    dataset_config,
    eval_split=True,
    eval_split_max_size=config.eval_split_max_size,
    eval_split_size=config.eval_split_size,
)

 | > Found 223 files in /content/tts_train_dir/LJSpeech-1.1


Now we're ready to initialize the model.

Models take a config object and a speaker manager as input. Config defines the details of the model like the number of layers, the size of the embedding, etc. Speaker manager is used by multi-speaker models.

In [33]:
from TTS.tts.models.glow_tts import GlowTTS
model = GlowTTS(config, ap, tokenizer, speaker_manager=None)

Trainer provides a generic API to train all the 🐸TTS models with all its perks like mixed-precision training, distributed training, etc.

In [34]:
from trainer import Trainer, TrainerArgs
trainer = Trainer(
    TrainerArgs(), config, output_path, model=model, train_samples=train_samples, eval_samples=eval_samples
)

 > Training Environment:
 | > Current device: 0
 | > Num. of GPUs: 1
 | > Num. of CPUs: 2
 | > Num. of Torch Threads: 1
 | > Torch seed: 54321
 | > Torch CUDNN: True
 | > Torch CUDNN deterministic: False
 | > Torch CUDNN benchmark: False
 > Start Tensorboard: tensorboard --logdir=tts_train_dir/run-December-26-2022_06+35PM-0000000

 > Model has 28610257 parameters


### AND... 3,2,1... START TRAINING 🚀🚀🚀

In [35]:
trainer.fit()


[4m[1m > EPOCH: 0/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:35:08) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
ðɛɹ wɚ taɪmz ɪn wɪt͡ʃ
 [!] Character '͡' not found in the vocabulary. Discarding it.
ænd ɪn d͡ʒɛnɚəlaʊt əv ðə fildz əv saɪkɑləd͡ʒi ɔn
 [!] Character '͡' not found in the vocabulary. Discarding it.

 [!] Character '͡' not found in the vocabulary. Discarding it.
ɑt d͡ʒʌst æz ə lɔŋ tɚm ɡoʊl ænd nɑt d͡ʒʌst æz
 [!] Character '͡' not found in the vocabulary. Discarding it.



[1m   --> STEP: 0/7 -- GLOBAL_STEP: 0[0m
     | > current_lr: 0.00000 
     | > step_time: 12.12090  (12.12089)
     | > loader_time: 2.90110  (2.90113)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time: 0.31619 [0m(+0.00000)
     | > avg_loss: 5.00088 [0m(+0.00000)
     | > avg_log_mle: 0.34571 [0m(+0.00000)
     | > avg_loss_dur: 4.65517 [0m(+0.00000)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_7.pth

[4m[1m > EPOCH: 1/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:35:53) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.27756 [0m(-0.03863)
     | > avg_loss:[91m 5.00116 [0m(+0.00029)
     | > avg_log_mle:[92m 0.34523 [0m(-0.00048)
     | > avg_loss_dur:[91m 4.65593 [0m(+0.00076)


[4m[1m > EPOCH: 2/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:36:08) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.27351 [0m(-0.00406)
     | > avg_loss:[92m 5.00075 [0m(-0.00041)
     | > avg_log_mle:[92m 0.34522 [0m(-0.00001)
     | > avg_loss_dur:[92m 4.65552 [0m(-0.00041)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_21.pth

[4m[1m > EPOCH: 3/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:36:23) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 4/7 -- GLOBAL_STEP: 25[0m
     | > loss: 4.89575  (5.09793)
     | > log_mle: 0.38731  (0.37647)
     | > loss_dur: 4.50844  (4.72146)
     | > amp_scaler: 8192.00000  (8192.00000)
     | > grad_norm: 12.91390  (12.87068)
     | > current_lr: 0.00000 
     | > step_time: 0.46300  (0.66576)
     | > loader_time: 0.00370  (0.00535)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.28836 [0m(+0.01485)
     | > avg_loss:[92m 4.99877 [0m(-0.00197)
     | > avg_log_mle:[92m 0.34519 [0m(-0.00003)
     | > avg_loss_dur:[92m 4.65358 [0m(-0.00194)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_28.pth

[4m[1m > EPOCH: 4/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:36:38) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.28906 [0m(+0.00071)
     | > avg_loss:[92m 4.99524 [0m(-0.00353)
     | > avg_log_mle:[92m 0.34513 [0m(-0.00006)
     | > avg_loss_dur:[92m 4.65011 [0m(-0.00347)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_35.pth

[4m[1m > EPOCH: 5/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:36:54) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.27216 [0m(-0.01690)
     | > avg_loss:[92m 4.98990 [0m(-0.00534)
     | > avg_log_mle:[92m 0.34503 [0m(-0.00010)
     | > avg_loss_dur:[92m 4.64487 [0m(-0.00525)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_42.pth

[4m[1m > EPOCH: 6/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:37:09) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.27095 [0m(-0.00121)
     | > avg_loss:[92m 4.98225 [0m(-0.00765)
     | > avg_log_mle:[92m 0.34490 [0m(-0.00013)
     | > avg_loss_dur:[92m 4.63735 [0m(-0.00752)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_49.pth

[4m[1m > EPOCH: 7/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:37:24) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 1/7 -- GLOBAL_STEP: 50[0m
     | > loss: 5.58244  (5.58244)
     | > log_mle: 0.38335  (0.38335)
     | > loss_dur: 5.19909  (5.19909)
     | > amp_scaler: 8192.00000  (8192.00000)
     | > grad_norm: 13.26894  (13.26894)
     | > current_lr: 0.00000 
     | > step_time: 1.24260  (1.24258)
     | > loader_time: 0.01960  (0.01964)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.28849 [0m(+0.01754)
     | > avg_loss:[92m 4.97069 [0m(-0.01156)
     | > avg_log_mle:[92m 0.34473 [0m(-0.00017)
     | > avg_loss_dur:[92m 4.62596 [0m(-0.01139)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_56.pth

[4m[1m > EPOCH: 8/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:37:39) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.28068 [0m(-0.00781)
     | > avg_loss:[92m 4.95422 [0m(-0.01647)
     | > avg_log_mle:[92m 0.34451 [0m(-0.00022)
     | > avg_loss_dur:[92m 4.60971 [0m(-0.01625)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_63.pth

[4m[1m > EPOCH: 9/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:37:54) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.26529 [0m(-0.01539)
     | > avg_loss:[92m 4.93652 [0m(-0.01769)
     | > avg_log_mle:[92m 0.34425 [0m(-0.00026)
     | > avg_loss_dur:[92m 4.59227 [0m(-0.01743)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_70.pth

[4m[1m > EPOCH: 10/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:38:10) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 5/7 -- GLOBAL_STEP: 75[0m
     | > loss: 4.79520  (5.02137)
     | > log_mle: 0.37751  (0.37565)
     | > loss_dur: 4.41769  (4.64572)
     | > amp_scaler: 8192.00000  (8192.00000)
     | > grad_norm: 12.82663  (12.81982)
     | > current_lr: 0.00000 
     | > step_time: 0.73160  (0.65656)
     | > loader_time: 0.00470  (0.00849)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.27435 [0m(+0.00905)
     | > avg_loss:[92m 4.91849 [0m(-0.01803)
     | > avg_log_mle:[92m 0.34394 [0m(-0.00031)
     | > avg_loss_dur:[92m 4.57454 [0m(-0.01773)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_77.pth

[4m[1m > EPOCH: 11/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:38:26) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.26584 [0m(-0.00851)
     | > avg_loss:[92m 4.89731 [0m(-0.02118)
     | > avg_log_mle:[92m 0.34359 [0m(-0.00035)
     | > avg_loss_dur:[92m 4.55372 [0m(-0.02082)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_84.pth

[4m[1m > EPOCH: 12/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:38:41) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.28273 [0m(+0.01689)
     | > avg_loss:[92m 4.87310 [0m(-0.02421)
     | > avg_log_mle:[92m 0.34318 [0m(-0.00041)
     | > avg_loss_dur:[92m 4.52991 [0m(-0.02381)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_91.pth

[4m[1m > EPOCH: 13/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:38:56) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.25760 [0m(-0.02512)
     | > avg_loss:[92m 4.84552 [0m(-0.02758)
     | > avg_log_mle:[92m 0.34273 [0m(-0.00046)
     | > avg_loss_dur:[92m 4.50279 [0m(-0.02712)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_98.pth

[4m[1m > EPOCH: 14/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:39:11) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 2/7 -- GLOBAL_STEP: 100[0m
     | > loss: 4.99279  (5.26712)
     | > log_mle: 0.36399  (0.37240)
     | > loss_dur: 4.62880  (4.89472)
     | > amp_scaler: 8192.00000  (8192.00000)
     | > grad_norm: 12.64304  (12.89275)
     | > current_lr: 0.00000 
     | > step_time: 0.56060  (0.74878)
     | > loader_time: 0.00960  (0.00870)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.29664 [0m(+0.03904)
     | > avg_loss:[92m 4.81280 [0m(-0.03272)
     | > avg_log_mle:[92m 0.34221 [0m(-0.00052)
     | > avg_loss_dur:[92m 4.47059 [0m(-0.03220)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_105.pth

[4m[1m > EPOCH: 15/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:39:26) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.28700 [0m(-0.00964)
     | > avg_loss:[92m 4.77861 [0m(-0.03419)
     | > avg_log_mle:[92m 0.34163 [0m(-0.00058)
     | > avg_loss_dur:[92m 4.43697 [0m(-0.03361)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_112.pth

[4m[1m > EPOCH: 16/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:39:41) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.29886 [0m(+0.01187)
     | > avg_loss:[92m 4.74062 [0m(-0.03799)
     | > avg_log_mle:[92m 0.34099 [0m(-0.00064)
     | > avg_loss_dur:[92m 4.39962 [0m(-0.03735)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_119.pth

[4m[1m > EPOCH: 17/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:39:57) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 6/7 -- GLOBAL_STEP: 125[0m
     | > loss: 4.65204  (4.80648)
     | > log_mle: 0.36828  (0.37120)
     | > loss_dur: 4.28376  (4.43527)
     | > amp_scaler: 8192.00000  (8192.00000)
     | > grad_norm: 12.50834  (12.45275)
     | > current_lr: 0.00000 
     | > step_time: 1.17060  (0.71286)
     | > loader_time: 0.00640  (0.00608)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.30296 [0m(+0.00410)
     | > avg_loss:[92m 4.71124 [0m(-0.02938)
     | > avg_log_mle:[92m 0.34028 [0m(-0.00071)
     | > avg_loss_dur:[92m 4.37095 [0m(-0.02867)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_126.pth

[4m[1m > EPOCH: 18/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:40:12) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.27252 [0m(-0.03044)
     | > avg_loss:[92m 4.66460 [0m(-0.04664)
     | > avg_log_mle:[92m 0.33949 [0m(-0.00079)
     | > avg_loss_dur:[92m 4.32511 [0m(-0.04585)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_133.pth

[4m[1m > EPOCH: 19/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:40:28) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.30943 [0m(+0.03691)
     | > avg_loss:[92m 4.61987 [0m(-0.04473)
     | > avg_log_mle:[92m 0.33861 [0m(-0.00088)
     | > avg_loss_dur:[92m 4.28126 [0m(-0.04385)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_140.pth

[4m[1m > EPOCH: 20/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:40:43) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.28803 [0m(-0.02140)
     | > avg_loss:[91m 4.62317 [0m(+0.00329)
     | > avg_log_mle:[92m 0.33763 [0m(-0.00098)
     | > avg_loss_dur:[91m 4.28554 [0m(+0.00428)


[4m[1m > EPOCH: 21/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:40:58) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 3/7 -- GLOBAL_STEP: 150[0m
     | > loss: 4.45901  (4.76936)
     | > log_mle: 0.35901  (0.36398)
     | > loss_dur: 4.10000  (4.40537)
     | > amp_scaler: 8192.00000  (8192.00000)
     | > grad_norm: 11.72823  (11.90354)
     | > current_lr: 0.00001 
     | > step_time: 0.33250  (0.67573)
     | > loader_time: 0.00300  (0.00878)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.27796 [0m(-0.01006)
     | > avg_loss:[92m 4.56723 [0m(-0.05593)
     | > avg_log_mle:[92m 0.33652 [0m(-0.00111)
     | > avg_loss_dur:[92m 4.23071 [0m(-0.05483)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_154.pth

[4m[1m > EPOCH: 22/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:41:13) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.29276 [0m(+0.01480)
     | > avg_loss:[92m 4.50745 [0m(-0.05978)
     | > avg_log_mle:[92m 0.33527 [0m(-0.00125)
     | > avg_loss_dur:[92m 4.17218 [0m(-0.05853)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_161.pth

[4m[1m > EPOCH: 23/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:41:28) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.30117 [0m(+0.00841)
     | > avg_loss:[92m 4.46375 [0m(-0.04371)
     | > avg_log_mle:[92m 0.33387 [0m(-0.00140)
     | > avg_loss_dur:[92m 4.12987 [0m(-0.04231)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_168.pth

[4m[1m > EPOCH: 24/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:41:43) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.29600 [0m(-0.00517)
     | > avg_loss:[92m 4.40191 [0m(-0.06184)
     | > avg_log_mle:[92m 0.33229 [0m(-0.00158)
     | > avg_loss_dur:[92m 4.06961 [0m(-0.06026)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_175.pth

[4m[1m > EPOCH: 25/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:41:58) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 0/7 -- GLOBAL_STEP: 175[0m
     | > loss: 5.50331  (5.50331)
     | > log_mle: 0.35667  (0.35667)
     | > loss_dur: 5.14664  (5.14664)
     | > amp_scaler: 8192.00000  (8192.00000)
     | > grad_norm: 11.45946  (11.45946)
     | > current_lr: 0.00001 
     | > step_time: 2.10230  (2.10232)
     | > loader_time: 1.06840  (1.06839)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.29011 [0m(-0.00589)
     | > avg_loss:[92m 4.33712 [0m(-0.06479)
     | > avg_log_mle:[92m 0.33050 [0m(-0.00180)
     | > avg_loss_dur:[92m 4.00662 [0m(-0.06299)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_182.pth

[4m[1m > EPOCH: 26/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:42:14) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.31070 [0m(+0.02059)
     | > avg_loss:[92m 4.19877 [0m(-0.13835)
     | > avg_log_mle:[92m 0.32846 [0m(-0.00204)
     | > avg_loss_dur:[92m 3.87031 [0m(-0.13631)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_189.pth

[4m[1m > EPOCH: 27/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:42:29) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.29694 [0m(-0.01376)
     | > avg_loss:[92m 4.11008 [0m(-0.08868)
     | > avg_log_mle:[92m 0.32615 [0m(-0.00231)
     | > avg_loss_dur:[92m 3.78393 [0m(-0.08637)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_196.pth

[4m[1m > EPOCH: 28/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:42:44) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 4/7 -- GLOBAL_STEP: 200[0m
     | > loss: 4.22884  (4.42819)
     | > log_mle: 0.36245  (0.35369)
     | > loss_dur: 3.86640  (4.07451)
     | > amp_scaler: 8192.00000  (8192.00000)
     | > grad_norm: 10.31739  (10.51375)
     | > current_lr: 0.00001 
     | > step_time: 0.45730  (0.57866)
     | > loader_time: 0.00390  (0.00814)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.29816 [0m(+0.00123)
     | > avg_loss:[92m 4.04867 [0m(-0.06142)
     | > avg_log_mle:[92m 0.32354 [0m(-0.00261)
     | > avg_loss_dur:[92m 3.72513 [0m(-0.05880)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_203.pth

[4m[1m > EPOCH: 29/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:42:59) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.28184 [0m(-0.01632)
     | > avg_loss:[92m 3.94141 [0m(-0.10726)
     | > avg_log_mle:[92m 0.32059 [0m(-0.00296)
     | > avg_loss_dur:[92m 3.62082 [0m(-0.10430)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_210.pth

[4m[1m > EPOCH: 30/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:43:14) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.26410 [0m(-0.01774)
     | > avg_loss:[92m 3.91773 [0m(-0.02368)
     | > avg_log_mle:[92m 0.31726 [0m(-0.00333)
     | > avg_loss_dur:[92m 3.60047 [0m(-0.02035)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_217.pth

[4m[1m > EPOCH: 31/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:43:31) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.27267 [0m(+0.00857)
     | > avg_loss:[92m 3.85926 [0m(-0.05847)
     | > avg_log_mle:[92m 0.31350 [0m(-0.00376)
     | > avg_loss_dur:[92m 3.54575 [0m(-0.05472)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_224.pth

[4m[1m > EPOCH: 32/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:43:46) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 1/7 -- GLOBAL_STEP: 225[0m
     | > loss: 4.56395  (4.56395)
     | > log_mle: 0.34481  (0.34481)
     | > loss_dur: 4.21914  (4.21914)
     | > amp_scaler: 8192.00000  (8192.00000)
     | > grad_norm: 9.80867  (9.80867)
     | > current_lr: 0.00001 
     | > step_time: 0.85480  (0.85484)
     | > loader_time: 0.00950  (0.00949)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.28070 [0m(+0.00802)
     | > avg_loss:[92m 3.80741 [0m(-0.05185)
     | > avg_log_mle:[92m 0.30929 [0m(-0.00421)
     | > avg_loss_dur:[92m 3.49811 [0m(-0.04764)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_231.pth

[4m[1m > EPOCH: 33/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:44:01) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.26813 [0m(-0.01257)
     | > avg_loss:[92m 3.74880 [0m(-0.05861)
     | > avg_log_mle:[92m 0.30463 [0m(-0.00467)
     | > avg_loss_dur:[92m 3.44417 [0m(-0.05394)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_238.pth

[4m[1m > EPOCH: 34/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:44:16) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.27444 [0m(+0.00631)
     | > avg_loss:[92m 3.69362 [0m(-0.05517)
     | > avg_log_mle:[92m 0.29948 [0m(-0.00515)
     | > avg_loss_dur:[92m 3.39414 [0m(-0.05003)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_245.pth

[4m[1m > EPOCH: 35/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:44:31) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 5/7 -- GLOBAL_STEP: 250[0m
     | > loss: 3.80810  (4.02510)
     | > log_mle: 0.32249  (0.32194)
     | > loss_dur: 3.48561  (3.70315)
     | > amp_scaler: 8192.00000  (8192.00000)
     | > grad_norm: 8.28729  (8.55617)
     | > current_lr: 0.00001 
     | > step_time: 0.72630  (0.65874)
     | > loader_time: 0.00490  (0.00594)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.30665 [0m(+0.03221)
     | > avg_loss:[92m 3.68138 [0m(-0.01225)
     | > avg_log_mle:[92m 0.29384 [0m(-0.00564)
     | > avg_loss_dur:[92m 3.38754 [0m(-0.00661)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_252.pth

[4m[1m > EPOCH: 36/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:44:46) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.28673 [0m(-0.01991)
     | > avg_loss:[92m 3.62430 [0m(-0.05707)
     | > avg_log_mle:[92m 0.28770 [0m(-0.00614)
     | > avg_loss_dur:[92m 3.33660 [0m(-0.05093)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_259.pth

[4m[1m > EPOCH: 37/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:45:01) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.28193 [0m(-0.00480)
     | > avg_loss:[92m 3.58797 [0m(-0.03633)
     | > avg_log_mle:[92m 0.28108 [0m(-0.00662)
     | > avg_loss_dur:[92m 3.30690 [0m(-0.02971)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_266.pth

[4m[1m > EPOCH: 38/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:45:17) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.28785 [0m(+0.00591)
     | > avg_loss:[91m 3.59169 [0m(+0.00372)
     | > avg_log_mle:[92m 0.27402 [0m(-0.00705)
     | > avg_loss_dur:[91m 3.31767 [0m(+0.01077)


[4m[1m > EPOCH: 39/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:45:30) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 2/7 -- GLOBAL_STEP: 275[0m
     | > loss: 3.91053  (4.09673)
     | > log_mle: 0.28923  (0.29307)
     | > loss_dur: 3.62130  (3.80366)
     | > amp_scaler: 8192.00000  (8192.00000)
     | > grad_norm: 7.86991  (8.02326)
     | > current_lr: 0.00001 
     | > step_time: 0.39310  (0.69003)
     | > loader_time: 0.00330  (0.01024)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.31317 [0m(+0.02533)
     | > avg_loss:[92m 3.54979 [0m(-0.04190)
     | > avg_log_mle:[92m 0.26654 [0m(-0.00748)
     | > avg_loss_dur:[92m 3.28325 [0m(-0.03442)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_280.pth

[4m[1m > EPOCH: 40/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:45:45) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.35095 [0m(+0.03778)
     | > avg_loss:[92m 3.51599 [0m(-0.03380)
     | > avg_log_mle:[92m 0.25866 [0m(-0.00788)
     | > avg_loss_dur:[92m 3.25732 [0m(-0.02592)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_287.pth

[4m[1m > EPOCH: 41/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:46:02) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.32822 [0m(-0.02273)
     | > avg_loss:[91m 3.56668 [0m(+0.05069)
     | > avg_log_mle:[92m 0.25037 [0m(-0.00829)
     | > avg_loss_dur:[91m 3.31631 [0m(+0.05899)


[4m[1m > EPOCH: 42/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:46:16) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 6/7 -- GLOBAL_STEP: 300[0m
     | > loss: 3.51183  (3.73748)
     | > log_mle: 0.25955  (0.26522)
     | > loss_dur: 3.25229  (3.47226)
     | > amp_scaler: 8192.00000  (8192.00000)
     | > grad_norm: 7.08180  (7.38283)
     | > current_lr: 0.00001 
     | > step_time: 1.18470  (0.67812)
     | > loader_time: 0.00700  (0.00747)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.28848 [0m(-0.03973)
     | > avg_loss:[91m 3.59637 [0m(+0.02969)
     | > avg_log_mle:[92m 0.24173 [0m(-0.00864)
     | > avg_loss_dur:[91m 3.35464 [0m(+0.03833)


[4m[1m > EPOCH: 43/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:46:29) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.30460 [0m(+0.01611)
     | > avg_loss:[92m 3.51963 [0m(-0.07674)
     | > avg_log_mle:[92m 0.23270 [0m(-0.00903)
     | > avg_loss_dur:[92m 3.28693 [0m(-0.06771)


[4m[1m > EPOCH: 44/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:46:42) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.30928 [0m(+0.00469)
     | > avg_loss:[92m 3.46379 [0m(-0.05584)
     | > avg_log_mle:[92m 0.22324 [0m(-0.00946)
     | > avg_loss_dur:[92m 3.24055 [0m(-0.04638)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_315.pth

[4m[1m > EPOCH: 45/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:46:58) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.27056 [0m(-0.03872)
     | > avg_loss:[92m 3.39131 [0m(-0.07248)
     | > avg_log_mle:[92m 0.21323 [0m(-0.01001)
     | > avg_loss_dur:[92m 3.17808 [0m(-0.06247)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_322.pth

[4m[1m > EPOCH: 46/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:47:13) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 3/7 -- GLOBAL_STEP: 325[0m
     | > loss: 3.49135  (3.73860)
     | > log_mle: 0.22246  (0.22659)
     | > loss_dur: 3.26890  (3.51201)
     | > amp_scaler: 8192.00000  (8192.00000)
     | > grad_norm: 7.07792  (7.34341)
     | > current_lr: 0.00001 
     | > step_time: 0.33940  (0.70090)
     | > loader_time: 0.00260  (0.01014)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.31468 [0m(+0.04411)
     | > avg_loss:[91m 3.51799 [0m(+0.12668)
     | > avg_log_mle:[92m 0.20270 [0m(-0.01053)
     | > avg_loss_dur:[91m 3.31529 [0m(+0.13721)


[4m[1m > EPOCH: 47/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:47:26) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.28955 [0m(-0.02512)
     | > avg_loss:[92m 3.45520 [0m(-0.06279)
     | > avg_log_mle:[92m 0.19157 [0m(-0.01112)
     | > avg_loss_dur:[92m 3.26362 [0m(-0.05167)


[4m[1m > EPOCH: 48/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:47:38) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.30603 [0m(+0.01648)
     | > avg_loss:[92m 3.33565 [0m(-0.11955)
     | > avg_log_mle:[92m 0.18003 [0m(-0.01154)
     | > avg_loss_dur:[92m 3.15562 [0m(-0.10801)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_343.pth

[4m[1m > EPOCH: 49/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:47:53) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.29895 [0m(-0.00708)
     | > avg_loss:[92m 3.30748 [0m(-0.02817)
     | > avg_log_mle:[92m 0.16852 [0m(-0.01151)
     | > avg_loss_dur:[92m 3.13896 [0m(-0.01666)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_350.pth

[4m[1m > EPOCH: 50/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:48:09) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 0/7 -- GLOBAL_STEP: 350[0m
     | > loss: 4.35563  (4.35563)
     | > log_mle: 0.18887  (0.18887)
     | > loss_dur: 4.16676  (4.16676)
     | > amp_scaler: 8192.00000  (8192.00000)
     | > grad_norm: 8.36543  (8.36543)
     | > current_lr: 0.00001 
     | > step_time: 2.08320  (2.08318)
     | > loader_time: 0.69100  (0.69098)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.27371 [0m(-0.02525)
     | > avg_loss:[92m 3.15686 [0m(-0.15062)
     | > avg_log_mle:[92m 0.15781 [0m(-0.01071)
     | > avg_loss_dur:[92m 2.99905 [0m(-0.13991)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_357.pth

[4m[1m > EPOCH: 51/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:48:24) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.41907 [0m(+0.14536)
     | > avg_loss:[92m 2.94951 [0m(-0.20735)
     | > avg_log_mle:[92m 0.14812 [0m(-0.00969)
     | > avg_loss_dur:[92m 2.80139 [0m(-0.19766)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_364.pth

[4m[1m > EPOCH: 52/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:48:40) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.29809 [0m(-0.12097)
     | > avg_loss:[92m 2.87781 [0m(-0.07170)
     | > avg_log_mle:[92m 0.13892 [0m(-0.00920)
     | > avg_loss_dur:[92m 2.73889 [0m(-0.06250)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_371.pth

[4m[1m > EPOCH: 53/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:48:55) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 4/7 -- GLOBAL_STEP: 375[0m
     | > loss: 2.92523  (3.12657)
     | > log_mle: 0.15332  (0.15157)
     | > loss_dur: 2.77191  (2.97500)
     | > amp_scaler: 8192.00000  (8192.00000)
     | > grad_norm: 6.18948  (6.45690)
     | > current_lr: 0.00001 
     | > step_time: 0.46510  (0.39517)
     | > loader_time: 0.00390  (0.00462)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.32796 [0m(+0.02987)
     | > avg_loss:[92m 2.79474 [0m(-0.08306)
     | > avg_log_mle:[92m 0.13002 [0m(-0.00890)
     | > avg_loss_dur:[92m 2.66473 [0m(-0.07416)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_378.pth

[4m[1m > EPOCH: 54/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:49:12) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.30609 [0m(-0.02187)
     | > avg_loss:[92m 2.73856 [0m(-0.05618)
     | > avg_log_mle:[92m 0.12151 [0m(-0.00851)
     | > avg_loss_dur:[92m 2.61705 [0m(-0.04767)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_385.pth

[4m[1m > EPOCH: 55/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:49:26) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.29618 [0m(-0.00991)
     | > avg_loss:[92m 2.65643 [0m(-0.08213)
     | > avg_log_mle:[92m 0.11326 [0m(-0.00824)
     | > avg_loss_dur:[92m 2.54317 [0m(-0.07389)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_392.pth

[4m[1m > EPOCH: 56/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:49:42) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.29622 [0m(+0.00004)
     | > avg_loss:[92m 2.60602 [0m(-0.05041)
     | > avg_log_mle:[92m 0.10514 [0m(-0.00812)
     | > avg_loss_dur:[92m 2.50088 [0m(-0.04229)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_399.pth

[4m[1m > EPOCH: 57/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:49:57) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 1/7 -- GLOBAL_STEP: 400[0m
     | > loss: 3.23478  (3.23478)
     | > log_mle: 0.11865  (0.11865)
     | > loss_dur: 3.11613  (3.11613)
     | > amp_scaler: 8192.00000  (8192.00000)
     | > grad_norm: 6.19126  (6.19126)
     | > current_lr: 0.00001 
     | > step_time: 1.13430  (1.13434)
     | > loader_time: 0.01410  (0.01407)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.29960 [0m(+0.00339)
     | > avg_loss:[92m 2.57024 [0m(-0.03578)
     | > avg_log_mle:[92m 0.09709 [0m(-0.00805)
     | > avg_loss_dur:[92m 2.47315 [0m(-0.02773)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_406.pth

[4m[1m > EPOCH: 58/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:50:13) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.29594 [0m(-0.00366)
     | > avg_loss:[92m 2.51642 [0m(-0.05381)
     | > avg_log_mle:[92m 0.08923 [0m(-0.00786)
     | > avg_loss_dur:[92m 2.42720 [0m(-0.04595)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_413.pth

[4m[1m > EPOCH: 59/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:50:29) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.31783 [0m(+0.02189)
     | > avg_loss:[92m 2.46636 [0m(-0.05007)
     | > avg_log_mle:[92m 0.08126 [0m(-0.00797)
     | > avg_loss_dur:[92m 2.38510 [0m(-0.04210)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_420.pth

[4m[1m > EPOCH: 60/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:50:46) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 5/7 -- GLOBAL_STEP: 425[0m
     | > loss: 2.47127  (2.67707)
     | > log_mle: 0.09540  (0.09337)
     | > loss_dur: 2.37587  (2.58369)
     | > amp_scaler: 8192.00000  (8192.00000)
     | > grad_norm: 4.93094  (5.25279)
     | > current_lr: 0.00001 
     | > step_time: 0.71870  (0.58055)
     | > loader_time: 0.00480  (0.00655)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.29984 [0m(-0.01799)
     | > avg_loss:[92m 2.41663 [0m(-0.04972)
     | > avg_log_mle:[92m 0.07355 [0m(-0.00770)
     | > avg_loss_dur:[92m 2.34308 [0m(-0.04202)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_427.pth

[4m[1m > EPOCH: 61/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:51:02) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.28534 [0m(-0.01449)
     | > avg_loss:[92m 2.40576 [0m(-0.01087)
     | > avg_log_mle:[92m 0.06588 [0m(-0.00768)
     | > avg_loss_dur:[92m 2.33988 [0m(-0.00320)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_434.pth

[4m[1m > EPOCH: 62/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:51:21) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.30179 [0m(+0.01644)
     | > avg_loss:[92m 2.34884 [0m(-0.05692)
     | > avg_log_mle:[92m 0.05828 [0m(-0.00760)
     | > avg_loss_dur:[92m 2.29056 [0m(-0.04932)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_441.pth

[4m[1m > EPOCH: 63/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:51:38) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.31393 [0m(+0.01214)
     | > avg_loss:[92m 2.32373 [0m(-0.02511)
     | > avg_log_mle:[92m 0.05069 [0m(-0.00759)
     | > avg_loss_dur:[92m 2.27304 [0m(-0.01752)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_448.pth

[4m[1m > EPOCH: 64/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:51:55) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 2/7 -- GLOBAL_STEP: 450[0m
     | > loss: 2.58172  (2.75982)
     | > log_mle: 0.06139  (0.06273)
     | > loss_dur: 2.52034  (2.69710)
     | > amp_scaler: 8192.00000  (8192.00000)
     | > grad_norm: 5.05181  (5.33042)
     | > current_lr: 0.00002 
     | > step_time: 0.55610  (0.73561)
     | > loader_time: 0.00300  (0.00253)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.30540 [0m(-0.00853)
     | > avg_loss:[92m 2.28630 [0m(-0.03742)
     | > avg_log_mle:[92m 0.04296 [0m(-0.00773)
     | > avg_loss_dur:[92m 2.24334 [0m(-0.02970)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_455.pth

[4m[1m > EPOCH: 65/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:52:12) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.30883 [0m(+0.00343)
     | > avg_loss:[92m 2.24519 [0m(-0.04111)
     | > avg_log_mle:[92m 0.03520 [0m(-0.00776)
     | > avg_loss_dur:[92m 2.21000 [0m(-0.03334)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_462.pth

[4m[1m > EPOCH: 66/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:52:29) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.30768 [0m(-0.00115)
     | > avg_loss:[92m 2.21293 [0m(-0.03226)
     | > avg_log_mle:[92m 0.02717 [0m(-0.00803)
     | > avg_loss_dur:[92m 2.18576 [0m(-0.02423)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_469.pth

[4m[1m > EPOCH: 67/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:52:46) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 6/7 -- GLOBAL_STEP: 475[0m
     | > loss: 2.21257  (2.38180)
     | > log_mle: 0.03284  (0.03706)
     | > loss_dur: 2.17972  (2.34474)
     | > amp_scaler: 8192.00000  (8192.00000)
     | > grad_norm: 4.42558  (4.68702)
     | > current_lr: 0.00002 
     | > step_time: 1.17720  (0.63370)
     | > loader_time: 0.00690  (0.00443)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.28596 [0m(-0.02172)
     | > avg_loss:[92m 2.19871 [0m(-0.01422)
     | > avg_log_mle:[92m 0.01915 [0m(-0.00802)
     | > avg_loss_dur:[92m 2.17956 [0m(-0.00620)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_476.pth

[4m[1m > EPOCH: 68/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:53:02) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.29978 [0m(+0.01382)
     | > avg_loss:[91m 2.20515 [0m(+0.00645)
     | > avg_log_mle:[92m 0.01086 [0m(-0.00829)
     | > avg_loss_dur:[91m 2.19429 [0m(+0.01473)


[4m[1m > EPOCH: 69/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:53:16) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.31424 [0m(+0.01446)
     | > avg_loss:[92m 2.17270 [0m(-0.03246)
     | > avg_log_mle:[92m 0.00273 [0m(-0.00813)
     | > avg_loss_dur:[92m 2.16996 [0m(-0.02433)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_490.pth

[4m[1m > EPOCH: 70/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:53:33) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.29264 [0m(-0.02161)
     | > avg_loss:[92m 2.13606 [0m(-0.03663)
     | > avg_log_mle:[92m -0.00549 [0m(-0.00822)
     | > avg_loss_dur:[92m 2.14155 [0m(-0.02841)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_497.pth

[4m[1m > EPOCH: 71/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:53:51) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 3/7 -- GLOBAL_STEP: 500[0m
     | > loss: 2.18220  (2.37000)
     | > log_mle: 0.00304  (0.00467)
     | > loss_dur: 2.17916  (2.36532)
     | > amp_scaler: 8192.00000  (8192.00000)
     | > grad_norm: 4.34402  (4.64825)
     | > current_lr: 0.00002 
     | > step_time: 0.33350  (0.39283)
     | > loader_time: 0.00340  (0.00305)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.30073 [0m(+0.00810)
     | > avg_loss:[92m 2.12066 [0m(-0.01540)
     | > avg_log_mle:[92m -0.01342 [0m(-0.00793)
     | > avg_loss_dur:[92m 2.13408 [0m(-0.00747)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_504.pth

[4m[1m > EPOCH: 72/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:54:10) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.30696 [0m(+0.00623)
     | > avg_loss:[92m 2.08781 [0m(-0.03285)
     | > avg_log_mle:[92m -0.02135 [0m(-0.00793)
     | > avg_loss_dur:[92m 2.10916 [0m(-0.02492)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_511.pth

[4m[1m > EPOCH: 73/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:54:26) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.30092 [0m(-0.00604)
     | > avg_loss:[92m 2.06259 [0m(-0.02522)
     | > avg_log_mle:[92m -0.02896 [0m(-0.00761)
     | > avg_loss_dur:[92m 2.09155 [0m(-0.01761)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_518.pth

[4m[1m > EPOCH: 74/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:54:43) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.30313 [0m(+0.00221)
     | > avg_loss:[91m 2.07780 [0m(+0.01521)
     | > avg_log_mle:[92m -0.03684 [0m(-0.00788)
     | > avg_loss_dur:[91m 2.11464 [0m(+0.02309)


[4m[1m > EPOCH: 75/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:54:57) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 0/7 -- GLOBAL_STEP: 525[0m
     | > loss: 2.91541  (2.91541)
     | > log_mle: -0.03146  (-0.03146)
     | > loss_dur: 2.94687  (2.94687)
     | > amp_scaler: 8192.00000  (8192.00000)
     | > grad_norm: 5.98590  (5.98590)
     | > current_lr: 0.00002 
     | > step_time: 1.68160  (1.68164)
     | > loader_time: 0.97250  (0.97252)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.29136 [0m(-0.01177)
     | > avg_loss:[92m 2.04236 [0m(-0.03544)
     | > avg_log_mle:[92m -0.04433 [0m(-0.00749)
     | > avg_loss_dur:[92m 2.08669 [0m(-0.02795)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_532.pth

[4m[1m > EPOCH: 76/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:55:14) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.32043 [0m(+0.02907)
     | > avg_loss:[92m 1.98421 [0m(-0.05815)
     | > avg_log_mle:[92m -0.04877 [0m(-0.00444)
     | > avg_loss_dur:[92m 2.03298 [0m(-0.05371)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_539.pth

[4m[1m > EPOCH: 77/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:55:31) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.30571 [0m(-0.01472)
     | > avg_loss:[92m 1.95214 [0m(-0.03207)
     | > avg_log_mle:[92m -0.05670 [0m(-0.00793)
     | > avg_loss_dur:[92m 2.00884 [0m(-0.02414)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_546.pth

[4m[1m > EPOCH: 78/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:55:48) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 4/7 -- GLOBAL_STEP: 550[0m
     | > loss: 1.95856  (2.11513)
     | > log_mle: -0.04656  (-0.04921)
     | > loss_dur: 2.00512  (2.16434)
     | > amp_scaler: 8192.00000  (8192.00000)
     | > grad_norm: 4.91969  (5.12270)
     | > current_lr: 0.00002 
     | > step_time: 0.46910  (0.56864)
     | > loader_time: 0.00380  (0.01131)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.29463 [0m(-0.01109)
     | > avg_loss:[92m 1.94802 [0m(-0.00412)
     | > avg_log_mle:[92m -0.06314 [0m(-0.00644)
     | > avg_loss_dur:[91m 2.01116 [0m(+0.00232)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_553.pth

[4m[1m > EPOCH: 79/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:56:04) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.29364 [0m(-0.00098)
     | > avg_loss:[91m 1.96149 [0m(+0.01347)
     | > avg_log_mle:[92m -0.06590 [0m(-0.00276)
     | > avg_loss_dur:[91m 2.02740 [0m(+0.01624)


[4m[1m > EPOCH: 80/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:56:20) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.30935 [0m(+0.01570)
     | > avg_loss:[92m 1.89819 [0m(-0.06331)
     | > avg_log_mle:[92m -0.07404 [0m(-0.00814)
     | > avg_loss_dur:[92m 1.97223 [0m(-0.05517)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_567.pth

[4m[1m > EPOCH: 81/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:56:37) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.32213 [0m(+0.01279)
     | > avg_loss:[91m 1.92436 [0m(+0.02617)
     | > avg_log_mle:[91m -0.07185 [0m(+0.00219)
     | > avg_loss_dur:[91m 1.99621 [0m(+0.02398)


[4m[1m > EPOCH: 82/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:56:52) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 1/7 -- GLOBAL_STEP: 575[0m
     | > loss: 2.30102  (2.30102)
     | > log_mle: -0.07244  (-0.07244)
     | > loss_dur: 2.37346  (2.37346)
     | > amp_scaler: 8192.00000  (8192.00000)
     | > grad_norm: 7.28833  (7.28833)
     | > current_lr: 0.00002 
     | > step_time: 0.93720  (0.93718)
     | > loader_time: 0.00290  (0.00287)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.28935 [0m(-0.03278)
     | > avg_loss:[92m 1.88621 [0m(-0.03815)
     | > avg_log_mle:[92m -0.07907 [0m(-0.00722)
     | > avg_loss_dur:[92m 1.96528 [0m(-0.03093)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_581.pth

[4m[1m > EPOCH: 83/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:57:08) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.30425 [0m(+0.01489)
     | > avg_loss:[92m 1.86337 [0m(-0.02284)
     | > avg_log_mle:[92m -0.08705 [0m(-0.00798)
     | > avg_loss_dur:[92m 1.95042 [0m(-0.01486)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_588.pth

[4m[1m > EPOCH: 84/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:57:25) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.28465 [0m(-0.01960)
     | > avg_loss:[92m 1.85482 [0m(-0.00856)
     | > avg_log_mle:[91m -0.08395 [0m(+0.00310)
     | > avg_loss_dur:[92m 1.93876 [0m(-0.01165)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_595.pth

[4m[1m > EPOCH: 85/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:57:42) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 5/7 -- GLOBAL_STEP: 600[0m
     | > loss: 1.77684  (1.93803)
     | > log_mle: -0.08943  (-0.08751)
     | > loss_dur: 1.86626  (2.02554)
     | > amp_scaler: 8192.00000  (8192.00000)
     | > grad_norm: 8.15892  (9.13555)
     | > current_lr: 0.00002 
     | > step_time: 0.73440  (0.65199)
     | > loader_time: 0.00460  (0.00579)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.30904 [0m(+0.02439)
     | > avg_loss:[92m 1.82330 [0m(-0.03151)
     | > avg_log_mle:[92m -0.09264 [0m(-0.00870)
     | > avg_loss_dur:[92m 1.91594 [0m(-0.02282)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_602.pth

[4m[1m > EPOCH: 86/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:57:59) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.30158 [0m(-0.00747)
     | > avg_loss:[91m 1.84153 [0m(+0.01822)
     | > avg_log_mle:[91m -0.07498 [0m(+0.01766)
     | > avg_loss_dur:[91m 1.91651 [0m(+0.00056)


[4m[1m > EPOCH: 87/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:58:13) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.30403 [0m(+0.00246)
     | > avg_loss:[92m 1.79541 [0m(-0.04612)
     | > avg_log_mle:[92m -0.10490 [0m(-0.02992)
     | > avg_loss_dur:[92m 1.90031 [0m(-0.01620)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_616.pth

[4m[1m > EPOCH: 88/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:58:30) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.31102 [0m(+0.00699)
     | > avg_loss:[91m 1.79927 [0m(+0.00387)
     | > avg_log_mle:[91m -0.09271 [0m(+0.01219)
     | > avg_loss_dur:[92m 1.89198 [0m(-0.00833)


[4m[1m > EPOCH: 89/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:58:46) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 2/7 -- GLOBAL_STEP: 625[0m
     | > loss: 1.93994  (2.04308)
     | > log_mle: -0.10785  (-0.10475)
     | > loss_dur: 2.04779  (2.14782)
     | > amp_scaler: 8192.00000  (8192.00000)
     | > grad_norm: 8.69015  (11.69132)
     | > current_lr: 0.00002 
     | > step_time: 0.51320  (0.86890)
     | > loader_time: 0.00610  (0.00905)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.30853 [0m(-0.00250)
     | > avg_loss:[92m 1.78454 [0m(-0.01473)
     | > avg_log_mle:[92m -0.09950 [0m(-0.00679)
     | > avg_loss_dur:[92m 1.88404 [0m(-0.00794)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_630.pth

[4m[1m > EPOCH: 90/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:59:03) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.31342 [0m(+0.00489)
     | > avg_loss:[91m 1.81008 [0m(+0.02554)
     | > avg_log_mle:[91m -0.08968 [0m(+0.00981)
     | > avg_loss_dur:[91m 1.89977 [0m(+0.01572)


[4m[1m > EPOCH: 91/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:59:17) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.29683 [0m(-0.01659)
     | > avg_loss:[92m 1.79503 [0m(-0.01505)
     | > avg_log_mle:[91m -0.08501 [0m(+0.00467)
     | > avg_loss_dur:[92m 1.88004 [0m(-0.01972)


[4m[1m > EPOCH: 92/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:59:32) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 6/7 -- GLOBAL_STEP: 650[0m
     | > loss: 1.69859  (1.82637)
     | > log_mle: -0.11320  (-0.11148)
     | > loss_dur: 1.81179  (1.93785)
     | > amp_scaler: 8192.00000  (8192.00000)
     | > grad_norm: 6.52406  (10.53759)
     | > current_lr: 0.00002 
     | > step_time: 1.15080  (0.71649)
     | > loader_time: 0.00710  (0.00709)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.30044 [0m(+0.00361)
     | > avg_loss:[92m 1.76001 [0m(-0.03503)
     | > avg_log_mle:[92m -0.10377 [0m(-0.01876)
     | > avg_loss_dur:[92m 1.86378 [0m(-0.01626)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_651.pth

[4m[1m > EPOCH: 93/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 18:59:49) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.31925 [0m(+0.01881)
     | > avg_loss:[92m 1.74919 [0m(-0.01082)
     | > avg_log_mle:[92m -0.11567 [0m(-0.01190)
     | > avg_loss_dur:[91m 1.86486 [0m(+0.00108)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_658.pth

[4m[1m > EPOCH: 94/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 19:00:05) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.28952 [0m(-0.02973)
     | > avg_loss:[91m 1.75189 [0m(+0.00270)
     | > avg_log_mle:[91m -0.09543 [0m(+0.02024)
     | > avg_loss_dur:[92m 1.84733 [0m(-0.01753)


[4m[1m > EPOCH: 95/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 19:00:20) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.31914 [0m(+0.02962)
     | > avg_loss:[91m 1.76494 [0m(+0.01305)
     | > avg_log_mle:[91m -0.08579 [0m(+0.00965)
     | > avg_loss_dur:[91m 1.85073 [0m(+0.00340)


[4m[1m > EPOCH: 96/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 19:00:34) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m   --> STEP: 3/7 -- GLOBAL_STEP: 675[0m
     | > loss: 1.71057  (1.88542)
     | > log_mle: -0.12124  (-0.12604)
     | > loss_dur: 1.83181  (2.01146)
     | > amp_scaler: 8192.00000  (8192.00000)
     | > grad_norm: 11.06725  (8.96603)
     | > current_lr: 0.00002 
     | > step_time: 0.32890  (0.55486)
     | > loader_time: 0.00340  (0.01119)


[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.32085 [0m(+0.00170)
     | > avg_loss:[92m 1.72476 [0m(-0.04018)
     | > avg_log_mle:[92m -0.11496 [0m(-0.02917)
     | > avg_loss_dur:[92m 1.83972 [0m(-0.01101)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_679.pth

[4m[1m > EPOCH: 97/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 19:00:51) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.31601 [0m(-0.00484)
     | > avg_loss:[92m 1.70124 [0m(-0.02352)
     | > avg_log_mle:[92m -0.12714 [0m(-0.01219)
     | > avg_loss_dur:[92m 1.82838 [0m(-0.01134)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_686.pth

[4m[1m > EPOCH: 98/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 19:01:09) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.30636 [0m(-0.00965)
     | > avg_loss:[91m 1.74776 [0m(+0.04652)
     | > avg_log_mle:[91m -0.07854 [0m(+0.04860)
     | > avg_loss_dur:[92m 1.82630 [0m(-0.00208)


[4m[1m > EPOCH: 99/100[0m
 --> tts_train_dir/run-December-26-2022_06+35PM-0000000

[1m > TRAINING (2022-12-26 19:01:24) [0m




> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 221
 | > Preprocessing samples
 | > Max text length: 176
 | > Min text length: 2
 | > Avg text length: 45.90045248868778
 | 
 | > Max audio length: 394187.0
 | > Min audio length: 4284.0
 | > Avg audio length: 110258.36199095023
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.



[1m > EVALUATION [0m





> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 2
 | > Preprocessing samples
 | > Max text length: 78
 | > Min text length: 61
 | > Avg text length: 69.5
 | 
 | > Max audio length: 206507.0
 | > Min audio length: 137387.0
 | > Avg audio length: 171947.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.30288 [0m(-0.00348)
     | > avg_loss:[92m 1.68916 [0m(-0.05860)
     | > avg_log_mle:[92m -0.11936 [0m(-0.04083)
     | > avg_loss_dur:[92m 1.80852 [0m(-0.01778)

 > BEST MODEL : tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model_700.pth


#### 🚀 Run the Tensorboard. 🚀
On the notebook and Tensorboard, you can monitor the progress of your model. Also Tensorboard provides certain figures and sample outputs.

In [36]:
!pip install tensorboard
!tensorboard --logdir=tts_train_dir

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
[0m
NOTE: Using experimental fast data loading logic. To disable, pass
    "--load_fast=false" and report issues on GitHub. More details:
    https://github.com/tensorflow/tensorboard/issues/4784

Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all
TensorBoard 2.9.1 at http://localhost:6006/ (Press CTRL+C to quit)
Exception ignored in: <module 'threading' from '/usr/lib/python3.8/threading.py'>
Traceback (most recent call last):
  File "/usr/lib/python3.8/threading.py", line 1355, in _shutdown
    def _shutdown():
KeyboardInterrupt: 


## ✅ Test the model

We made it! 🙌

Let's kick off the testing run, which displays performance metrics.

We're committing the cardinal sin of ML 😈 (aka - testing on our training data) so you don't want to deploy this model into production. In this notebook we're focusing on the workflow itself, so it's forgivable 😇

You can see from the test output that our tiny model has overfit to the data, and basically memorized this one sentence.

When you start training your own models, make sure your testing data doesn't include your training data 😅

Let's get the latest saved checkpoint. 

In [37]:
import glob, os
output_path = "tts_train_dir"
ckpts = sorted([f for f in glob.glob(output_path+"/*/*.pth")])
configs = sorted([f for f in glob.glob(output_path+"/*/*.json")])

In [48]:
ckpts = ckpts[0]
ckpts

'tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model.pth'

In [49]:
configs = configs[0]
configs

'tts_train_dir/run-December-26-2022_06+34PM-0000000/config.json'

In [55]:
 !tts --text "Hello. I am Andrew Huberman, and I am a professor of neurobiology and ophthalmology" \
      --model_path tts_train_dir/run-December-26-2022_06+35PM-0000000/best_model.pth \
      --config_path tts_train_dir/run-December-26-2022_06+34PM-0000000/config.json \
      --out_path out.wav

 > Using model: glow_tts
 > Setting up Audio Processor...
 | > sample_rate:22050
 | > resample:False
 | > num_mels:80
 | > log_func:np.log10
 | > min_level_db:-100
 | > frame_shift_ms:None
 | > frame_length_ms:None
 | > ref_level_db:20
 | > fft_size:1024
 | > power:1.5
 | > preemphasis:0.0
 | > griffin_lim_iters:60
 | > signal_norm:True
 | > symmetric_norm:True
 | > mel_fmin:0
 | > mel_fmax:None
 | > pitch_fmin:1.0
 | > pitch_fmax:640.0
 | > spec_gain:20.0
 | > stft_pad_mode:reflect
 | > max_norm:4.0
 | > clip_norm:True
 | > do_trim_silence:True
 | > trim_db:45
 | > do_sound_norm:False
 | > do_amp_to_db_linear:True
 | > do_amp_to_db_mel:True
 | > do_rms_norm:False
 | > db_level:None
 | > stats_path:None
 | > base:10
 | > hop_length:256
 | > win_length:1024
 > Text: Hello. I am Andrew Huberman, and I am a professor of neurobiology and ophthalmology
 > Text splitted to sentences.
['Hello.', 'I am Andrew Huberman, and I am a professor of neurobiology and ophthalmology']
aɪ æm ændɹu hubɚmə

## 📣 Listen to the synthesized wave 📣

In [56]:
import IPython


In [58]:
IPython.display.Audio(filename="out.wav")

## 🎉 Congratulations! 🎉 You now have trained your first TTS model! 
Follow up with the next tutorials to learn more advanced material.