# Voice Cloning

I'll be attempting to make a voice cloning model using Coqui TTS and the TIMIT dataset. Documentation on how to use Coqui TTS can be found here: https://tts.readthedocs.io/en/latest/index.html#.

In [1]:
import os

from trainer import Trainer, TrainerArgs

from TTS.tts.configs.shared_configs import BaseDatasetConfig
from TTS.tts.configs.vits_config import VitsConfig
from TTS.tts.datasets import load_tts_samples
from TTS.tts.models.vits import Vits, VitsArgs, VitsAudioConfig
from TTS.tts.utils.speakers import SpeakerManager
from TTS.tts.utils.text.tokenizer import TTSTokenizer
from TTS.utils.audio import AudioProcessor



# Data Preparation

In [2]:
dataset_path = 'D:\data\TIMIT\TRAIN'
output_path = './output'

I'll be using the vctk dataset config since it is already formatted for a multispeaker dataset. If you're planning on using this dataset config, your dataset needs to be configured in this way:

/MyTTSDataset\
&emsp;| -> /txt\
&emsp;&emsp;&emsp;| -> /speaker\
&emsp;&emsp;&emsp;&emsp;&emsp;| -> audio1.txt\
&emsp;&emsp;&emsp;&emsp;&emsp;| -> audio2.txt\
&emsp;&emsp;&emsp;&emsp;&emsp;| -> ...\
&emsp;| -> /wav48\
&emsp;&emsp;&emsp;| -> /speaker\
&emsp;&emsp;&emsp;&emsp;&emsp;| -> audio1.wav\
&emsp;&emsp;&emsp;&emsp;&emsp;| -> audio2.wav\
&emsp;&emsp;&emsp;&emsp;&emsp;| -> ...

In [3]:
# define dataset config
# need to use vctk_old format if using wav files instead of flac

dataset_config = BaseDatasetConfig(
    name="vctk_old", meta_file_train="", language="en-us", path=dataset_path
)

In [4]:
audio_config = VitsAudioConfig(
    sample_rate=22050, win_length=1024, hop_length=256, num_mels=80, mel_fmin=0, mel_fmax=None
)

In [5]:
vitsArgs = VitsArgs(
    use_speaker_embedding=True,
)

In [6]:
config = VitsConfig(
    model_args=vitsArgs,
    audio=audio_config,
    run_name="vits_vctk",
    batch_size=32,
    eval_batch_size=16,
    batch_group_size=5,
    num_loader_workers=4,
    num_eval_loader_workers=4,
    run_eval=True,
    test_delay_epochs=-1,
    epochs=1000,
    text_cleaner="english_cleaners",
    use_phonemes=True,
    phoneme_language="en",
    phoneme_cache_path=os.path.join(output_path, "phoneme_cache"),
    compute_input_seq_cache=True,
    print_step=25,
    print_eval=False,
    mixed_precision=True,
    max_text_len=325,  # change this if you have a larger VRAM than 16GB
    output_path=output_path,
    datasets=[dataset_config],
    cudnn_benchmark=False,
)

In [7]:
# INITIALIZE THE AUDIO PROCESSOR
# Audio processor is used for feature extraction and audio I/O.
# It mainly serves to the dataloader and the training loggers.
ap = AudioProcessor.init_from_config(config)

 > Setting up Audio Processor...
 | > sample_rate:22050
 | > resample:False
 | > num_mels:80
 | > log_func:np.log10
 | > min_level_db:0
 | > frame_shift_ms:None
 | > frame_length_ms:None
 | > ref_level_db:None
 | > fft_size:1024
 | > power:None
 | > preemphasis:0.0
 | > griffin_lim_iters:None
 | > signal_norm:None
 | > symmetric_norm:None
 | > mel_fmin:0
 | > mel_fmax:None
 | > pitch_fmin:None
 | > pitch_fmax:None
 | > spec_gain:20.0
 | > stft_pad_mode:reflect
 | > max_norm:1.0
 | > clip_norm:True
 | > do_trim_silence:False
 | > trim_db:60
 | > do_sound_norm:False
 | > do_amp_to_db_linear:True
 | > do_amp_to_db_mel:True
 | > do_rms_norm:False
 | > db_level:None
 | > stats_path:None
 | > base:10
 | > hop_length:256
 | > win_length:1024


In [8]:
# INITIALIZE THE TOKENIZER
# Tokenizer is used to convert text to sequences of token IDs.
# config is updated with the default characters if not defined in the config.
tokenizer, config = TTSTokenizer.init_from_config(config)

In [9]:
# LOAD DATA SAMPLES
# Each sample is a list of ```[text, audio_file_path, speaker_name]```

train_samples, eval_samples = load_tts_samples(
    dataset_config,
    eval_split=True,
    eval_split_max_size=config.eval_split_max_size,
    eval_split_size=config.eval_split_size,
)

 | > Found 2580 files in D:\data\TIMIT\TRAIN


In [10]:
# init speaker manager for multi-speaker training
# it maps speaker-id to speaker-name in the model and data-loader
speaker_manager = SpeakerManager()
speaker_manager.set_ids_from_data(train_samples + eval_samples, parse_key="speaker_name")
config.model_args.num_speakers = speaker_manager.num_speakers

In [11]:
# init model
model = Vits(config, ap, tokenizer, speaker_manager)

 > initialization of speaker-embedding layers.


In [None]:
# init the trainer and 🚀
trainer = Trainer(
    TrainerArgs(),
    config,
    output_path,
    model=model,
    train_samples=train_samples,
    eval_samples=eval_samples,
)
trainer.fit()

 > Using CUDA: True
 > Number of GPUs: 1

 > Model has 86476460 parameters

[4m[1m > EPOCH: 0/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce


 > `speakers.pth` is saved to ./output\vits_vctk-September-23-2022_02+46AM-3c624ce\speakers.pth.
 > `speakers_file` is updated in the config.json.
[*] Pre-computing phonemes...


  0%|          | 1/2555 [00:00<24:54,  1.71it/s]

['<BLNK>', 'b', '<BLNK>', 'ɪ', '<BLNK>', 'f', '<BLNK>', 'ɔ', '<BLNK>', 'ɹ', '<BLNK>', ' ', '<BLNK>', 'm', '<BLNK>', 'ʌ', '<BLNK>', 't', '<BLNK>', '͡', '<BLNK>', 'ʃ', '<BLNK>', ' ', '<BLNK>', 'l', '<BLNK>', 'ɔ', '<BLNK>', 'ŋ', '<BLNK>', 'ɡ', '<BLNK>', 'ɚ', '<BLNK>', ' ', '<BLNK>', 'ð', '<BLNK>', 'ə', '<BLNK>', ' ', '<BLNK>', 'm', '<BLNK>', 'ɚ', '<BLNK>', 'i', '<BLNK>', 'n', '<BLNK>', ' ', '<BLNK>', 'k', '<BLNK>', 'w', '<BLNK>', 'a', '<BLNK>', 'ɪ', '<BLNK>', 'ə', '<BLNK>', 't', '<BLNK>', 'ɪ', '<BLNK>', 'd', '<BLNK>', ' ', '<BLNK>', 'd', '<BLNK>', 'a', '<BLNK>', 'ʊ', '<BLNK>', 'n', '<BLNK>', '.', '<BLNK>']
 [!] Character '͡' not found in the vocabulary. Discarding it.


100%|██████████| 2555/2555 [01:35<00:00, 26.83it/s]




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 02:48:32) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.


  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore

[1m   --> STEP: 0/80 -- GLOBAL_STEP: 0[0m
     | > loss_disc: 6.11368  (6.11368)
     | > loss_disc_real_0: 1.05477  (1.05477)
     | > loss_disc_real_1: 0.98233  (0.98233)
     | > loss_disc_real_2: 1.00725  (1.00725)
     | > loss_disc_real_3: 1.03027  (1.03027)
     | > loss_disc_real_4: 1.02319  (1.02319)
     | > loss_disc_real_5: 1.01460  (1.01460)
     | > loss_0: 6.11368  (6.11368)
     | > grad_norm_0: 0.00000  (0.00000)
     | > loss_gen: 6.11202  (6.11202)
     | > loss_kl: 172.06483  (172.06483)
     | > loss_feat: 0.22016  (0.22016)
     | > loss_mel: 83.79940  (83.79940)
     | > loss_duration: 1.68916  (1.68916)
     | > amp_scaler: 32768.00000  (32768.00000)
     | > loss_1: 263.88559  (263.88559)
     | > grad_norm_1: 0.00000  (0.00000)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 19.65040  (19.65043)
     | > loader_time: 26.86980  (26.86977)


[1



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time: 0.01101 [0m(+0.00000)
     | > avg_loss_disc: 2.99067 [0m(+0.00000)
     | > avg_loss_disc_real_0: 0.24615 [0m(+0.00000)
     | > avg_loss_disc_real_1: 0.18960 [0m(+0.00000)
     | > avg_loss_disc_real_2: 0.31744 [0m(+0.00000)
     | > avg_loss_disc_real_3: 0.23929 [0m(+0.00000)
     | > avg_loss_disc_real_4: 0.26313 [0m(+0.00000)
     | > avg_loss_disc_real_5: 0.25094 [0m(+0.00000)
     | > avg_loss_0: 2.99067 [0m(+0.00000)
     | > avg_loss_gen: 1.53253 [0m(+0.00000)
     | > avg_loss_kl: 2.42016 [0m(+0.00000)
     | > avg_loss_feat: 0.17695 [0m(+0.00000)
     | > avg_loss_mel: 44.02678 [0m(+0.00000)
     | > avg_loss_duration: 2.08415 [0m(+0.00000)
     | > avg_loss_1: 50.24057 [0m(+0.00000)

 > BEST MODEL : ./output\vits_vctk-September-23-2022_02+46AM-3c624ce\best_model_80.pth

[4m[1m > EPOCH: 1/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 02:57:44) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 20/80 -- GLOBAL_STEP: 100[0m
     | > loss_disc: 3.01141  (3.07165)
     | > loss_disc_real_0: 0.24578  (0.25090)
     | > loss_disc_real_1: 0.26513  (0.26752)
     | > loss_disc_real_2: 0.27293  (0.25913)
     | > loss_disc_real_3: 0.23149  (0.25820)
     | > loss_disc_real_4: 0.24993  (0.25593)
     | > loss_disc_real_5: 0.22031  (0.25331)
     | > loss_0: 3.01141  (3.07165)
     | > grad_norm_0: 0.92365  (2.30652)
     | > loss_gen: 1.47746  (1.51351)
     | > loss_kl: 2.29899  (2.90413)
     | > loss_feat: 0.40550  (0.35985)
     | > loss_mel: 43.20527  (42.79910)
     | > loss_duration: 1.75414  (1.71961)
     | > amp_scaler: 128.00000  (147.20000)
     | > loss_1: 49.14136  (49.29620)
     | > grad_norm_1: 130.37881  (225.38504)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.81250  (3.72100)
     | > loader_time: 0.01000  (0.00866)


[1m   --> STEP: 45/80 -- GLOBAL_STEP: 125[0m
     | > loss_disc: 2.94522  (3.01591)
    



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00200)
     | > avg_loss_disc:[92m 2.81389 [0m(-0.17677)
     | > avg_loss_disc_real_0:[92m 0.11632 [0m(-0.12983)
     | > avg_loss_disc_real_1:[91m 0.30232 [0m(+0.11273)
     | > avg_loss_disc_real_2:[92m 0.25000 [0m(-0.06744)
     | > avg_loss_disc_real_3:[91m 0.25131 [0m(+0.01202)
     | > avg_loss_disc_real_4:[92m 0.18194 [0m(-0.08119)
     | > avg_loss_disc_real_5:[92m 0.23990 [0m(-0.01104)
     | > avg_loss_0:[92m 2.81389 [0m(-0.17677)
     | > avg_loss_gen:[91m 1.77927 [0m(+0.24674)
     | > avg_loss_kl:[92m 1.57916 [0m(-0.84100)
     | > avg_loss_feat:[91m 0.99965 [0m(+0.82270)
     | > avg_loss_mel:[92m 40.68757 [0m(-3.33921)
     | > avg_loss_duration:[92m 2.07579 [0m(-0.00836)
     | > avg_loss_1:[92m 47.12144 [0m(-3.11913)

 > BEST MODEL : ./output\vits_vctk-September-23-2022_02+46AM-3c624ce\best_model_160.pth

[4m[1m > EPOCH: 2/1000[0m
 --> ./output\vits_vctk-Septe



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 03:03:43) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 15/80 -- GLOBAL_STEP: 175[0m
     | > loss_disc: 2.70822  (2.74839)
     | > loss_disc_real_0: 0.08817  (0.11828)
     | > loss_disc_real_1: 0.24986  (0.24931)
     | > loss_disc_real_2: 0.25055  (0.24961)
     | > loss_disc_real_3: 0.24942  (0.24694)
     | > loss_disc_real_4: 0.25395  (0.25074)
     | > loss_disc_real_5: 0.25324  (0.23898)
     | > loss_0: 2.70822  (2.74839)
     | > grad_norm_0: 2.53157  (6.01052)
     | > loss_gen: 1.88771  (1.89132)
     | > loss_kl: 1.71137  (1.94142)
     | > loss_feat: 1.42440  (1.21705)
     | > loss_mel: 38.74998  (39.95965)
     | > loss_duration: 1.69472  (1.74811)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 45.46818  (46.75755)
     | > grad_norm_1: 108.47432  (162.58652)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.73160  (3.70142)
     | > loader_time: 0.01000  (0.00841)


[1m   --> STEP: 40/80 -- GLOBAL_STEP: 200[0m
     | > loss_disc: 2.71678  (2.73990)
    



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[91m 3.07179 [0m(+0.25789)
     | > avg_loss_disc_real_0:[91m 0.27248 [0m(+0.15616)
     | > avg_loss_disc_real_1:[92m 0.25592 [0m(-0.04640)
     | > avg_loss_disc_real_2:[92m 0.24913 [0m(-0.00087)
     | > avg_loss_disc_real_3:[91m 0.28359 [0m(+0.03228)
     | > avg_loss_disc_real_4:[91m 0.25089 [0m(+0.06895)
     | > avg_loss_disc_real_5:[91m 0.26192 [0m(+0.02202)
     | > avg_loss_0:[91m 3.07179 [0m(+0.25789)
     | > avg_loss_gen:[92m 1.54680 [0m(-0.23247)
     | > avg_loss_kl:[91m 1.69688 [0m(+0.11772)
     | > avg_loss_feat:[92m 0.46630 [0m(-0.53336)
     | > avg_loss_mel:[92m 36.20634 [0m(-4.48122)
     | > avg_loss_duration:[92m 1.99865 [0m(-0.07714)
     | > avg_loss_1:[92m 41.91497 [0m(-5.20647)

 > BEST MODEL : ./output\vits_vctk-September-23-2022_02+46AM-3c624ce\best_model_240.pth

[4m[1m > EPOCH: 3/1000[0m
 --> ./output\vits_vctk-Septe



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 03:09:36) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 10/80 -- GLOBAL_STEP: 250[0m
     | > loss_disc: 2.84081  (2.95717)
     | > loss_disc_real_0: 0.15214  (0.25659)
     | > loss_disc_real_1: 0.24419  (0.24495)
     | > loss_disc_real_2: 0.25628  (0.24866)
     | > loss_disc_real_3: 0.27792  (0.24230)
     | > loss_disc_real_4: 0.24324  (0.24343)
     | > loss_disc_real_5: 0.23603  (0.23780)
     | > loss_0: 2.84081  (2.95717)
     | > grad_norm_0: 4.79401  (5.57308)
     | > loss_gen: 1.53450  (1.55451)
     | > loss_kl: 1.83119  (2.02393)
     | > loss_feat: 0.57709  (0.46294)
     | > loss_mel: 33.04788  (34.49731)
     | > loss_duration: 1.71335  (1.75774)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 38.70401  (40.29643)
     | > grad_norm_1: 141.54292  (162.55446)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.72220  (3.68935)
     | > loader_time: 0.01000  (0.00861)


[1m   --> STEP: 35/80 -- GLOBAL_STEP: 275[0m
     | > loss_disc: 2.87061  (2.92550)
    



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[91m 3.18682 [0m(+0.11503)
     | > avg_loss_disc_real_0:[91m 0.42324 [0m(+0.15076)
     | > avg_loss_disc_real_1:[91m 0.34581 [0m(+0.08989)
     | > avg_loss_disc_real_2:[92m 0.24111 [0m(-0.00802)
     | > avg_loss_disc_real_3:[92m 0.27120 [0m(-0.01239)
     | > avg_loss_disc_real_4:[92m 0.16745 [0m(-0.08343)
     | > avg_loss_disc_real_5:[91m 0.26542 [0m(+0.00350)
     | > avg_loss_0:[91m 3.18682 [0m(+0.11503)
     | > avg_loss_gen:[91m 1.57898 [0m(+0.03218)
     | > avg_loss_kl:[92m 1.44568 [0m(-0.25120)
     | > avg_loss_feat:[91m 0.55991 [0m(+0.09362)
     | > avg_loss_mel:[92m 32.08495 [0m(-4.12140)
     | > avg_loss_duration:[92m 1.97502 [0m(-0.02363)
     | > avg_loss_1:[92m 37.64454 [0m(-4.27043)

 > BEST MODEL : ./output\vits_vctk-September-23-2022_02+46AM-3c624ce\best_model_320.pth

[4m[1m > EPOCH: 4/1000[0m
 --> ./output\vits_vctk-Septe



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 03:15:44) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 5/80 -- GLOBAL_STEP: 325[0m
     | > loss_disc: 2.91549  (2.96087)
     | > loss_disc_real_0: 0.23185  (0.22413)
     | > loss_disc_real_1: 0.28585  (0.26384)
     | > loss_disc_real_2: 0.23617  (0.25562)
     | > loss_disc_real_3: 0.28903  (0.24814)
     | > loss_disc_real_4: 0.20400  (0.27601)
     | > loss_disc_real_5: 0.18648  (0.23771)
     | > loss_0: 2.91549  (2.96087)
     | > grad_norm_0: 4.44181  (7.04678)
     | > loss_gen: 1.47178  (1.56879)
     | > loss_kl: 1.90361  (1.82830)
     | > loss_feat: 0.35855  (0.40895)
     | > loss_mel: 31.23132  (32.65712)
     | > loss_duration: 1.78754  (1.77980)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 36.75280  (38.24296)
     | > grad_norm_1: 283.18399  (184.72749)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.76960  (3.67794)
     | > loader_time: 0.00800  (0.00801)


[1m   --> STEP: 30/80 -- GLOBAL_STEP: 350[0m
     | > loss_disc: 2.66527  (2.85263)
     



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.01001 [0m(+0.00100)
     | > avg_loss_disc:[92m 2.94327 [0m(-0.24355)
     | > avg_loss_disc_real_0:[92m 0.20249 [0m(-0.22075)
     | > avg_loss_disc_real_1:[92m 0.24687 [0m(-0.09893)
     | > avg_loss_disc_real_2:[91m 0.26396 [0m(+0.02285)
     | > avg_loss_disc_real_3:[92m 0.26885 [0m(-0.00235)
     | > avg_loss_disc_real_4:[91m 0.18107 [0m(+0.01361)
     | > avg_loss_disc_real_5:[91m 0.30917 [0m(+0.04375)
     | > avg_loss_0:[92m 2.94327 [0m(-0.24355)
     | > avg_loss_gen:[92m 1.55228 [0m(-0.02670)
     | > avg_loss_kl:[91m 1.60917 [0m(+0.16350)
     | > avg_loss_feat:[92m 0.50594 [0m(-0.05397)
     | > avg_loss_mel:[92m 29.71990 [0m(-2.36505)
     | > avg_loss_duration:[92m 1.96706 [0m(-0.00796)
     | > avg_loss_1:[92m 35.35435 [0m(-2.29018)

 > BEST MODEL : ./output\vits_vctk-September-23-2022_02+46AM-3c624ce\best_model_400.pth

[4m[1m > EPOCH: 5/1000[0m
 --> ./output\vits_vctk-Septe



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 03:21:44) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 0/80 -- GLOBAL_STEP: 400[0m
     | > loss_disc: 2.93410  (2.93410)
     | > loss_disc_real_0: 0.20164  (0.20164)
     | > loss_disc_real_1: 0.24692  (0.24692)
     | > loss_disc_real_2: 0.26419  (0.26419)
     | > loss_disc_real_3: 0.26897  (0.26897)
     | > loss_disc_real_4: 0.17975  (0.17975)
     | > loss_disc_real_5: 0.31079  (0.31079)
     | > loss_0: 2.93410  (2.93410)
     | > grad_norm_0: 12.41704  (12.41704)
     | > loss_gen: 1.53848  (1.53848)
     | > loss_kl: 1.50734  (1.50734)
     | > loss_feat: 0.52665  (0.52665)
     | > loss_mel: 31.63449  (31.63449)
     | > loss_duration: 1.81281  (1.81281)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 37.01978  (37.01978)
     | > grad_norm_1: 124.51566  (124.51566)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.68040  (3.68035)
     | > loader_time: 23.72990  (23.72993)


[1m   --> STEP: 25/80 -- GLOBAL_STEP: 425[0m
     | > loss_disc: 2.89359  (2.87923)
 



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.01001 [0m(-0.00000)
     | > avg_loss_disc:[92m 2.88226 [0m(-0.06101)
     | > avg_loss_disc_real_0:[92m 0.16962 [0m(-0.03287)
     | > avg_loss_disc_real_1:[91m 0.27900 [0m(+0.03213)
     | > avg_loss_disc_real_2:[92m 0.22622 [0m(-0.03774)
     | > avg_loss_disc_real_3:[92m 0.22819 [0m(-0.04066)
     | > avg_loss_disc_real_4:[91m 0.28356 [0m(+0.10250)
     | > avg_loss_disc_real_5:[92m 0.26105 [0m(-0.04812)
     | > avg_loss_0:[92m 2.88226 [0m(-0.06101)
     | > avg_loss_gen:[91m 1.59592 [0m(+0.04364)
     | > avg_loss_kl:[92m 1.27087 [0m(-0.33830)
     | > avg_loss_feat:[92m 0.44323 [0m(-0.06271)
     | > avg_loss_mel:[92m 27.89806 [0m(-1.82184)
     | > avg_loss_duration:[92m 1.95886 [0m(-0.00820)
     | > avg_loss_1:[92m 33.16693 [0m(-2.18742)

 > BEST MODEL : ./output\vits_vctk-September-23-2022_02+46AM-3c624ce\best_model_480.pth

[4m[1m > EPOCH: 6/1000[0m
 --> ./output\vits_vctk-Septe



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 03:27:43) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 20/80 -- GLOBAL_STEP: 500[0m
     | > loss_disc: 2.89324  (2.77882)
     | > loss_disc_real_0: 0.22289  (0.17144)
     | > loss_disc_real_1: 0.26583  (0.24808)
     | > loss_disc_real_2: 0.25108  (0.24841)
     | > loss_disc_real_3: 0.32476  (0.25096)
     | > loss_disc_real_4: 0.28794  (0.24619)
     | > loss_disc_real_5: 0.31851  (0.24594)
     | > loss_0: 2.89324  (2.77882)
     | > grad_norm_0: 12.86797  (10.30196)
     | > loss_gen: 1.72059  (1.76612)
     | > loss_kl: 1.64256  (1.72538)
     | > loss_feat: 0.92375  (0.96654)
     | > loss_mel: 29.73773  (29.96533)
     | > loss_duration: 1.75782  (1.75281)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 35.78247  (36.17618)
     | > grad_norm_1: 159.38362  (236.48839)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.62730  (3.74319)
     | > loader_time: 0.01000  (0.00856)


[1m   --> STEP: 45/80 -- GLOBAL_STEP: 525[0m
     | > loss_disc: 3.04211  (2.92344)
  



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.01802 [0m(+0.00801)
     | > avg_loss_disc:[91m 2.90947 [0m(+0.02721)
     | > avg_loss_disc_real_0:[92m 0.13241 [0m(-0.03721)
     | > avg_loss_disc_real_1:[92m 0.24035 [0m(-0.03865)
     | > avg_loss_disc_real_2:[91m 0.26150 [0m(+0.03528)
     | > avg_loss_disc_real_3:[92m 0.22592 [0m(-0.00227)
     | > avg_loss_disc_real_4:[92m 0.23155 [0m(-0.05201)
     | > avg_loss_disc_real_5:[92m 0.23095 [0m(-0.03010)
     | > avg_loss_0:[91m 2.90947 [0m(+0.02721)
     | > avg_loss_gen:[92m 1.44666 [0m(-0.14926)
     | > avg_loss_kl:[91m 1.52208 [0m(+0.25121)
     | > avg_loss_feat:[91m 0.50777 [0m(+0.06454)
     | > avg_loss_mel:[91m 28.99157 [0m(+1.09352)
     | > avg_loss_duration:[92m 1.93411 [0m(-0.02475)
     | > avg_loss_1:[91m 34.40219 [0m(+1.23526)


[4m[1m > EPOCH: 7/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 03:33:46) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 15/80 -- GLOBAL_STEP: 575[0m
     | > loss_disc: 2.75306  (2.88276)
     | > loss_disc_real_0: 0.07507  (0.21105)
     | > loss_disc_real_1: 0.25930  (0.25302)
     | > loss_disc_real_2: 0.20594  (0.25044)
     | > loss_disc_real_3: 0.24383  (0.24900)
     | > loss_disc_real_4: 0.24383  (0.24538)
     | > loss_disc_real_5: 0.27076  (0.25453)
     | > loss_0: 2.75306  (2.88276)
     | > grad_norm_0: 24.78207  (9.48525)
     | > loss_gen: 1.88194  (1.70840)
     | > loss_kl: 1.84662  (1.75102)
     | > loss_feat: 1.02333  (0.78417)
     | > loss_mel: 27.80295  (28.93932)
     | > loss_duration: 1.68762  (1.75477)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 34.24245  (34.93769)
     | > grad_norm_1: 285.24994  (290.26987)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.77560  (4.13891)
     | > loader_time: 0.01000  (0.00904)


[1m   --> STEP: 40/80 -- GLOBAL_STEP: 600[0m
     | > loss_disc: 2.84544  (2.85240)
   



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.01201 [0m(-0.00600)
     | > avg_loss_disc:[92m 2.89020 [0m(-0.01927)
     | > avg_loss_disc_real_0:[91m 0.20231 [0m(+0.06989)
     | > avg_loss_disc_real_1:[92m 0.23714 [0m(-0.00321)
     | > avg_loss_disc_real_2:[92m 0.15276 [0m(-0.10873)
     | > avg_loss_disc_real_3:[92m 0.17746 [0m(-0.04846)
     | > avg_loss_disc_real_4:[92m 0.16875 [0m(-0.06280)
     | > avg_loss_disc_real_5:[92m 0.13469 [0m(-0.09626)
     | > avg_loss_0:[92m 2.89020 [0m(-0.01927)
     | > avg_loss_gen:[92m 1.24186 [0m(-0.20480)
     | > avg_loss_kl:[92m 1.47913 [0m(-0.04295)
     | > avg_loss_feat:[91m 0.53594 [0m(+0.02817)
     | > avg_loss_mel:[92m 25.85097 [0m(-3.14061)
     | > avg_loss_duration:[91m 1.97993 [0m(+0.04583)
     | > avg_loss_1:[92m 31.08783 [0m(-3.31437)

 > BEST MODEL : ./output\vits_vctk-September-23-2022_02+46AM-3c624ce\best_model_640.pth

[4m[1m > EPOCH: 8/1000[0m
 --> ./output\vits_vctk-Septe



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 03:40:08) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 10/80 -- GLOBAL_STEP: 650[0m
     | > loss_disc: 2.88375  (2.83719)
     | > loss_disc_real_0: 0.24939  (0.21920)
     | > loss_disc_real_1: 0.29363  (0.24729)
     | > loss_disc_real_2: 0.27808  (0.25801)
     | > loss_disc_real_3: 0.19382  (0.25378)
     | > loss_disc_real_4: 0.20492  (0.25203)
     | > loss_disc_real_5: 0.19402  (0.23157)
     | > loss_0: 2.88375  (2.83719)
     | > grad_norm_0: 15.00054  (12.11371)
     | > loss_gen: 1.68873  (1.71992)
     | > loss_kl: 1.60884  (1.70320)
     | > loss_feat: 0.79929  (0.76377)
     | > loss_mel: 28.56273  (28.57067)
     | > loss_duration: 1.69726  (1.76856)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 34.35686  (34.52612)
     | > grad_norm_1: 272.05252  (439.99576)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.73620  (3.70636)
     | > loader_time: 0.01000  (0.00851)


[1m   --> STEP: 35/80 -- GLOBAL_STEP: 675[0m
     | > loss_disc: 2.64242  (2.87036)
  



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.01001 [0m(-0.00200)
     | > avg_loss_disc:[91m 2.90152 [0m(+0.01133)
     | > avg_loss_disc_real_0:[91m 0.20969 [0m(+0.00739)
     | > avg_loss_disc_real_1:[91m 0.27112 [0m(+0.03398)
     | > avg_loss_disc_real_2:[91m 0.25838 [0m(+0.10561)
     | > avg_loss_disc_real_3:[91m 0.29743 [0m(+0.11997)
     | > avg_loss_disc_real_4:[91m 0.24999 [0m(+0.08124)
     | > avg_loss_disc_real_5:[91m 0.33683 [0m(+0.20214)
     | > avg_loss_0:[91m 2.90152 [0m(+0.01133)
     | > avg_loss_gen:[91m 1.77477 [0m(+0.53291)
     | > avg_loss_kl:[91m 1.72360 [0m(+0.24447)
     | > avg_loss_feat:[91m 0.60578 [0m(+0.06984)
     | > avg_loss_mel:[91m 26.27793 [0m(+0.42696)
     | > avg_loss_duration:[92m 1.93578 [0m(-0.04415)
     | > avg_loss_1:[91m 32.31786 [0m(+1.23003)


[4m[1m > EPOCH: 9/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 03:46:11) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 5/80 -- GLOBAL_STEP: 725[0m
     | > loss_disc: 2.83291  (2.86697)
     | > loss_disc_real_0: 0.17815  (0.23548)
     | > loss_disc_real_1: 0.22700  (0.23706)
     | > loss_disc_real_2: 0.27813  (0.24939)
     | > loss_disc_real_3: 0.25312  (0.23349)
     | > loss_disc_real_4: 0.21067  (0.23470)
     | > loss_disc_real_5: 0.23793  (0.21841)
     | > loss_0: 2.83291  (2.86697)
     | > grad_norm_0: 21.49244  (19.95113)
     | > loss_gen: 1.78805  (1.63230)
     | > loss_kl: 1.49709  (1.62903)
     | > loss_feat: 0.92652  (0.76766)
     | > loss_mel: 27.93019  (28.57004)
     | > loss_duration: 1.82928  (1.79821)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 33.97114  (34.39724)
     | > grad_norm_1: 274.78519  (305.40323)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.72490  (3.67506)
     | > loader_time: 0.00900  (0.00781)


[1m   --> STEP: 30/80 -- GLOBAL_STEP: 750[0m
     | > loss_disc: 2.97954  (2.84505)
   



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.01001 [0m(+0.00000)
     | > avg_loss_disc:[92m 2.68038 [0m(-0.22115)
     | > avg_loss_disc_real_0:[92m 0.11516 [0m(-0.09454)
     | > avg_loss_disc_real_1:[92m 0.23646 [0m(-0.03466)
     | > avg_loss_disc_real_2:[92m 0.24377 [0m(-0.01461)
     | > avg_loss_disc_real_3:[92m 0.20973 [0m(-0.08770)
     | > avg_loss_disc_real_4:[92m 0.22102 [0m(-0.02897)
     | > avg_loss_disc_real_5:[92m 0.25027 [0m(-0.08656)
     | > avg_loss_0:[92m 2.68038 [0m(-0.22115)
     | > avg_loss_gen:[91m 1.78040 [0m(+0.00563)
     | > avg_loss_kl:[92m 1.56405 [0m(-0.15954)
     | > avg_loss_feat:[91m 1.30162 [0m(+0.69584)
     | > avg_loss_mel:[91m 27.10733 [0m(+0.82940)
     | > avg_loss_duration:[92m 1.92527 [0m(-0.01051)
     | > avg_loss_1:[91m 33.67868 [0m(+1.36082)


[4m[1m > EPOCH: 10/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 03:52:17) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 0/80 -- GLOBAL_STEP: 800[0m
     | > loss_disc: 2.63484  (2.63484)
     | > loss_disc_real_0: 0.13646  (0.13646)
     | > loss_disc_real_1: 0.23127  (0.23127)
     | > loss_disc_real_2: 0.23769  (0.23769)
     | > loss_disc_real_3: 0.20275  (0.20275)
     | > loss_disc_real_4: 0.21213  (0.21213)
     | > loss_disc_real_5: 0.23528  (0.23528)
     | > loss_0: 2.63484  (2.63484)
     | > grad_norm_0: 16.90382  (16.90382)
     | > loss_gen: 2.01796  (2.01796)
     | > loss_kl: 1.56051  (1.56051)
     | > loss_feat: 1.63720  (1.63720)
     | > loss_mel: 28.98617  (28.98617)
     | > loss_duration: 1.75865  (1.75865)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 35.96048  (35.96048)
     | > grad_norm_1: 432.88168  (432.88168)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.71790  (3.71788)
     | > loader_time: 23.65760  (23.65759)


[1m   --> STEP: 25/80 -- GLOBAL_STEP: 825[0m
     | > loss_disc: 2.71803  (2.76545)
 



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.01001 [0m(-0.00000)
     | > avg_loss_disc:[91m 3.04045 [0m(+0.36007)
     | > avg_loss_disc_real_0:[91m 0.35437 [0m(+0.23922)
     | > avg_loss_disc_real_1:[91m 0.30299 [0m(+0.06654)
     | > avg_loss_disc_real_2:[92m 0.12958 [0m(-0.11419)
     | > avg_loss_disc_real_3:[91m 0.29197 [0m(+0.08224)
     | > avg_loss_disc_real_4:[91m 0.30429 [0m(+0.08327)
     | > avg_loss_disc_real_5:[91m 0.38129 [0m(+0.13102)
     | > avg_loss_0:[91m 3.04045 [0m(+0.36007)
     | > avg_loss_gen:[91m 1.93235 [0m(+0.15195)
     | > avg_loss_kl:[91m 1.72510 [0m(+0.16105)
     | > avg_loss_feat:[92m 0.57933 [0m(-0.72229)
     | > avg_loss_mel:[92m 26.88945 [0m(-0.21789)
     | > avg_loss_duration:[91m 1.93892 [0m(+0.01365)
     | > avg_loss_1:[92m 33.06515 [0m(-0.61353)


[4m[1m > EPOCH: 11/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 03:58:18) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 20/80 -- GLOBAL_STEP: 900[0m
     | > loss_disc: 2.93844  (2.78947)
     | > loss_disc_real_0: 0.26238  (0.17401)
     | > loss_disc_real_1: 0.19596  (0.24970)
     | > loss_disc_real_2: 0.20490  (0.25092)
     | > loss_disc_real_3: 0.18537  (0.24487)
     | > loss_disc_real_4: 0.15214  (0.24217)
     | > loss_disc_real_5: 0.22738  (0.25042)
     | > loss_0: 2.93844  (2.78947)
     | > grad_norm_0: 52.88070  (45.70899)
     | > loss_gen: 1.61262  (1.77950)
     | > loss_kl: 1.50151  (1.56321)
     | > loss_feat: 0.61944  (1.04579)
     | > loss_mel: 27.16593  (27.35041)
     | > loss_duration: 1.79795  (1.75472)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 32.69746  (33.49363)
     | > grad_norm_1: 353.47534  (472.48340)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.74480  (3.71515)
     | > loader_time: 0.01000  (0.00886)


[1m   --> STEP: 45/80 -- GLOBAL_STEP: 925[0m
     | > loss_disc: 2.89584  (2.78150)
  



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.01001 [0m(+0.00000)
     | > avg_loss_disc:[92m 2.82881 [0m(-0.21164)
     | > avg_loss_disc_real_0:[92m 0.10761 [0m(-0.24676)
     | > avg_loss_disc_real_1:[92m 0.30210 [0m(-0.00089)
     | > avg_loss_disc_real_2:[91m 0.22489 [0m(+0.09531)
     | > avg_loss_disc_real_3:[92m 0.22174 [0m(-0.07023)
     | > avg_loss_disc_real_4:[92m 0.26237 [0m(-0.04192)
     | > avg_loss_disc_real_5:[92m 0.22749 [0m(-0.15381)
     | > avg_loss_0:[92m 2.82881 [0m(-0.21164)
     | > avg_loss_gen:[92m 1.69232 [0m(-0.24003)
     | > avg_loss_kl:[92m 1.44380 [0m(-0.28130)
     | > avg_loss_feat:[91m 1.27774 [0m(+0.69841)
     | > avg_loss_mel:[91m 28.45235 [0m(+1.56290)
     | > avg_loss_duration:[91m 1.96166 [0m(+0.02274)
     | > avg_loss_1:[91m 34.82787 [0m(+1.76272)


[4m[1m > EPOCH: 12/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 04:04:11) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 15/80 -- GLOBAL_STEP: 975[0m
     | > loss_disc: 2.73996  (2.83610)
     | > loss_disc_real_0: 0.16750  (0.21364)
     | > loss_disc_real_1: 0.23806  (0.25212)
     | > loss_disc_real_2: 0.21800  (0.25181)
     | > loss_disc_real_3: 0.22285  (0.24484)
     | > loss_disc_real_4: 0.27005  (0.24139)
     | > loss_disc_real_5: 0.27344  (0.25761)
     | > loss_0: 2.73996  (2.83610)
     | > grad_norm_0: 39.14388  (53.51194)
     | > loss_gen: 1.98844  (1.81275)
     | > loss_kl: 1.67183  (1.52509)
     | > loss_feat: 1.16271  (1.05262)
     | > loss_mel: 29.10928  (27.41999)
     | > loss_duration: 1.71484  (1.76080)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 35.64709  (33.57126)
     | > grad_norm_1: 487.81717  (469.07669)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.74850  (3.70407)
     | > loader_time: 0.01000  (0.00834)


[1m   --> STEP: 40/80 -- GLOBAL_STEP: 1000[0m
     | > loss_disc: 2.86682  (2.77348)
 



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00100)
     | > avg_loss_disc:[92m 2.74528 [0m(-0.08353)
     | > avg_loss_disc_real_0:[91m 0.23395 [0m(+0.12634)
     | > avg_loss_disc_real_1:[92m 0.20020 [0m(-0.10189)
     | > avg_loss_disc_real_2:[92m 0.19387 [0m(-0.03102)
     | > avg_loss_disc_real_3:[91m 0.23301 [0m(+0.01127)
     | > avg_loss_disc_real_4:[92m 0.21631 [0m(-0.04607)
     | > avg_loss_disc_real_5:[92m 0.21608 [0m(-0.01141)
     | > avg_loss_0:[92m 2.74528 [0m(-0.08353)
     | > avg_loss_gen:[91m 1.69733 [0m(+0.00501)
     | > avg_loss_kl:[91m 1.75070 [0m(+0.30690)
     | > avg_loss_feat:[92m 1.13935 [0m(-0.13840)
     | > avg_loss_mel:[92m 25.83222 [0m(-2.62013)
     | > avg_loss_duration:[92m 1.95220 [0m(-0.00946)
     | > avg_loss_1:[92m 32.37180 [0m(-2.45608)


[4m[1m > EPOCH: 13/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 04:10:13) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 10/80 -- GLOBAL_STEP: 1050[0m
     | > loss_disc: 2.90858  (2.82913)
     | > loss_disc_real_0: 0.14012  (0.17158)
     | > loss_disc_real_1: 0.28778  (0.25747)
     | > loss_disc_real_2: 0.17008  (0.26417)
     | > loss_disc_real_3: 0.29353  (0.24724)
     | > loss_disc_real_4: 0.28219  (0.27244)
     | > loss_disc_real_5: 0.19931  (0.25799)
     | > loss_0: 2.90858  (2.82913)
     | > grad_norm_0: 44.39010  (63.41825)
     | > loss_gen: 1.75772  (1.91266)
     | > loss_kl: 1.53964  (1.53794)
     | > loss_feat: 0.92423  (1.28221)
     | > loss_mel: 25.64748  (26.95044)
     | > loss_duration: 1.77191  (1.76906)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 31.64098  (33.45232)
     | > grad_norm_1: 622.65839  (551.87018)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.73250  (3.68662)
     | > loader_time: 0.00700  (0.00791)


[1m   --> STEP: 35/80 -- GLOBAL_STEP: 1075[0m
     | > loss_disc: 2.75962  (2.79443)




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00000)
     | > avg_loss_disc:[91m 2.77834 [0m(+0.03306)
     | > avg_loss_disc_real_0:[92m 0.03159 [0m(-0.20236)
     | > avg_loss_disc_real_1:[91m 0.22622 [0m(+0.02601)
     | > avg_loss_disc_real_2:[91m 0.20476 [0m(+0.01090)
     | > avg_loss_disc_real_3:[92m 0.22418 [0m(-0.00883)
     | > avg_loss_disc_real_4:[92m 0.18052 [0m(-0.03579)
     | > avg_loss_disc_real_5:[91m 0.23861 [0m(+0.02253)
     | > avg_loss_0:[91m 2.77834 [0m(+0.03306)
     | > avg_loss_gen:[92m 1.40432 [0m(-0.29301)
     | > avg_loss_kl:[92m 1.74503 [0m(-0.00568)
     | > avg_loss_feat:[91m 1.27906 [0m(+0.13972)
     | > avg_loss_mel:[91m 26.53620 [0m(+0.70398)
     | > avg_loss_duration:[91m 1.98339 [0m(+0.03119)
     | > avg_loss_1:[91m 32.94800 [0m(+0.57620)


[4m[1m > EPOCH: 14/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 04:16:31) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 5/80 -- GLOBAL_STEP: 1125[0m
     | > loss_disc: 2.69800  (2.71166)
     | > loss_disc_real_0: 0.15222  (0.19043)
     | > loss_disc_real_1: 0.20684  (0.23639)
     | > loss_disc_real_2: 0.19028  (0.23484)
     | > loss_disc_real_3: 0.21581  (0.24046)
     | > loss_disc_real_4: 0.20923  (0.24843)
     | > loss_disc_real_5: 0.14543  (0.22793)
     | > loss_0: 2.69800  (2.71166)
     | > grad_norm_0: 40.85512  (79.53668)
     | > loss_gen: 2.07893  (1.89321)
     | > loss_kl: 1.61686  (1.55692)
     | > loss_feat: 1.26955  (1.33364)
     | > loss_mel: 27.50124  (27.32279)
     | > loss_duration: 1.71024  (1.78131)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 34.17682  (33.88787)
     | > grad_norm_1: 388.09039  (503.70212)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.68750  (3.66446)
     | > loader_time: 0.00800  (0.00789)


[1m   --> STEP: 30/80 -- GLOBAL_STEP: 1150[0m
     | > loss_disc: 2.73445  (2.77034)
 



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.02803 [0m(+0.01902)
     | > avg_loss_disc:[91m 2.87250 [0m(+0.09416)
     | > avg_loss_disc_real_0:[91m 0.03763 [0m(+0.00603)
     | > avg_loss_disc_real_1:[91m 0.26452 [0m(+0.03831)
     | > avg_loss_disc_real_2:[91m 0.24095 [0m(+0.03618)
     | > avg_loss_disc_real_3:[92m 0.19310 [0m(-0.03108)
     | > avg_loss_disc_real_4:[91m 0.21645 [0m(+0.03593)
     | > avg_loss_disc_real_5:[92m 0.21332 [0m(-0.02528)
     | > avg_loss_0:[91m 2.87250 [0m(+0.09416)
     | > avg_loss_gen:[91m 1.41140 [0m(+0.00708)
     | > avg_loss_kl:[92m 1.18328 [0m(-0.56175)
     | > avg_loss_feat:[92m 1.09671 [0m(-0.18236)
     | > avg_loss_mel:[91m 27.29061 [0m(+0.75441)
     | > avg_loss_duration:[91m 2.04007 [0m(+0.05669)
     | > avg_loss_1:[91m 33.02206 [0m(+0.07406)


[4m[1m > EPOCH: 15/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 04:22:27) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 0/80 -- GLOBAL_STEP: 1200[0m
     | > loss_disc: 2.85750  (2.85750)
     | > loss_disc_real_0: 0.04982  (0.04982)
     | > loss_disc_real_1: 0.26994  (0.26994)
     | > loss_disc_real_2: 0.24794  (0.24794)
     | > loss_disc_real_3: 0.19686  (0.19686)
     | > loss_disc_real_4: 0.23794  (0.23794)
     | > loss_disc_real_5: 0.22905  (0.22905)
     | > loss_0: 2.85750  (2.85750)
     | > grad_norm_0: 220.90564  (220.90564)
     | > loss_gen: 2.27551  (2.27551)
     | > loss_kl: 1.04209  (1.04209)
     | > loss_feat: 1.01844  (1.01844)
     | > loss_mel: 26.63858  (26.63858)
     | > loss_duration: 1.90073  (1.90073)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 32.87535  (32.87535)
     | > grad_norm_1: 539.09753  (539.09753)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.69220  (3.69219)
     | > loader_time: 23.52820  (23.52816)


[1m   --> STEP: 25/80 -- GLOBAL_STEP: 1225[0m
     | > loss_disc: 2.74492  (2.7482



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.01902)
     | > avg_loss_disc:[92m 2.83044 [0m(-0.04207)
     | > avg_loss_disc_real_0:[91m 0.16549 [0m(+0.12786)
     | > avg_loss_disc_real_1:[92m 0.25725 [0m(-0.00727)
     | > avg_loss_disc_real_2:[92m 0.18882 [0m(-0.05213)
     | > avg_loss_disc_real_3:[91m 0.19498 [0m(+0.00188)
     | > avg_loss_disc_real_4:[92m 0.18932 [0m(-0.02713)
     | > avg_loss_disc_real_5:[92m 0.18245 [0m(-0.03087)
     | > avg_loss_0:[92m 2.83044 [0m(-0.04207)
     | > avg_loss_gen:[92m 1.39408 [0m(-0.01732)
     | > avg_loss_kl:[91m 1.40291 [0m(+0.21963)
     | > avg_loss_feat:[92m 0.77025 [0m(-0.32646)
     | > avg_loss_mel:[92m 23.84937 [0m(-3.44124)
     | > avg_loss_duration:[92m 2.02290 [0m(-0.01718)
     | > avg_loss_1:[92m 29.43950 [0m(-3.58256)

 > BEST MODEL : ./output\vits_vctk-September-23-2022_02+46AM-3c624ce\best_model_1280.pth

[4m[1m > EPOCH: 16/1000[0m
 --> ./output\vits_vctk-Sep



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 04:28:35) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 20/80 -- GLOBAL_STEP: 1300[0m
     | > loss_disc: 2.80967  (2.76613)
     | > loss_disc_real_0: 0.16705  (0.18505)
     | > loss_disc_real_1: 0.24721  (0.24637)
     | > loss_disc_real_2: 0.25926  (0.24838)
     | > loss_disc_real_3: 0.25704  (0.23816)
     | > loss_disc_real_4: 0.24368  (0.24681)
     | > loss_disc_real_5: 0.22122  (0.24623)
     | > loss_0: 2.80967  (2.76613)
     | > grad_norm_0: 36.23070  (56.61380)
     | > loss_gen: 2.02500  (1.81227)
     | > loss_kl: 1.55088  (1.46371)
     | > loss_feat: 1.27442  (1.18627)
     | > loss_mel: 26.83059  (25.54435)
     | > loss_duration: 1.78529  (1.77906)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 33.46618  (31.78565)
     | > grad_norm_1: 756.95923  (719.85760)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.76350  (3.71533)
     | > loader_time: 0.01000  (0.00876)


[1m   --> STEP: 45/80 -- GLOBAL_STEP: 1325[0m
     | > loss_disc: 2.79511  (2.76312)




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.01001 [0m(+0.00100)
     | > avg_loss_disc:[92m 2.82656 [0m(-0.00388)
     | > avg_loss_disc_real_0:[92m 0.10992 [0m(-0.05556)
     | > avg_loss_disc_real_1:[92m 0.22467 [0m(-0.03258)
     | > avg_loss_disc_real_2:[92m 0.18146 [0m(-0.00736)
     | > avg_loss_disc_real_3:[92m 0.18560 [0m(-0.00939)
     | > avg_loss_disc_real_4:[91m 0.27101 [0m(+0.08169)
     | > avg_loss_disc_real_5:[91m 0.28549 [0m(+0.10303)
     | > avg_loss_0:[92m 2.82656 [0m(-0.00388)
     | > avg_loss_gen:[91m 1.49801 [0m(+0.10393)
     | > avg_loss_kl:[92m 1.06917 [0m(-0.33374)
     | > avg_loss_feat:[91m 0.84057 [0m(+0.07032)
     | > avg_loss_mel:[91m 24.11366 [0m(+0.26430)
     | > avg_loss_duration:[91m 2.04033 [0m(+0.01743)
     | > avg_loss_1:[91m 29.56175 [0m(+0.12225)


[4m[1m > EPOCH: 17/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 04:34:22) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 15/80 -- GLOBAL_STEP: 1375[0m
     | > loss_disc: 2.70308  (2.75525)
     | > loss_disc_real_0: 0.18148  (0.19856)
     | > loss_disc_real_1: 0.24593  (0.24754)
     | > loss_disc_real_2: 0.24039  (0.24907)
     | > loss_disc_real_3: 0.19253  (0.24072)
     | > loss_disc_real_4: 0.22937  (0.24783)
     | > loss_disc_real_5: 0.19816  (0.24203)
     | > loss_0: 2.70308  (2.75525)
     | > grad_norm_0: 35.44003  (120.18858)
     | > loss_gen: 1.59424  (1.89556)
     | > loss_kl: 1.50305  (1.46158)
     | > loss_feat: 1.20256  (1.43611)
     | > loss_mel: 25.11116  (25.48549)
     | > loss_duration: 1.74513  (1.78167)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 31.15614  (32.06040)
     | > grad_norm_1: 895.00128  (811.69342)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.59130  (3.56751)
     | > loader_time: 0.00900  (0.00834)


[1m   --> STEP: 40/80 -- GLOBAL_STEP: 1400[0m
     | > loss_disc: 2.73150  (2.74380)



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00100)
     | > avg_loss_disc:[92m 2.60079 [0m(-0.22577)
     | > avg_loss_disc_real_0:[91m 0.13315 [0m(+0.02323)
     | > avg_loss_disc_real_1:[92m 0.21788 [0m(-0.00679)
     | > avg_loss_disc_real_2:[91m 0.20849 [0m(+0.02703)
     | > avg_loss_disc_real_3:[91m 0.23967 [0m(+0.05407)
     | > avg_loss_disc_real_4:[92m 0.24014 [0m(-0.03087)
     | > avg_loss_disc_real_5:[92m 0.21258 [0m(-0.07291)
     | > avg_loss_0:[92m 2.60079 [0m(-0.22577)
     | > avg_loss_gen:[91m 1.77152 [0m(+0.27351)
     | > avg_loss_kl:[91m 1.35282 [0m(+0.28365)
     | > avg_loss_feat:[91m 1.34818 [0m(+0.50760)
     | > avg_loss_mel:[91m 25.26597 [0m(+1.15230)
     | > avg_loss_duration:[92m 2.00851 [0m(-0.03182)
     | > avg_loss_1:[91m 31.74700 [0m(+2.18525)


[4m[1m > EPOCH: 18/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 04:39:57) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 10/80 -- GLOBAL_STEP: 1450[0m
     | > loss_disc: 2.77647  (2.71233)
     | > loss_disc_real_0: 0.24340  (0.16054)
     | > loss_disc_real_1: 0.30407  (0.25293)
     | > loss_disc_real_2: 0.22130  (0.23898)
     | > loss_disc_real_3: 0.27500  (0.24485)
     | > loss_disc_real_4: 0.29812  (0.24697)
     | > loss_disc_real_5: 0.29029  (0.24162)
     | > loss_0: 2.77647  (2.71233)
     | > grad_norm_0: 103.50808  (99.10791)
     | > loss_gen: 1.97015  (1.88964)
     | > loss_kl: 1.52985  (1.45097)
     | > loss_feat: 1.15705  (1.43215)
     | > loss_mel: 23.51728  (24.71344)
     | > loss_duration: 1.77328  (1.79744)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 29.94761  (31.28364)
     | > grad_norm_1: 826.52887  (1105.52258)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.60130  (3.57155)
     | > loader_time: 0.00800  (0.00831)


[1m   --> STEP: 35/80 -- GLOBAL_STEP: 1475[0m
     | > loss_disc: 2.57617  (2.69829



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00000)
     | > avg_loss_disc:[91m 2.76720 [0m(+0.16641)
     | > avg_loss_disc_real_0:[91m 0.48029 [0m(+0.34714)
     | > avg_loss_disc_real_1:[92m 0.20432 [0m(-0.01356)
     | > avg_loss_disc_real_2:[91m 0.21667 [0m(+0.00817)
     | > avg_loss_disc_real_3:[92m 0.22364 [0m(-0.01603)
     | > avg_loss_disc_real_4:[91m 0.24608 [0m(+0.00594)
     | > avg_loss_disc_real_5:[91m 0.25958 [0m(+0.04700)
     | > avg_loss_0:[91m 2.76720 [0m(+0.16641)
     | > avg_loss_gen:[91m 2.19520 [0m(+0.42368)
     | > avg_loss_kl:[91m 1.35444 [0m(+0.00162)
     | > avg_loss_feat:[92m 1.32977 [0m(-0.01841)
     | > avg_loss_mel:[92m 23.94368 [0m(-1.32229)
     | > avg_loss_duration:[92m 1.98663 [0m(-0.02188)
     | > avg_loss_1:[92m 30.80972 [0m(-0.93728)


[4m[1m > EPOCH: 19/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 04:45:33) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 5/80 -- GLOBAL_STEP: 1525[0m
     | > loss_disc: 2.64787  (2.82015)
     | > loss_disc_real_0: 0.13734  (0.21205)
     | > loss_disc_real_1: 0.23660  (0.24965)
     | > loss_disc_real_2: 0.22932  (0.24660)
     | > loss_disc_real_3: 0.26473  (0.24487)
     | > loss_disc_real_4: 0.17606  (0.25100)
     | > loss_disc_real_5: 0.18596  (0.24147)
     | > loss_0: 2.64787  (2.82015)
     | > grad_norm_0: 91.83904  (109.04144)
     | > loss_gen: 2.28856  (1.84920)
     | > loss_kl: 1.50842  (1.42157)
     | > loss_feat: 1.67267  (1.40656)
     | > loss_mel: 24.13403  (25.03430)
     | > loss_duration: 1.78070  (1.81690)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 31.38439  (31.52853)
     | > grad_norm_1: 1073.05212  (1016.89838)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.57120  (3.53061)
     | > loader_time: 0.00900  (0.00761)


[1m   --> STEP: 30/80 -- GLOBAL_STEP: 1550[0m
     | > loss_disc: 2.70549  (2.70735



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00000)
     | > avg_loss_disc:[92m 2.50268 [0m(-0.26452)
     | > avg_loss_disc_real_0:[92m 0.14248 [0m(-0.33781)
     | > avg_loss_disc_real_1:[91m 0.29011 [0m(+0.08580)
     | > avg_loss_disc_real_2:[91m 0.23799 [0m(+0.02132)
     | > avg_loss_disc_real_3:[91m 0.28027 [0m(+0.05663)
     | > avg_loss_disc_real_4:[92m 0.22344 [0m(-0.02265)
     | > avg_loss_disc_real_5:[91m 0.28599 [0m(+0.02642)
     | > avg_loss_0:[92m 2.50268 [0m(-0.26452)
     | > avg_loss_gen:[91m 2.24991 [0m(+0.05471)
     | > avg_loss_kl:[92m 1.08490 [0m(-0.26954)
     | > avg_loss_feat:[91m 1.79932 [0m(+0.46955)
     | > avg_loss_mel:[91m 24.47887 [0m(+0.53520)
     | > avg_loss_duration:[91m 2.01944 [0m(+0.03280)
     | > avg_loss_1:[91m 31.63243 [0m(+0.82272)


[4m[1m > EPOCH: 20/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 04:51:08) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 0/80 -- GLOBAL_STEP: 1600[0m
     | > loss_disc: 2.54612  (2.54612)
     | > loss_disc_real_0: 0.19341  (0.19341)
     | > loss_disc_real_1: 0.27807  (0.27807)
     | > loss_disc_real_2: 0.24841  (0.24841)
     | > loss_disc_real_3: 0.25764  (0.25764)
     | > loss_disc_real_4: 0.20838  (0.20838)
     | > loss_disc_real_5: 0.25352  (0.25352)
     | > loss_0: 2.54612  (2.54612)
     | > grad_norm_0: 152.74756  (152.74756)
     | > loss_gen: 2.00295  (2.00295)
     | > loss_kl: 1.12006  (1.12006)
     | > loss_feat: 1.78833  (1.78833)
     | > loss_mel: 25.34683  (25.34683)
     | > loss_duration: 1.84887  (1.84887)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 32.10705  (32.10705)
     | > grad_norm_1: 2057.43604  (2057.43604)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.57530  (3.57525)
     | > loader_time: 23.16080  (23.16076)


[1m   --> STEP: 25/80 -- GLOBAL_STEP: 1625[0m
     | > loss_disc: 2.62073  (2.64



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[91m 2.62111 [0m(+0.11843)
     | > avg_loss_disc_real_0:[92m 0.10926 [0m(-0.03322)
     | > avg_loss_disc_real_1:[92m 0.20706 [0m(-0.08305)
     | > avg_loss_disc_real_2:[92m 0.15680 [0m(-0.08119)
     | > avg_loss_disc_real_3:[92m 0.21192 [0m(-0.06835)
     | > avg_loss_disc_real_4:[92m 0.22117 [0m(-0.00227)
     | > avg_loss_disc_real_5:[92m 0.24258 [0m(-0.04342)
     | > avg_loss_0:[91m 2.62111 [0m(+0.11843)
     | > avg_loss_gen:[92m 1.63285 [0m(-0.61705)
     | > avg_loss_kl:[91m 1.32096 [0m(+0.23606)
     | > avg_loss_feat:[92m 1.68088 [0m(-0.11844)
     | > avg_loss_mel:[92m 24.33858 [0m(-0.14030)
     | > avg_loss_duration:[91m 2.02240 [0m(+0.00296)
     | > avg_loss_1:[92m 30.99567 [0m(-0.63677)


[4m[1m > EPOCH: 21/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 04:56:43) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 20/80 -- GLOBAL_STEP: 1700[0m
     | > loss_disc: 2.78785  (2.63963)
     | > loss_disc_real_0: 0.03910  (0.14172)
     | > loss_disc_real_1: 0.23026  (0.24536)
     | > loss_disc_real_2: 0.24680  (0.23670)
     | > loss_disc_real_3: 0.22172  (0.23797)
     | > loss_disc_real_4: 0.26433  (0.24324)
     | > loss_disc_real_5: 0.24474  (0.24518)
     | > loss_0: 2.78785  (2.63963)
     | > grad_norm_0: 515.59625  (289.15140)
     | > loss_gen: 2.28990  (2.00062)
     | > loss_kl: 1.36966  (1.37685)
     | > loss_feat: 1.85231  (1.83446)
     | > loss_mel: 23.37904  (23.82031)
     | > loss_duration: 1.75867  (1.77794)
     | > amp_scaler: 64.00000  (64.00000)
     | > loss_1: 30.64957  (30.81019)
     | > grad_norm_1: 2381.84814  (2497.19800)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.62730  (3.58095)
     | > loader_time: 0.00900  (0.00871)


[1m   --> STEP: 45/80 -- GLOBAL_STEP: 1725[0m
     | > loss_disc: 2.63618  (2.63774



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.01001 [0m(+0.00100)
     | > avg_loss_disc:[91m 2.64821 [0m(+0.02710)
     | > avg_loss_disc_real_0:[91m 0.18924 [0m(+0.07999)
     | > avg_loss_disc_real_1:[91m 0.23814 [0m(+0.03108)
     | > avg_loss_disc_real_2:[91m 0.25396 [0m(+0.09715)
     | > avg_loss_disc_real_3:[91m 0.22844 [0m(+0.01652)
     | > avg_loss_disc_real_4:[92m 0.19902 [0m(-0.02215)
     | > avg_loss_disc_real_5:[92m 0.21861 [0m(-0.02396)
     | > avg_loss_0:[91m 2.64821 [0m(+0.02710)
     | > avg_loss_gen:[91m 1.80999 [0m(+0.17713)
     | > avg_loss_kl:[92m 1.06104 [0m(-0.25992)
     | > avg_loss_feat:[92m 1.42623 [0m(-0.25465)
     | > avg_loss_mel:[92m 21.98924 [0m(-2.34933)
     | > avg_loss_duration:[92m 2.00420 [0m(-0.01820)
     | > avg_loss_1:[92m 28.29070 [0m(-2.70496)

 > BEST MODEL : ./output\vits_vctk-September-23-2022_02+46AM-3c624ce\best_model_1760.pth

[4m[1m > EPOCH: 22/1000[0m
 --> ./output\vits_vctk-Sep



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 05:02:24) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 15/80 -- GLOBAL_STEP: 1775[0m
     | > loss_disc: 2.61714  (2.61393)
     | > loss_disc_real_0: 0.18990  (0.13243)
     | > loss_disc_real_1: 0.19375  (0.24591)
     | > loss_disc_real_2: 0.30699  (0.23696)
     | > loss_disc_real_3: 0.21590  (0.23756)
     | > loss_disc_real_4: 0.23422  (0.24450)
     | > loss_disc_real_5: 0.21912  (0.24584)
     | > loss_0: 2.61714  (2.61393)
     | > grad_norm_0: 480.34457  (411.98297)
     | > loss_gen: 1.99331  (2.03603)
     | > loss_kl: 1.29732  (1.40650)
     | > loss_feat: 1.81689  (1.94309)
     | > loss_mel: 24.98114  (23.86329)
     | > loss_duration: 1.75168  (1.77215)
     | > amp_scaler: 64.00000  (64.00000)
     | > loss_1: 31.84034  (31.02106)
     | > grad_norm_1: 2447.10059  (2815.74268)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.61730  (3.57005)
     | > loader_time: 0.00900  (0.00841)


[1m   --> STEP: 40/80 -- GLOBAL_STEP: 1800[0m
     | > loss_disc: 2.50946  (2.59792



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00100)
     | > avg_loss_disc:[92m 2.48060 [0m(-0.16762)
     | > avg_loss_disc_real_0:[92m 0.07753 [0m(-0.11171)
     | > avg_loss_disc_real_1:[92m 0.21225 [0m(-0.02588)
     | > avg_loss_disc_real_2:[92m 0.21666 [0m(-0.03730)
     | > avg_loss_disc_real_3:[92m 0.20740 [0m(-0.02104)
     | > avg_loss_disc_real_4:[92m 0.16209 [0m(-0.03693)
     | > avg_loss_disc_real_5:[92m 0.19419 [0m(-0.02442)
     | > avg_loss_0:[92m 2.48060 [0m(-0.16762)
     | > avg_loss_gen:[92m 1.73079 [0m(-0.07920)
     | > avg_loss_kl:[91m 1.06990 [0m(+0.00886)
     | > avg_loss_feat:[91m 1.98356 [0m(+0.55733)
     | > avg_loss_mel:[91m 23.70755 [0m(+1.71831)
     | > avg_loss_duration:[92m 1.99396 [0m(-0.01024)
     | > avg_loss_1:[91m 30.48576 [0m(+2.19506)


[4m[1m > EPOCH: 23/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 05:07:59) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 10/80 -- GLOBAL_STEP: 1850[0m
     | > loss_disc: 2.66733  (2.57913)
     | > loss_disc_real_0: 0.11252  (0.10598)
     | > loss_disc_real_1: 0.30385  (0.25637)
     | > loss_disc_real_2: 0.26480  (0.24164)
     | > loss_disc_real_3: 0.26817  (0.24688)
     | > loss_disc_real_4: 0.27754  (0.24731)
     | > loss_disc_real_5: 0.27039  (0.25240)
     | > loss_0: 2.66733  (2.57913)
     | > grad_norm_0: 382.96725  (294.05136)
     | > loss_gen: 2.22851  (2.11403)
     | > loss_kl: 1.50789  (1.48542)
     | > loss_feat: 2.02437  (2.10374)
     | > loss_mel: 23.49702  (23.79457)
     | > loss_duration: 1.74617  (1.77986)
     | > amp_scaler: 64.00000  (64.00000)
     | > loss_1: 31.00397  (31.27760)
     | > grad_norm_1: 3470.37012  (3062.27734)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.58930  (3.55484)
     | > loader_time: 0.01000  (0.00831)


[1m   --> STEP: 35/80 -- GLOBAL_STEP: 1875[0m
     | > loss_disc: 2.60277  (2.58601



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00000)
     | > avg_loss_disc:[91m 2.62398 [0m(+0.14338)
     | > avg_loss_disc_real_0:[92m 0.05255 [0m(-0.02498)
     | > avg_loss_disc_real_1:[91m 0.28546 [0m(+0.07320)
     | > avg_loss_disc_real_2:[91m 0.23217 [0m(+0.01551)
     | > avg_loss_disc_real_3:[91m 0.25329 [0m(+0.04589)
     | > avg_loss_disc_real_4:[91m 0.27068 [0m(+0.10859)
     | > avg_loss_disc_real_5:[91m 0.30042 [0m(+0.10623)
     | > avg_loss_0:[91m 2.62398 [0m(+0.14338)
     | > avg_loss_gen:[91m 1.92646 [0m(+0.19566)
     | > avg_loss_kl:[91m 1.38839 [0m(+0.31849)
     | > avg_loss_feat:[92m 1.60810 [0m(-0.37546)
     | > avg_loss_mel:[92m 21.49783 [0m(-2.20973)
     | > avg_loss_duration:[92m 1.99055 [0m(-0.00341)
     | > avg_loss_1:[92m 28.41132 [0m(-2.07444)


[4m[1m > EPOCH: 24/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 05:13:34) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 5/80 -- GLOBAL_STEP: 1925[0m
     | > loss_disc: 2.60786  (2.60672)
     | > loss_disc_real_0: 0.17851  (0.12222)
     | > loss_disc_real_1: 0.24931  (0.23776)
     | > loss_disc_real_2: 0.26769  (0.23408)
     | > loss_disc_real_3: 0.25377  (0.23535)
     | > loss_disc_real_4: 0.27634  (0.23623)
     | > loss_disc_real_5: 0.30812  (0.25429)
     | > loss_0: 2.60786  (2.60672)
     | > grad_norm_0: 391.89130  (372.66153)
     | > loss_gen: 2.26039  (2.08552)
     | > loss_kl: 1.42603  (1.43034)
     | > loss_feat: 2.12638  (2.10079)
     | > loss_mel: 24.06825  (24.09492)
     | > loss_duration: 1.74447  (1.76587)
     | > amp_scaler: 64.00000  (64.00000)
     | > loss_1: 31.62552  (31.47744)
     | > grad_norm_1: 2922.48706  (3021.10376)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.55220  (3.53342)
     | > loader_time: 0.00900  (0.00840)


[1m   --> STEP: 30/80 -- GLOBAL_STEP: 1950[0m
     | > loss_disc: 2.51939  (2.54800)



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[92m 2.58173 [0m(-0.04225)
     | > avg_loss_disc_real_0:[91m 0.12232 [0m(+0.06978)
     | > avg_loss_disc_real_1:[92m 0.22864 [0m(-0.05682)
     | > avg_loss_disc_real_2:[92m 0.21067 [0m(-0.02150)
     | > avg_loss_disc_real_3:[91m 0.26798 [0m(+0.01469)
     | > avg_loss_disc_real_4:[92m 0.21385 [0m(-0.05684)
     | > avg_loss_disc_real_5:[92m 0.21635 [0m(-0.08407)
     | > avg_loss_0:[92m 2.58173 [0m(-0.04225)
     | > avg_loss_gen:[91m 2.10820 [0m(+0.18174)
     | > avg_loss_kl:[92m 1.37949 [0m(-0.00890)
     | > avg_loss_feat:[91m 2.09999 [0m(+0.49189)
     | > avg_loss_mel:[91m 21.92873 [0m(+0.43091)
     | > avg_loss_duration:[92m 1.95667 [0m(-0.03387)
     | > avg_loss_1:[91m 29.47308 [0m(+1.06177)


[4m[1m > EPOCH: 25/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 05:19:09) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 0/80 -- GLOBAL_STEP: 2000[0m
     | > loss_disc: 2.52133  (2.52133)
     | > loss_disc_real_0: 0.16035  (0.16035)
     | > loss_disc_real_1: 0.23564  (0.23564)
     | > loss_disc_real_2: 0.21323  (0.21323)
     | > loss_disc_real_3: 0.25442  (0.25442)
     | > loss_disc_real_4: 0.21383  (0.21383)
     | > loss_disc_real_5: 0.20543  (0.20543)
     | > loss_0: 2.52133  (2.52133)
     | > grad_norm_0: 323.28055  (323.28055)
     | > loss_gen: 2.20342  (2.20342)
     | > loss_kl: 1.29950  (1.29950)
     | > loss_feat: 2.32385  (2.32385)
     | > loss_mel: 23.16245  (23.16245)
     | > loss_duration: 1.71870  (1.71870)
     | > amp_scaler: 64.00000  (64.00000)
     | > loss_1: 30.70792  (30.70792)
     | > grad_norm_1: 2782.44287  (2782.44287)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.60930  (3.60929)
     | > loader_time: 23.33210  (23.33215)


[1m   --> STEP: 25/80 -- GLOBAL_STEP: 2025[0m
     | > loss_disc: 2.55882  (2.5294



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[91m 2.60023 [0m(+0.01849)
     | > avg_loss_disc_real_0:[92m 0.08539 [0m(-0.03693)
     | > avg_loss_disc_real_1:[91m 0.28213 [0m(+0.05349)
     | > avg_loss_disc_real_2:[91m 0.26097 [0m(+0.05030)
     | > avg_loss_disc_real_3:[91m 0.32448 [0m(+0.05651)
     | > avg_loss_disc_real_4:[91m 0.21855 [0m(+0.00470)
     | > avg_loss_disc_real_5:[92m 0.19942 [0m(-0.01693)
     | > avg_loss_0:[91m 2.60023 [0m(+0.01849)
     | > avg_loss_gen:[92m 2.06515 [0m(-0.04305)
     | > avg_loss_kl:[92m 1.11058 [0m(-0.26890)
     | > avg_loss_feat:[92m 2.01129 [0m(-0.08869)
     | > avg_loss_mel:[91m 23.76898 [0m(+1.84024)
     | > avg_loss_duration:[91m 1.96190 [0m(+0.00523)
     | > avg_loss_1:[91m 30.91790 [0m(+1.44482)


[4m[1m > EPOCH: 26/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 05:24:44) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 20/80 -- GLOBAL_STEP: 2100[0m
     | > loss_disc: 2.55323  (2.54380)
     | > loss_disc_real_0: 0.11856  (0.11177)
     | > loss_disc_real_1: 0.23268  (0.22919)
     | > loss_disc_real_2: 0.38857  (0.23371)
     | > loss_disc_real_3: 0.22252  (0.24261)
     | > loss_disc_real_4: 0.21545  (0.24341)
     | > loss_disc_real_5: 0.22292  (0.24660)
     | > loss_0: 2.55323  (2.54380)
     | > grad_norm_0: 154.26967  (344.68542)
     | > loss_gen: 1.92456  (2.14883)
     | > loss_kl: 1.48718  (1.42764)
     | > loss_feat: 2.22918  (2.33059)
     | > loss_mel: 23.12783  (23.53635)
     | > loss_duration: 1.75155  (1.73108)
     | > amp_scaler: 64.00000  (64.00000)
     | > loss_1: 30.52030  (31.17448)
     | > grad_norm_1: 2740.01196  (2858.90234)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.63630  (3.58511)
     | > loader_time: 0.00900  (0.00871)


[1m   --> STEP: 45/80 -- GLOBAL_STEP: 2125[0m
     | > loss_disc: 2.57591  (2.55917



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00000)
     | > avg_loss_disc:[92m 2.54980 [0m(-0.05043)
     | > avg_loss_disc_real_0:[91m 0.15077 [0m(+0.06538)
     | > avg_loss_disc_real_1:[92m 0.27852 [0m(-0.00361)
     | > avg_loss_disc_real_2:[92m 0.20550 [0m(-0.05547)
     | > avg_loss_disc_real_3:[92m 0.23433 [0m(-0.09016)
     | > avg_loss_disc_real_4:[92m 0.21244 [0m(-0.00610)
     | > avg_loss_disc_real_5:[91m 0.24583 [0m(+0.04641)
     | > avg_loss_0:[92m 2.54980 [0m(-0.05043)
     | > avg_loss_gen:[91m 2.16937 [0m(+0.10422)
     | > avg_loss_kl:[91m 1.32122 [0m(+0.21064)
     | > avg_loss_feat:[92m 1.91949 [0m(-0.09181)
     | > avg_loss_mel:[92m 22.65256 [0m(-1.11642)
     | > avg_loss_duration:[91m 1.96638 [0m(+0.00448)
     | > avg_loss_1:[92m 30.02903 [0m(-0.88888)


[4m[1m > EPOCH: 27/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 05:30:19) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 15/80 -- GLOBAL_STEP: 2175[0m
     | > loss_disc: 2.45331  (2.53205)
     | > loss_disc_real_0: 0.08629  (0.10908)
     | > loss_disc_real_1: 0.19747  (0.22667)
     | > loss_disc_real_2: 0.16544  (0.23459)
     | > loss_disc_real_3: 0.18964  (0.23584)
     | > loss_disc_real_4: 0.22640  (0.23828)
     | > loss_disc_real_5: 0.21877  (0.24354)
     | > loss_0: 2.45331  (2.53205)
     | > grad_norm_0: 147.99045  (328.33557)
     | > loss_gen: 2.27636  (2.17885)
     | > loss_kl: 1.41281  (1.38009)
     | > loss_feat: 2.51579  (2.46654)
     | > loss_mel: 24.07610  (23.29811)
     | > loss_duration: 1.70579  (1.72608)
     | > amp_scaler: 64.00000  (64.00000)
     | > loss_1: 31.98685  (31.04967)
     | > grad_norm_1: 2577.00635  (2935.77148)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.62030  (3.56572)
     | > loader_time: 0.00900  (0.00847)


[1m   --> STEP: 40/80 -- GLOBAL_STEP: 2200[0m
     | > loss_disc: 2.47529  (2.53324



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00000)
     | > avg_loss_disc:[91m 2.59638 [0m(+0.04658)
     | > avg_loss_disc_real_0:[92m 0.04343 [0m(-0.10734)
     | > avg_loss_disc_real_1:[92m 0.23018 [0m(-0.04834)
     | > avg_loss_disc_real_2:[92m 0.17152 [0m(-0.03397)
     | > avg_loss_disc_real_3:[92m 0.22887 [0m(-0.00546)
     | > avg_loss_disc_real_4:[91m 0.27226 [0m(+0.05982)
     | > avg_loss_disc_real_5:[91m 0.26013 [0m(+0.01430)
     | > avg_loss_0:[91m 2.59638 [0m(+0.04658)
     | > avg_loss_gen:[92m 1.83440 [0m(-0.33497)
     | > avg_loss_kl:[92m 0.97892 [0m(-0.34231)
     | > avg_loss_feat:[91m 2.44217 [0m(+0.52268)
     | > avg_loss_mel:[92m 22.40774 [0m(-0.24482)
     | > avg_loss_duration:[92m 1.94756 [0m(-0.01882)
     | > avg_loss_1:[92m 29.61079 [0m(-0.41824)


[4m[1m > EPOCH: 28/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 05:35:54) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 10/80 -- GLOBAL_STEP: 2250[0m
     | > loss_disc: 2.72982  (2.70837)
     | > loss_disc_real_0: 0.03994  (0.20694)
     | > loss_disc_real_1: 0.22638  (0.23058)
     | > loss_disc_real_2: 0.19737  (0.23409)
     | > loss_disc_real_3: 0.30543  (0.24549)
     | > loss_disc_real_4: 0.21899  (0.23771)
     | > loss_disc_real_5: 0.27932  (0.23922)
     | > loss_0: 2.72982  (2.70837)
     | > grad_norm_0: 279.56485  (330.93704)
     | > loss_gen: 2.00371  (2.11659)
     | > loss_kl: 1.62288  (1.46293)
     | > loss_feat: 1.75563  (2.12661)
     | > loss_mel: 22.47989  (23.72289)
     | > loss_duration: 1.73218  (1.73137)
     | > amp_scaler: 64.00000  (64.00000)
     | > loss_1: 29.59428  (31.16038)
     | > grad_norm_1: 2112.23926  (2308.60425)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.59830  (3.55821)
     | > loader_time: 0.00800  (0.00801)


[1m   --> STEP: 35/80 -- GLOBAL_STEP: 2275[0m
     | > loss_disc: 2.40957  (2.61904



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[92m 2.48344 [0m(-0.11294)
     | > avg_loss_disc_real_0:[91m 0.06281 [0m(+0.01938)
     | > avg_loss_disc_real_1:[91m 0.23933 [0m(+0.00914)
     | > avg_loss_disc_real_2:[91m 0.25117 [0m(+0.07964)
     | > avg_loss_disc_real_3:[92m 0.18816 [0m(-0.04070)
     | > avg_loss_disc_real_4:[92m 0.25763 [0m(-0.01463)
     | > avg_loss_disc_real_5:[92m 0.21169 [0m(-0.04844)
     | > avg_loss_0:[92m 2.48344 [0m(-0.11294)
     | > avg_loss_gen:[91m 2.04536 [0m(+0.21096)
     | > avg_loss_kl:[91m 1.06148 [0m(+0.08257)
     | > avg_loss_feat:[92m 2.32944 [0m(-0.11273)
     | > avg_loss_mel:[91m 22.54546 [0m(+0.13772)
     | > avg_loss_duration:[92m 1.93614 [0m(-0.01142)
     | > avg_loss_1:[91m 29.91788 [0m(+0.30709)


[4m[1m > EPOCH: 29/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 05:41:29) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 5/80 -- GLOBAL_STEP: 2325[0m
     | > loss_disc: 2.51418  (2.53535)
     | > loss_disc_real_0: 0.06520  (0.09574)
     | > loss_disc_real_1: 0.26620  (0.23395)
     | > loss_disc_real_2: 0.29251  (0.24234)
     | > loss_disc_real_3: 0.24364  (0.23944)
     | > loss_disc_real_4: 0.27923  (0.24807)
     | > loss_disc_real_5: 0.23722  (0.24240)
     | > loss_0: 2.51418  (2.53535)
     | > grad_norm_0: 228.18578  (329.08392)
     | > loss_gen: 2.22973  (2.18101)
     | > loss_kl: 1.44677  (1.21951)
     | > loss_feat: 2.46840  (2.38582)
     | > loss_mel: 22.55832  (23.41446)
     | > loss_duration: 1.70841  (1.73842)
     | > amp_scaler: 64.00000  (64.00000)
     | > loss_1: 30.41164  (30.93923)
     | > grad_norm_1: 2303.71753  (2727.63379)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.54720  (3.53662)
     | > loader_time: 0.00800  (0.00801)


[1m   --> STEP: 30/80 -- GLOBAL_STEP: 2350[0m
     | > loss_disc: 2.47135  (2.49758)



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[91m 2.48591 [0m(+0.00247)
     | > avg_loss_disc_real_0:[91m 0.15452 [0m(+0.09170)
     | > avg_loss_disc_real_1:[92m 0.21512 [0m(-0.02420)
     | > avg_loss_disc_real_2:[92m 0.19007 [0m(-0.06109)
     | > avg_loss_disc_real_3:[91m 0.21388 [0m(+0.02571)
     | > avg_loss_disc_real_4:[92m 0.23215 [0m(-0.02548)
     | > avg_loss_disc_real_5:[91m 0.23319 [0m(+0.02150)
     | > avg_loss_0:[91m 2.48591 [0m(+0.00247)
     | > avg_loss_gen:[91m 2.06006 [0m(+0.01470)
     | > avg_loss_kl:[91m 1.18612 [0m(+0.12464)
     | > avg_loss_feat:[92m 1.99601 [0m(-0.33342)
     | > avg_loss_mel:[92m 21.42615 [0m(-1.11932)
     | > avg_loss_duration:[92m 1.93460 [0m(-0.00154)
     | > avg_loss_1:[92m 28.60294 [0m(-1.31494)


[4m[1m > EPOCH: 30/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 05:47:04) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 0/80 -- GLOBAL_STEP: 2400[0m
     | > loss_disc: 2.42715  (2.42715)
     | > loss_disc_real_0: 0.16058  (0.16058)
     | > loss_disc_real_1: 0.17884  (0.17884)
     | > loss_disc_real_2: 0.17422  (0.17422)
     | > loss_disc_real_3: 0.18235  (0.18235)
     | > loss_disc_real_4: 0.20553  (0.20553)
     | > loss_disc_real_5: 0.21103  (0.21103)
     | > loss_0: 2.42715  (2.42715)
     | > grad_norm_0: 230.31281  (230.31281)
     | > loss_gen: 2.41425  (2.41425)
     | > loss_kl: 1.05259  (1.05259)
     | > loss_feat: 2.91030  (2.91030)
     | > loss_mel: 23.11695  (23.11695)
     | > loss_duration: 1.70047  (1.70047)
     | > amp_scaler: 64.00000  (64.00000)
     | > loss_1: 31.19455  (31.19455)
     | > grad_norm_1: 2212.73657  (2212.73657)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.59030  (3.59027)
     | > loader_time: 23.22350  (23.22351)


[1m   --> STEP: 25/80 -- GLOBAL_STEP: 2425[0m
     | > loss_disc: 2.45772  (2.5060



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00000)
     | > avg_loss_disc:[91m 2.54897 [0m(+0.06305)
     | > avg_loss_disc_real_0:[92m 0.04450 [0m(-0.11002)
     | > avg_loss_disc_real_1:[91m 0.23933 [0m(+0.02421)
     | > avg_loss_disc_real_2:[92m 0.17689 [0m(-0.01319)
     | > avg_loss_disc_real_3:[91m 0.24890 [0m(+0.03502)
     | > avg_loss_disc_real_4:[92m 0.22462 [0m(-0.00753)
     | > avg_loss_disc_real_5:[92m 0.21740 [0m(-0.01579)
     | > avg_loss_0:[91m 2.54897 [0m(+0.06305)
     | > avg_loss_gen:[92m 1.84657 [0m(-0.21350)
     | > avg_loss_kl:[91m 1.33026 [0m(+0.14414)
     | > avg_loss_feat:[91m 2.36806 [0m(+0.37205)
     | > avg_loss_mel:[91m 22.16548 [0m(+0.73933)
     | > avg_loss_duration:[92m 1.92632 [0m(-0.00827)
     | > avg_loss_1:[91m 29.63670 [0m(+1.03376)


[4m[1m > EPOCH: 31/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 05:52:39) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 20/80 -- GLOBAL_STEP: 2500[0m
     | > loss_disc: 2.50927  (2.56193)
     | > loss_disc_real_0: 0.09942  (0.14708)
     | > loss_disc_real_1: 0.17575  (0.22607)
     | > loss_disc_real_2: 0.16375  (0.23246)
     | > loss_disc_real_3: 0.21483  (0.23598)
     | > loss_disc_real_4: 0.18293  (0.23934)
     | > loss_disc_real_5: 0.32361  (0.25115)
     | > loss_0: 2.50927  (2.56193)
     | > grad_norm_0: 85.01346  (341.67584)
     | > loss_gen: 2.13645  (2.21897)
     | > loss_kl: 1.45814  (1.33099)
     | > loss_feat: 2.46786  (2.54216)
     | > loss_mel: 23.27518  (23.12946)
     | > loss_duration: 1.68354  (1.70438)
     | > amp_scaler: 64.00000  (64.00000)
     | > loss_1: 31.02117  (30.92596)
     | > grad_norm_1: 2235.82031  (2519.67017)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.62230  (3.58106)
     | > loader_time: 0.00900  (0.00876)


[1m   --> STEP: 45/80 -- GLOBAL_STEP: 2525[0m
     | > loss_disc: 2.77104  (2.54524)



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[91m 2.88724 [0m(+0.33827)
     | > avg_loss_disc_real_0:[91m 0.09899 [0m(+0.05449)
     | > avg_loss_disc_real_1:[91m 0.27980 [0m(+0.04047)
     | > avg_loss_disc_real_2:[91m 0.29673 [0m(+0.11985)
     | > avg_loss_disc_real_3:[92m 0.18283 [0m(-0.06607)
     | > avg_loss_disc_real_4:[91m 0.29105 [0m(+0.06642)
     | > avg_loss_disc_real_5:[91m 0.23125 [0m(+0.01386)
     | > avg_loss_0:[91m 2.88724 [0m(+0.33827)
     | > avg_loss_gen:[92m 1.68144 [0m(-0.16513)
     | > avg_loss_kl:[92m 1.26870 [0m(-0.06156)
     | > avg_loss_feat:[92m 1.57952 [0m(-0.78854)
     | > avg_loss_mel:[91m 22.20332 [0m(+0.03784)
     | > avg_loss_duration:[91m 1.93268 [0m(+0.00635)
     | > avg_loss_1:[92m 28.66567 [0m(-0.97103)


[4m[1m > EPOCH: 32/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 05:58:14) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 15/80 -- GLOBAL_STEP: 2575[0m
     | > loss_disc: 2.56395  (2.68318)
     | > loss_disc_real_0: 0.09646  (0.25172)
     | > loss_disc_real_1: 0.21743  (0.23690)
     | > loss_disc_real_2: 0.20132  (0.23593)
     | > loss_disc_real_3: 0.20336  (0.22943)
     | > loss_disc_real_4: 0.21416  (0.23769)
     | > loss_disc_real_5: 0.20966  (0.24320)
     | > loss_0: 2.56395  (2.68318)
     | > grad_norm_0: 25.78421  (125.62117)
     | > loss_gen: 1.77863  (1.96011)
     | > loss_kl: 1.66627  (1.43767)
     | > loss_feat: 2.12704  (1.97666)
     | > loss_mel: 23.09465  (23.46389)
     | > loss_duration: 1.74265  (1.71198)
     | > amp_scaler: 64.00000  (64.00000)
     | > loss_1: 30.40925  (30.55031)
     | > grad_norm_1: 1398.30139  (1036.22559)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.62430  (3.56514)
     | > loader_time: 0.00900  (0.00821)


[1m   --> STEP: 40/80 -- GLOBAL_STEP: 2600[0m
     | > loss_disc: 2.54708  (2.59582)



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[92m 2.48474 [0m(-0.40250)
     | > avg_loss_disc_real_0:[91m 0.20236 [0m(+0.10337)
     | > avg_loss_disc_real_1:[92m 0.21184 [0m(-0.06795)
     | > avg_loss_disc_real_2:[92m 0.18432 [0m(-0.11241)
     | > avg_loss_disc_real_3:[92m 0.17630 [0m(-0.00653)
     | > avg_loss_disc_real_4:[92m 0.24324 [0m(-0.04780)
     | > avg_loss_disc_real_5:[92m 0.22956 [0m(-0.00169)
     | > avg_loss_0:[92m 2.48474 [0m(-0.40250)
     | > avg_loss_gen:[91m 2.00154 [0m(+0.32010)
     | > avg_loss_kl:[92m 1.24674 [0m(-0.02196)
     | > avg_loss_feat:[91m 2.17620 [0m(+0.59668)
     | > avg_loss_mel:[92m 22.01457 [0m(-0.18875)
     | > avg_loss_duration:[91m 1.94190 [0m(+0.00923)
     | > avg_loss_1:[91m 29.38096 [0m(+0.71530)


[4m[1m > EPOCH: 33/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 06:03:49) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 10/80 -- GLOBAL_STEP: 2650[0m
     | > loss_disc: 2.54300  (2.61555)
     | > loss_disc_real_0: 0.23111  (0.19229)
     | > loss_disc_real_1: 0.26794  (0.23364)
     | > loss_disc_real_2: 0.31628  (0.24485)
     | > loss_disc_real_3: 0.30307  (0.24460)
     | > loss_disc_real_4: 0.27405  (0.23440)
     | > loss_disc_real_5: 0.32035  (0.24417)
     | > loss_0: 2.54300  (2.61555)
     | > grad_norm_0: 118.32683  (266.74948)
     | > loss_gen: 2.00910  (2.10684)
     | > loss_kl: 1.43197  (1.33224)
     | > loss_feat: 2.49671  (2.43939)
     | > loss_mel: 21.90722  (22.74988)
     | > loss_duration: 1.68083  (1.70231)
     | > amp_scaler: 64.00000  (64.00000)
     | > loss_1: 29.52584  (30.33066)
     | > grad_norm_1: 3065.84985  (2193.55396)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.59630  (3.55856)
     | > loader_time: 0.00800  (0.00851)


[1m   --> STEP: 35/80 -- GLOBAL_STEP: 2675[0m
     | > loss_disc: 2.56488  (2.52791



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.01001 [0m(+0.00100)
     | > avg_loss_disc:[92m 2.41228 [0m(-0.07246)
     | > avg_loss_disc_real_0:[92m 0.08656 [0m(-0.11580)
     | > avg_loss_disc_real_1:[91m 0.25893 [0m(+0.04709)
     | > avg_loss_disc_real_2:[91m 0.24479 [0m(+0.06047)
     | > avg_loss_disc_real_3:[91m 0.24306 [0m(+0.06675)
     | > avg_loss_disc_real_4:[91m 0.29227 [0m(+0.04903)
     | > avg_loss_disc_real_5:[92m 0.22771 [0m(-0.00185)
     | > avg_loss_0:[92m 2.41228 [0m(-0.07246)
     | > avg_loss_gen:[91m 2.61602 [0m(+0.61447)
     | > avg_loss_kl:[92m 1.14557 [0m(-0.10117)
     | > avg_loss_feat:[91m 2.63935 [0m(+0.46315)
     | > avg_loss_mel:[92m 21.81218 [0m(-0.20239)
     | > avg_loss_duration:[92m 1.91669 [0m(-0.02521)
     | > avg_loss_1:[91m 30.12981 [0m(+0.74885)


[4m[1m > EPOCH: 34/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 06:09:24) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 5/80 -- GLOBAL_STEP: 2725[0m
     | > loss_disc: 2.53248  (2.45718)
     | > loss_disc_real_0: 0.11703  (0.09383)
     | > loss_disc_real_1: 0.21330  (0.22298)
     | > loss_disc_real_2: 0.14913  (0.22991)
     | > loss_disc_real_3: 0.18614  (0.23853)
     | > loss_disc_real_4: 0.17840  (0.21827)
     | > loss_disc_real_5: 0.21755  (0.24923)
     | > loss_0: 2.53248  (2.45718)
     | > grad_norm_0: 235.21416  (186.95293)
     | > loss_gen: 2.26273  (2.25416)
     | > loss_kl: 1.35920  (1.27679)
     | > loss_feat: 2.74208  (2.75532)
     | > loss_mel: 24.02898  (23.34395)
     | > loss_duration: 1.67535  (1.70849)
     | > amp_scaler: 64.00000  (64.00000)
     | > loss_1: 32.06834  (31.33871)
     | > grad_norm_1: 2038.72400  (2332.27686)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.56520  (3.54082)
     | > loader_time: 0.00800  (0.00761)


[1m   --> STEP: 30/80 -- GLOBAL_STEP: 2750[0m
     | > loss_disc: 2.35183  (2.46306)



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00801 [0m(-0.00200)
     | > avg_loss_disc:[91m 2.67989 [0m(+0.26761)
     | > avg_loss_disc_real_0:[91m 0.25537 [0m(+0.16880)
     | > avg_loss_disc_real_1:[91m 0.35333 [0m(+0.09440)
     | > avg_loss_disc_real_2:[92m 0.24332 [0m(-0.00147)
     | > avg_loss_disc_real_3:[92m 0.18093 [0m(-0.06212)
     | > avg_loss_disc_real_4:[92m 0.27601 [0m(-0.01626)
     | > avg_loss_disc_real_5:[92m 0.20229 [0m(-0.02542)
     | > avg_loss_0:[91m 2.67989 [0m(+0.26761)
     | > avg_loss_gen:[92m 2.02504 [0m(-0.59097)
     | > avg_loss_kl:[91m 1.36543 [0m(+0.21986)
     | > avg_loss_feat:[92m 1.71996 [0m(-0.91939)
     | > avg_loss_mel:[91m 23.09525 [0m(+1.28307)
     | > avg_loss_duration:[92m 1.90652 [0m(-0.01017)
     | > avg_loss_1:[92m 30.11220 [0m(-0.01761)


[4m[1m > EPOCH: 35/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 06:14:58) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 0/80 -- GLOBAL_STEP: 2800[0m
     | > loss_disc: 2.60840  (2.60840)
     | > loss_disc_real_0: 0.24103  (0.24103)
     | > loss_disc_real_1: 0.31527  (0.31527)
     | > loss_disc_real_2: 0.21814  (0.21814)
     | > loss_disc_real_3: 0.17060  (0.17060)
     | > loss_disc_real_4: 0.24003  (0.24003)
     | > loss_disc_real_5: 0.15898  (0.15898)
     | > loss_0: 2.60840  (2.60840)
     | > grad_norm_0: 31.08002  (31.08002)
     | > loss_gen: 2.08705  (2.08705)
     | > loss_kl: 1.32797  (1.32797)
     | > loss_feat: 2.14786  (2.14786)
     | > loss_mel: 24.88015  (24.88015)
     | > loss_duration: 1.72426  (1.72426)
     | > amp_scaler: 64.00000  (64.00000)
     | > loss_1: 32.16728  (32.16728)
     | > grad_norm_1: 508.93417  (508.93417)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.59230  (3.59228)
     | > loader_time: 23.15120  (23.15116)


[1m   --> STEP: 25/80 -- GLOBAL_STEP: 2825[0m
     | > loss_disc: 2.32794  (2.56129)
 



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00701 [0m(-0.00100)
     | > avg_loss_disc:[91m 2.73879 [0m(+0.05890)
     | > avg_loss_disc_real_0:[92m 0.11446 [0m(-0.14090)
     | > avg_loss_disc_real_1:[92m 0.23101 [0m(-0.12232)
     | > avg_loss_disc_real_2:[92m 0.20300 [0m(-0.04032)
     | > avg_loss_disc_real_3:[91m 0.19439 [0m(+0.01345)
     | > avg_loss_disc_real_4:[92m 0.18038 [0m(-0.09563)
     | > avg_loss_disc_real_5:[91m 0.23688 [0m(+0.03459)
     | > avg_loss_0:[91m 2.73879 [0m(+0.05890)
     | > avg_loss_gen:[92m 1.63441 [0m(-0.39063)
     | > avg_loss_kl:[92m 1.21345 [0m(-0.15197)
     | > avg_loss_feat:[91m 2.11496 [0m(+0.39500)
     | > avg_loss_mel:[91m 24.19194 [0m(+1.09669)
     | > avg_loss_duration:[91m 1.91695 [0m(+0.01043)
     | > avg_loss_1:[91m 31.07171 [0m(+0.95951)


[4m[1m > EPOCH: 36/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 06:20:33) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 20/80 -- GLOBAL_STEP: 2900[0m
     | > loss_disc: 2.47886  (2.44863)
     | > loss_disc_real_0: 0.26319  (0.07990)
     | > loss_disc_real_1: 0.16531  (0.22483)
     | > loss_disc_real_2: 0.23066  (0.23733)
     | > loss_disc_real_3: 0.25250  (0.23389)
     | > loss_disc_real_4: 0.21235  (0.24287)
     | > loss_disc_real_5: 0.25638  (0.24565)
     | > loss_0: 2.47886  (2.44863)
     | > grad_norm_0: 627.84821  (275.96448)
     | > loss_gen: 2.50094  (2.36323)
     | > loss_kl: 1.24118  (1.24183)
     | > loss_feat: 3.10700  (3.06481)
     | > loss_mel: 23.00406  (23.10704)
     | > loss_duration: 1.69341  (1.68951)
     | > amp_scaler: 64.00000  (64.00000)
     | > loss_1: 31.54659  (31.46641)
     | > grad_norm_1: 1035.95581  (1807.64783)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.63550  (3.58688)
     | > loader_time: 0.00900  (0.00916)


[1m   --> STEP: 45/80 -- GLOBAL_STEP: 2925[0m
     | > loss_disc: 2.42570  (2.42441



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00200)
     | > avg_loss_disc:[92m 2.28374 [0m(-0.45504)
     | > avg_loss_disc_real_0:[92m 0.09630 [0m(-0.01816)
     | > avg_loss_disc_real_1:[92m 0.14848 [0m(-0.08253)
     | > avg_loss_disc_real_2:[91m 0.21848 [0m(+0.01549)
     | > avg_loss_disc_real_3:[91m 0.21251 [0m(+0.01812)
     | > avg_loss_disc_real_4:[91m 0.26116 [0m(+0.08078)
     | > avg_loss_disc_real_5:[92m 0.19720 [0m(-0.03968)
     | > avg_loss_0:[92m 2.28374 [0m(-0.45504)
     | > avg_loss_gen:[91m 2.56250 [0m(+0.92808)
     | > avg_loss_kl:[92m 1.00888 [0m(-0.20458)
     | > avg_loss_feat:[91m 3.12879 [0m(+1.01384)
     | > avg_loss_mel:[92m 22.98649 [0m(-1.20545)
     | > avg_loss_duration:[91m 1.91884 [0m(+0.00189)
     | > avg_loss_1:[91m 31.60550 [0m(+0.53379)


[4m[1m > EPOCH: 37/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 06:26:08) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 15/80 -- GLOBAL_STEP: 2975[0m
     | > loss_disc: 2.44655  (2.46534)
     | > loss_disc_real_0: 0.12785  (0.10166)
     | > loss_disc_real_1: 0.14085  (0.21241)
     | > loss_disc_real_2: 0.28703  (0.23010)
     | > loss_disc_real_3: 0.14152  (0.22681)
     | > loss_disc_real_4: 0.23776  (0.23749)
     | > loss_disc_real_5: 0.24169  (0.24480)
     | > loss_0: 2.44655  (2.46534)
     | > grad_norm_0: 33.93363  (266.49365)
     | > loss_gen: 2.14642  (2.25686)
     | > loss_kl: 1.50034  (1.32685)
     | > loss_feat: 2.69024  (2.88507)
     | > loss_mel: 23.40163  (22.77559)
     | > loss_duration: 1.67995  (1.69133)
     | > amp_scaler: 64.00000  (64.00000)
     | > loss_1: 31.41858  (30.93569)
     | > grad_norm_1: 3871.46362  (2167.25928)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.60830  (3.57143)
     | > loader_time: 0.00900  (0.00841)


[1m   --> STEP: 40/80 -- GLOBAL_STEP: 3000[0m
     | > loss_disc: 2.51044  (2.52631)



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[91m 2.35621 [0m(+0.07247)
     | > avg_loss_disc_real_0:[92m 0.06493 [0m(-0.03137)
     | > avg_loss_disc_real_1:[91m 0.27948 [0m(+0.13099)
     | > avg_loss_disc_real_2:[91m 0.22526 [0m(+0.00678)
     | > avg_loss_disc_real_3:[91m 0.28129 [0m(+0.06877)
     | > avg_loss_disc_real_4:[92m 0.21003 [0m(-0.05113)
     | > avg_loss_disc_real_5:[91m 0.26015 [0m(+0.06295)
     | > avg_loss_0:[91m 2.35621 [0m(+0.07247)
     | > avg_loss_gen:[92m 2.40518 [0m(-0.15731)
     | > avg_loss_kl:[91m 1.14576 [0m(+0.13688)
     | > avg_loss_feat:[92m 2.61187 [0m(-0.51692)
     | > avg_loss_mel:[92m 21.25878 [0m(-1.72772)
     | > avg_loss_duration:[92m 1.91116 [0m(-0.00768)
     | > avg_loss_1:[92m 29.33275 [0m(-2.27275)


[4m[1m > EPOCH: 38/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 06:31:43) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 10/80 -- GLOBAL_STEP: 3050[0m
     | > loss_disc: 2.55869  (2.47784)
     | > loss_disc_real_0: 0.08086  (0.10191)
     | > loss_disc_real_1: 0.28104  (0.21620)
     | > loss_disc_real_2: 0.28994  (0.23744)
     | > loss_disc_real_3: 0.24598  (0.24236)
     | > loss_disc_real_4: 0.29057  (0.24652)
     | > loss_disc_real_5: 0.24798  (0.24203)
     | > loss_0: 2.55869  (2.47784)
     | > grad_norm_0: 553.46704  (367.00430)
     | > loss_gen: 2.27634  (2.30018)
     | > loss_kl: 1.19561  (1.28678)
     | > loss_feat: 2.47936  (2.95137)
     | > loss_mel: 21.89475  (22.34746)
     | > loss_duration: 1.65507  (1.67552)
     | > amp_scaler: 64.00000  (64.00000)
     | > loss_1: 29.50113  (30.56131)
     | > grad_norm_1: 2993.97290  (2553.20142)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.61130  (3.55654)
     | > loader_time: 0.00900  (0.00790)


[1m   --> STEP: 35/80 -- GLOBAL_STEP: 3075[0m
     | > loss_disc: 2.47214  (2.46110



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00000)
     | > avg_loss_disc:[91m 2.43871 [0m(+0.08249)
     | > avg_loss_disc_real_0:[92m 0.03020 [0m(-0.03472)
     | > avg_loss_disc_real_1:[92m 0.19186 [0m(-0.08761)
     | > avg_loss_disc_real_2:[92m 0.20024 [0m(-0.02502)
     | > avg_loss_disc_real_3:[91m 0.29933 [0m(+0.01805)
     | > avg_loss_disc_real_4:[91m 0.31002 [0m(+0.09999)
     | > avg_loss_disc_real_5:[91m 0.27641 [0m(+0.01626)
     | > avg_loss_0:[91m 2.43871 [0m(+0.08249)
     | > avg_loss_gen:[92m 2.16426 [0m(-0.24092)
     | > avg_loss_kl:[92m 0.97937 [0m(-0.16639)
     | > avg_loss_feat:[92m 2.59791 [0m(-0.01395)
     | > avg_loss_mel:[92m 20.27985 [0m(-0.97893)
     | > avg_loss_duration:[91m 1.91514 [0m(+0.00398)
     | > avg_loss_1:[92m 27.93654 [0m(-1.39621)

 > BEST MODEL : ./output\vits_vctk-September-23-2022_02+46AM-3c624ce\best_model_3120.pth

[4m[1m > EPOCH: 39/1000[0m
 --> ./output\vits_vctk-Sep



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 06:37:22) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 5/80 -- GLOBAL_STEP: 3125[0m
     | > loss_disc: 2.42701  (2.33856)
     | > loss_disc_real_0: 0.05040  (0.08982)
     | > loss_disc_real_1: 0.20831  (0.20982)
     | > loss_disc_real_2: 0.17330  (0.22482)
     | > loss_disc_real_3: 0.18638  (0.21270)
     | > loss_disc_real_4: 0.27101  (0.23667)
     | > loss_disc_real_5: 0.21021  (0.24345)
     | > loss_0: 2.42701  (2.33856)
     | > grad_norm_0: 457.88562  (176.75157)
     | > loss_gen: 2.38049  (2.36063)
     | > loss_kl: 1.22590  (1.28306)
     | > loss_feat: 3.24354  (3.26983)
     | > loss_mel: 22.30243  (22.76838)
     | > loss_duration: 1.66636  (1.68849)
     | > amp_scaler: 64.00000  (64.00000)
     | > loss_1: 30.81872  (31.37038)
     | > grad_norm_1: 2238.26465  (2956.03003)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.57130  (3.54523)
     | > loader_time: 0.00800  (0.00821)


[1m   --> STEP: 30/80 -- GLOBAL_STEP: 3150[0m
     | > loss_disc: 2.44758  (2.40172)



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[91m 2.61339 [0m(+0.17469)
     | > avg_loss_disc_real_0:[91m 0.40640 [0m(+0.37619)
     | > avg_loss_disc_real_1:[91m 0.21052 [0m(+0.01866)
     | > avg_loss_disc_real_2:[91m 0.27237 [0m(+0.07212)
     | > avg_loss_disc_real_3:[92m 0.19506 [0m(-0.10427)
     | > avg_loss_disc_real_4:[92m 0.26931 [0m(-0.04071)
     | > avg_loss_disc_real_5:[91m 0.30694 [0m(+0.03054)
     | > avg_loss_0:[91m 2.61339 [0m(+0.17469)
     | > avg_loss_gen:[91m 2.83817 [0m(+0.67391)
     | > avg_loss_kl:[91m 1.14252 [0m(+0.16315)
     | > avg_loss_feat:[92m 2.36335 [0m(-0.23456)
     | > avg_loss_mel:[91m 21.99035 [0m(+1.71050)
     | > avg_loss_duration:[92m 1.90957 [0m(-0.00556)
     | > avg_loss_1:[91m 30.24397 [0m(+2.30743)


[4m[1m > EPOCH: 40/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 06:42:57) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 0/80 -- GLOBAL_STEP: 3200[0m
     | > loss_disc: 2.57000  (2.57000)
     | > loss_disc_real_0: 0.42205  (0.42205)
     | > loss_disc_real_1: 0.21245  (0.21245)
     | > loss_disc_real_2: 0.24841  (0.24841)
     | > loss_disc_real_3: 0.16419  (0.16419)
     | > loss_disc_real_4: 0.22882  (0.22882)
     | > loss_disc_real_5: 0.25304  (0.25304)
     | > loss_0: 2.57000  (2.57000)
     | > grad_norm_0: 763.46765  (763.46765)
     | > loss_gen: 2.60496  (2.60496)
     | > loss_kl: 1.19766  (1.19766)
     | > loss_feat: 3.41681  (3.41681)
     | > loss_mel: 23.08001  (23.08001)
     | > loss_duration: 1.68518  (1.68518)
     | > amp_scaler: 64.00000  (64.00000)
     | > loss_1: 31.98461  (31.98461)
     | > grad_norm_1: 1501.25159  (1501.25159)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.58230  (3.58226)
     | > loader_time: 23.00350  (23.00348)


[1m   --> STEP: 25/80 -- GLOBAL_STEP: 3225[0m
     | > loss_disc: 2.50268  (2.4992



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.01001 [0m(+0.00100)
     | > avg_loss_disc:[92m 2.50969 [0m(-0.10371)
     | > avg_loss_disc_real_0:[92m 0.03939 [0m(-0.36701)
     | > avg_loss_disc_real_1:[91m 0.21098 [0m(+0.00046)
     | > avg_loss_disc_real_2:[92m 0.24203 [0m(-0.03033)
     | > avg_loss_disc_real_3:[91m 0.25757 [0m(+0.06250)
     | > avg_loss_disc_real_4:[92m 0.23118 [0m(-0.03813)
     | > avg_loss_disc_real_5:[92m 0.28733 [0m(-0.01961)
     | > avg_loss_0:[92m 2.50969 [0m(-0.10371)
     | > avg_loss_gen:[92m 2.04728 [0m(-0.79089)
     | > avg_loss_kl:[91m 1.18670 [0m(+0.04417)
     | > avg_loss_feat:[91m 2.59468 [0m(+0.23133)
     | > avg_loss_mel:[92m 21.36023 [0m(-0.63012)
     | > avg_loss_duration:[92m 1.90194 [0m(-0.00763)
     | > avg_loss_1:[92m 29.09084 [0m(-1.15314)


[4m[1m > EPOCH: 41/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 06:48:31) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 20/80 -- GLOBAL_STEP: 3300[0m
     | > loss_disc: 2.37099  (2.40321)
     | > loss_disc_real_0: 0.04297  (0.08054)
     | > loss_disc_real_1: 0.18386  (0.21228)
     | > loss_disc_real_2: 0.19628  (0.22549)
     | > loss_disc_real_3: 0.24810  (0.22869)
     | > loss_disc_real_4: 0.23716  (0.24185)
     | > loss_disc_real_5: 0.27977  (0.24075)
     | > loss_0: 2.37099  (2.40321)
     | > grad_norm_0: 111.45210  (241.87337)
     | > loss_gen: 2.43333  (2.33509)
     | > loss_kl: 1.14454  (1.21485)
     | > loss_feat: 3.11102  (3.11791)
     | > loss_mel: 23.28271  (22.29662)
     | > loss_duration: 1.66648  (1.66552)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 31.63807  (30.62998)
     | > grad_norm_1: 1646.06140  (2541.07202)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.62130  (3.57906)
     | > loader_time: 0.01100  (0.00876)


[1m   --> STEP: 45/80 -- GLOBAL_STEP: 3325[0m
     | > loss_disc: 2.55642  (2.40109



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00100)
     | > avg_loss_disc:[92m 2.32606 [0m(-0.18363)
     | > avg_loss_disc_real_0:[91m 0.11325 [0m(+0.07386)
     | > avg_loss_disc_real_1:[91m 0.24789 [0m(+0.03691)
     | > avg_loss_disc_real_2:[92m 0.20631 [0m(-0.03573)
     | > avg_loss_disc_real_3:[91m 0.26997 [0m(+0.01241)
     | > avg_loss_disc_real_4:[91m 0.28217 [0m(+0.05100)
     | > avg_loss_disc_real_5:[92m 0.24069 [0m(-0.04665)
     | > avg_loss_0:[92m 2.32606 [0m(-0.18363)
     | > avg_loss_gen:[91m 2.69540 [0m(+0.64812)
     | > avg_loss_kl:[91m 1.33749 [0m(+0.15079)
     | > avg_loss_feat:[91m 2.91781 [0m(+0.32312)
     | > avg_loss_mel:[91m 22.09461 [0m(+0.73438)
     | > avg_loss_duration:[91m 1.93370 [0m(+0.03175)
     | > avg_loss_1:[91m 30.97900 [0m(+1.88816)


[4m[1m > EPOCH: 42/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 06:54:06) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 15/80 -- GLOBAL_STEP: 3375[0m
     | > loss_disc: 2.36587  (2.43660)
     | > loss_disc_real_0: 0.13178  (0.07350)
     | > loss_disc_real_1: 0.25243  (0.22449)
     | > loss_disc_real_2: 0.21694  (0.23269)
     | > loss_disc_real_3: 0.15404  (0.23415)
     | > loss_disc_real_4: 0.21851  (0.23683)
     | > loss_disc_real_5: 0.23297  (0.24512)
     | > loss_0: 2.36587  (2.43660)
     | > grad_norm_0: 488.99136  (196.71031)
     | > loss_gen: 2.60529  (2.37036)
     | > loss_kl: 1.33042  (1.36601)
     | > loss_feat: 3.90089  (3.21666)
     | > loss_mel: 23.96414  (22.83178)
     | > loss_duration: 1.70050  (1.68511)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 33.50124  (31.46991)
     | > grad_norm_1: 615.35431  (1550.28674)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.59830  (3.57219)
     | > loader_time: 0.01000  (0.00887)


[1m   --> STEP: 40/80 -- GLOBAL_STEP: 3400[0m
     | > loss_disc: 2.48176  (2.41394)



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[91m 2.46082 [0m(+0.13475)
     | > avg_loss_disc_real_0:[91m 0.23310 [0m(+0.11985)
     | > avg_loss_disc_real_1:[92m 0.21690 [0m(-0.03099)
     | > avg_loss_disc_real_2:[92m 0.20317 [0m(-0.00314)
     | > avg_loss_disc_real_3:[92m 0.19639 [0m(-0.07358)
     | > avg_loss_disc_real_4:[92m 0.17373 [0m(-0.10844)
     | > avg_loss_disc_real_5:[91m 0.25094 [0m(+0.01025)
     | > avg_loss_0:[91m 2.46082 [0m(+0.13475)
     | > avg_loss_gen:[92m 2.33779 [0m(-0.35761)
     | > avg_loss_kl:[92m 1.21357 [0m(-0.12392)
     | > avg_loss_feat:[92m 2.73702 [0m(-0.18079)
     | > avg_loss_mel:[92m 21.88770 [0m(-0.20691)
     | > avg_loss_duration:[92m 1.92825 [0m(-0.00544)
     | > avg_loss_1:[92m 30.10433 [0m(-0.87467)


[4m[1m > EPOCH: 43/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 06:59:40) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 10/80 -- GLOBAL_STEP: 3450[0m
     | > loss_disc: 2.38646  (2.44596)
     | > loss_disc_real_0: 0.11187  (0.09449)
     | > loss_disc_real_1: 0.21136  (0.21681)
     | > loss_disc_real_2: 0.23906  (0.23867)
     | > loss_disc_real_3: 0.21674  (0.23108)
     | > loss_disc_real_4: 0.26802  (0.24124)
     | > loss_disc_real_5: 0.25749  (0.24317)
     | > loss_0: 2.38646  (2.44596)
     | > grad_norm_0: 187.34799  (331.01575)
     | > loss_gen: 2.46321  (2.30438)
     | > loss_kl: 0.98714  (1.15672)
     | > loss_feat: 3.11322  (3.15830)
     | > loss_mel: 22.42708  (22.26156)
     | > loss_duration: 1.62522  (1.67667)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 30.61588  (30.55763)
     | > grad_norm_1: 2609.26538  (2462.69458)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.58230  (3.55800)
     | > loader_time: 0.01000  (0.00871)


[1m   --> STEP: 35/80 -- GLOBAL_STEP: 3475[0m
     | > loss_disc: 2.49257  (2.43625



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00000)
     | > avg_loss_disc:[91m 2.51125 [0m(+0.05043)
     | > avg_loss_disc_real_0:[92m 0.01889 [0m(-0.21420)
     | > avg_loss_disc_real_1:[91m 0.22743 [0m(+0.01053)
     | > avg_loss_disc_real_2:[92m 0.18065 [0m(-0.02252)
     | > avg_loss_disc_real_3:[91m 0.28796 [0m(+0.09157)
     | > avg_loss_disc_real_4:[91m 0.29133 [0m(+0.11760)
     | > avg_loss_disc_real_5:[91m 0.28009 [0m(+0.02915)
     | > avg_loss_0:[91m 2.51125 [0m(+0.05043)
     | > avg_loss_gen:[92m 2.12157 [0m(-0.21622)
     | > avg_loss_kl:[92m 0.97052 [0m(-0.24305)
     | > avg_loss_feat:[92m 2.63870 [0m(-0.09832)
     | > avg_loss_mel:[92m 20.77654 [0m(-1.11116)
     | > avg_loss_duration:[92m 1.90857 [0m(-0.01968)
     | > avg_loss_1:[92m 28.41590 [0m(-1.68843)


[4m[1m > EPOCH: 44/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 07:05:15) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 5/80 -- GLOBAL_STEP: 3525[0m
     | > loss_disc: 2.60515  (2.47504)
     | > loss_disc_real_0: 0.05411  (0.08867)
     | > loss_disc_real_1: 0.27837  (0.23728)
     | > loss_disc_real_2: 0.28822  (0.24204)
     | > loss_disc_real_3: 0.24945  (0.24037)
     | > loss_disc_real_4: 0.24341  (0.23861)
     | > loss_disc_real_5: 0.25031  (0.25174)
     | > loss_0: 2.60515  (2.47504)
     | > grad_norm_0: 434.76343  (543.43384)
     | > loss_gen: 2.40931  (2.44005)
     | > loss_kl: 1.22683  (1.19824)
     | > loss_feat: 2.93648  (3.26212)
     | > loss_mel: 23.02372  (23.03615)
     | > loss_duration: 1.65656  (1.67777)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 31.25289  (31.61434)
     | > grad_norm_1: 3278.40503  (2835.49976)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.56230  (3.53182)
     | > loader_time: 0.01000  (0.00841)


[1m   --> STEP: 30/80 -- GLOBAL_STEP: 3550[0m
     | > loss_disc: 2.44560  (2.45742)



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00000)
     | > avg_loss_disc:[92m 2.38987 [0m(-0.12138)
     | > avg_loss_disc_real_0:[91m 0.06165 [0m(+0.04276)
     | > avg_loss_disc_real_1:[91m 0.29081 [0m(+0.06338)
     | > avg_loss_disc_real_2:[91m 0.25415 [0m(+0.07350)
     | > avg_loss_disc_real_3:[92m 0.27272 [0m(-0.01523)
     | > avg_loss_disc_real_4:[92m 0.22432 [0m(-0.06701)
     | > avg_loss_disc_real_5:[92m 0.23449 [0m(-0.04560)
     | > avg_loss_0:[92m 2.38987 [0m(-0.12138)
     | > avg_loss_gen:[91m 2.48152 [0m(+0.35995)
     | > avg_loss_kl:[91m 1.03634 [0m(+0.06583)
     | > avg_loss_feat:[91m 3.06943 [0m(+0.43073)
     | > avg_loss_mel:[91m 20.99604 [0m(+0.21949)
     | > avg_loss_duration:[91m 1.91312 [0m(+0.00455)
     | > avg_loss_1:[91m 29.49645 [0m(+1.08055)


[4m[1m > EPOCH: 45/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 07:10:49) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 0/80 -- GLOBAL_STEP: 3600[0m
     | > loss_disc: 2.33523  (2.33523)
     | > loss_disc_real_0: 0.08289  (0.08289)
     | > loss_disc_real_1: 0.25107  (0.25107)
     | > loss_disc_real_2: 0.22843  (0.22843)
     | > loss_disc_real_3: 0.24447  (0.24447)
     | > loss_disc_real_4: 0.18492  (0.18492)
     | > loss_disc_real_5: 0.20254  (0.20254)
     | > loss_0: 2.33523  (2.33523)
     | > grad_norm_0: 409.88718  (409.88718)
     | > loss_gen: 2.52001  (2.52001)
     | > loss_kl: 1.18252  (1.18252)
     | > loss_feat: 3.48158  (3.48158)
     | > loss_mel: 22.06741  (22.06741)
     | > loss_duration: 1.68689  (1.68689)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 30.93840  (30.93840)
     | > grad_norm_1: 2988.62622  (2988.62622)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.54920  (3.54923)
     | > loader_time: 23.10670  (23.10666)


[1m   --> STEP: 25/80 -- GLOBAL_STEP: 3625[0m
     | > loss_disc: 2.31808  (2.3904



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[91m 2.54436 [0m(+0.15449)
     | > avg_loss_disc_real_0:[92m 0.03810 [0m(-0.02355)
     | > avg_loss_disc_real_1:[92m 0.16109 [0m(-0.12972)
     | > avg_loss_disc_real_2:[91m 0.34712 [0m(+0.09297)
     | > avg_loss_disc_real_3:[91m 0.30214 [0m(+0.02941)
     | > avg_loss_disc_real_4:[91m 0.22835 [0m(+0.00403)
     | > avg_loss_disc_real_5:[91m 0.23970 [0m(+0.00521)
     | > avg_loss_0:[91m 2.54436 [0m(+0.15449)
     | > avg_loss_gen:[92m 2.03099 [0m(-0.45053)
     | > avg_loss_kl:[91m 1.08954 [0m(+0.05319)
     | > avg_loss_feat:[92m 3.01634 [0m(-0.05309)
     | > avg_loss_mel:[91m 22.48553 [0m(+1.48949)
     | > avg_loss_duration:[92m 1.90278 [0m(-0.01034)
     | > avg_loss_1:[91m 30.52518 [0m(+1.02873)


[4m[1m > EPOCH: 46/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 07:16:24) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 20/80 -- GLOBAL_STEP: 3700[0m
     | > loss_disc: 2.51003  (2.48960)
     | > loss_disc_real_0: 0.05265  (0.14929)
     | > loss_disc_real_1: 0.22590  (0.22540)
     | > loss_disc_real_2: 0.24449  (0.22899)
     | > loss_disc_real_3: 0.25108  (0.23133)
     | > loss_disc_real_4: 0.20085  (0.24051)
     | > loss_disc_real_5: 0.26572  (0.24367)
     | > loss_0: 2.51003  (2.48960)
     | > grad_norm_0: 220.37347  (217.09854)
     | > loss_gen: 1.92572  (2.40359)
     | > loss_kl: 1.25840  (1.26653)
     | > loss_feat: 2.70918  (3.26157)
     | > loss_mel: 22.91559  (23.09098)
     | > loss_duration: 1.67597  (1.65956)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 30.48486  (31.68224)
     | > grad_norm_1: 1350.56689  (1411.53674)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.62230  (3.57783)
     | > loader_time: 0.00900  (0.00866)


[1m   --> STEP: 45/80 -- GLOBAL_STEP: 3725[0m
     | > loss_disc: 2.40927  (2.48446



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00000)
     | > avg_loss_disc:[91m 2.66387 [0m(+0.11951)
     | > avg_loss_disc_real_0:[91m 0.06561 [0m(+0.02751)
     | > avg_loss_disc_real_1:[92m 0.08179 [0m(-0.07930)
     | > avg_loss_disc_real_2:[92m 0.16872 [0m(-0.17839)
     | > avg_loss_disc_real_3:[92m 0.20978 [0m(-0.09236)
     | > avg_loss_disc_real_4:[92m 0.21355 [0m(-0.01481)
     | > avg_loss_disc_real_5:[92m 0.23515 [0m(-0.00455)
     | > avg_loss_0:[91m 2.66387 [0m(+0.11951)
     | > avg_loss_gen:[92m 1.47440 [0m(-0.55659)
     | > avg_loss_kl:[92m 1.07327 [0m(-0.01627)
     | > avg_loss_feat:[92m 2.31577 [0m(-0.70058)
     | > avg_loss_mel:[92m 21.31578 [0m(-1.16974)
     | > avg_loss_duration:[92m 1.88441 [0m(-0.01837)
     | > avg_loss_1:[92m 28.06362 [0m(-2.46155)


[4m[1m > EPOCH: 47/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 07:21:59) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 15/80 -- GLOBAL_STEP: 3775[0m
     | > loss_disc: 2.30857  (2.51059)
     | > loss_disc_real_0: 0.14409  (0.12710)
     | > loss_disc_real_1: 0.22904  (0.23477)
     | > loss_disc_real_2: 0.23882  (0.23456)
     | > loss_disc_real_3: 0.18731  (0.22502)
     | > loss_disc_real_4: 0.22154  (0.23421)
     | > loss_disc_real_5: 0.25524  (0.24337)
     | > loss_0: 2.30857  (2.51059)
     | > grad_norm_0: 274.27753  (269.87509)
     | > loss_gen: 2.73126  (2.26820)
     | > loss_kl: 1.00359  (1.19471)
     | > loss_feat: 3.49944  (2.89480)
     | > loss_mel: 21.92923  (22.18277)
     | > loss_duration: 1.65703  (1.66110)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 30.82054  (30.20158)
     | > grad_norm_1: 3129.71265  (2766.15503)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.60030  (3.56838)
     | > loader_time: 0.00900  (0.00867)


[1m   --> STEP: 40/80 -- GLOBAL_STEP: 3800[0m
     | > loss_disc: 2.51530  (2.51251



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00000)
     | > avg_loss_disc:[92m 2.43307 [0m(-0.23080)
     | > avg_loss_disc_real_0:[92m 0.04670 [0m(-0.01891)
     | > avg_loss_disc_real_1:[91m 0.23586 [0m(+0.15407)
     | > avg_loss_disc_real_2:[91m 0.21196 [0m(+0.04324)
     | > avg_loss_disc_real_3:[92m 0.18752 [0m(-0.02226)
     | > avg_loss_disc_real_4:[92m 0.19579 [0m(-0.01776)
     | > avg_loss_disc_real_5:[92m 0.21412 [0m(-0.02103)
     | > avg_loss_0:[92m 2.43307 [0m(-0.23080)
     | > avg_loss_gen:[91m 1.87774 [0m(+0.40335)
     | > avg_loss_kl:[91m 1.08566 [0m(+0.01240)
     | > avg_loss_feat:[91m 2.93343 [0m(+0.61766)
     | > avg_loss_mel:[91m 21.46587 [0m(+0.15009)
     | > avg_loss_duration:[91m 1.88975 [0m(+0.00534)
     | > avg_loss_1:[91m 29.25245 [0m(+1.18883)


[4m[1m > EPOCH: 48/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 07:27:34) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 10/80 -- GLOBAL_STEP: 3850[0m
     | > loss_disc: 2.70609  (2.55242)
     | > loss_disc_real_0: 0.16605  (0.14565)
     | > loss_disc_real_1: 0.22669  (0.21192)
     | > loss_disc_real_2: 0.32924  (0.23327)
     | > loss_disc_real_3: 0.26204  (0.23707)
     | > loss_disc_real_4: 0.25616  (0.25641)
     | > loss_disc_real_5: 0.22226  (0.24570)
     | > loss_0: 2.70609  (2.55242)
     | > grad_norm_0: 329.08133  (350.33163)
     | > loss_gen: 1.77455  (2.20539)
     | > loss_kl: 1.02669  (1.19058)
     | > loss_feat: 2.15585  (2.83694)
     | > loss_mel: 19.76677  (21.87911)
     | > loss_duration: 1.63559  (1.66077)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 26.35945  (29.77280)
     | > grad_norm_1: 2867.77466  (3003.79272)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.59030  (3.55514)
     | > loader_time: 0.00800  (0.00831)


[1m   --> STEP: 35/80 -- GLOBAL_STEP: 3875[0m
     | > loss_disc: 2.45774  (2.52510



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.01001 [0m(+0.00100)
     | > avg_loss_disc:[92m 2.38238 [0m(-0.05068)
     | > avg_loss_disc_real_0:[92m 0.04620 [0m(-0.00049)
     | > avg_loss_disc_real_1:[92m 0.16673 [0m(-0.06913)
     | > avg_loss_disc_real_2:[91m 0.23317 [0m(+0.02121)
     | > avg_loss_disc_real_3:[91m 0.24045 [0m(+0.05293)
     | > avg_loss_disc_real_4:[91m 0.19630 [0m(+0.00052)
     | > avg_loss_disc_real_5:[92m 0.21050 [0m(-0.00362)
     | > avg_loss_0:[92m 2.38238 [0m(-0.05068)
     | > avg_loss_gen:[91m 2.08114 [0m(+0.20340)
     | > avg_loss_kl:[91m 1.42114 [0m(+0.33548)
     | > avg_loss_feat:[91m 3.07057 [0m(+0.13714)
     | > avg_loss_mel:[92m 20.98225 [0m(-0.48362)
     | > avg_loss_duration:[91m 1.92167 [0m(+0.03192)
     | > avg_loss_1:[91m 29.47677 [0m(+0.22432)


[4m[1m > EPOCH: 49/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 07:33:08) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 5/80 -- GLOBAL_STEP: 3925[0m
     | > loss_disc: 2.48911  (2.42329)
     | > loss_disc_real_0: 0.17200  (0.08869)
     | > loss_disc_real_1: 0.26029  (0.25063)
     | > loss_disc_real_2: 0.27468  (0.25329)
     | > loss_disc_real_3: 0.25316  (0.23950)
     | > loss_disc_real_4: 0.23783  (0.24003)
     | > loss_disc_real_5: 0.27538  (0.24414)
     | > loss_0: 2.48911  (2.42329)
     | > grad_norm_0: 670.16858  (273.29736)
     | > loss_gen: 2.31768  (2.45122)
     | > loss_kl: 1.29617  (1.22596)
     | > loss_feat: 3.45053  (3.45431)
     | > loss_mel: 21.40338  (22.35527)
     | > loss_duration: 1.68919  (1.67132)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 30.15696  (31.15807)
     | > grad_norm_1: 2282.74683  (2290.69165)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.56020  (3.53101)
     | > loader_time: 0.00900  (0.00800)


[1m   --> STEP: 30/80 -- GLOBAL_STEP: 3950[0m
     | > loss_disc: 2.63804  (2.49896)



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00100)
     | > avg_loss_disc:[91m 2.50427 [0m(+0.12189)
     | > avg_loss_disc_real_0:[91m 0.13471 [0m(+0.08850)
     | > avg_loss_disc_real_1:[92m 0.15854 [0m(-0.00819)
     | > avg_loss_disc_real_2:[91m 0.27302 [0m(+0.03985)
     | > avg_loss_disc_real_3:[92m 0.20752 [0m(-0.03294)
     | > avg_loss_disc_real_4:[91m 0.24724 [0m(+0.05093)
     | > avg_loss_disc_real_5:[91m 0.25293 [0m(+0.04243)
     | > avg_loss_0:[91m 2.50427 [0m(+0.12189)
     | > avg_loss_gen:[91m 2.09459 [0m(+0.01346)
     | > avg_loss_kl:[92m 1.09255 [0m(-0.32860)
     | > avg_loss_feat:[92m 2.72701 [0m(-0.34355)
     | > avg_loss_mel:[91m 21.66758 [0m(+0.68533)
     | > avg_loss_duration:[92m 1.91682 [0m(-0.00485)
     | > avg_loss_1:[91m 29.49856 [0m(+0.02179)


[4m[1m > EPOCH: 50/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 07:38:43) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 0/80 -- GLOBAL_STEP: 4000[0m
     | > loss_disc: 2.45059  (2.45059)
     | > loss_disc_real_0: 0.12028  (0.12028)
     | > loss_disc_real_1: 0.15601  (0.15601)
     | > loss_disc_real_2: 0.25997  (0.25997)
     | > loss_disc_real_3: 0.23200  (0.23200)
     | > loss_disc_real_4: 0.26410  (0.26410)
     | > loss_disc_real_5: 0.26304  (0.26304)
     | > loss_0: 2.45059  (2.45059)
     | > grad_norm_0: 468.61627  (468.61627)
     | > loss_gen: 2.33305  (2.33305)
     | > loss_kl: 1.01605  (1.01605)
     | > loss_feat: 3.25283  (3.25283)
     | > loss_mel: 21.70832  (21.70832)
     | > loss_duration: 1.69601  (1.69601)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 30.00627  (30.00627)
     | > grad_norm_1: 2702.96069  (2702.96069)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.59030  (3.59027)
     | > loader_time: 23.11660  (23.11658)


[1m   --> STEP: 25/80 -- GLOBAL_STEP: 4025[0m
     | > loss_disc: 2.55121  (2.4384



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.01001 [0m(+0.00100)
     | > avg_loss_disc:[92m 2.42390 [0m(-0.08037)
     | > avg_loss_disc_real_0:[92m 0.12203 [0m(-0.01267)
     | > avg_loss_disc_real_1:[91m 0.17571 [0m(+0.01717)
     | > avg_loss_disc_real_2:[92m 0.23577 [0m(-0.03725)
     | > avg_loss_disc_real_3:[91m 0.21955 [0m(+0.01203)
     | > avg_loss_disc_real_4:[91m 0.26533 [0m(+0.01810)
     | > avg_loss_disc_real_5:[92m 0.19029 [0m(-0.06263)
     | > avg_loss_0:[92m 2.42390 [0m(-0.08037)
     | > avg_loss_gen:[91m 2.16092 [0m(+0.06633)
     | > avg_loss_kl:[91m 1.22100 [0m(+0.12846)
     | > avg_loss_feat:[91m 3.03228 [0m(+0.30526)
     | > avg_loss_mel:[91m 22.04301 [0m(+0.37543)
     | > avg_loss_duration:[92m 1.88270 [0m(-0.03412)
     | > avg_loss_1:[91m 30.33992 [0m(+0.84135)


[4m[1m > EPOCH: 51/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 07:44:18) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 20/80 -- GLOBAL_STEP: 4100[0m
     | > loss_disc: 2.45933  (2.45264)
     | > loss_disc_real_0: 0.13157  (0.09393)
     | > loss_disc_real_1: 0.26163  (0.21792)
     | > loss_disc_real_2: 0.20058  (0.22795)
     | > loss_disc_real_3: 0.28813  (0.23996)
     | > loss_disc_real_4: 0.28881  (0.24207)
     | > loss_disc_real_5: 0.23742  (0.24354)
     | > loss_0: 2.45933  (2.45264)
     | > grad_norm_0: 460.09213  (377.03857)
     | > loss_gen: 2.36096  (2.33609)
     | > loss_kl: 1.24699  (1.15039)
     | > loss_feat: 3.16432  (3.20139)
     | > loss_mel: 22.01972  (22.20677)
     | > loss_duration: 1.63434  (1.64996)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 30.42633  (30.54460)
     | > grad_norm_1: 2494.50244  (2812.35278)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.61430  (3.57956)
     | > loader_time: 0.00900  (0.00876)


[1m   --> STEP: 45/80 -- GLOBAL_STEP: 4125[0m
     | > loss_disc: 2.36956  (2.43683



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00100)
     | > avg_loss_disc:[92m 2.31685 [0m(-0.10705)
     | > avg_loss_disc_real_0:[92m 0.06072 [0m(-0.06131)
     | > avg_loss_disc_real_1:[91m 0.20389 [0m(+0.02818)
     | > avg_loss_disc_real_2:[91m 0.23905 [0m(+0.00328)
     | > avg_loss_disc_real_3:[91m 0.29749 [0m(+0.07794)
     | > avg_loss_disc_real_4:[92m 0.19088 [0m(-0.07445)
     | > avg_loss_disc_real_5:[91m 0.22872 [0m(+0.03843)
     | > avg_loss_0:[92m 2.31685 [0m(-0.10705)
     | > avg_loss_gen:[91m 2.43223 [0m(+0.27131)
     | > avg_loss_kl:[92m 0.93035 [0m(-0.29066)
     | > avg_loss_feat:[91m 3.70985 [0m(+0.67757)
     | > avg_loss_mel:[91m 23.23039 [0m(+1.18738)
     | > avg_loss_duration:[91m 1.89906 [0m(+0.01636)
     | > avg_loss_1:[91m 32.20188 [0m(+1.86197)


[4m[1m > EPOCH: 52/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 07:49:52) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 15/80 -- GLOBAL_STEP: 4175[0m
     | > loss_disc: 2.45455  (2.51608)
     | > loss_disc_real_0: 0.08678  (0.13650)
     | > loss_disc_real_1: 0.28410  (0.23154)
     | > loss_disc_real_2: 0.23601  (0.23646)
     | > loss_disc_real_3: 0.24628  (0.23554)
     | > loss_disc_real_4: 0.26238  (0.24133)
     | > loss_disc_real_5: 0.22651  (0.24317)
     | > loss_0: 2.45455  (2.51608)
     | > grad_norm_0: 215.38817  (326.32608)
     | > loss_gen: 2.18511  (2.24188)
     | > loss_kl: 1.43612  (1.22989)
     | > loss_feat: 2.78041  (2.91381)
     | > loss_mel: 22.81859  (22.37452)
     | > loss_duration: 1.68841  (1.66045)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 30.90864  (30.42055)
     | > grad_norm_1: 2624.38843  (2394.72119)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.60230  (3.56985)
     | > loader_time: 0.00900  (0.00847)


[1m   --> STEP: 40/80 -- GLOBAL_STEP: 4200[0m
     | > loss_disc: 2.58829  (2.51237



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[91m 2.38407 [0m(+0.06722)
     | > avg_loss_disc_real_0:[92m 0.03085 [0m(-0.02987)
     | > avg_loss_disc_real_1:[91m 0.27061 [0m(+0.06672)
     | > avg_loss_disc_real_2:[91m 0.26831 [0m(+0.02926)
     | > avg_loss_disc_real_3:[92m 0.27890 [0m(-0.01859)
     | > avg_loss_disc_real_4:[91m 0.23346 [0m(+0.04257)
     | > avg_loss_disc_real_5:[92m 0.18943 [0m(-0.03929)
     | > avg_loss_0:[91m 2.38407 [0m(+0.06722)
     | > avg_loss_gen:[92m 2.25109 [0m(-0.18114)
     | > avg_loss_kl:[91m 1.08508 [0m(+0.15473)
     | > avg_loss_feat:[92m 3.21325 [0m(-0.49660)
     | > avg_loss_mel:[91m 24.34504 [0m(+1.11465)
     | > avg_loss_duration:[91m 1.91666 [0m(+0.01760)
     | > avg_loss_1:[91m 32.81113 [0m(+0.60925)


[4m[1m > EPOCH: 53/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 07:55:27) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 10/80 -- GLOBAL_STEP: 4250[0m
     | > loss_disc: 2.84319  (2.49102)
     | > loss_disc_real_0: 0.46963  (0.13327)
     | > loss_disc_real_1: 0.25764  (0.22638)
     | > loss_disc_real_2: 0.24501  (0.24330)
     | > loss_disc_real_3: 0.30763  (0.23220)
     | > loss_disc_real_4: 0.30000  (0.24531)
     | > loss_disc_real_5: 0.31218  (0.25012)
     | > loss_0: 2.84319  (2.49102)
     | > grad_norm_0: 1015.82568  (468.43555)
     | > loss_gen: 2.52562  (2.44677)
     | > loss_kl: 1.19716  (1.24992)
     | > loss_feat: 2.71531  (3.26724)
     | > loss_mel: 22.27339  (22.76016)
     | > loss_duration: 1.66974  (1.67133)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 30.38121  (31.39541)
     | > grad_norm_1: 1841.10486  (1426.15613)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.59130  (3.56075)
     | > loader_time: 0.00800  (0.00810)


[1m   --> STEP: 35/80 -- GLOBAL_STEP: 4275[0m
     | > loss_disc: 2.58912  (2.5088



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00000)
     | > avg_loss_disc:[92m 2.37240 [0m(-0.01166)
     | > avg_loss_disc_real_0:[91m 0.13036 [0m(+0.09951)
     | > avg_loss_disc_real_1:[92m 0.19880 [0m(-0.07181)
     | > avg_loss_disc_real_2:[92m 0.14449 [0m(-0.12382)
     | > avg_loss_disc_real_3:[92m 0.23806 [0m(-0.04083)
     | > avg_loss_disc_real_4:[91m 0.24690 [0m(+0.01344)
     | > avg_loss_disc_real_5:[91m 0.29123 [0m(+0.10180)
     | > avg_loss_0:[92m 2.37240 [0m(-0.01166)
     | > avg_loss_gen:[91m 2.48103 [0m(+0.22994)
     | > avg_loss_kl:[92m 0.99770 [0m(-0.08739)
     | > avg_loss_feat:[92m 2.78949 [0m(-0.42377)
     | > avg_loss_mel:[92m 20.26794 [0m(-4.07710)
     | > avg_loss_duration:[92m 1.88841 [0m(-0.02825)
     | > avg_loss_1:[92m 28.42457 [0m(-4.38656)


[4m[1m > EPOCH: 54/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 08:01:02) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 5/80 -- GLOBAL_STEP: 4325[0m
     | > loss_disc: 2.50581  (2.45329)
     | > loss_disc_real_0: 0.13095  (0.11745)
     | > loss_disc_real_1: 0.15034  (0.20734)
     | > loss_disc_real_2: 0.25299  (0.24468)
     | > loss_disc_real_3: 0.25009  (0.23528)
     | > loss_disc_real_4: 0.24666  (0.23715)
     | > loss_disc_real_5: 0.25447  (0.23883)
     | > loss_0: 2.50581  (2.45329)
     | > grad_norm_0: 835.05170  (412.58292)
     | > loss_gen: 2.22807  (2.30986)
     | > loss_kl: 1.14049  (1.15566)
     | > loss_feat: 3.39057  (3.25352)
     | > loss_mel: 22.37683  (22.90374)
     | > loss_duration: 1.66571  (1.67004)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 30.80167  (31.29283)
     | > grad_norm_1: 3077.48804  (3064.49463)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.56720  (3.53813)
     | > loader_time: 0.00800  (0.00740)


[1m   --> STEP: 30/80 -- GLOBAL_STEP: 4350[0m
     | > loss_disc: 2.29991  (2.42642)



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00000)
     | > avg_loss_disc:[92m 2.31766 [0m(-0.05474)
     | > avg_loss_disc_real_0:[92m 0.12086 [0m(-0.00950)
     | > avg_loss_disc_real_1:[91m 0.23259 [0m(+0.03380)
     | > avg_loss_disc_real_2:[91m 0.20829 [0m(+0.06380)
     | > avg_loss_disc_real_3:[92m 0.19264 [0m(-0.04542)
     | > avg_loss_disc_real_4:[91m 0.24928 [0m(+0.00238)
     | > avg_loss_disc_real_5:[92m 0.24758 [0m(-0.04366)
     | > avg_loss_0:[92m 2.31766 [0m(-0.05474)
     | > avg_loss_gen:[92m 2.31694 [0m(-0.16409)
     | > avg_loss_kl:[91m 1.02639 [0m(+0.02869)
     | > avg_loss_feat:[91m 3.13204 [0m(+0.34255)
     | > avg_loss_mel:[91m 22.00600 [0m(+1.73806)
     | > avg_loss_duration:[91m 1.90444 [0m(+0.01602)
     | > avg_loss_1:[91m 30.38581 [0m(+1.96124)


[4m[1m > EPOCH: 55/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 08:06:37) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 0/80 -- GLOBAL_STEP: 4400[0m
     | > loss_disc: 2.36164  (2.36164)
     | > loss_disc_real_0: 0.10564  (0.10564)
     | > loss_disc_real_1: 0.21340  (0.21340)
     | > loss_disc_real_2: 0.15956  (0.15956)
     | > loss_disc_real_3: 0.18840  (0.18840)
     | > loss_disc_real_4: 0.26685  (0.26685)
     | > loss_disc_real_5: 0.21839  (0.21839)
     | > loss_0: 2.36164  (2.36164)
     | > grad_norm_0: 570.24640  (570.24640)
     | > loss_gen: 2.47983  (2.47983)
     | > loss_kl: 1.25530  (1.25530)
     | > loss_feat: 3.67584  (3.67584)
     | > loss_mel: 22.29555  (22.29555)
     | > loss_duration: 1.69939  (1.69939)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 31.40591  (31.40591)
     | > grad_norm_1: 3284.15015  (3284.15015)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.55830  (3.55825)
     | > loader_time: 23.10380  (23.10379)


[1m   --> STEP: 25/80 -- GLOBAL_STEP: 4425[0m
     | > loss_disc: 2.38447  (2.4108



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[91m 2.53345 [0m(+0.21578)
     | > avg_loss_disc_real_0:[92m 0.11537 [0m(-0.00549)
     | > avg_loss_disc_real_1:[92m 0.20004 [0m(-0.03255)
     | > avg_loss_disc_real_2:[91m 0.21500 [0m(+0.00672)
     | > avg_loss_disc_real_3:[91m 0.25727 [0m(+0.06463)
     | > avg_loss_disc_real_4:[92m 0.23903 [0m(-0.01025)
     | > avg_loss_disc_real_5:[91m 0.25984 [0m(+0.01226)
     | > avg_loss_0:[91m 2.53345 [0m(+0.21578)
     | > avg_loss_gen:[92m 2.01171 [0m(-0.30524)
     | > avg_loss_kl:[91m 1.26401 [0m(+0.23762)
     | > avg_loss_feat:[92m 2.47999 [0m(-0.65205)
     | > avg_loss_mel:[92m 21.37012 [0m(-0.63589)
     | > avg_loss_duration:[92m 1.89558 [0m(-0.00886)
     | > avg_loss_1:[92m 29.02140 [0m(-1.36441)


[4m[1m > EPOCH: 56/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 08:12:12) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 20/80 -- GLOBAL_STEP: 4500[0m
     | > loss_disc: 2.67609  (2.42573)
     | > loss_disc_real_0: 0.10001  (0.08739)
     | > loss_disc_real_1: 0.23123  (0.22262)
     | > loss_disc_real_2: 0.23526  (0.22929)
     | > loss_disc_real_3: 0.27337  (0.23072)
     | > loss_disc_real_4: 0.22991  (0.23830)
     | > loss_disc_real_5: 0.20814  (0.24257)
     | > loss_0: 2.67609  (2.42573)
     | > grad_norm_0: 402.18387  (396.48032)
     | > loss_gen: 1.99625  (2.37035)
     | > loss_kl: 1.06524  (1.19573)
     | > loss_feat: 2.50087  (3.36152)
     | > loss_mel: 21.56965  (22.00625)
     | > loss_duration: 1.66849  (1.64794)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 28.80051  (30.58179)
     | > grad_norm_1: 438.93750  (2679.56812)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.61130  (3.57824)
     | > loader_time: 0.01000  (0.00886)


[1m   --> STEP: 45/80 -- GLOBAL_STEP: 4525[0m
     | > loss_disc: 3.10409  (2.50400)



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[91m 2.67446 [0m(+0.14101)
     | > avg_loss_disc_real_0:[91m 0.44469 [0m(+0.32932)
     | > avg_loss_disc_real_1:[91m 0.20645 [0m(+0.00640)
     | > avg_loss_disc_real_2:[91m 0.25738 [0m(+0.04238)
     | > avg_loss_disc_real_3:[91m 0.28710 [0m(+0.02983)
     | > avg_loss_disc_real_4:[92m 0.19382 [0m(-0.04522)
     | > avg_loss_disc_real_5:[92m 0.19674 [0m(-0.06309)
     | > avg_loss_0:[91m 2.67446 [0m(+0.14101)
     | > avg_loss_gen:[91m 2.35627 [0m(+0.34457)
     | > avg_loss_kl:[91m 1.40667 [0m(+0.14266)
     | > avg_loss_feat:[92m 2.42117 [0m(-0.05882)
     | > avg_loss_mel:[91m 22.26902 [0m(+0.89891)
     | > avg_loss_duration:[91m 1.90152 [0m(+0.00594)
     | > avg_loss_1:[91m 30.35465 [0m(+1.33326)


[4m[1m > EPOCH: 57/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 08:17:46) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 15/80 -- GLOBAL_STEP: 4575[0m
     | > loss_disc: 2.39990  (2.55720)
     | > loss_disc_real_0: 0.08653  (0.15028)
     | > loss_disc_real_1: 0.18022  (0.22615)
     | > loss_disc_real_2: 0.15949  (0.22517)
     | > loss_disc_real_3: 0.18585  (0.22584)
     | > loss_disc_real_4: 0.21106  (0.24143)
     | > loss_disc_real_5: 0.26574  (0.24944)
     | > loss_0: 2.39990  (2.55720)
     | > grad_norm_0: 27.57095  (275.56882)
     | > loss_gen: 2.23904  (2.19325)
     | > loss_kl: 1.26946  (1.24128)
     | > loss_feat: 3.25738  (2.90134)
     | > loss_mel: 22.92831  (22.60694)
     | > loss_duration: 1.68500  (1.64968)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 31.37920  (30.59249)
     | > grad_norm_1: 1800.19214  (2211.10010)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.60930  (3.56695)
     | > loader_time: 0.01000  (0.00874)


[1m   --> STEP: 40/80 -- GLOBAL_STEP: 4600[0m
     | > loss_disc: 2.54922  (2.46867)



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.01001 [0m(+0.00100)
     | > avg_loss_disc:[92m 2.27960 [0m(-0.39486)
     | > avg_loss_disc_real_0:[92m 0.06302 [0m(-0.38167)
     | > avg_loss_disc_real_1:[91m 0.23331 [0m(+0.02686)
     | > avg_loss_disc_real_2:[92m 0.22386 [0m(-0.03352)
     | > avg_loss_disc_real_3:[92m 0.23850 [0m(-0.04860)
     | > avg_loss_disc_real_4:[91m 0.24077 [0m(+0.04695)
     | > avg_loss_disc_real_5:[91m 0.21123 [0m(+0.01448)
     | > avg_loss_0:[92m 2.27960 [0m(-0.39486)
     | > avg_loss_gen:[91m 2.45377 [0m(+0.09750)
     | > avg_loss_kl:[91m 1.51783 [0m(+0.11116)
     | > avg_loss_feat:[91m 3.63224 [0m(+1.21107)
     | > avg_loss_mel:[91m 22.34510 [0m(+0.07608)
     | > avg_loss_duration:[92m 1.89042 [0m(-0.01110)
     | > avg_loss_1:[91m 31.83938 [0m(+1.48472)


[4m[1m > EPOCH: 58/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 08:23:21) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 10/80 -- GLOBAL_STEP: 4650[0m
     | > loss_disc: 2.37654  (2.40995)
     | > loss_disc_real_0: 0.05138  (0.08588)
     | > loss_disc_real_1: 0.22450  (0.22001)
     | > loss_disc_real_2: 0.24690  (0.22901)
     | > loss_disc_real_3: 0.26651  (0.23219)
     | > loss_disc_real_4: 0.29007  (0.24774)
     | > loss_disc_real_5: 0.22881  (0.24901)
     | > loss_0: 2.37654  (2.40995)
     | > grad_norm_0: 148.24498  (304.49100)
     | > loss_gen: 2.21046  (2.38747)
     | > loss_kl: 1.17786  (1.12765)
     | > loss_feat: 3.20661  (3.38728)
     | > loss_mel: 20.71030  (22.04588)
     | > loss_duration: 1.61848  (1.66016)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 28.92370  (30.60844)
     | > grad_norm_1: 2798.57056  (2826.59058)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.59430  (3.55824)
     | > loader_time: 0.00800  (0.00831)


[1m   --> STEP: 35/80 -- GLOBAL_STEP: 4675[0m
     | > loss_disc: 2.47757  (2.41850



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00100)
     | > avg_loss_disc:[91m 2.42040 [0m(+0.14079)
     | > avg_loss_disc_real_0:[92m 0.04952 [0m(-0.01350)
     | > avg_loss_disc_real_1:[92m 0.21053 [0m(-0.02278)
     | > avg_loss_disc_real_2:[91m 0.26040 [0m(+0.03654)
     | > avg_loss_disc_real_3:[91m 0.26060 [0m(+0.02210)
     | > avg_loss_disc_real_4:[91m 0.25788 [0m(+0.01711)
     | > avg_loss_disc_real_5:[91m 0.23302 [0m(+0.02180)
     | > avg_loss_0:[91m 2.42040 [0m(+0.14079)
     | > avg_loss_gen:[92m 2.20118 [0m(-0.25259)
     | > avg_loss_kl:[92m 1.06925 [0m(-0.44859)
     | > avg_loss_feat:[92m 2.90443 [0m(-0.72781)
     | > avg_loss_mel:[92m 20.57463 [0m(-1.77047)
     | > avg_loss_duration:[92m 1.87596 [0m(-0.01446)
     | > avg_loss_1:[92m 28.62545 [0m(-3.21393)


[4m[1m > EPOCH: 59/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 08:28:56) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 5/80 -- GLOBAL_STEP: 4725[0m
     | > loss_disc: 2.52371  (2.40324)
     | > loss_disc_real_0: 0.23572  (0.10263)
     | > loss_disc_real_1: 0.28275  (0.22972)
     | > loss_disc_real_2: 0.23958  (0.22022)
     | > loss_disc_real_3: 0.19532  (0.21443)
     | > loss_disc_real_4: 0.23114  (0.22776)
     | > loss_disc_real_5: 0.24468  (0.23598)
     | > loss_0: 2.52371  (2.40324)
     | > grad_norm_0: 781.28351  (393.45041)
     | > loss_gen: 2.64972  (2.50454)
     | > loss_kl: 1.49031  (1.12784)
     | > loss_feat: 3.41757  (3.49659)
     | > loss_mel: 22.55595  (22.61033)
     | > loss_duration: 1.63326  (1.64395)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 31.74681  (31.38325)
     | > grad_norm_1: 2352.10620  (3134.51685)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.56030  (3.53446)
     | > loader_time: 0.00900  (0.00800)


[1m   --> STEP: 30/80 -- GLOBAL_STEP: 4750[0m
     | > loss_disc: 2.47427  (2.44344)



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[92m 2.34796 [0m(-0.07244)
     | > avg_loss_disc_real_0:[92m 0.04893 [0m(-0.00059)
     | > avg_loss_disc_real_1:[91m 0.33678 [0m(+0.12625)
     | > avg_loss_disc_real_2:[92m 0.23117 [0m(-0.02922)
     | > avg_loss_disc_real_3:[92m 0.19894 [0m(-0.06166)
     | > avg_loss_disc_real_4:[92m 0.25074 [0m(-0.00714)
     | > avg_loss_disc_real_5:[92m 0.22853 [0m(-0.00449)
     | > avg_loss_0:[92m 2.34796 [0m(-0.07244)
     | > avg_loss_gen:[91m 2.51604 [0m(+0.31486)
     | > avg_loss_kl:[91m 1.37851 [0m(+0.30926)
     | > avg_loss_feat:[91m 3.76702 [0m(+0.86259)
     | > avg_loss_mel:[91m 24.38262 [0m(+3.80798)
     | > avg_loss_duration:[91m 1.90343 [0m(+0.02747)
     | > avg_loss_1:[91m 33.94762 [0m(+5.32217)


[4m[1m > EPOCH: 60/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 08:34:31) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 0/80 -- GLOBAL_STEP: 4800[0m
     | > loss_disc: 2.49467  (2.49467)
     | > loss_disc_real_0: 0.05176  (0.05176)
     | > loss_disc_real_1: 0.29083  (0.29083)
     | > loss_disc_real_2: 0.24171  (0.24171)
     | > loss_disc_real_3: 0.22911  (0.22911)
     | > loss_disc_real_4: 0.27435  (0.27435)
     | > loss_disc_real_5: 0.26638  (0.26638)
     | > loss_0: 2.49467  (2.49467)
     | > grad_norm_0: 336.09592  (336.09592)
     | > loss_gen: 2.33709  (2.33709)
     | > loss_kl: 1.10549  (1.10549)
     | > loss_feat: 3.36186  (3.36186)
     | > loss_mel: 22.27252  (22.27252)
     | > loss_duration: 1.73743  (1.73743)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 30.81440  (30.81440)
     | > grad_norm_1: 1652.38306  (1652.38306)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.55220  (3.55223)
     | > loader_time: 23.15810  (23.15813)


[1m   --> STEP: 25/80 -- GLOBAL_STEP: 4825[0m
     | > loss_disc: 2.30602  (2.4026



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time: 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[91m 2.36111 [0m(+0.01315)
     | > avg_loss_disc_real_0:[92m 0.01408 [0m(-0.03485)
     | > avg_loss_disc_real_1:[92m 0.25818 [0m(-0.07860)
     | > avg_loss_disc_real_2:[91m 0.24816 [0m(+0.01699)
     | > avg_loss_disc_real_3:[91m 0.24127 [0m(+0.04232)
     | > avg_loss_disc_real_4:[91m 0.26478 [0m(+0.01403)
     | > avg_loss_disc_real_5:[91m 0.23832 [0m(+0.00979)
     | > avg_loss_0:[91m 2.36111 [0m(+0.01315)
     | > avg_loss_gen:[92m 2.42256 [0m(-0.09348)
     | > avg_loss_kl:[92m 1.18440 [0m(-0.19411)
     | > avg_loss_feat:[92m 3.49052 [0m(-0.27650)
     | > avg_loss_mel:[92m 20.74347 [0m(-3.63914)
     | > avg_loss_duration:[92m 1.87364 [0m(-0.02978)
     | > avg_loss_1:[92m 29.71460 [0m(-4.23302)


[4m[1m > EPOCH: 61/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 08:40:06) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 20/80 -- GLOBAL_STEP: 4900[0m
     | > loss_disc: 2.52210  (2.46824)
     | > loss_disc_real_0: 0.07542  (0.11564)
     | > loss_disc_real_1: 0.28110  (0.21786)
     | > loss_disc_real_2: 0.26821  (0.23443)
     | > loss_disc_real_3: 0.21048  (0.22797)
     | > loss_disc_real_4: 0.24302  (0.23641)
     | > loss_disc_real_5: 0.26296  (0.24015)
     | > loss_0: 2.52210  (2.46824)
     | > grad_norm_0: 637.05890  (657.87067)
     | > loss_gen: 2.25017  (2.38115)
     | > loss_kl: 1.06856  (1.16817)
     | > loss_feat: 3.17028  (3.29776)
     | > loss_mel: 22.42452  (22.04692)
     | > loss_duration: 1.68600  (1.64307)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 30.59953  (30.53707)
     | > grad_norm_1: 3667.99023  (3194.32349)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.62030  (3.58256)
     | > loader_time: 0.01200  (0.00931)


[1m   --> STEP: 45/80 -- GLOBAL_STEP: 4925[0m
     | > loss_disc: 2.55387  (2.44536



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00801 [0m(-0.00100)
     | > avg_loss_disc:[92m 2.24373 [0m(-0.11738)
     | > avg_loss_disc_real_0:[91m 0.02759 [0m(+0.01351)
     | > avg_loss_disc_real_1:[92m 0.19506 [0m(-0.06312)
     | > avg_loss_disc_real_2:[92m 0.23742 [0m(-0.01074)
     | > avg_loss_disc_real_3:[92m 0.13369 [0m(-0.10758)
     | > avg_loss_disc_real_4:[92m 0.23283 [0m(-0.03195)
     | > avg_loss_disc_real_5:[92m 0.22491 [0m(-0.01340)
     | > avg_loss_0:[92m 2.24373 [0m(-0.11738)
     | > avg_loss_gen:[92m 2.22277 [0m(-0.19979)
     | > avg_loss_kl:[92m 1.13477 [0m(-0.04963)
     | > avg_loss_feat:[92m 3.43519 [0m(-0.05533)
     | > avg_loss_mel:[91m 21.82456 [0m(+1.08108)
     | > avg_loss_duration:[91m 1.93597 [0m(+0.06233)
     | > avg_loss_1:[91m 30.55326 [0m(+0.83866)


[4m[1m > EPOCH: 62/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 08:45:40) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 15/80 -- GLOBAL_STEP: 4975[0m
     | > loss_disc: 2.34075  (2.36975)
     | > loss_disc_real_0: 0.11191  (0.05254)
     | > loss_disc_real_1: 0.18903  (0.23227)
     | > loss_disc_real_2: 0.23560  (0.23362)
     | > loss_disc_real_3: 0.18710  (0.22962)
     | > loss_disc_real_4: 0.18554  (0.24421)
     | > loss_disc_real_5: 0.22880  (0.24122)
     | > loss_0: 2.34075  (2.36975)
     | > grad_norm_0: 224.53087  (135.96480)
     | > loss_gen: 2.81735  (2.48782)
     | > loss_kl: 1.17906  (1.26650)
     | > loss_feat: 4.02657  (3.65751)
     | > loss_mel: 23.76001  (22.63483)
     | > loss_duration: 1.63754  (1.65344)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 33.42052  (31.70010)
     | > grad_norm_1: 615.74872  (1503.28931)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.58730  (3.57181)
     | > loader_time: 0.00900  (0.00841)


[1m   --> STEP: 40/80 -- GLOBAL_STEP: 5000[0m
     | > loss_disc: 2.27848  (2.37823)



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00100)
     | > avg_loss_disc:[91m 2.53060 [0m(+0.28688)
     | > avg_loss_disc_real_0:[91m 0.21171 [0m(+0.18413)
     | > avg_loss_disc_real_1:[92m 0.16138 [0m(-0.03368)
     | > avg_loss_disc_real_2:[92m 0.17853 [0m(-0.05889)
     | > avg_loss_disc_real_3:[91m 0.27514 [0m(+0.14145)
     | > avg_loss_disc_real_4:[92m 0.22625 [0m(-0.00658)
     | > avg_loss_disc_real_5:[92m 0.22236 [0m(-0.00256)
     | > avg_loss_0:[91m 2.53060 [0m(+0.28688)
     | > avg_loss_gen:[92m 2.16924 [0m(-0.05353)
     | > avg_loss_kl:[91m 1.39162 [0m(+0.25685)
     | > avg_loss_feat:[92m 2.59307 [0m(-0.84212)
     | > avg_loss_mel:[92m 20.78308 [0m(-1.04148)
     | > avg_loss_duration:[92m 1.88703 [0m(-0.04894)
     | > avg_loss_1:[92m 28.82404 [0m(-1.72921)


[4m[1m > EPOCH: 63/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 08:51:15) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 10/80 -- GLOBAL_STEP: 5050[0m
     | > loss_disc: 2.46112  (2.50532)
     | > loss_disc_real_0: 0.16820  (0.13281)
     | > loss_disc_real_1: 0.20198  (0.21553)
     | > loss_disc_real_2: 0.28341  (0.23299)
     | > loss_disc_real_3: 0.17851  (0.22285)
     | > loss_disc_real_4: 0.27276  (0.24546)
     | > loss_disc_real_5: 0.29042  (0.24772)
     | > loss_0: 2.46112  (2.50532)
     | > grad_norm_0: 83.22794  (215.11534)
     | > loss_gen: 2.38523  (2.27566)
     | > loss_kl: 1.21820  (1.24220)
     | > loss_feat: 3.15276  (3.08458)
     | > loss_mel: 21.96815  (21.84355)
     | > loss_duration: 1.58994  (1.64828)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 30.31428  (30.09426)
     | > grad_norm_1: 4300.89502  (2282.67822)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.58330  (3.55624)
     | > loader_time: 0.01000  (0.00820)


[1m   --> STEP: 35/80 -- GLOBAL_STEP: 5075[0m
     | > loss_disc: 2.23407  (2.44978)



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00801 [0m(-0.00100)
     | > avg_loss_disc:[92m 2.29482 [0m(-0.23578)
     | > avg_loss_disc_real_0:[92m 0.05039 [0m(-0.16132)
     | > avg_loss_disc_real_1:[91m 0.23970 [0m(+0.07832)
     | > avg_loss_disc_real_2:[91m 0.26217 [0m(+0.08364)
     | > avg_loss_disc_real_3:[92m 0.25361 [0m(-0.02153)
     | > avg_loss_disc_real_4:[92m 0.20491 [0m(-0.02134)
     | > avg_loss_disc_real_5:[91m 0.24487 [0m(+0.02252)
     | > avg_loss_0:[92m 2.29482 [0m(-0.23578)
     | > avg_loss_gen:[91m 2.42424 [0m(+0.25500)
     | > avg_loss_kl:[92m 1.16929 [0m(-0.22234)
     | > avg_loss_feat:[91m 3.26476 [0m(+0.67170)
     | > avg_loss_mel:[92m 20.41960 [0m(-0.36348)
     | > avg_loss_duration:[91m 1.90024 [0m(+0.01321)
     | > avg_loss_1:[91m 29.17813 [0m(+0.35409)


[4m[1m > EPOCH: 64/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 08:56:50) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 5/80 -- GLOBAL_STEP: 5125[0m
     | > loss_disc: 2.65455  (2.86014)
     | > loss_disc_real_0: 0.28865  (0.24036)
     | > loss_disc_real_1: 0.12445  (0.18843)
     | > loss_disc_real_2: 0.19278  (0.21273)
     | > loss_disc_real_3: 0.19795  (0.22144)
     | > loss_disc_real_4: 0.21193  (0.24475)
     | > loss_disc_real_5: 0.24478  (0.24337)
     | > loss_0: 2.65455  (2.86014)
     | > grad_norm_0: 462.15643  (577.28119)
     | > loss_gen: 2.15578  (2.03241)
     | > loss_kl: 1.19059  (1.24675)
     | > loss_feat: 2.79081  (2.59361)
     | > loss_mel: 21.95756  (22.12987)
     | > loss_duration: 1.69738  (1.64418)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 29.79212  (29.64682)
     | > grad_norm_1: 2184.93970  (1664.02795)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.56720  (3.53922)
     | > loader_time: 0.00800  (0.00821)


[1m   --> STEP: 30/80 -- GLOBAL_STEP: 5150[0m
     | > loss_disc: 2.75753  (2.60622)



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00100)
     | > avg_loss_disc:[91m 2.52364 [0m(+0.22881)
     | > avg_loss_disc_real_0:[91m 0.07441 [0m(+0.02402)
     | > avg_loss_disc_real_1:[92m 0.19394 [0m(-0.04576)
     | > avg_loss_disc_real_2:[92m 0.21499 [0m(-0.04719)
     | > avg_loss_disc_real_3:[92m 0.19645 [0m(-0.05716)
     | > avg_loss_disc_real_4:[92m 0.19320 [0m(-0.01171)
     | > avg_loss_disc_real_5:[91m 0.26034 [0m(+0.01547)
     | > avg_loss_0:[91m 2.52364 [0m(+0.22881)
     | > avg_loss_gen:[92m 1.79587 [0m(-0.62837)
     | > avg_loss_kl:[92m 1.11775 [0m(-0.05154)
     | > avg_loss_feat:[92m 2.27902 [0m(-0.98575)
     | > avg_loss_mel:[92m 19.95484 [0m(-0.46476)
     | > avg_loss_duration:[91m 1.90388 [0m(+0.00364)
     | > avg_loss_1:[92m 27.05137 [0m(-2.12676)

 > BEST MODEL : ./output\vits_vctk-September-23-2022_02+46AM-3c624ce\best_model_5200.pth

[4m[1m > EPOCH: 65/1000[0m
 --> ./output\vits_vctk-Sep



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 09:02:29) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 0/80 -- GLOBAL_STEP: 5200[0m
     | > loss_disc: 2.49661  (2.49661)
     | > loss_disc_real_0: 0.09346  (0.09346)
     | > loss_disc_real_1: 0.19405  (0.19405)
     | > loss_disc_real_2: 0.21351  (0.21351)
     | > loss_disc_real_3: 0.21105  (0.21105)
     | > loss_disc_real_4: 0.22537  (0.22537)
     | > loss_disc_real_5: 0.24954  (0.24954)
     | > loss_0: 2.49661  (2.49661)
     | > grad_norm_0: 139.47813  (139.47813)
     | > loss_gen: 2.29099  (2.29099)
     | > loss_kl: 1.03188  (1.03188)
     | > loss_feat: 2.61739  (2.61739)
     | > loss_mel: 22.03330  (22.03330)
     | > loss_duration: 1.63109  (1.63109)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 29.60465  (29.60465)
     | > grad_norm_1: 2584.36108  (2584.36108)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.54520  (3.54523)
     | > loader_time: 23.26100  (23.26101)


[1m   --> STEP: 25/80 -- GLOBAL_STEP: 5225[0m
     | > loss_disc: 2.49151  (2.4324



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[91m 2.53982 [0m(+0.01618)
     | > avg_loss_disc_real_0:[91m 0.10224 [0m(+0.02783)
     | > avg_loss_disc_real_1:[91m 0.21900 [0m(+0.02505)
     | > avg_loss_disc_real_2:[91m 0.21721 [0m(+0.00222)
     | > avg_loss_disc_real_3:[91m 0.20113 [0m(+0.00468)
     | > avg_loss_disc_real_4:[91m 0.19743 [0m(+0.00423)
     | > avg_loss_disc_real_5:[92m 0.23023 [0m(-0.03011)
     | > avg_loss_0:[91m 2.53982 [0m(+0.01618)
     | > avg_loss_gen:[91m 1.97027 [0m(+0.17440)
     | > avg_loss_kl:[92m 1.04539 [0m(-0.07236)
     | > avg_loss_feat:[91m 2.49926 [0m(+0.22024)
     | > avg_loss_mel:[92m 19.90577 [0m(-0.04908)
     | > avg_loss_duration:[92m 1.86713 [0m(-0.03675)
     | > avg_loss_1:[91m 27.28782 [0m(+0.23645)


[4m[1m > EPOCH: 66/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 09:08:04) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 20/80 -- GLOBAL_STEP: 5300[0m
     | > loss_disc: 2.34196  (2.40669)
     | > loss_disc_real_0: 0.06109  (0.10186)
     | > loss_disc_real_1: 0.25146  (0.21659)
     | > loss_disc_real_2: 0.20658  (0.22223)
     | > loss_disc_real_3: 0.20319  (0.23030)
     | > loss_disc_real_4: 0.19778  (0.23992)
     | > loss_disc_real_5: 0.20614  (0.23882)
     | > loss_0: 2.34196  (2.40669)
     | > grad_norm_0: 94.10439  (401.00595)
     | > loss_gen: 2.46929  (2.34985)
     | > loss_kl: 1.31601  (1.18938)
     | > loss_feat: 3.92657  (3.41091)
     | > loss_mel: 20.90821  (21.55198)
     | > loss_duration: 1.67093  (1.63121)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 30.29101  (30.13332)
     | > grad_norm_1: 2275.27051  (2969.47437)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.63930  (3.58571)
     | > loader_time: 0.01000  (0.00891)


[1m   --> STEP: 45/80 -- GLOBAL_STEP: 5325[0m
     | > loss_disc: 2.42090  (2.38927)



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[91m 2.54686 [0m(+0.00704)
     | > avg_loss_disc_real_0:[91m 0.18789 [0m(+0.08565)
     | > avg_loss_disc_real_1:[91m 0.22908 [0m(+0.01008)
     | > avg_loss_disc_real_2:[91m 0.24928 [0m(+0.03207)
     | > avg_loss_disc_real_3:[91m 0.27972 [0m(+0.07859)
     | > avg_loss_disc_real_4:[91m 0.26197 [0m(+0.06454)
     | > avg_loss_disc_real_5:[91m 0.26342 [0m(+0.03319)
     | > avg_loss_0:[91m 2.54686 [0m(+0.00704)
     | > avg_loss_gen:[91m 2.29406 [0m(+0.32379)
     | > avg_loss_kl:[92m 1.02388 [0m(-0.02151)
     | > avg_loss_feat:[91m 2.57198 [0m(+0.07273)
     | > avg_loss_mel:[92m 19.90384 [0m(-0.00192)
     | > avg_loss_duration:[91m 1.88792 [0m(+0.02079)
     | > avg_loss_1:[91m 27.68169 [0m(+0.39388)


[4m[1m > EPOCH: 67/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 09:13:39) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 15/80 -- GLOBAL_STEP: 5375[0m
     | > loss_disc: 2.28469  (2.44836)
     | > loss_disc_real_0: 0.08764  (0.08137)
     | > loss_disc_real_1: 0.16948  (0.23889)
     | > loss_disc_real_2: 0.21197  (0.22777)
     | > loss_disc_real_3: 0.23745  (0.23589)
     | > loss_disc_real_4: 0.19826  (0.24220)
     | > loss_disc_real_5: 0.22906  (0.24789)
     | > loss_0: 2.28469  (2.44836)
     | > grad_norm_0: 71.74471  (282.23563)
     | > loss_gen: 2.45448  (2.45457)
     | > loss_kl: 1.48090  (1.37332)
     | > loss_feat: 3.79401  (3.57972)
     | > loss_mel: 21.66088  (22.31684)
     | > loss_duration: 1.67848  (1.63795)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 31.06875  (31.36240)
     | > grad_norm_1: 1762.99133  (1464.48657)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.59430  (3.56678)
     | > loader_time: 0.00900  (0.00847)


[1m   --> STEP: 40/80 -- GLOBAL_STEP: 5400[0m
     | > loss_disc: 2.38935  (2.40182)



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00000)
     | > avg_loss_disc:[92m 2.42790 [0m(-0.11896)
     | > avg_loss_disc_real_0:[92m 0.06117 [0m(-0.12671)
     | > avg_loss_disc_real_1:[91m 0.29783 [0m(+0.06876)
     | > avg_loss_disc_real_2:[91m 0.26617 [0m(+0.01689)
     | > avg_loss_disc_real_3:[91m 0.28365 [0m(+0.00393)
     | > avg_loss_disc_real_4:[92m 0.25660 [0m(-0.00537)
     | > avg_loss_disc_real_5:[92m 0.20514 [0m(-0.05828)
     | > avg_loss_0:[92m 2.42790 [0m(-0.11896)
     | > avg_loss_gen:[91m 2.39989 [0m(+0.10583)
     | > avg_loss_kl:[92m 0.99369 [0m(-0.03019)
     | > avg_loss_feat:[91m 3.20370 [0m(+0.63172)
     | > avg_loss_mel:[91m 20.10930 [0m(+0.20546)
     | > avg_loss_duration:[92m 1.87085 [0m(-0.01707)
     | > avg_loss_1:[91m 28.57743 [0m(+0.89574)


[4m[1m > EPOCH: 68/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 09:19:13) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 10/80 -- GLOBAL_STEP: 5450[0m
     | > loss_disc: 2.37357  (2.45273)
     | > loss_disc_real_0: 0.04402  (0.08689)
     | > loss_disc_real_1: 0.19948  (0.21927)
     | > loss_disc_real_2: 0.29977  (0.22998)
     | > loss_disc_real_3: 0.20530  (0.22613)
     | > loss_disc_real_4: 0.22140  (0.24185)
     | > loss_disc_real_5: 0.27555  (0.25222)
     | > loss_0: 2.37357  (2.45273)
     | > grad_norm_0: 138.45451  (542.41516)
     | > loss_gen: 2.45726  (2.33926)
     | > loss_kl: 1.17357  (1.22525)
     | > loss_feat: 3.34688  (3.40448)
     | > loss_mel: 21.29163  (21.67567)
     | > loss_duration: 1.66404  (1.63719)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 29.93338  (30.28185)
     | > grad_norm_1: 895.81805  (2578.79907)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.59330  (3.55794)
     | > loader_time: 0.00700  (0.00771)


[1m   --> STEP: 35/80 -- GLOBAL_STEP: 5475[0m
     | > loss_disc: 2.22839  (2.43720)



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time: 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[92m 2.28893 [0m(-0.13897)
     | > avg_loss_disc_real_0:[91m 0.07050 [0m(+0.00933)
     | > avg_loss_disc_real_1:[92m 0.21323 [0m(-0.08460)
     | > avg_loss_disc_real_2:[91m 0.27223 [0m(+0.00606)
     | > avg_loss_disc_real_3:[92m 0.25821 [0m(-0.02544)
     | > avg_loss_disc_real_4:[92m 0.20647 [0m(-0.05012)
     | > avg_loss_disc_real_5:[91m 0.22796 [0m(+0.02282)
     | > avg_loss_0:[92m 2.28893 [0m(-0.13897)
     | > avg_loss_gen:[92m 2.38059 [0m(-0.01930)
     | > avg_loss_kl:[91m 1.17463 [0m(+0.18094)
     | > avg_loss_feat:[91m 3.34852 [0m(+0.14482)
     | > avg_loss_mel:[91m 21.15932 [0m(+1.05002)
     | > avg_loss_duration:[91m 1.88659 [0m(+0.01574)
     | > avg_loss_1:[91m 29.94966 [0m(+1.37223)


[4m[1m > EPOCH: 69/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 09:24:48) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 5/80 -- GLOBAL_STEP: 5525[0m
     | > loss_disc: 2.46849  (2.33902)
     | > loss_disc_real_0: 0.09356  (0.06385)
     | > loss_disc_real_1: 0.23720  (0.20933)
     | > loss_disc_real_2: 0.18081  (0.22107)
     | > loss_disc_real_3: 0.16118  (0.21363)
     | > loss_disc_real_4: 0.22005  (0.24146)
     | > loss_disc_real_5: 0.27307  (0.23197)
     | > loss_0: 2.46849  (2.33902)
     | > grad_norm_0: 937.78778  (354.20483)
     | > loss_gen: 2.57093  (2.49481)
     | > loss_kl: 0.83563  (1.07657)
     | > loss_feat: 3.55565  (3.72136)
     | > loss_mel: 20.77104  (21.50784)
     | > loss_duration: 1.58599  (1.63888)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 29.31923  (30.43946)
     | > grad_norm_1: 3348.36865  (3287.72900)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.56830  (3.53222)
     | > loader_time: 0.00800  (0.00761)


[1m   --> STEP: 30/80 -- GLOBAL_STEP: 5550[0m
     | > loss_disc: 2.51787  (2.39412)



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00701 [0m(-0.00200)
     | > avg_loss_disc:[91m 2.57642 [0m(+0.28749)
     | > avg_loss_disc_real_0:[92m 0.03329 [0m(-0.03721)
     | > avg_loss_disc_real_1:[91m 0.23675 [0m(+0.02352)
     | > avg_loss_disc_real_2:[92m 0.21534 [0m(-0.05689)
     | > avg_loss_disc_real_3:[91m 0.31908 [0m(+0.06087)
     | > avg_loss_disc_real_4:[91m 0.26501 [0m(+0.05854)
     | > avg_loss_disc_real_5:[91m 0.26137 [0m(+0.03341)
     | > avg_loss_0:[91m 2.57642 [0m(+0.28749)
     | > avg_loss_gen:[92m 2.13488 [0m(-0.24571)
     | > avg_loss_kl:[91m 1.45599 [0m(+0.28136)
     | > avg_loss_feat:[92m 2.62129 [0m(-0.72723)
     | > avg_loss_mel:[91m 21.46075 [0m(+0.30143)
     | > avg_loss_duration:[92m 1.88254 [0m(-0.00406)
     | > avg_loss_1:[92m 29.55545 [0m(-0.39421)


[4m[1m > EPOCH: 70/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 09:30:23) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 0/80 -- GLOBAL_STEP: 5600[0m
     | > loss_disc: 2.37634  (2.37634)
     | > loss_disc_real_0: 0.03384  (0.03384)
     | > loss_disc_real_1: 0.25167  (0.25167)
     | > loss_disc_real_2: 0.19180  (0.19180)
     | > loss_disc_real_3: 0.24751  (0.24751)
     | > loss_disc_real_4: 0.22999  (0.22999)
     | > loss_disc_real_5: 0.24854  (0.24854)
     | > loss_0: 2.37634  (2.37634)
     | > grad_norm_0: 460.27649  (460.27649)
     | > loss_gen: 2.28027  (2.28027)
     | > loss_kl: 0.89704  (0.89704)
     | > loss_feat: 3.78593  (3.78593)
     | > loss_mel: 23.92258  (23.92258)
     | > loss_duration: 1.66106  (1.66106)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 32.54688  (32.54688)
     | > grad_norm_1: 2506.35449  (2506.35449)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.57330  (3.57325)
     | > loader_time: 23.11170  (23.11170)


[1m   --> STEP: 25/80 -- GLOBAL_STEP: 5625[0m
     | > loss_disc: 2.30286  (2.4733



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.01001 [0m(+0.00300)
     | > avg_loss_disc:[92m 2.36391 [0m(-0.21251)
     | > avg_loss_disc_real_0:[91m 0.04008 [0m(+0.00679)
     | > avg_loss_disc_real_1:[92m 0.20363 [0m(-0.03312)
     | > avg_loss_disc_real_2:[91m 0.23325 [0m(+0.01791)
     | > avg_loss_disc_real_3:[92m 0.27312 [0m(-0.04595)
     | > avg_loss_disc_real_4:[91m 0.28199 [0m(+0.01698)
     | > avg_loss_disc_real_5:[92m 0.23172 [0m(-0.02965)
     | > avg_loss_0:[92m 2.36391 [0m(-0.21251)
     | > avg_loss_gen:[91m 2.40567 [0m(+0.27079)
     | > avg_loss_kl:[91m 1.60099 [0m(+0.14501)
     | > avg_loss_feat:[91m 3.08769 [0m(+0.46640)
     | > avg_loss_mel:[92m 19.47856 [0m(-1.98219)
     | > avg_loss_duration:[91m 1.89515 [0m(+0.01261)
     | > avg_loss_1:[92m 28.46807 [0m(-1.08738)


[4m[1m > EPOCH: 71/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 09:35:57) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 20/80 -- GLOBAL_STEP: 5700[0m
     | > loss_disc: 2.80886  (2.44133)
     | > loss_disc_real_0: 0.01658  (0.08581)
     | > loss_disc_real_1: 0.22278  (0.21951)
     | > loss_disc_real_2: 0.21263  (0.22862)
     | > loss_disc_real_3: 0.22141  (0.22516)
     | > loss_disc_real_4: 0.26056  (0.24082)
     | > loss_disc_real_5: 0.25466  (0.24517)
     | > loss_0: 2.80886  (2.44133)
     | > grad_norm_0: 417.33353  (377.72714)
     | > loss_gen: 1.98592  (2.42283)
     | > loss_kl: 1.27826  (1.16547)
     | > loss_feat: 2.75336  (3.55983)
     | > loss_mel: 21.77798  (21.99481)
     | > loss_duration: 1.65194  (1.62570)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 29.44747  (30.76864)
     | > grad_norm_1: 3566.74316  (1562.36780)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.61530  (3.57665)
     | > loader_time: 0.01000  (0.00886)


[1m   --> STEP: 45/80 -- GLOBAL_STEP: 5725[0m
     | > loss_disc: 2.47280  (2.44345



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00100)
     | > avg_loss_disc:[92m 2.31376 [0m(-0.05015)
     | > avg_loss_disc_real_0:[91m 0.04544 [0m(+0.00536)
     | > avg_loss_disc_real_1:[92m 0.20338 [0m(-0.00025)
     | > avg_loss_disc_real_2:[91m 0.24107 [0m(+0.00782)
     | > avg_loss_disc_real_3:[92m 0.25110 [0m(-0.02202)
     | > avg_loss_disc_real_4:[92m 0.22737 [0m(-0.05462)
     | > avg_loss_disc_real_5:[92m 0.23115 [0m(-0.00057)
     | > avg_loss_0:[92m 2.31376 [0m(-0.05015)
     | > avg_loss_gen:[91m 2.66035 [0m(+0.25468)
     | > avg_loss_kl:[92m 1.40631 [0m(-0.19469)
     | > avg_loss_feat:[91m 3.53473 [0m(+0.44704)
     | > avg_loss_mel:[91m 21.03189 [0m(+1.55333)
     | > avg_loss_duration:[92m 1.87176 [0m(-0.02339)
     | > avg_loss_1:[91m 30.50504 [0m(+2.03697)


[4m[1m > EPOCH: 72/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 09:41:31) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 15/80 -- GLOBAL_STEP: 5775[0m
     | > loss_disc: 2.42212  (2.41178)
     | > loss_disc_real_0: 0.14809  (0.06753)
     | > loss_disc_real_1: 0.23898  (0.22956)
     | > loss_disc_real_2: 0.21021  (0.22729)
     | > loss_disc_real_3: 0.25668  (0.22941)
     | > loss_disc_real_4: 0.22325  (0.23863)
     | > loss_disc_real_5: 0.21613  (0.24108)
     | > loss_0: 2.42212  (2.41178)
     | > grad_norm_0: 241.28745  (165.52820)
     | > loss_gen: 2.72299  (2.37487)
     | > loss_kl: 1.23935  (1.21130)
     | > loss_feat: 3.85990  (3.43004)
     | > loss_mel: 22.13868  (21.97339)
     | > loss_duration: 1.58413  (1.62695)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 31.54506  (30.61655)
     | > grad_norm_1: 752.75305  (1679.74304)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.61430  (3.56876)
     | > loader_time: 0.01000  (0.00854)


[1m   --> STEP: 40/80 -- GLOBAL_STEP: 5800[0m
     | > loss_disc: 2.18644  (2.40590)



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[91m 2.52962 [0m(+0.21585)
     | > avg_loss_disc_real_0:[91m 0.07818 [0m(+0.03274)
     | > avg_loss_disc_real_1:[91m 0.22862 [0m(+0.02524)
     | > avg_loss_disc_real_2:[92m 0.21020 [0m(-0.03088)
     | > avg_loss_disc_real_3:[91m 0.26477 [0m(+0.01367)
     | > avg_loss_disc_real_4:[91m 0.24804 [0m(+0.02067)
     | > avg_loss_disc_real_5:[91m 0.23911 [0m(+0.00796)
     | > avg_loss_0:[91m 2.52962 [0m(+0.21585)
     | > avg_loss_gen:[92m 1.97388 [0m(-0.68647)
     | > avg_loss_kl:[92m 1.15717 [0m(-0.24914)
     | > avg_loss_feat:[92m 2.43930 [0m(-1.09544)
     | > avg_loss_mel:[92m 18.65263 [0m(-2.37927)
     | > avg_loss_duration:[91m 1.88231 [0m(+0.01056)
     | > avg_loss_1:[92m 26.10529 [0m(-4.39976)

 > BEST MODEL : ./output\vits_vctk-September-23-2022_02+46AM-3c624ce\best_model_5840.pth

[4m[1m > EPOCH: 73/1000[0m
 --> ./output\vits_vctk-Sep



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 09:47:10) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 10/80 -- GLOBAL_STEP: 5850[0m
     | > loss_disc: 2.56662  (2.43965)
     | > loss_disc_real_0: 0.08397  (0.08394)
     | > loss_disc_real_1: 0.21681  (0.21542)
     | > loss_disc_real_2: 0.30139  (0.22882)
     | > loss_disc_real_3: 0.33111  (0.24009)
     | > loss_disc_real_4: 0.28888  (0.23899)
     | > loss_disc_real_5: 0.24487  (0.24081)
     | > loss_0: 2.56662  (2.43965)
     | > grad_norm_0: 390.23163  (338.71445)
     | > loss_gen: 2.47192  (2.39476)
     | > loss_kl: 1.17839  (1.14142)
     | > loss_feat: 2.76717  (3.42492)
     | > loss_mel: 20.22251  (21.52075)
     | > loss_duration: 1.59102  (1.62420)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 28.23103  (30.10606)
     | > grad_norm_1: 3245.40186  (2794.50171)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.60030  (3.55424)
     | > loader_time: 0.00800  (0.00801)


[1m   --> STEP: 35/80 -- GLOBAL_STEP: 5875[0m
     | > loss_disc: 2.44485  (2.51306



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time: 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[91m 2.82077 [0m(+0.29115)
     | > avg_loss_disc_real_0:[92m 0.03309 [0m(-0.04509)
     | > avg_loss_disc_real_1:[91m 0.25863 [0m(+0.03001)
     | > avg_loss_disc_real_2:[91m 0.33392 [0m(+0.12372)
     | > avg_loss_disc_real_3:[91m 0.30283 [0m(+0.03806)
     | > avg_loss_disc_real_4:[91m 0.28773 [0m(+0.03969)
     | > avg_loss_disc_real_5:[91m 0.31385 [0m(+0.07474)
     | > avg_loss_0:[91m 2.82077 [0m(+0.29115)
     | > avg_loss_gen:[91m 1.99153 [0m(+0.01764)
     | > avg_loss_kl:[92m 0.92890 [0m(-0.22827)
     | > avg_loss_feat:[92m 1.94063 [0m(-0.49867)
     | > avg_loss_mel:[91m 19.04882 [0m(+0.39619)
     | > avg_loss_duration:[91m 1.90557 [0m(+0.02326)
     | > avg_loss_1:[92m 25.81544 [0m(-0.28984)


[4m[1m > EPOCH: 74/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 09:52:45) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 5/80 -- GLOBAL_STEP: 5925[0m
     | > loss_disc: 2.54726  (2.48533)
     | > loss_disc_real_0: 0.10460  (0.11663)
     | > loss_disc_real_1: 0.23719  (0.20122)
     | > loss_disc_real_2: 0.20156  (0.21828)
     | > loss_disc_real_3: 0.17388  (0.21262)
     | > loss_disc_real_4: 0.22049  (0.22413)
     | > loss_disc_real_5: 0.22728  (0.24041)
     | > loss_0: 2.54726  (2.48533)
     | > grad_norm_0: 195.92967  (235.13257)
     | > loss_gen: 2.27721  (2.28463)
     | > loss_kl: 0.81764  (1.07955)
     | > loss_feat: 2.96911  (3.09721)
     | > loss_mel: 21.70242  (21.86278)
     | > loss_duration: 1.66652  (1.65086)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 29.43288  (29.97503)
     | > grad_norm_1: 3869.24731  (2530.55737)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.57730  (3.54363)
     | > loader_time: 0.00700  (0.00741)


[1m   --> STEP: 30/80 -- GLOBAL_STEP: 5950[0m
     | > loss_disc: 2.52242  (2.47814)



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[92m 2.27169 [0m(-0.54908)
     | > avg_loss_disc_real_0:[91m 0.07299 [0m(+0.03990)
     | > avg_loss_disc_real_1:[92m 0.22312 [0m(-0.03551)
     | > avg_loss_disc_real_2:[92m 0.20762 [0m(-0.12630)
     | > avg_loss_disc_real_3:[92m 0.25242 [0m(-0.05041)
     | > avg_loss_disc_real_4:[92m 0.25026 [0m(-0.03748)
     | > avg_loss_disc_real_5:[92m 0.24057 [0m(-0.07328)
     | > avg_loss_0:[92m 2.27169 [0m(-0.54908)
     | > avg_loss_gen:[91m 2.44864 [0m(+0.45711)
     | > avg_loss_kl:[91m 1.08348 [0m(+0.15458)
     | > avg_loss_feat:[91m 3.46683 [0m(+1.52620)
     | > avg_loss_mel:[91m 22.02998 [0m(+2.98116)
     | > avg_loss_duration:[92m 1.89081 [0m(-0.01476)
     | > avg_loss_1:[91m 30.91974 [0m(+5.10430)


[4m[1m > EPOCH: 75/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 09:58:19) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 0/80 -- GLOBAL_STEP: 6000[0m
     | > loss_disc: 2.23068  (2.23068)
     | > loss_disc_real_0: 0.05843  (0.05843)
     | > loss_disc_real_1: 0.23566  (0.23566)
     | > loss_disc_real_2: 0.23633  (0.23633)
     | > loss_disc_real_3: 0.22906  (0.22906)
     | > loss_disc_real_4: 0.24121  (0.24121)
     | > loss_disc_real_5: 0.25762  (0.25762)
     | > loss_0: 2.23068  (2.23068)
     | > grad_norm_0: 207.94154  (207.94154)
     | > loss_gen: 2.38914  (2.38914)
     | > loss_kl: 0.82330  (0.82330)
     | > loss_feat: 3.82250  (3.82250)
     | > loss_mel: 21.62091  (21.62091)
     | > loss_duration: 1.61750  (1.61750)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 30.27336  (30.27336)
     | > grad_norm_1: 1097.32471  (1097.32471)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.56420  (3.56424)
     | > loader_time: 23.24390  (23.24391)


[1m   --> STEP: 25/80 -- GLOBAL_STEP: 6025[0m
     | > loss_disc: 2.42590  (2.4350



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00000)
     | > avg_loss_disc:[91m 2.53633 [0m(+0.26463)
     | > avg_loss_disc_real_0:[92m 0.01773 [0m(-0.05526)
     | > avg_loss_disc_real_1:[91m 0.29184 [0m(+0.06872)
     | > avg_loss_disc_real_2:[92m 0.18668 [0m(-0.02095)
     | > avg_loss_disc_real_3:[92m 0.19791 [0m(-0.05451)
     | > avg_loss_disc_real_4:[92m 0.20921 [0m(-0.04105)
     | > avg_loss_disc_real_5:[92m 0.22943 [0m(-0.01113)
     | > avg_loss_0:[91m 2.53633 [0m(+0.26463)
     | > avg_loss_gen:[92m 2.32770 [0m(-0.12094)
     | > avg_loss_kl:[91m 1.24021 [0m(+0.15672)
     | > avg_loss_feat:[91m 4.13835 [0m(+0.67153)
     | > avg_loss_mel:[91m 23.52814 [0m(+1.49816)
     | > avg_loss_duration:[91m 1.89934 [0m(+0.00853)
     | > avg_loss_1:[91m 33.13374 [0m(+2.21400)


[4m[1m > EPOCH: 76/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 10:03:54) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 20/80 -- GLOBAL_STEP: 6100[0m
     | > loss_disc: 2.30457  (2.41708)
     | > loss_disc_real_0: 0.02162  (0.05371)
     | > loss_disc_real_1: 0.21776  (0.24158)
     | > loss_disc_real_2: 0.28647  (0.23274)
     | > loss_disc_real_3: 0.19708  (0.23272)
     | > loss_disc_real_4: 0.22380  (0.23835)
     | > loss_disc_real_5: 0.24777  (0.24385)
     | > loss_0: 2.30457  (2.41708)
     | > grad_norm_0: 87.33273  (349.57114)
     | > loss_gen: 2.40009  (2.46943)
     | > loss_kl: 0.97244  (1.24817)
     | > loss_feat: 3.08953  (3.66587)
     | > loss_mel: 21.59224  (22.08010)
     | > loss_duration: 1.65505  (1.62496)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 29.70935  (31.08854)
     | > grad_norm_1: 2675.43457  (1784.57385)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.63030  (3.57820)
     | > loader_time: 0.01000  (0.00886)


[1m   --> STEP: 45/80 -- GLOBAL_STEP: 6125[0m
     | > loss_disc: 2.26836  (2.39990)



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[91m 2.67633 [0m(+0.14001)
     | > avg_loss_disc_real_0:[91m 0.30010 [0m(+0.28237)
     | > avg_loss_disc_real_1:[92m 0.25124 [0m(-0.04060)
     | > avg_loss_disc_real_2:[91m 0.25836 [0m(+0.07168)
     | > avg_loss_disc_real_3:[91m 0.28430 [0m(+0.08639)
     | > avg_loss_disc_real_4:[91m 0.22960 [0m(+0.02039)
     | > avg_loss_disc_real_5:[91m 0.23852 [0m(+0.00909)
     | > avg_loss_0:[91m 2.67633 [0m(+0.14001)
     | > avg_loss_gen:[91m 2.34278 [0m(+0.01509)
     | > avg_loss_kl:[92m 1.20546 [0m(-0.03474)
     | > avg_loss_feat:[92m 2.28721 [0m(-1.85115)
     | > avg_loss_mel:[92m 20.46918 [0m(-3.05896)
     | > avg_loss_duration:[92m 1.87941 [0m(-0.01994)
     | > avg_loss_1:[92m 28.18403 [0m(-4.94971)


[4m[1m > EPOCH: 77/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 10:09:29) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 15/80 -- GLOBAL_STEP: 6175[0m
     | > loss_disc: 2.37409  (2.51698)
     | > loss_disc_real_0: 0.07942  (0.10354)
     | > loss_disc_real_1: 0.23401  (0.23581)
     | > loss_disc_real_2: 0.29700  (0.24095)
     | > loss_disc_real_3: 0.20919  (0.22852)
     | > loss_disc_real_4: 0.27764  (0.24836)
     | > loss_disc_real_5: 0.25328  (0.24123)
     | > loss_0: 2.37409  (2.51698)
     | > grad_norm_0: 157.65472  (361.59286)
     | > loss_gen: 2.55675  (2.28684)
     | > loss_kl: 1.18757  (1.12273)
     | > loss_feat: 3.57571  (3.11077)
     | > loss_mel: 21.50920  (21.12822)
     | > loss_duration: 1.59527  (1.61518)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 30.42451  (29.26375)
     | > grad_norm_1: 965.55408  (2013.89209)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.60160  (3.56587)
     | > loader_time: 0.01000  (0.00874)


[1m   --> STEP: 40/80 -- GLOBAL_STEP: 6200[0m
     | > loss_disc: 2.57360  (2.50855)



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.01001 [0m(+0.00100)
     | > avg_loss_disc:[92m 2.65737 [0m(-0.01896)
     | > avg_loss_disc_real_0:[92m 0.26963 [0m(-0.03047)
     | > avg_loss_disc_real_1:[92m 0.24762 [0m(-0.00361)
     | > avg_loss_disc_real_2:[91m 0.30037 [0m(+0.04201)
     | > avg_loss_disc_real_3:[92m 0.25252 [0m(-0.03178)
     | > avg_loss_disc_real_4:[91m 0.24934 [0m(+0.01974)
     | > avg_loss_disc_real_5:[91m 0.26110 [0m(+0.02259)
     | > avg_loss_0:[92m 2.65737 [0m(-0.01896)
     | > avg_loss_gen:[92m 2.09201 [0m(-0.25077)
     | > avg_loss_kl:[91m 1.34673 [0m(+0.14127)
     | > avg_loss_feat:[92m 1.97226 [0m(-0.31494)
     | > avg_loss_mel:[91m 20.92971 [0m(+0.46053)
     | > avg_loss_duration:[91m 1.89050 [0m(+0.01109)
     | > avg_loss_1:[91m 28.23120 [0m(+0.04717)


[4m[1m > EPOCH: 78/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 10:15:03) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 10/80 -- GLOBAL_STEP: 6250[0m
     | > loss_disc: 2.63727  (2.63381)
     | > loss_disc_real_0: 0.22688  (0.23189)
     | > loss_disc_real_1: 0.21420  (0.22420)
     | > loss_disc_real_2: 0.25323  (0.21932)
     | > loss_disc_real_3: 0.17721  (0.22347)
     | > loss_disc_real_4: 0.31175  (0.24239)
     | > loss_disc_real_5: 0.25785  (0.23958)
     | > loss_0: 2.63727  (2.63381)
     | > grad_norm_0: 8.67670  (20.24204)
     | > loss_gen: 1.90160  (1.94808)
     | > loss_kl: 1.16061  (1.14267)
     | > loss_feat: 2.14413  (2.44779)
     | > loss_mel: 20.34605  (21.62567)
     | > loss_duration: 1.63694  (1.63551)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 27.18933  (28.79973)
     | > grad_norm_1: 123.75573  (315.88269)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.59030  (3.55073)
     | > loader_time: 0.00800  (0.00821)


[1m   --> STEP: 35/80 -- GLOBAL_STEP: 6275[0m
     | > loss_disc: 2.65964  (2.60704)
   



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00801 [0m(-0.00200)
     | > avg_loss_disc:[92m 2.35261 [0m(-0.30476)
     | > avg_loss_disc_real_0:[92m 0.08833 [0m(-0.18130)
     | > avg_loss_disc_real_1:[92m 0.17242 [0m(-0.07520)
     | > avg_loss_disc_real_2:[92m 0.23422 [0m(-0.06614)
     | > avg_loss_disc_real_3:[92m 0.23889 [0m(-0.01363)
     | > avg_loss_disc_real_4:[92m 0.24697 [0m(-0.00237)
     | > avg_loss_disc_real_5:[92m 0.25200 [0m(-0.00911)
     | > avg_loss_0:[92m 2.35261 [0m(-0.30476)
     | > avg_loss_gen:[91m 2.23284 [0m(+0.14083)
     | > avg_loss_kl:[92m 1.26332 [0m(-0.08341)
     | > avg_loss_feat:[91m 3.01699 [0m(+1.04472)
     | > avg_loss_mel:[91m 21.01789 [0m(+0.08818)
     | > avg_loss_duration:[92m 1.87661 [0m(-0.01389)
     | > avg_loss_1:[91m 29.40765 [0m(+1.17644)


[4m[1m > EPOCH: 79/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 10:20:38) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 5/80 -- GLOBAL_STEP: 6325[0m
     | > loss_disc: 2.48872  (2.46973)
     | > loss_disc_real_0: 0.12855  (0.08907)
     | > loss_disc_real_1: 0.22134  (0.22797)
     | > loss_disc_real_2: 0.24017  (0.23210)
     | > loss_disc_real_3: 0.20044  (0.22794)
     | > loss_disc_real_4: 0.26721  (0.24132)
     | > loss_disc_real_5: 0.24113  (0.24549)
     | > loss_0: 2.48872  (2.46973)
     | > grad_norm_0: 517.18945  (428.85352)
     | > loss_gen: 2.19546  (2.29969)
     | > loss_kl: 0.99495  (1.11765)
     | > loss_feat: 3.24753  (3.21785)
     | > loss_mel: 21.28937  (21.42941)
     | > loss_duration: 1.57258  (1.61288)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 29.29987  (29.67748)
     | > grad_norm_1: 2683.25317  (2542.85327)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.57230  (3.53987)
     | > loader_time: 0.00800  (0.00801)


[1m   --> STEP: 30/80 -- GLOBAL_STEP: 6350[0m
     | > loss_disc: 2.45487  (2.48885)



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00100)
     | > avg_loss_disc:[91m 2.65914 [0m(+0.30653)
     | > avg_loss_disc_real_0:[91m 0.22866 [0m(+0.14033)
     | > avg_loss_disc_real_1:[91m 0.27559 [0m(+0.10317)
     | > avg_loss_disc_real_2:[91m 0.23522 [0m(+0.00100)
     | > avg_loss_disc_real_3:[92m 0.18682 [0m(-0.05208)
     | > avg_loss_disc_real_4:[92m 0.23311 [0m(-0.01386)
     | > avg_loss_disc_real_5:[91m 0.25371 [0m(+0.00171)
     | > avg_loss_0:[91m 2.65914 [0m(+0.30653)
     | > avg_loss_gen:[92m 2.21350 [0m(-0.01934)
     | > avg_loss_kl:[92m 1.15909 [0m(-0.10423)
     | > avg_loss_feat:[92m 2.52085 [0m(-0.49613)
     | > avg_loss_mel:[92m 19.78579 [0m(-1.23210)
     | > avg_loss_duration:[91m 1.89506 [0m(+0.01845)
     | > avg_loss_1:[92m 27.57429 [0m(-1.83335)


[4m[1m > EPOCH: 80/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 10:26:13) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 0/80 -- GLOBAL_STEP: 6400[0m
     | > loss_disc: 2.53231  (2.53231)
     | > loss_disc_real_0: 0.13503  (0.13503)
     | > loss_disc_real_1: 0.22176  (0.22176)
     | > loss_disc_real_2: 0.21074  (0.21074)
     | > loss_disc_real_3: 0.20138  (0.20138)
     | > loss_disc_real_4: 0.24249  (0.24249)
     | > loss_disc_real_5: 0.23780  (0.23780)
     | > loss_0: 2.53231  (2.53231)
     | > grad_norm_0: 662.60352  (662.60352)
     | > loss_gen: 2.31118  (2.31118)
     | > loss_kl: 1.20919  (1.20919)
     | > loss_feat: 3.21039  (3.21039)
     | > loss_mel: 20.94125  (20.94125)
     | > loss_duration: 1.63644  (1.63644)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 29.30845  (29.30845)
     | > grad_norm_1: 2803.49951  (2803.49951)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.56920  (3.56925)
     | > loader_time: 23.07730  (23.07731)


[1m   --> STEP: 25/80 -- GLOBAL_STEP: 6425[0m
     | > loss_disc: 2.57159  (2.4977



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00000)
     | > avg_loss_disc:[91m 2.75768 [0m(+0.09854)
     | > avg_loss_disc_real_0:[92m 0.03998 [0m(-0.18868)
     | > avg_loss_disc_real_1:[92m 0.24599 [0m(-0.02960)
     | > avg_loss_disc_real_2:[92m 0.19633 [0m(-0.03889)
     | > avg_loss_disc_real_3:[91m 0.26653 [0m(+0.07971)
     | > avg_loss_disc_real_4:[92m 0.22375 [0m(-0.00935)
     | > avg_loss_disc_real_5:[92m 0.24231 [0m(-0.01140)
     | > avg_loss_0:[91m 2.75768 [0m(+0.09854)
     | > avg_loss_gen:[92m 1.65234 [0m(-0.56116)
     | > avg_loss_kl:[92m 0.92151 [0m(-0.23758)
     | > avg_loss_feat:[92m 2.14872 [0m(-0.37214)
     | > avg_loss_mel:[91m 20.45070 [0m(+0.66491)
     | > avg_loss_duration:[92m 1.88624 [0m(-0.00882)
     | > avg_loss_1:[92m 27.05951 [0m(-0.51478)


[4m[1m > EPOCH: 81/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 10:31:47) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 20/80 -- GLOBAL_STEP: 6500[0m
     | > loss_disc: 2.32711  (2.53127)
     | > loss_disc_real_0: 0.10646  (0.15020)
     | > loss_disc_real_1: 0.23823  (0.23628)
     | > loss_disc_real_2: 0.22866  (0.22910)
     | > loss_disc_real_3: 0.16722  (0.22635)
     | > loss_disc_real_4: 0.19658  (0.23603)
     | > loss_disc_real_5: 0.23852  (0.24310)
     | > loss_0: 2.32711  (2.53127)
     | > grad_norm_0: 407.66650  (596.06293)
     | > loss_gen: 2.74053  (2.35709)
     | > loss_kl: 1.31817  (1.20944)
     | > loss_feat: 4.15942  (3.19104)
     | > loss_mel: 21.74656  (21.58945)
     | > loss_duration: 1.68862  (1.62220)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 31.65330  (29.96921)
     | > grad_norm_1: 1357.43579  (2096.64966)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.62230  (3.58040)
     | > loader_time: 0.00900  (0.00871)


[1m   --> STEP: 45/80 -- GLOBAL_STEP: 6525[0m
     | > loss_disc: 2.50060  (2.52044



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[92m 2.70634 [0m(-0.05134)
     | > avg_loss_disc_real_0:[91m 0.33382 [0m(+0.29383)
     | > avg_loss_disc_real_1:[92m 0.21031 [0m(-0.03568)
     | > avg_loss_disc_real_2:[91m 0.23066 [0m(+0.03433)
     | > avg_loss_disc_real_3:[91m 0.28012 [0m(+0.01359)
     | > avg_loss_disc_real_4:[91m 0.28395 [0m(+0.06019)
     | > avg_loss_disc_real_5:[91m 0.28549 [0m(+0.04317)
     | > avg_loss_0:[92m 2.70634 [0m(-0.05134)
     | > avg_loss_gen:[91m 2.32540 [0m(+0.67306)
     | > avg_loss_kl:[91m 1.02569 [0m(+0.10419)
     | > avg_loss_feat:[92m 2.02415 [0m(-0.12456)
     | > avg_loss_mel:[92m 17.40975 [0m(-3.04095)
     | > avg_loss_duration:[91m 1.89046 [0m(+0.00421)
     | > avg_loss_1:[92m 24.67545 [0m(-2.38406)

 > BEST MODEL : ./output\vits_vctk-September-23-2022_02+46AM-3c624ce\best_model_6560.pth

[4m[1m > EPOCH: 82/1000[0m
 --> ./output\vits_vctk-Sep



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 10:37:26) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 15/80 -- GLOBAL_STEP: 6575[0m
     | > loss_disc: 2.56496  (2.52698)
     | > loss_disc_real_0: 0.08992  (0.10258)
     | > loss_disc_real_1: 0.27566  (0.23970)
     | > loss_disc_real_2: 0.26239  (0.23620)
     | > loss_disc_real_3: 0.20473  (0.22778)
     | > loss_disc_real_4: 0.22725  (0.24093)
     | > loss_disc_real_5: 0.20203  (0.24039)
     | > loss_0: 2.56496  (2.52698)
     | > grad_norm_0: 507.29285  (690.22485)
     | > loss_gen: 2.29162  (2.32081)
     | > loss_kl: 1.34551  (1.17535)
     | > loss_feat: 3.44789  (3.26080)
     | > loss_mel: 21.68053  (21.16901)
     | > loss_duration: 1.61350  (1.61298)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 30.37904  (29.53894)
     | > grad_norm_1: 3165.73657  (3220.21167)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.60130  (3.56696)
     | > loader_time: 0.00900  (0.00821)


[1m   --> STEP: 40/80 -- GLOBAL_STEP: 6600[0m
     | > loss_disc: 2.38180  (2.49953



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00000)
     | > avg_loss_disc:[92m 2.62931 [0m(-0.07703)
     | > avg_loss_disc_real_0:[92m 0.11502 [0m(-0.21880)
     | > avg_loss_disc_real_1:[91m 0.22523 [0m(+0.01492)
     | > avg_loss_disc_real_2:[91m 0.29549 [0m(+0.06483)
     | > avg_loss_disc_real_3:[92m 0.27469 [0m(-0.00543)
     | > avg_loss_disc_real_4:[92m 0.26666 [0m(-0.01729)
     | > avg_loss_disc_real_5:[92m 0.23731 [0m(-0.04818)
     | > avg_loss_0:[92m 2.62931 [0m(-0.07703)
     | > avg_loss_gen:[92m 2.07363 [0m(-0.25177)
     | > avg_loss_kl:[91m 1.08606 [0m(+0.06037)
     | > avg_loss_feat:[91m 2.39242 [0m(+0.36827)
     | > avg_loss_mel:[91m 20.27837 [0m(+2.86863)
     | > avg_loss_duration:[92m 1.88711 [0m(-0.00334)
     | > avg_loss_1:[91m 27.71760 [0m(+3.04215)


[4m[1m > EPOCH: 83/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 10:43:01) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 10/80 -- GLOBAL_STEP: 6650[0m
     | > loss_disc: 2.44986  (2.49327)
     | > loss_disc_real_0: 0.09059  (0.09804)
     | > loss_disc_real_1: 0.24362  (0.23705)
     | > loss_disc_real_2: 0.18102  (0.22561)
     | > loss_disc_real_3: 0.27096  (0.22957)
     | > loss_disc_real_4: 0.30257  (0.24532)
     | > loss_disc_real_5: 0.26617  (0.24215)
     | > loss_0: 2.44986  (2.49327)
     | > grad_norm_0: 147.71107  (385.96609)
     | > loss_gen: 2.51791  (2.33070)
     | > loss_kl: 1.24035  (1.19487)
     | > loss_feat: 3.18550  (3.30207)
     | > loss_mel: 21.33795  (21.69860)
     | > loss_duration: 1.60501  (1.62237)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 29.88673  (30.14861)
     | > grad_norm_1: 2868.36255  (2518.49780)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.57530  (3.55406)
     | > loader_time: 0.00800  (0.00851)


[1m   --> STEP: 35/80 -- GLOBAL_STEP: 6675[0m
     | > loss_disc: 2.86512  (2.57527



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.01001 [0m(+0.00100)
     | > avg_loss_disc:[92m 2.38417 [0m(-0.24514)
     | > avg_loss_disc_real_0:[92m 0.09517 [0m(-0.01985)
     | > avg_loss_disc_real_1:[91m 0.25246 [0m(+0.02723)
     | > avg_loss_disc_real_2:[92m 0.21687 [0m(-0.07862)
     | > avg_loss_disc_real_3:[92m 0.23338 [0m(-0.04131)
     | > avg_loss_disc_real_4:[91m 0.29675 [0m(+0.03009)
     | > avg_loss_disc_real_5:[92m 0.22463 [0m(-0.01268)
     | > avg_loss_0:[92m 2.38417 [0m(-0.24514)
     | > avg_loss_gen:[91m 2.26601 [0m(+0.19238)
     | > avg_loss_kl:[91m 1.23817 [0m(+0.15210)
     | > avg_loss_feat:[91m 2.78865 [0m(+0.39623)
     | > avg_loss_mel:[91m 20.79930 [0m(+0.52093)
     | > avg_loss_duration:[92m 1.87919 [0m(-0.00792)
     | > avg_loss_1:[91m 28.97132 [0m(+1.25372)


[4m[1m > EPOCH: 84/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 10:48:35) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 5/80 -- GLOBAL_STEP: 6725[0m
     | > loss_disc: 2.61033  (2.53119)
     | > loss_disc_real_0: 0.06182  (0.09918)
     | > loss_disc_real_1: 0.25954  (0.23650)
     | > loss_disc_real_2: 0.24844  (0.22673)
     | > loss_disc_real_3: 0.25944  (0.23545)
     | > loss_disc_real_4: 0.24428  (0.24508)
     | > loss_disc_real_5: 0.24472  (0.23714)
     | > loss_0: 2.61033  (2.53119)
     | > grad_norm_0: 650.79346  (730.80823)
     | > loss_gen: 2.49827  (2.36693)
     | > loss_kl: 1.31409  (1.25821)
     | > loss_feat: 3.32644  (3.34401)
     | > loss_mel: 20.09330  (20.92901)
     | > loss_duration: 1.58915  (1.61174)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 28.82125  (29.50990)
     | > grad_norm_1: 3913.90234  (3395.45117)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.57230  (3.53182)
     | > loader_time: 0.00700  (0.00740)


[1m   --> STEP: 30/80 -- GLOBAL_STEP: 6750[0m
     | > loss_disc: 2.33308  (2.47530)



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00100)
     | > avg_loss_disc:[91m 2.47397 [0m(+0.08980)
     | > avg_loss_disc_real_0:[92m 0.07070 [0m(-0.02447)
     | > avg_loss_disc_real_1:[92m 0.24144 [0m(-0.01103)
     | > avg_loss_disc_real_2:[91m 0.26658 [0m(+0.04971)
     | > avg_loss_disc_real_3:[91m 0.29135 [0m(+0.05798)
     | > avg_loss_disc_real_4:[92m 0.24409 [0m(-0.05266)
     | > avg_loss_disc_real_5:[91m 0.25376 [0m(+0.02914)
     | > avg_loss_0:[91m 2.47397 [0m(+0.08980)
     | > avg_loss_gen:[91m 2.31332 [0m(+0.04732)
     | > avg_loss_kl:[92m 1.01616 [0m(-0.22201)
     | > avg_loss_feat:[92m 2.64173 [0m(-0.14692)
     | > avg_loss_mel:[92m 18.81989 [0m(-1.97941)
     | > avg_loss_duration:[91m 1.90066 [0m(+0.02147)
     | > avg_loss_1:[92m 26.69177 [0m(-2.27955)


[4m[1m > EPOCH: 85/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 10:54:10) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 0/80 -- GLOBAL_STEP: 6800[0m
     | > loss_disc: 2.45458  (2.45458)
     | > loss_disc_real_0: 0.08552  (0.08552)
     | > loss_disc_real_1: 0.22304  (0.22304)
     | > loss_disc_real_2: 0.24172  (0.24172)
     | > loss_disc_real_3: 0.24200  (0.24200)
     | > loss_disc_real_4: 0.21498  (0.21498)
     | > loss_disc_real_5: 0.24350  (0.24350)
     | > loss_0: 2.45458  (2.45458)
     | > grad_norm_0: 437.49835  (437.49835)
     | > loss_gen: 2.25429  (2.25429)
     | > loss_kl: 1.40772  (1.40772)
     | > loss_feat: 3.11384  (3.11384)
     | > loss_mel: 21.32224  (21.32224)
     | > loss_duration: 1.61187  (1.61187)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 29.70996  (29.70996)
     | > grad_norm_1: 3589.96289  (3589.96289)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.62130  (3.62130)
     | > loader_time: 23.07780  (23.07781)


[1m   --> STEP: 25/80 -- GLOBAL_STEP: 6825[0m
     | > loss_disc: 2.43917  (2.4735



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time: 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[91m 2.53849 [0m(+0.06451)
     | > avg_loss_disc_real_0:[91m 0.17932 [0m(+0.10863)
     | > avg_loss_disc_real_1:[91m 0.27602 [0m(+0.03459)
     | > avg_loss_disc_real_2:[92m 0.17850 [0m(-0.08808)
     | > avg_loss_disc_real_3:[92m 0.23397 [0m(-0.05739)
     | > avg_loss_disc_real_4:[91m 0.32758 [0m(+0.08349)
     | > avg_loss_disc_real_5:[91m 0.25542 [0m(+0.00166)
     | > avg_loss_0:[91m 2.53849 [0m(+0.06451)
     | > avg_loss_gen:[92m 2.27232 [0m(-0.04100)
     | > avg_loss_kl:[91m 1.41380 [0m(+0.39765)
     | > avg_loss_feat:[92m 2.31961 [0m(-0.32212)
     | > avg_loss_mel:[91m 20.91117 [0m(+2.09128)
     | > avg_loss_duration:[92m 1.89202 [0m(-0.00864)
     | > avg_loss_1:[91m 28.80893 [0m(+2.11716)


[4m[1m > EPOCH: 86/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 10:59:46) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 20/80 -- GLOBAL_STEP: 6900[0m
     | > loss_disc: 2.28037  (2.49389)
     | > loss_disc_real_0: 0.05411  (0.10235)
     | > loss_disc_real_1: 0.20643  (0.23552)
     | > loss_disc_real_2: 0.22197  (0.23037)
     | > loss_disc_real_3: 0.24377  (0.23426)
     | > loss_disc_real_4: 0.23405  (0.23973)
     | > loss_disc_real_5: 0.26436  (0.24743)
     | > loss_0: 2.28037  (2.49389)
     | > grad_norm_0: 162.04184  (424.30884)
     | > loss_gen: 2.39119  (2.41921)
     | > loss_kl: 1.32892  (1.32033)
     | > loss_feat: 3.42102  (3.45695)
     | > loss_mel: 22.15058  (22.14784)
     | > loss_duration: 1.63817  (1.61210)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 30.92988  (30.95642)
     | > grad_norm_1: 1420.68799  (1974.61853)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.62930  (3.57831)
     | > loader_time: 0.01000  (0.00876)


[1m   --> STEP: 45/80 -- GLOBAL_STEP: 6925[0m
     | > loss_disc: 2.40417  (2.49728



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[91m 2.73733 [0m(+0.19885)
     | > avg_loss_disc_real_0:[91m 0.28034 [0m(+0.10102)
     | > avg_loss_disc_real_1:[92m 0.22104 [0m(-0.05498)
     | > avg_loss_disc_real_2:[91m 0.23985 [0m(+0.06136)
     | > avg_loss_disc_real_3:[91m 0.25345 [0m(+0.01948)
     | > avg_loss_disc_real_4:[92m 0.24898 [0m(-0.07860)
     | > avg_loss_disc_real_5:[92m 0.22445 [0m(-0.03097)
     | > avg_loss_0:[91m 2.73733 [0m(+0.19885)
     | > avg_loss_gen:[92m 2.03609 [0m(-0.23623)
     | > avg_loss_kl:[92m 1.11246 [0m(-0.30134)
     | > avg_loss_feat:[92m 2.24616 [0m(-0.07345)
     | > avg_loss_mel:[91m 21.67963 [0m(+0.76846)
     | > avg_loss_duration:[92m 1.89102 [0m(-0.00099)
     | > avg_loss_1:[91m 28.96537 [0m(+0.15644)


[4m[1m > EPOCH: 87/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 11:05:20) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 15/80 -- GLOBAL_STEP: 6975[0m
     | > loss_disc: 2.31591  (2.63585)
     | > loss_disc_real_0: 0.03482  (0.14249)
     | > loss_disc_real_1: 0.24821  (0.24652)
     | > loss_disc_real_2: 0.20012  (0.24431)
     | > loss_disc_real_3: 0.25583  (0.23518)
     | > loss_disc_real_4: 0.25941  (0.24123)
     | > loss_disc_real_5: 0.23776  (0.24265)
     | > loss_0: 2.31591  (2.63585)
     | > grad_norm_0: 71.68561  (262.01398)
     | > loss_gen: 2.44572  (2.11528)
     | > loss_kl: 1.18077  (1.24558)
     | > loss_feat: 3.22693  (2.75469)
     | > loss_mel: 21.92948  (21.77673)
     | > loss_duration: 1.61236  (1.60952)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 30.39527  (29.50180)
     | > grad_norm_1: 1494.94849  (2077.66797)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.60930  (3.56849)
     | > loader_time: 0.01100  (0.00861)


[1m   --> STEP: 40/80 -- GLOBAL_STEP: 7000[0m
     | > loss_disc: 2.33644  (2.53709)



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.01001 [0m(+0.00100)
     | > avg_loss_disc:[92m 2.41612 [0m(-0.32121)
     | > avg_loss_disc_real_0:[92m 0.05953 [0m(-0.22081)
     | > avg_loss_disc_real_1:[92m 0.19824 [0m(-0.02280)
     | > avg_loss_disc_real_2:[91m 0.26714 [0m(+0.02728)
     | > avg_loss_disc_real_3:[92m 0.24387 [0m(-0.00957)
     | > avg_loss_disc_real_4:[92m 0.23396 [0m(-0.01501)
     | > avg_loss_disc_real_5:[91m 0.26555 [0m(+0.04111)
     | > avg_loss_0:[92m 2.41612 [0m(-0.32121)
     | > avg_loss_gen:[91m 2.31509 [0m(+0.27900)
     | > avg_loss_kl:[92m 0.90582 [0m(-0.20664)
     | > avg_loss_feat:[91m 2.90736 [0m(+0.66120)
     | > avg_loss_mel:[92m 19.59186 [0m(-2.08777)
     | > avg_loss_duration:[91m 1.89425 [0m(+0.00322)
     | > avg_loss_1:[92m 27.61438 [0m(-1.35099)


[4m[1m > EPOCH: 88/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 11:10:55) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 10/80 -- GLOBAL_STEP: 7050[0m
     | > loss_disc: 2.52456  (2.51926)
     | > loss_disc_real_0: 0.06537  (0.11977)
     | > loss_disc_real_1: 0.26429  (0.23553)
     | > loss_disc_real_2: 0.31988  (0.22993)
     | > loss_disc_real_3: 0.27979  (0.24101)
     | > loss_disc_real_4: 0.27035  (0.24030)
     | > loss_disc_real_5: 0.24603  (0.23964)
     | > loss_0: 2.52456  (2.51926)
     | > grad_norm_0: 190.05632  (359.79465)
     | > loss_gen: 2.24984  (2.23519)
     | > loss_kl: 1.27289  (1.19816)
     | > loss_feat: 3.61350  (3.08165)
     | > loss_mel: 21.61414  (21.31087)
     | > loss_duration: 1.59644  (1.61375)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 30.34682  (29.43962)
     | > grad_norm_1: 3864.22510  (2816.86670)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.58730  (3.55664)
     | > loader_time: 0.00800  (0.00831)


[1m   --> STEP: 35/80 -- GLOBAL_STEP: 7075[0m
     | > loss_disc: 2.36896  (2.54490



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00100)
     | > avg_loss_disc:[92m 2.35304 [0m(-0.06308)
     | > avg_loss_disc_real_0:[92m 0.04472 [0m(-0.01481)
     | > avg_loss_disc_real_1:[91m 0.22807 [0m(+0.02984)
     | > avg_loss_disc_real_2:[92m 0.19276 [0m(-0.07437)
     | > avg_loss_disc_real_3:[91m 0.28533 [0m(+0.04146)
     | > avg_loss_disc_real_4:[91m 0.25895 [0m(+0.02499)
     | > avg_loss_disc_real_5:[91m 0.28183 [0m(+0.01628)
     | > avg_loss_0:[92m 2.35304 [0m(-0.06308)
     | > avg_loss_gen:[92m 2.26015 [0m(-0.05494)
     | > avg_loss_kl:[91m 1.26239 [0m(+0.35657)
     | > avg_loss_feat:[92m 2.80311 [0m(-0.10425)
     | > avg_loss_mel:[92m 18.82507 [0m(-0.76679)
     | > avg_loss_duration:[91m 1.92092 [0m(+0.02667)
     | > avg_loss_1:[92m 27.07163 [0m(-0.54275)


[4m[1m > EPOCH: 89/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 11:16:29) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 5/80 -- GLOBAL_STEP: 7125[0m
     | > loss_disc: 2.49117  (2.56184)
     | > loss_disc_real_0: 0.09187  (0.14749)
     | > loss_disc_real_1: 0.22806  (0.23102)
     | > loss_disc_real_2: 0.24730  (0.23967)
     | > loss_disc_real_3: 0.22758  (0.23534)
     | > loss_disc_real_4: 0.26234  (0.24304)
     | > loss_disc_real_5: 0.26561  (0.24550)
     | > loss_0: 2.49117  (2.56184)
     | > grad_norm_0: 576.47235  (939.07715)
     | > loss_gen: 1.75966  (2.22750)
     | > loss_kl: 1.39152  (1.15575)
     | > loss_feat: 2.63065  (2.99526)
     | > loss_mel: 19.88053  (20.90991)
     | > loss_duration: 1.58905  (1.61486)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 27.25140  (28.90327)
     | > grad_norm_1: 5174.23633  (3452.14844)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.56720  (3.53482)
     | > loader_time: 0.01000  (0.00841)


[1m   --> STEP: 30/80 -- GLOBAL_STEP: 7150[0m
     | > loss_disc: 2.47511  (2.50488)



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00000)
     | > avg_loss_disc:[92m 2.33529 [0m(-0.01775)
     | > avg_loss_disc_real_0:[91m 0.05710 [0m(+0.01238)
     | > avg_loss_disc_real_1:[92m 0.21830 [0m(-0.00977)
     | > avg_loss_disc_real_2:[91m 0.25438 [0m(+0.06161)
     | > avg_loss_disc_real_3:[92m 0.18728 [0m(-0.09805)
     | > avg_loss_disc_real_4:[92m 0.19730 [0m(-0.06165)
     | > avg_loss_disc_real_5:[92m 0.24157 [0m(-0.04026)
     | > avg_loss_0:[92m 2.33529 [0m(-0.01775)
     | > avg_loss_gen:[92m 2.18586 [0m(-0.07429)
     | > avg_loss_kl:[92m 1.15464 [0m(-0.10775)
     | > avg_loss_feat:[91m 3.25371 [0m(+0.45060)
     | > avg_loss_mel:[91m 21.51926 [0m(+2.69419)
     | > avg_loss_duration:[92m 1.88700 [0m(-0.03391)
     | > avg_loss_1:[91m 30.00048 [0m(+2.92884)


[4m[1m > EPOCH: 90/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 11:22:04) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 0/80 -- GLOBAL_STEP: 7200[0m
     | > loss_disc: 2.36441  (2.36441)
     | > loss_disc_real_0: 0.07432  (0.07432)
     | > loss_disc_real_1: 0.22838  (0.22838)
     | > loss_disc_real_2: 0.25258  (0.25258)
     | > loss_disc_real_3: 0.19423  (0.19423)
     | > loss_disc_real_4: 0.21191  (0.21191)
     | > loss_disc_real_5: 0.23687  (0.23687)
     | > loss_0: 2.36441  (2.36441)
     | > grad_norm_0: 256.96976  (256.96976)
     | > loss_gen: 2.43839  (2.43839)
     | > loss_kl: 0.99077  (0.99077)
     | > loss_feat: 3.37586  (3.37586)
     | > loss_mel: 21.00573  (21.00573)
     | > loss_duration: 1.62948  (1.62948)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 29.44022  (29.44022)
     | > grad_norm_1: 2965.72070  (2965.72070)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.57220  (3.57225)
     | > loader_time: 23.24510  (23.24513)


[1m   --> STEP: 25/80 -- GLOBAL_STEP: 7225[0m
     | > loss_disc: 2.32508  (2.4753



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[91m 2.53046 [0m(+0.19517)
     | > avg_loss_disc_real_0:[92m 0.02200 [0m(-0.03510)
     | > avg_loss_disc_real_1:[91m 0.25206 [0m(+0.03376)
     | > avg_loss_disc_real_2:[92m 0.15802 [0m(-0.09636)
     | > avg_loss_disc_real_3:[91m 0.22322 [0m(+0.03595)
     | > avg_loss_disc_real_4:[91m 0.24190 [0m(+0.04460)
     | > avg_loss_disc_real_5:[92m 0.22908 [0m(-0.01249)
     | > avg_loss_0:[91m 2.53046 [0m(+0.19517)
     | > avg_loss_gen:[92m 1.82342 [0m(-0.36243)
     | > avg_loss_kl:[92m 1.12211 [0m(-0.03253)
     | > avg_loss_feat:[92m 2.67407 [0m(-0.57965)
     | > avg_loss_mel:[92m 18.97237 [0m(-2.54689)
     | > avg_loss_duration:[91m 1.91461 [0m(+0.02761)
     | > avg_loss_1:[92m 26.50658 [0m(-3.49389)


[4m[1m > EPOCH: 91/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 11:27:39) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 20/80 -- GLOBAL_STEP: 7300[0m
     | > loss_disc: 2.65288  (2.54547)
     | > loss_disc_real_0: 0.03492  (0.12762)
     | > loss_disc_real_1: 0.17969  (0.22470)
     | > loss_disc_real_2: 0.27639  (0.23577)
     | > loss_disc_real_3: 0.20754  (0.22752)
     | > loss_disc_real_4: 0.21831  (0.23840)
     | > loss_disc_real_5: 0.24181  (0.24184)
     | > loss_0: 2.65288  (2.54547)
     | > grad_norm_0: 508.24942  (589.63287)
     | > loss_gen: 1.92333  (2.28846)
     | > loss_kl: 1.16103  (1.22195)
     | > loss_feat: 2.80592  (3.20092)
     | > loss_mel: 21.27342  (21.28058)
     | > loss_duration: 1.59305  (1.59534)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 28.75676  (29.58726)
     | > grad_norm_1: 1946.97937  (2614.56934)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.62430  (3.58011)
     | > loader_time: 0.00800  (0.00886)


[1m   --> STEP: 45/80 -- GLOBAL_STEP: 7325[0m
     | > loss_disc: 2.43975  (2.57265



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[91m 2.57954 [0m(+0.04908)
     | > avg_loss_disc_real_0:[91m 0.13397 [0m(+0.11197)
     | > avg_loss_disc_real_1:[92m 0.17097 [0m(-0.08109)
     | > avg_loss_disc_real_2:[91m 0.27711 [0m(+0.11910)
     | > avg_loss_disc_real_3:[91m 0.23130 [0m(+0.00808)
     | > avg_loss_disc_real_4:[92m 0.22772 [0m(-0.01418)
     | > avg_loss_disc_real_5:[91m 0.24960 [0m(+0.02052)
     | > avg_loss_0:[91m 2.57954 [0m(+0.04908)
     | > avg_loss_gen:[91m 1.99652 [0m(+0.17310)
     | > avg_loss_kl:[92m 0.98859 [0m(-0.13352)
     | > avg_loss_feat:[92m 2.43446 [0m(-0.23961)
     | > avg_loss_mel:[91m 20.71979 [0m(+1.74741)
     | > avg_loss_duration:[92m 1.90603 [0m(-0.00858)
     | > avg_loss_1:[91m 28.04539 [0m(+1.53881)


[4m[1m > EPOCH: 92/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 11:33:13) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 15/80 -- GLOBAL_STEP: 7375[0m
     | > loss_disc: 2.48781  (2.49094)
     | > loss_disc_real_0: 0.09561  (0.09526)
     | > loss_disc_real_1: 0.24525  (0.22525)
     | > loss_disc_real_2: 0.15156  (0.22225)
     | > loss_disc_real_3: 0.20186  (0.23584)
     | > loss_disc_real_4: 0.24840  (0.24059)
     | > loss_disc_real_5: 0.22840  (0.24593)
     | > loss_0: 2.48781  (2.49094)
     | > grad_norm_0: 582.22223  (663.61475)
     | > loss_gen: 2.26732  (2.32245)
     | > loss_kl: 1.28732  (1.14978)
     | > loss_feat: 3.29649  (3.33060)
     | > loss_mel: 21.84418  (20.88324)
     | > loss_duration: 1.55485  (1.59053)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 30.25015  (29.27659)
     | > grad_norm_1: 2608.94556  (3094.18628)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.61030  (3.56992)
     | > loader_time: 0.01000  (0.00861)


[1m   --> STEP: 40/80 -- GLOBAL_STEP: 7400[0m
     | > loss_disc: 2.36414  (2.45548



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00000)
     | > avg_loss_disc:[92m 2.37647 [0m(-0.20306)
     | > avg_loss_disc_real_0:[92m 0.09843 [0m(-0.03555)
     | > avg_loss_disc_real_1:[91m 0.27524 [0m(+0.10427)
     | > avg_loss_disc_real_2:[92m 0.24400 [0m(-0.03311)
     | > avg_loss_disc_real_3:[92m 0.22788 [0m(-0.00342)
     | > avg_loss_disc_real_4:[91m 0.26405 [0m(+0.03633)
     | > avg_loss_disc_real_5:[91m 0.25752 [0m(+0.00791)
     | > avg_loss_0:[92m 2.37647 [0m(-0.20306)
     | > avg_loss_gen:[91m 2.58055 [0m(+0.58403)
     | > avg_loss_kl:[91m 1.04220 [0m(+0.05361)
     | > avg_loss_feat:[91m 3.07249 [0m(+0.63804)
     | > avg_loss_mel:[92m 19.55892 [0m(-1.16086)
     | > avg_loss_duration:[91m 1.91081 [0m(+0.00477)
     | > avg_loss_1:[91m 28.16497 [0m(+0.11958)


[4m[1m > EPOCH: 93/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 11:38:48) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 10/80 -- GLOBAL_STEP: 7450[0m
     | > loss_disc: 2.63837  (2.54962)
     | > loss_disc_real_0: 0.19259  (0.11742)
     | > loss_disc_real_1: 0.17186  (0.24837)
     | > loss_disc_real_2: 0.19297  (0.23277)
     | > loss_disc_real_3: 0.28293  (0.23364)
     | > loss_disc_real_4: 0.19499  (0.24711)
     | > loss_disc_real_5: 0.22441  (0.25239)
     | > loss_0: 2.63837  (2.54962)
     | > grad_norm_0: 945.89087  (463.36301)
     | > loss_gen: 2.22574  (2.40549)
     | > loss_kl: 1.33955  (1.41041)
     | > loss_feat: 3.25538  (3.64522)
     | > loss_mel: 21.77082  (21.80964)
     | > loss_duration: 1.58025  (1.60712)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 30.17175  (30.87788)
     | > grad_norm_1: 1533.20020  (1538.02063)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.60130  (3.55053)
     | > loader_time: 0.00900  (0.00831)


[1m   --> STEP: 35/80 -- GLOBAL_STEP: 7475[0m
     | > loss_disc: 2.35739  (2.59648



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time: 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[92m 2.34836 [0m(-0.02811)
     | > avg_loss_disc_real_0:[92m 0.01102 [0m(-0.08740)
     | > avg_loss_disc_real_1:[92m 0.23307 [0m(-0.04217)
     | > avg_loss_disc_real_2:[91m 0.27555 [0m(+0.03155)
     | > avg_loss_disc_real_3:[91m 0.27751 [0m(+0.04963)
     | > avg_loss_disc_real_4:[91m 0.29040 [0m(+0.02635)
     | > avg_loss_disc_real_5:[91m 0.27190 [0m(+0.01439)
     | > avg_loss_0:[92m 2.34836 [0m(-0.02811)
     | > avg_loss_gen:[92m 2.57453 [0m(-0.00602)
     | > avg_loss_kl:[91m 1.17499 [0m(+0.13279)
     | > avg_loss_feat:[91m 3.46521 [0m(+0.39271)
     | > avg_loss_mel:[91m 20.57700 [0m(+1.01808)
     | > avg_loss_duration:[92m 1.87212 [0m(-0.03869)
     | > avg_loss_1:[91m 29.66385 [0m(+1.49888)


[4m[1m > EPOCH: 94/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 11:44:26) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 5/80 -- GLOBAL_STEP: 7525[0m
     | > loss_disc: 2.35436  (2.47105)
     | > loss_disc_real_0: 0.08289  (0.08618)
     | > loss_disc_real_1: 0.22323  (0.22890)
     | > loss_disc_real_2: 0.28921  (0.23592)
     | > loss_disc_real_3: 0.28151  (0.23312)
     | > loss_disc_real_4: 0.26528  (0.23342)
     | > loss_disc_real_5: 0.24405  (0.25216)
     | > loss_0: 2.35436  (2.47105)
     | > grad_norm_0: 242.11557  (604.47247)
     | > loss_gen: 2.55361  (2.37290)
     | > loss_kl: 1.13183  (1.17471)
     | > loss_feat: 3.45623  (3.27606)
     | > loss_mel: 20.35890  (21.24859)
     | > loss_duration: 1.63012  (1.60166)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 29.13070  (29.67393)
     | > grad_norm_1: 774.82178  (1775.38147)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.57030  (3.84201)
     | > loader_time: 0.00800  (0.00761)


[1m   --> STEP: 30/80 -- GLOBAL_STEP: 7550[0m
     | > loss_disc: 2.68755  (2.50386)




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.01001 [0m(+0.00100)
     | > avg_loss_disc:[91m 2.64986 [0m(+0.30150)
     | > avg_loss_disc_real_0:[91m 0.18984 [0m(+0.17881)
     | > avg_loss_disc_real_1:[91m 0.26574 [0m(+0.03267)
     | > avg_loss_disc_real_2:[92m 0.27247 [0m(-0.00308)
     | > avg_loss_disc_real_3:[92m 0.24098 [0m(-0.03653)
     | > avg_loss_disc_real_4:[92m 0.22550 [0m(-0.06489)
     | > avg_loss_disc_real_5:[92m 0.21668 [0m(-0.05522)
     | > avg_loss_0:[91m 2.64986 [0m(+0.30150)
     | > avg_loss_gen:[92m 1.96423 [0m(-0.61030)
     | > avg_loss_kl:[91m 1.17892 [0m(+0.00393)
     | > avg_loss_feat:[92m 2.45060 [0m(-1.01461)
     | > avg_loss_mel:[91m 21.86970 [0m(+1.29270)
     | > avg_loss_duration:[91m 1.88457 [0m(+0.01244)
     | > avg_loss_1:[92m 29.34802 [0m(-0.31583)


[4m[1m > EPOCH: 95/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 11:50:12) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 0/80 -- GLOBAL_STEP: 7600[0m
     | > loss_disc: 2.64087  (2.64087)
     | > loss_disc_real_0: 0.15316  (0.15316)
     | > loss_disc_real_1: 0.26168  (0.26168)
     | > loss_disc_real_2: 0.27958  (0.27958)
     | > loss_disc_real_3: 0.27578  (0.27578)
     | > loss_disc_real_4: 0.24076  (0.24076)
     | > loss_disc_real_5: 0.24596  (0.24596)
     | > loss_0: 2.64087  (2.64087)
     | > grad_norm_0: 607.81525  (607.81525)
     | > loss_gen: 2.07169  (2.07169)
     | > loss_kl: 1.17381  (1.17381)
     | > loss_feat: 2.36987  (2.36987)
     | > loss_mel: 20.78014  (20.78014)
     | > loss_duration: 1.60436  (1.60436)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 27.99987  (27.99987)
     | > grad_norm_1: 3182.42969  (3182.42969)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.59630  (3.59627)
     | > loader_time: 22.97710  (22.97714)


[1m   --> STEP: 25/80 -- GLOBAL_STEP: 7625[0m
     | > loss_disc: 2.36065  (2.5851



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00100)
     | > avg_loss_disc:[92m 2.34544 [0m(-0.30442)
     | > avg_loss_disc_real_0:[92m 0.02975 [0m(-0.16008)
     | > avg_loss_disc_real_1:[91m 0.27270 [0m(+0.00696)
     | > avg_loss_disc_real_2:[92m 0.23748 [0m(-0.03499)
     | > avg_loss_disc_real_3:[92m 0.23214 [0m(-0.00884)
     | > avg_loss_disc_real_4:[92m 0.21452 [0m(-0.01099)
     | > avg_loss_disc_real_5:[91m 0.26169 [0m(+0.04501)
     | > avg_loss_0:[92m 2.34544 [0m(-0.30442)
     | > avg_loss_gen:[91m 2.45605 [0m(+0.49181)
     | > avg_loss_kl:[92m 0.84580 [0m(-0.33312)
     | > avg_loss_feat:[91m 3.52869 [0m(+1.07809)
     | > avg_loss_mel:[92m 19.40484 [0m(-2.46486)
     | > avg_loss_duration:[91m 1.92026 [0m(+0.03570)
     | > avg_loss_1:[92m 28.15564 [0m(-1.19238)


[4m[1m > EPOCH: 96/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 11:56:13) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 20/80 -- GLOBAL_STEP: 7700[0m
     | > loss_disc: 2.61723  (2.55984)
     | > loss_disc_real_0: 0.12856  (0.10007)
     | > loss_disc_real_1: 0.24600  (0.22726)
     | > loss_disc_real_2: 0.21571  (0.23357)
     | > loss_disc_real_3: 0.21265  (0.23304)
     | > loss_disc_real_4: 0.26419  (0.24133)
     | > loss_disc_real_5: 0.25511  (0.24466)
     | > loss_0: 2.61723  (2.55984)
     | > grad_norm_0: 1175.06262  (316.32758)
     | > loss_gen: 2.20272  (2.20100)
     | > loss_kl: 1.38848  (1.25659)
     | > loss_feat: 2.85632  (2.99414)
     | > loss_mel: 22.53976  (20.97560)
     | > loss_duration: 1.57057  (1.59205)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 30.55785  (29.01938)
     | > grad_norm_1: 3650.05396  (2071.77856)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.62830  (3.59407)
     | > loader_time: 0.01000  (0.00866)


[1m   --> STEP: 45/80 -- GLOBAL_STEP: 7725[0m
     | > loss_disc: 2.35395  (2.5607



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.01001 [0m(+0.00100)
     | > avg_loss_disc:[91m 2.52038 [0m(+0.17494)
     | > avg_loss_disc_real_0:[91m 0.09761 [0m(+0.06785)
     | > avg_loss_disc_real_1:[92m 0.18853 [0m(-0.08417)
     | > avg_loss_disc_real_2:[92m 0.21549 [0m(-0.02199)
     | > avg_loss_disc_real_3:[91m 0.23939 [0m(+0.00725)
     | > avg_loss_disc_real_4:[91m 0.22936 [0m(+0.01484)
     | > avg_loss_disc_real_5:[92m 0.22840 [0m(-0.03329)
     | > avg_loss_0:[91m 2.52038 [0m(+0.17494)
     | > avg_loss_gen:[92m 1.84119 [0m(-0.61485)
     | > avg_loss_kl:[91m 1.26053 [0m(+0.41473)
     | > avg_loss_feat:[92m 2.48023 [0m(-1.04846)
     | > avg_loss_mel:[91m 21.17866 [0m(+1.77382)
     | > avg_loss_duration:[91m 1.93117 [0m(+0.01091)
     | > avg_loss_1:[91m 28.69178 [0m(+0.53614)


[4m[1m > EPOCH: 97/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 12:01:50) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 15/80 -- GLOBAL_STEP: 7775[0m
     | > loss_disc: 2.53680  (2.42335)
     | > loss_disc_real_0: 0.12866  (0.06180)
     | > loss_disc_real_1: 0.24613  (0.22477)
     | > loss_disc_real_2: 0.21372  (0.22846)
     | > loss_disc_real_3: 0.28589  (0.23511)
     | > loss_disc_real_4: 0.22549  (0.24014)
     | > loss_disc_real_5: 0.22865  (0.23943)
     | > loss_0: 2.53680  (2.42335)
     | > grad_norm_0: 519.88562  (226.10831)
     | > loss_gen: 2.07150  (2.37451)
     | > loss_kl: 1.34660  (1.25337)
     | > loss_feat: 2.87596  (3.54646)
     | > loss_mel: 20.56091  (21.12322)
     | > loss_duration: 1.62419  (1.59482)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 28.47915  (29.89238)
     | > grad_norm_1: 2094.86963  (1231.05676)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.63930  (3.58353)
     | > loader_time: 0.00900  (0.00801)


[1m   --> STEP: 40/80 -- GLOBAL_STEP: 7800[0m
     | > loss_disc: 2.11085  (2.42732



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00100)
     | > avg_loss_disc:[92m 2.34957 [0m(-0.17081)
     | > avg_loss_disc_real_0:[92m 0.04566 [0m(-0.05195)
     | > avg_loss_disc_real_1:[91m 0.21479 [0m(+0.02626)
     | > avg_loss_disc_real_2:[92m 0.19653 [0m(-0.01896)
     | > avg_loss_disc_real_3:[92m 0.17056 [0m(-0.06883)
     | > avg_loss_disc_real_4:[92m 0.21468 [0m(-0.01468)
     | > avg_loss_disc_real_5:[91m 0.24020 [0m(+0.01180)
     | > avg_loss_0:[92m 2.34957 [0m(-0.17081)
     | > avg_loss_gen:[91m 1.87141 [0m(+0.03021)
     | > avg_loss_kl:[91m 1.43551 [0m(+0.17498)
     | > avg_loss_feat:[91m 2.49700 [0m(+0.01677)
     | > avg_loss_mel:[92m 19.91970 [0m(-1.25896)
     | > avg_loss_duration:[92m 1.88511 [0m(-0.04606)
     | > avg_loss_1:[92m 27.60872 [0m(-1.08306)


[4m[1m > EPOCH: 98/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 12:07:27) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 10/80 -- GLOBAL_STEP: 7850[0m
     | > loss_disc: 2.47357  (2.58526)
     | > loss_disc_real_0: 0.17821  (0.14746)
     | > loss_disc_real_1: 0.24767  (0.23717)
     | > loss_disc_real_2: 0.22579  (0.22384)
     | > loss_disc_real_3: 0.28233  (0.23514)
     | > loss_disc_real_4: 0.20726  (0.24031)
     | > loss_disc_real_5: 0.27006  (0.24410)
     | > loss_0: 2.47357  (2.58526)
     | > grad_norm_0: 313.67465  (358.47330)
     | > loss_gen: 2.49382  (2.25797)
     | > loss_kl: 1.06737  (1.17052)
     | > loss_feat: 3.18215  (3.01315)
     | > loss_mel: 20.17956  (20.84050)
     | > loss_duration: 1.57042  (1.59901)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 28.49330  (28.88114)
     | > grad_norm_1: 2161.14111  (1901.79980)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.60030  (3.58116)
     | > loader_time: 0.01000  (0.00841)


[1m   --> STEP: 35/80 -- GLOBAL_STEP: 7875[0m
     | > loss_disc: 2.42568  (2.51545



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[91m 3.07128 [0m(+0.72171)
     | > avg_loss_disc_real_0:[91m 0.70573 [0m(+0.66007)
     | > avg_loss_disc_real_1:[92m 0.19696 [0m(-0.01782)
     | > avg_loss_disc_real_2:[91m 0.24942 [0m(+0.05289)
     | > avg_loss_disc_real_3:[91m 0.21095 [0m(+0.04039)
     | > avg_loss_disc_real_4:[92m 0.19184 [0m(-0.02283)
     | > avg_loss_disc_real_5:[91m 0.24867 [0m(+0.00847)
     | > avg_loss_0:[91m 3.07128 [0m(+0.72171)
     | > avg_loss_gen:[92m 1.85431 [0m(-0.01710)
     | > avg_loss_kl:[92m 1.15846 [0m(-0.27705)
     | > avg_loss_feat:[92m 2.46439 [0m(-0.03260)
     | > avg_loss_mel:[91m 19.94390 [0m(+0.02420)
     | > avg_loss_duration:[91m 1.94008 [0m(+0.05497)
     | > avg_loss_1:[92m 27.36113 [0m(-0.24759)


[4m[1m > EPOCH: 99/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 12:13:34) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 5/80 -- GLOBAL_STEP: 7925[0m
     | > loss_disc: 2.45822  (2.64856)
     | > loss_disc_real_0: 0.19701  (0.28061)
     | > loss_disc_real_1: 0.20660  (0.22045)
     | > loss_disc_real_2: 0.24078  (0.24190)
     | > loss_disc_real_3: 0.25914  (0.24226)
     | > loss_disc_real_4: 0.28695  (0.24648)
     | > loss_disc_real_5: 0.25012  (0.24082)
     | > loss_0: 2.45822  (2.64856)
     | > grad_norm_0: 57.19324  (349.22287)
     | > loss_gen: 2.02311  (2.01008)
     | > loss_kl: 1.32977  (1.28628)
     | > loss_feat: 2.78382  (2.55187)
     | > loss_mel: 20.33447  (20.90229)
     | > loss_duration: 1.65998  (1.59982)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 28.13115  (28.35033)
     | > grad_norm_1: 4480.45654  (1735.98755)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 5.08960  (4.06710)
     | > loader_time: 0.01100  (0.00861)


[1m   --> STEP: 30/80 -- GLOBAL_STEP: 7950[0m
     | > loss_disc: 2.27072  (2.52720)




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[92m 2.27747 [0m(-0.79381)
     | > avg_loss_disc_real_0:[92m 0.02988 [0m(-0.67585)
     | > avg_loss_disc_real_1:[92m 0.18762 [0m(-0.00934)
     | > avg_loss_disc_real_2:[92m 0.24195 [0m(-0.00746)
     | > avg_loss_disc_real_3:[92m 0.15600 [0m(-0.05494)
     | > avg_loss_disc_real_4:[91m 0.19817 [0m(+0.00633)
     | > avg_loss_disc_real_5:[92m 0.24656 [0m(-0.00211)
     | > avg_loss_0:[92m 2.27747 [0m(-0.79381)
     | > avg_loss_gen:[91m 2.18117 [0m(+0.32686)
     | > avg_loss_kl:[91m 1.17599 [0m(+0.01753)
     | > avg_loss_feat:[91m 3.46123 [0m(+0.99683)
     | > avg_loss_mel:[91m 21.11269 [0m(+1.16880)
     | > avg_loss_duration:[92m 1.90207 [0m(-0.03800)
     | > avg_loss_1:[91m 29.83315 [0m(+2.47202)


[4m[1m > EPOCH: 100/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 12:19:25) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 0/80 -- GLOBAL_STEP: 8000[0m
     | > loss_disc: 2.40345  (2.40345)
     | > loss_disc_real_0: 0.03308  (0.03308)
     | > loss_disc_real_1: 0.22282  (0.22282)
     | > loss_disc_real_2: 0.24166  (0.24166)
     | > loss_disc_real_3: 0.19277  (0.19277)
     | > loss_disc_real_4: 0.20761  (0.20761)
     | > loss_disc_real_5: 0.25679  (0.25679)
     | > loss_0: 2.40345  (2.40345)
     | > grad_norm_0: 287.77753  (287.77753)
     | > loss_gen: 1.95454  (1.95454)
     | > loss_kl: 1.23080  (1.23080)
     | > loss_feat: 2.73113  (2.73113)
     | > loss_mel: 20.81566  (20.81566)
     | > loss_duration: 1.61154  (1.61154)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 28.34368  (28.34368)
     | > grad_norm_1: 4140.88916  (4140.88916)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.57130  (3.57125)
     | > loader_time: 22.77640  (22.77638)


[1m   --> STEP: 25/80 -- GLOBAL_STEP: 8025[0m
     | > loss_disc: 2.28818  (2.4255



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[91m 2.34124 [0m(+0.06378)
     | > avg_loss_disc_real_0:[91m 0.08547 [0m(+0.05559)
     | > avg_loss_disc_real_1:[91m 0.28024 [0m(+0.09261)
     | > avg_loss_disc_real_2:[92m 0.20903 [0m(-0.03293)
     | > avg_loss_disc_real_3:[91m 0.18743 [0m(+0.03142)
     | > avg_loss_disc_real_4:[91m 0.26882 [0m(+0.07065)
     | > avg_loss_disc_real_5:[91m 0.24960 [0m(+0.00305)
     | > avg_loss_0:[91m 2.34124 [0m(+0.06378)
     | > avg_loss_gen:[91m 2.37497 [0m(+0.19380)
     | > avg_loss_kl:[91m 1.43587 [0m(+0.25988)
     | > avg_loss_feat:[92m 3.20540 [0m(-0.25583)
     | > avg_loss_mel:[92m 20.88300 [0m(-0.22970)
     | > avg_loss_duration:[91m 1.90969 [0m(+0.00762)
     | > avg_loss_1:[92m 29.80893 [0m(-0.02422)


[4m[1m > EPOCH: 101/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 12:25:01) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 20/80 -- GLOBAL_STEP: 8100[0m
     | > loss_disc: 2.99701  (2.63981)
     | > loss_disc_real_0: 0.36277  (0.15070)
     | > loss_disc_real_1: 0.18812  (0.23567)
     | > loss_disc_real_2: 0.20912  (0.22468)
     | > loss_disc_real_3: 0.20968  (0.23832)
     | > loss_disc_real_4: 0.19080  (0.24351)
     | > loss_disc_real_5: 0.25237  (0.24584)
     | > loss_0: 2.99701  (2.63981)
     | > grad_norm_0: 1801.84070  (945.12091)
     | > loss_gen: 2.04102  (2.21573)
     | > loss_kl: 1.42612  (1.18225)
     | > loss_feat: 2.85371  (2.97414)
     | > loss_mel: 20.73573  (20.61594)
     | > loss_duration: 1.57909  (1.57900)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 28.63566  (28.56706)
     | > grad_norm_1: 3990.10229  (2895.39404)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.66730  (3.59953)
     | > loader_time: 0.01000  (0.00846)


[1m   --> STEP: 45/80 -- GLOBAL_STEP: 8125[0m
     | > loss_disc: 2.73142  (2.5898



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00000)
     | > avg_loss_disc:[91m 2.46850 [0m(+0.12725)
     | > avg_loss_disc_real_0:[91m 0.09282 [0m(+0.00735)
     | > avg_loss_disc_real_1:[92m 0.18374 [0m(-0.09650)
     | > avg_loss_disc_real_2:[92m 0.18178 [0m(-0.02724)
     | > avg_loss_disc_real_3:[91m 0.21671 [0m(+0.02928)
     | > avg_loss_disc_real_4:[92m 0.24136 [0m(-0.02746)
     | > avg_loss_disc_real_5:[92m 0.22755 [0m(-0.02206)
     | > avg_loss_0:[91m 2.46850 [0m(+0.12725)
     | > avg_loss_gen:[92m 2.04271 [0m(-0.33226)
     | > avg_loss_kl:[92m 1.19603 [0m(-0.23984)
     | > avg_loss_feat:[92m 3.05443 [0m(-0.15097)
     | > avg_loss_mel:[92m 20.19852 [0m(-0.68447)
     | > avg_loss_duration:[92m 1.88279 [0m(-0.02690)
     | > avg_loss_1:[92m 28.37449 [0m(-1.43444)


[4m[1m > EPOCH: 102/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 12:30:37) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 15/80 -- GLOBAL_STEP: 8175[0m
     | > loss_disc: 2.54555  (2.54805)
     | > loss_disc_real_0: 0.13749  (0.11799)
     | > loss_disc_real_1: 0.21106  (0.22889)
     | > loss_disc_real_2: 0.26868  (0.23046)
     | > loss_disc_real_3: 0.21281  (0.23326)
     | > loss_disc_real_4: 0.25113  (0.24574)
     | > loss_disc_real_5: 0.23502  (0.24639)
     | > loss_0: 2.54555  (2.54805)
     | > grad_norm_0: 931.67499  (816.44946)
     | > loss_gen: 2.14209  (2.29398)
     | > loss_kl: 1.27978  (1.16404)
     | > loss_feat: 3.20623  (3.17001)
     | > loss_mel: 20.31332  (20.40143)
     | > loss_duration: 1.60593  (1.57966)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 28.54734  (28.60912)
     | > grad_norm_1: 3401.67358  (3111.63452)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.60030  (3.57205)
     | > loader_time: 0.00900  (0.00847)


[1m   --> STEP: 40/80 -- GLOBAL_STEP: 8200[0m
     | > loss_disc: 2.56248  (2.51742



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[91m 2.55738 [0m(+0.08889)
     | > avg_loss_disc_real_0:[91m 0.10836 [0m(+0.01554)
     | > avg_loss_disc_real_1:[91m 0.27163 [0m(+0.08789)
     | > avg_loss_disc_real_2:[91m 0.23276 [0m(+0.05098)
     | > avg_loss_disc_real_3:[92m 0.21247 [0m(-0.00424)
     | > avg_loss_disc_real_4:[92m 0.23769 [0m(-0.00367)
     | > avg_loss_disc_real_5:[91m 0.24109 [0m(+0.01355)
     | > avg_loss_0:[91m 2.55738 [0m(+0.08889)
     | > avg_loss_gen:[91m 2.20871 [0m(+0.16600)
     | > avg_loss_kl:[92m 1.06549 [0m(-0.13054)
     | > avg_loss_feat:[92m 2.61916 [0m(-0.43527)
     | > avg_loss_mel:[92m 18.62792 [0m(-1.57060)
     | > avg_loss_duration:[91m 1.92855 [0m(+0.04576)
     | > avg_loss_1:[92m 26.44983 [0m(-1.92466)


[4m[1m > EPOCH: 103/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 12:36:12) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 10/80 -- GLOBAL_STEP: 8250[0m
     | > loss_disc: 2.63750  (2.55803)
     | > loss_disc_real_0: 0.11479  (0.11486)
     | > loss_disc_real_1: 0.22215  (0.22777)
     | > loss_disc_real_2: 0.18982  (0.22655)
     | > loss_disc_real_3: 0.18270  (0.23091)
     | > loss_disc_real_4: 0.22962  (0.24862)
     | > loss_disc_real_5: 0.27392  (0.24841)
     | > loss_0: 2.63750  (2.55803)
     | > grad_norm_0: 970.05212  (726.40857)
     | > loss_gen: 2.48453  (2.28572)
     | > loss_kl: 1.24386  (1.24295)
     | > loss_feat: 2.91724  (3.17505)
     | > loss_mel: 20.41536  (20.68497)
     | > loss_duration: 1.55945  (1.59061)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 28.62044  (28.97929)
     | > grad_norm_1: 2459.63159  (2830.56738)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.60030  (3.56087)
     | > loader_time: 0.00800  (0.00791)


[1m   --> STEP: 35/80 -- GLOBAL_STEP: 8275[0m
     | > loss_disc: 2.50799  (2.53864



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00000)
     | > avg_loss_disc:[92m 2.49973 [0m(-0.05765)
     | > avg_loss_disc_real_0:[91m 0.12388 [0m(+0.01552)
     | > avg_loss_disc_real_1:[92m 0.21098 [0m(-0.06065)
     | > avg_loss_disc_real_2:[92m 0.21138 [0m(-0.02138)
     | > avg_loss_disc_real_3:[91m 0.21906 [0m(+0.00660)
     | > avg_loss_disc_real_4:[91m 0.32172 [0m(+0.08403)
     | > avg_loss_disc_real_5:[91m 0.27122 [0m(+0.03013)
     | > avg_loss_0:[92m 2.49973 [0m(-0.05765)
     | > avg_loss_gen:[91m 2.38617 [0m(+0.17746)
     | > avg_loss_kl:[92m 1.02328 [0m(-0.04221)
     | > avg_loss_feat:[91m 2.66325 [0m(+0.04409)
     | > avg_loss_mel:[91m 20.10608 [0m(+1.47816)
     | > avg_loss_duration:[92m 1.91204 [0m(-0.01651)
     | > avg_loss_1:[91m 28.09082 [0m(+1.64100)


[4m[1m > EPOCH: 104/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 12:41:47) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 5/80 -- GLOBAL_STEP: 8325[0m
     | > loss_disc: 2.42919  (2.62536)
     | > loss_disc_real_0: 0.10008  (0.16036)
     | > loss_disc_real_1: 0.24894  (0.23327)
     | > loss_disc_real_2: 0.19059  (0.23375)
     | > loss_disc_real_3: 0.28459  (0.24116)
     | > loss_disc_real_4: 0.27187  (0.24034)
     | > loss_disc_real_5: 0.25519  (0.23850)
     | > loss_0: 2.42919  (2.62536)
     | > grad_norm_0: 366.24463  (882.13666)
     | > loss_gen: 2.11573  (2.26265)
     | > loss_kl: 1.25575  (1.22446)
     | > loss_feat: 2.93302  (3.04686)
     | > loss_mel: 21.84008  (21.24362)
     | > loss_duration: 1.59439  (1.59307)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 29.73898  (29.37065)
     | > grad_norm_1: 2677.72095  (2663.45776)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.57530  (3.53462)
     | > loader_time: 0.00900  (0.00721)


[1m   --> STEP: 30/80 -- GLOBAL_STEP: 8350[0m
     | > loss_disc: 2.53471  (2.54101)



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.01001 [0m(+0.00100)
     | > avg_loss_disc:[91m 2.52310 [0m(+0.02337)
     | > avg_loss_disc_real_0:[92m 0.08574 [0m(-0.03814)
     | > avg_loss_disc_real_1:[91m 0.23006 [0m(+0.01908)
     | > avg_loss_disc_real_2:[91m 0.27294 [0m(+0.06156)
     | > avg_loss_disc_real_3:[91m 0.27573 [0m(+0.05667)
     | > avg_loss_disc_real_4:[92m 0.26484 [0m(-0.05688)
     | > avg_loss_disc_real_5:[92m 0.26369 [0m(-0.00753)
     | > avg_loss_0:[91m 2.52310 [0m(+0.02337)
     | > avg_loss_gen:[92m 2.31222 [0m(-0.07395)
     | > avg_loss_kl:[91m 1.70747 [0m(+0.68419)
     | > avg_loss_feat:[92m 2.55091 [0m(-0.11235)
     | > avg_loss_mel:[92m 18.08114 [0m(-2.02494)
     | > avg_loss_duration:[92m 1.90815 [0m(-0.00389)
     | > avg_loss_1:[92m 26.55989 [0m(-1.53093)


[4m[1m > EPOCH: 105/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 12:47:22) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 0/80 -- GLOBAL_STEP: 8400[0m
     | > loss_disc: 2.38799  (2.38799)
     | > loss_disc_real_0: 0.07071  (0.07071)
     | > loss_disc_real_1: 0.21322  (0.21322)
     | > loss_disc_real_2: 0.22869  (0.22869)
     | > loss_disc_real_3: 0.24059  (0.24059)
     | > loss_disc_real_4: 0.24163  (0.24163)
     | > loss_disc_real_5: 0.21901  (0.21901)
     | > loss_0: 2.38799  (2.38799)
     | > grad_norm_0: 316.75766  (316.75766)
     | > loss_gen: 2.37695  (2.37695)
     | > loss_kl: 1.23352  (1.23352)
     | > loss_feat: 3.51642  (3.51642)
     | > loss_mel: 21.67625  (21.67625)
     | > loss_duration: 1.62401  (1.62401)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 30.42716  (30.42716)
     | > grad_norm_1: 3047.70142  (3047.70142)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.57830  (3.57826)
     | > loader_time: 23.39080  (23.39082)


[1m   --> STEP: 25/80 -- GLOBAL_STEP: 8425[0m
     | > loss_disc: 2.39039  (2.5531



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00100)
     | > avg_loss_disc:[91m 2.65324 [0m(+0.13014)
     | > avg_loss_disc_real_0:[92m 0.03594 [0m(-0.04980)
     | > avg_loss_disc_real_1:[91m 0.25484 [0m(+0.02479)
     | > avg_loss_disc_real_2:[92m 0.26664 [0m(-0.00630)
     | > avg_loss_disc_real_3:[92m 0.20530 [0m(-0.07043)
     | > avg_loss_disc_real_4:[92m 0.23266 [0m(-0.03218)
     | > avg_loss_disc_real_5:[91m 0.28239 [0m(+0.01871)
     | > avg_loss_0:[91m 2.65324 [0m(+0.13014)
     | > avg_loss_gen:[92m 1.85142 [0m(-0.46080)
     | > avg_loss_kl:[92m 1.18575 [0m(-0.52172)
     | > avg_loss_feat:[92m 2.14386 [0m(-0.40705)
     | > avg_loss_mel:[91m 18.14157 [0m(+0.06042)
     | > avg_loss_duration:[92m 1.90129 [0m(-0.00686)
     | > avg_loss_1:[92m 25.22389 [0m(-1.33600)


[4m[1m > EPOCH: 106/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 12:52:56) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 20/80 -- GLOBAL_STEP: 8500[0m
     | > loss_disc: 2.72661  (2.55327)
     | > loss_disc_real_0: 0.03077  (0.10542)
     | > loss_disc_real_1: 0.27205  (0.24323)
     | > loss_disc_real_2: 0.33617  (0.24537)
     | > loss_disc_real_3: 0.25178  (0.23048)
     | > loss_disc_real_4: 0.26156  (0.24397)
     | > loss_disc_real_5: 0.28194  (0.24361)
     | > loss_0: 2.72661  (2.55327)
     | > grad_norm_0: 1144.14722  (428.83966)
     | > loss_gen: 2.15178  (2.33563)
     | > loss_kl: 1.55079  (1.35099)
     | > loss_feat: 2.96888  (3.34882)
     | > loss_mel: 21.07399  (21.31730)
     | > loss_duration: 1.64655  (1.59612)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 29.39199  (29.94886)
     | > grad_norm_1: 3328.45093  (1410.54602)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.63330  (3.57694)
     | > loader_time: 0.01000  (0.00881)


[1m   --> STEP: 45/80 -- GLOBAL_STEP: 8525[0m
     | > loss_disc: 2.41502  (2.5616



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[92m 2.37286 [0m(-0.28038)
     | > avg_loss_disc_real_0:[91m 0.11283 [0m(+0.07690)
     | > avg_loss_disc_real_1:[92m 0.21914 [0m(-0.03570)
     | > avg_loss_disc_real_2:[92m 0.17085 [0m(-0.09580)
     | > avg_loss_disc_real_3:[92m 0.17199 [0m(-0.03330)
     | > avg_loss_disc_real_4:[92m 0.21343 [0m(-0.01922)
     | > avg_loss_disc_real_5:[92m 0.23031 [0m(-0.05209)
     | > avg_loss_0:[92m 2.37286 [0m(-0.28038)
     | > avg_loss_gen:[91m 2.34243 [0m(+0.49100)
     | > avg_loss_kl:[92m 1.15529 [0m(-0.03046)
     | > avg_loss_feat:[91m 3.13458 [0m(+0.99072)
     | > avg_loss_mel:[91m 21.71752 [0m(+3.57595)
     | > avg_loss_duration:[91m 1.90928 [0m(+0.00800)
     | > avg_loss_1:[91m 30.25910 [0m(+5.03521)


[4m[1m > EPOCH: 107/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 12:58:31) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 15/80 -- GLOBAL_STEP: 8575[0m
     | > loss_disc: 2.81962  (2.48269)
     | > loss_disc_real_0: 0.22001  (0.10174)
     | > loss_disc_real_1: 0.21778  (0.22552)
     | > loss_disc_real_2: 0.22223  (0.22385)
     | > loss_disc_real_3: 0.32547  (0.22951)
     | > loss_disc_real_4: 0.21728  (0.23953)
     | > loss_disc_real_5: 0.24092  (0.24129)
     | > loss_0: 2.81962  (2.48269)
     | > grad_norm_0: 1998.82947  (614.74084)
     | > loss_gen: 1.85864  (2.36807)
     | > loss_kl: 1.21939  (1.24727)
     | > loss_feat: 2.45604  (3.41337)
     | > loss_mel: 21.88885  (20.75509)
     | > loss_duration: 1.60583  (1.58574)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 29.02874  (29.36953)
     | > grad_norm_1: 1761.85022  (1291.08813)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.61330  (3.56096)
     | > loader_time: 0.00800  (0.00867)


[1m   --> STEP: 40/80 -- GLOBAL_STEP: 8600[0m
     | > loss_disc: 2.23352  (2.4913



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.01001 [0m(+0.00100)
     | > avg_loss_disc:[91m 2.63686 [0m(+0.26400)
     | > avg_loss_disc_real_0:[92m 0.09133 [0m(-0.02150)
     | > avg_loss_disc_real_1:[91m 0.22676 [0m(+0.00762)
     | > avg_loss_disc_real_2:[91m 0.23452 [0m(+0.06367)
     | > avg_loss_disc_real_3:[91m 0.22322 [0m(+0.05122)
     | > avg_loss_disc_real_4:[92m 0.20027 [0m(-0.01317)
     | > avg_loss_disc_real_5:[92m 0.20340 [0m(-0.02691)
     | > avg_loss_0:[91m 2.63686 [0m(+0.26400)
     | > avg_loss_gen:[92m 1.64379 [0m(-0.69864)
     | > avg_loss_kl:[91m 1.27013 [0m(+0.11484)
     | > avg_loss_feat:[92m 2.02492 [0m(-1.10966)
     | > avg_loss_mel:[92m 18.53194 [0m(-3.18558)
     | > avg_loss_duration:[92m 1.90535 [0m(-0.00394)
     | > avg_loss_1:[92m 25.37613 [0m(-4.88297)


[4m[1m > EPOCH: 108/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 13:04:06) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 10/80 -- GLOBAL_STEP: 8650[0m
     | > loss_disc: 2.39361  (2.55853)
     | > loss_disc_real_0: 0.02890  (0.13669)
     | > loss_disc_real_1: 0.27440  (0.22165)
     | > loss_disc_real_2: 0.17480  (0.22601)
     | > loss_disc_real_3: 0.24106  (0.22870)
     | > loss_disc_real_4: 0.26063  (0.23532)
     | > loss_disc_real_5: 0.23954  (0.24729)
     | > loss_0: 2.39361  (2.55853)
     | > grad_norm_0: 56.13414  (692.98761)
     | > loss_gen: 2.67245  (2.35741)
     | > loss_kl: 1.34866  (1.18847)
     | > loss_feat: 3.97239  (3.28271)
     | > loss_mel: 20.59863  (20.54074)
     | > loss_duration: 1.59007  (1.58615)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 30.18219  (28.95547)
     | > grad_norm_1: 527.66913  (1710.80298)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.58330  (3.55223)
     | > loader_time: 0.00800  (0.00831)


[1m   --> STEP: 35/80 -- GLOBAL_STEP: 8675[0m
     | > loss_disc: 2.61893  (2.47531)




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.01001 [0m(-0.00000)
     | > avg_loss_disc:[92m 2.50152 [0m(-0.13534)
     | > avg_loss_disc_real_0:[92m 0.07380 [0m(-0.01754)
     | > avg_loss_disc_real_1:[92m 0.18101 [0m(-0.04575)
     | > avg_loss_disc_real_2:[92m 0.21650 [0m(-0.01802)
     | > avg_loss_disc_real_3:[91m 0.24975 [0m(+0.02654)
     | > avg_loss_disc_real_4:[91m 0.23555 [0m(+0.03529)
     | > avg_loss_disc_real_5:[91m 0.23409 [0m(+0.03069)
     | > avg_loss_0:[92m 2.50152 [0m(-0.13534)
     | > avg_loss_gen:[91m 1.94198 [0m(+0.29819)
     | > avg_loss_kl:[91m 1.43400 [0m(+0.16387)
     | > avg_loss_feat:[91m 2.46703 [0m(+0.44211)
     | > avg_loss_mel:[91m 19.66534 [0m(+1.13340)
     | > avg_loss_duration:[91m 1.91378 [0m(+0.00844)
     | > avg_loss_1:[91m 27.42213 [0m(+2.04600)


[4m[1m > EPOCH: 109/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 13:09:40) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 5/80 -- GLOBAL_STEP: 8725[0m
     | > loss_disc: 2.52537  (2.50249)
     | > loss_disc_real_0: 0.03914  (0.03761)
     | > loss_disc_real_1: 0.19844  (0.21299)
     | > loss_disc_real_2: 0.20278  (0.21647)
     | > loss_disc_real_3: 0.18757  (0.22186)
     | > loss_disc_real_4: 0.20851  (0.23950)
     | > loss_disc_real_5: 0.26175  (0.24696)
     | > loss_0: 2.52537  (2.50249)
     | > grad_norm_0: 551.87994  (474.68530)
     | > loss_gen: 2.08448  (2.20810)
     | > loss_kl: 1.43498  (1.21764)
     | > loss_feat: 3.14767  (3.42562)
     | > loss_mel: 21.29346  (21.11156)
     | > loss_duration: 1.57686  (1.58017)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 29.53744  (29.54309)
     | > grad_norm_1: 2155.24731  (2472.37524)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.56220  (3.53262)
     | > loader_time: 0.00800  (0.00780)


[1m   --> STEP: 30/80 -- GLOBAL_STEP: 8750[0m
     | > loss_disc: 2.81376  (2.51258)



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.01001 [0m(+0.00000)
     | > avg_loss_disc:[91m 2.58079 [0m(+0.07927)
     | > avg_loss_disc_real_0:[91m 0.20106 [0m(+0.12727)
     | > avg_loss_disc_real_1:[91m 0.21902 [0m(+0.03801)
     | > avg_loss_disc_real_2:[91m 0.21766 [0m(+0.00116)
     | > avg_loss_disc_real_3:[92m 0.18341 [0m(-0.06635)
     | > avg_loss_disc_real_4:[91m 0.25461 [0m(+0.01905)
     | > avg_loss_disc_real_5:[91m 0.25800 [0m(+0.02391)
     | > avg_loss_0:[91m 2.58079 [0m(+0.07927)
     | > avg_loss_gen:[91m 2.17676 [0m(+0.23478)
     | > avg_loss_kl:[92m 1.32005 [0m(-0.11395)
     | > avg_loss_feat:[91m 2.76535 [0m(+0.29832)
     | > avg_loss_mel:[91m 19.90773 [0m(+0.24240)
     | > avg_loss_duration:[91m 1.91803 [0m(+0.00425)
     | > avg_loss_1:[91m 28.08792 [0m(+0.66579)


[4m[1m > EPOCH: 110/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 13:15:15) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 0/80 -- GLOBAL_STEP: 8800[0m
     | > loss_disc: 2.62173  (2.62173)
     | > loss_disc_real_0: 0.16342  (0.16342)
     | > loss_disc_real_1: 0.23651  (0.23651)
     | > loss_disc_real_2: 0.21715  (0.21715)
     | > loss_disc_real_3: 0.20133  (0.20133)
     | > loss_disc_real_4: 0.25023  (0.25023)
     | > loss_disc_real_5: 0.25743  (0.25743)
     | > loss_0: 2.62173  (2.62173)
     | > grad_norm_0: 1107.25171  (1107.25171)
     | > loss_gen: 2.09047  (2.09047)
     | > loss_kl: 0.86497  (0.86497)
     | > loss_feat: 2.74176  (2.74176)
     | > loss_mel: 20.19175  (20.19175)
     | > loss_duration: 1.62887  (1.62887)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 27.51781  (27.51781)
     | > grad_norm_1: 2806.39062  (2806.39062)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.56120  (3.56124)
     | > loader_time: 23.42430  (23.42434)


[1m   --> STEP: 25/80 -- GLOBAL_STEP: 8825[0m
     | > loss_disc: 2.52827  (2.53



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00100)
     | > avg_loss_disc:[92m 2.36786 [0m(-0.21293)
     | > avg_loss_disc_real_0:[92m 0.08352 [0m(-0.11754)
     | > avg_loss_disc_real_1:[91m 0.21972 [0m(+0.00070)
     | > avg_loss_disc_real_2:[92m 0.21158 [0m(-0.00608)
     | > avg_loss_disc_real_3:[91m 0.20640 [0m(+0.02299)
     | > avg_loss_disc_real_4:[91m 0.26674 [0m(+0.01213)
     | > avg_loss_disc_real_5:[91m 0.29820 [0m(+0.04020)
     | > avg_loss_0:[92m 2.36786 [0m(-0.21293)
     | > avg_loss_gen:[91m 2.41971 [0m(+0.24295)
     | > avg_loss_kl:[92m 1.04905 [0m(-0.27100)
     | > avg_loss_feat:[91m 3.05174 [0m(+0.28639)
     | > avg_loss_mel:[91m 20.33238 [0m(+0.42465)
     | > avg_loss_duration:[92m 1.91639 [0m(-0.00164)
     | > avg_loss_1:[91m 28.76928 [0m(+0.68135)


[4m[1m > EPOCH: 111/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 13:20:50) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 20/80 -- GLOBAL_STEP: 8900[0m
     | > loss_disc: 2.50091  (2.55836)
     | > loss_disc_real_0: 0.06857  (0.12768)
     | > loss_disc_real_1: 0.22444  (0.23035)
     | > loss_disc_real_2: 0.20389  (0.22805)
     | > loss_disc_real_3: 0.24650  (0.22954)
     | > loss_disc_real_4: 0.22577  (0.24294)
     | > loss_disc_real_5: 0.24490  (0.24478)
     | > loss_0: 2.50091  (2.55836)
     | > grad_norm_0: 1025.90051  (936.45605)
     | > loss_gen: 2.17397  (2.28420)
     | > loss_kl: 1.41228  (1.19251)
     | > loss_feat: 3.44196  (3.23166)
     | > loss_mel: 20.46673  (20.54331)
     | > loss_duration: 1.60397  (1.56967)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 29.09892  (28.82134)
     | > grad_norm_1: 3793.00000  (3215.22656)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.61830  (3.57786)
     | > loader_time: 0.01000  (0.00886)


[1m   --> STEP: 45/80 -- GLOBAL_STEP: 8925[0m
     | > loss_disc: 2.34131  (2.5993



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00000)
     | > avg_loss_disc:[91m 2.39635 [0m(+0.02849)
     | > avg_loss_disc_real_0:[91m 0.19281 [0m(+0.10929)
     | > avg_loss_disc_real_1:[91m 0.30572 [0m(+0.08600)
     | > avg_loss_disc_real_2:[91m 0.26905 [0m(+0.05747)
     | > avg_loss_disc_real_3:[91m 0.20698 [0m(+0.00058)
     | > avg_loss_disc_real_4:[92m 0.22948 [0m(-0.03726)
     | > avg_loss_disc_real_5:[92m 0.23456 [0m(-0.06364)
     | > avg_loss_0:[91m 2.39635 [0m(+0.02849)
     | > avg_loss_gen:[91m 2.90548 [0m(+0.48577)
     | > avg_loss_kl:[91m 1.61206 [0m(+0.56301)
     | > avg_loss_feat:[91m 3.31541 [0m(+0.26367)
     | > avg_loss_mel:[91m 21.65552 [0m(+1.32314)
     | > avg_loss_duration:[91m 1.97210 [0m(+0.05570)
     | > avg_loss_1:[91m 31.46058 [0m(+2.69130)


[4m[1m > EPOCH: 112/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 13:26:25) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 15/80 -- GLOBAL_STEP: 8975[0m
     | > loss_disc: 2.65810  (2.54554)
     | > loss_disc_real_0: 0.17887  (0.09851)
     | > loss_disc_real_1: 0.24648  (0.23690)
     | > loss_disc_real_2: 0.28018  (0.22803)
     | > loss_disc_real_3: 0.19089  (0.22730)
     | > loss_disc_real_4: 0.24955  (0.23464)
     | > loss_disc_real_5: 0.23959  (0.24699)
     | > loss_0: 2.65810  (2.54554)
     | > grad_norm_0: 1517.40869  (499.28729)
     | > loss_gen: 2.40175  (2.26303)
     | > loss_kl: 1.22470  (1.34244)
     | > loss_feat: 3.49819  (3.29313)
     | > loss_mel: 21.35911  (21.15827)
     | > loss_duration: 1.59431  (1.58669)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 30.07806  (29.64356)
     | > grad_norm_1: 1977.88843  (2245.34595)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.59130  (3.56395)
     | > loader_time: 0.00900  (0.00847)


[1m   --> STEP: 40/80 -- GLOBAL_STEP: 9000[0m
     | > loss_disc: 2.46301  (2.5466



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.01001 [0m(+0.00100)
     | > avg_loss_disc:[91m 2.66949 [0m(+0.27314)
     | > avg_loss_disc_real_0:[92m 0.13992 [0m(-0.05289)
     | > avg_loss_disc_real_1:[92m 0.25787 [0m(-0.04785)
     | > avg_loss_disc_real_2:[92m 0.22974 [0m(-0.03931)
     | > avg_loss_disc_real_3:[91m 0.31532 [0m(+0.10834)
     | > avg_loss_disc_real_4:[91m 0.26882 [0m(+0.03933)
     | > avg_loss_disc_real_5:[91m 0.24905 [0m(+0.01449)
     | > avg_loss_0:[91m 2.66949 [0m(+0.27314)
     | > avg_loss_gen:[92m 2.07344 [0m(-0.83204)
     | > avg_loss_kl:[92m 1.20753 [0m(-0.40453)
     | > avg_loss_feat:[92m 2.51601 [0m(-0.79940)
     | > avg_loss_mel:[92m 19.33243 [0m(-2.32310)
     | > avg_loss_duration:[92m 1.93024 [0m(-0.04185)
     | > avg_loss_1:[92m 27.05965 [0m(-4.40093)


[4m[1m > EPOCH: 113/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 13:31:59) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 10/80 -- GLOBAL_STEP: 9050[0m
     | > loss_disc: 2.54259  (2.56528)
     | > loss_disc_real_0: 0.14803  (0.12263)
     | > loss_disc_real_1: 0.23631  (0.22582)
     | > loss_disc_real_2: 0.20166  (0.22671)
     | > loss_disc_real_3: 0.25708  (0.23417)
     | > loss_disc_real_4: 0.18731  (0.23725)
     | > loss_disc_real_5: 0.22567  (0.24346)
     | > loss_0: 2.54259  (2.56528)
     | > grad_norm_0: 907.33771  (778.58429)
     | > loss_gen: 2.39138  (2.25906)
     | > loss_kl: 1.29903  (1.17372)
     | > loss_feat: 3.28377  (3.21262)
     | > loss_mel: 20.36995  (20.76321)
     | > loss_duration: 1.55627  (1.57018)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 28.90041  (28.97879)
     | > grad_norm_1: 3117.54468  (2976.74585)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.59030  (3.55423)
     | > loader_time: 0.00800  (0.00821)


[1m   --> STEP: 35/80 -- GLOBAL_STEP: 9075[0m
     | > loss_disc: 2.57314  (2.56456



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00100)
     | > avg_loss_disc:[92m 2.48024 [0m(-0.18925)
     | > avg_loss_disc_real_0:[92m 0.04882 [0m(-0.09110)
     | > avg_loss_disc_real_1:[92m 0.20015 [0m(-0.05771)
     | > avg_loss_disc_real_2:[92m 0.21529 [0m(-0.01446)
     | > avg_loss_disc_real_3:[92m 0.25735 [0m(-0.05796)
     | > avg_loss_disc_real_4:[92m 0.22673 [0m(-0.04209)
     | > avg_loss_disc_real_5:[92m 0.23613 [0m(-0.01292)
     | > avg_loss_0:[92m 2.48024 [0m(-0.18925)
     | > avg_loss_gen:[92m 2.06322 [0m(-0.01022)
     | > avg_loss_kl:[91m 1.39504 [0m(+0.18751)
     | > avg_loss_feat:[91m 2.92653 [0m(+0.41052)
     | > avg_loss_mel:[91m 19.65207 [0m(+0.31964)
     | > avg_loss_duration:[92m 1.90416 [0m(-0.02609)
     | > avg_loss_1:[91m 27.94102 [0m(+0.88136)


[4m[1m > EPOCH: 114/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 13:37:34) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 5/80 -- GLOBAL_STEP: 9125[0m
     | > loss_disc: 2.59930  (2.53770)
     | > loss_disc_real_0: 0.06669  (0.11334)
     | > loss_disc_real_1: 0.23839  (0.22210)
     | > loss_disc_real_2: 0.35563  (0.26048)
     | > loss_disc_real_3: 0.27650  (0.22699)
     | > loss_disc_real_4: 0.28430  (0.25537)
     | > loss_disc_real_5: 0.26926  (0.24521)
     | > loss_0: 2.59930  (2.53770)
     | > grad_norm_0: 711.55396  (569.64471)
     | > loss_gen: 2.39486  (2.36087)
     | > loss_kl: 1.46481  (1.19560)
     | > loss_feat: 3.21140  (3.31361)
     | > loss_mel: 20.68659  (20.42537)
     | > loss_duration: 1.50818  (1.56496)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 29.26585  (28.86040)
     | > grad_norm_1: 3326.70654  (3009.09448)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.55020  (3.52721)
     | > loader_time: 0.01100  (0.00841)


[1m   --> STEP: 30/80 -- GLOBAL_STEP: 9150[0m
     | > loss_disc: 2.60954  (2.57524)



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time: 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[91m 2.50203 [0m(+0.02179)
     | > avg_loss_disc_real_0:[91m 0.14998 [0m(+0.10116)
     | > avg_loss_disc_real_1:[91m 0.23197 [0m(+0.03182)
     | > avg_loss_disc_real_2:[91m 0.28563 [0m(+0.07034)
     | > avg_loss_disc_real_3:[92m 0.20316 [0m(-0.05419)
     | > avg_loss_disc_real_4:[91m 0.28029 [0m(+0.05356)
     | > avg_loss_disc_real_5:[91m 0.26452 [0m(+0.02838)
     | > avg_loss_0:[91m 2.50203 [0m(+0.02179)
     | > avg_loss_gen:[91m 2.37718 [0m(+0.31396)
     | > avg_loss_kl:[92m 1.37544 [0m(-0.01960)
     | > avg_loss_feat:[92m 2.75487 [0m(-0.17165)
     | > avg_loss_mel:[92m 19.50880 [0m(-0.14327)
     | > avg_loss_duration:[91m 1.94225 [0m(+0.03810)
     | > avg_loss_1:[91m 27.95855 [0m(+0.01753)


[4m[1m > EPOCH: 115/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 13:43:09) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 0/80 -- GLOBAL_STEP: 9200[0m
     | > loss_disc: 2.47620  (2.47620)
     | > loss_disc_real_0: 0.17847  (0.17847)
     | > loss_disc_real_1: 0.20711  (0.20711)
     | > loss_disc_real_2: 0.25187  (0.25187)
     | > loss_disc_real_3: 0.17995  (0.17995)
     | > loss_disc_real_4: 0.26387  (0.26387)
     | > loss_disc_real_5: 0.24846  (0.24846)
     | > loss_0: 2.47620  (2.47620)
     | > grad_norm_0: 1067.71277  (1067.71277)
     | > loss_gen: 2.46035  (2.46035)
     | > loss_kl: 1.30146  (1.30146)
     | > loss_feat: 3.38997  (3.38997)
     | > loss_mel: 21.00595  (21.00595)
     | > loss_duration: 1.59586  (1.59586)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 29.75359  (29.75359)
     | > grad_norm_1: 2768.21313  (2768.21313)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.58430  (3.58426)
     | > loader_time: 23.34460  (23.34460)


[1m   --> STEP: 25/80 -- GLOBAL_STEP: 9225[0m
     | > loss_disc: 2.52945  (2.54



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00000)
     | > avg_loss_disc:[92m 2.39251 [0m(-0.10952)
     | > avg_loss_disc_real_0:[92m 0.04911 [0m(-0.10086)
     | > avg_loss_disc_real_1:[92m 0.20913 [0m(-0.02284)
     | > avg_loss_disc_real_2:[92m 0.22500 [0m(-0.06063)
     | > avg_loss_disc_real_3:[92m 0.16526 [0m(-0.03790)
     | > avg_loss_disc_real_4:[92m 0.20439 [0m(-0.07589)
     | > avg_loss_disc_real_5:[92m 0.21122 [0m(-0.05329)
     | > avg_loss_0:[92m 2.39251 [0m(-0.10952)
     | > avg_loss_gen:[92m 2.10082 [0m(-0.27636)
     | > avg_loss_kl:[91m 1.38811 [0m(+0.01267)
     | > avg_loss_feat:[91m 3.38478 [0m(+0.62990)
     | > avg_loss_mel:[91m 21.01553 [0m(+1.50673)
     | > avg_loss_duration:[92m 1.89332 [0m(-0.04893)
     | > avg_loss_1:[91m 29.78255 [0m(+1.82400)


[4m[1m > EPOCH: 116/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 13:48:44) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 20/80 -- GLOBAL_STEP: 9300[0m
     | > loss_disc: 2.63466  (2.52977)
     | > loss_disc_real_0: 0.34557  (0.12582)
     | > loss_disc_real_1: 0.19622  (0.22236)
     | > loss_disc_real_2: 0.24762  (0.22387)
     | > loss_disc_real_3: 0.22503  (0.22818)
     | > loss_disc_real_4: 0.25723  (0.24298)
     | > loss_disc_real_5: 0.24215  (0.24214)
     | > loss_0: 2.63466  (2.52977)
     | > grad_norm_0: 2007.63184  (891.63165)
     | > loss_gen: 2.31061  (2.34173)
     | > loss_kl: 1.25610  (1.21448)
     | > loss_feat: 3.00569  (3.36131)
     | > loss_mel: 20.28700  (20.69016)
     | > loss_duration: 1.60560  (1.56922)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 28.46499  (29.17691)
     | > grad_norm_1: 2907.24097  (2989.58643)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.62730  (3.58026)
     | > loader_time: 0.00900  (0.00916)


[1m   --> STEP: 45/80 -- GLOBAL_STEP: 9325[0m
     | > loss_disc: 2.42659  (2.5457



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[91m 2.57262 [0m(+0.18011)
     | > avg_loss_disc_real_0:[91m 0.21155 [0m(+0.16243)
     | > avg_loss_disc_real_1:[92m 0.20044 [0m(-0.00869)
     | > avg_loss_disc_real_2:[91m 0.27335 [0m(+0.04835)
     | > avg_loss_disc_real_3:[91m 0.24891 [0m(+0.08365)
     | > avg_loss_disc_real_4:[91m 0.27302 [0m(+0.06863)
     | > avg_loss_disc_real_5:[91m 0.27079 [0m(+0.05957)
     | > avg_loss_0:[91m 2.57262 [0m(+0.18011)
     | > avg_loss_gen:[91m 2.29497 [0m(+0.19415)
     | > avg_loss_kl:[92m 1.17924 [0m(-0.20887)
     | > avg_loss_feat:[92m 2.24809 [0m(-1.13669)
     | > avg_loss_mel:[92m 18.56736 [0m(-2.44817)
     | > avg_loss_duration:[91m 1.92358 [0m(+0.03026)
     | > avg_loss_1:[92m 26.21324 [0m(-3.56931)


[4m[1m > EPOCH: 117/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 13:54:19) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 15/80 -- GLOBAL_STEP: 9375[0m
     | > loss_disc: 2.46876  (2.55221)
     | > loss_disc_real_0: 0.11997  (0.12276)
     | > loss_disc_real_1: 0.23555  (0.22544)
     | > loss_disc_real_2: 0.24420  (0.22899)
     | > loss_disc_real_3: 0.21913  (0.22755)
     | > loss_disc_real_4: 0.18962  (0.23988)
     | > loss_disc_real_5: 0.22813  (0.24536)
     | > loss_0: 2.46876  (2.55221)
     | > grad_norm_0: 712.21747  (756.43457)
     | > loss_gen: 2.50091  (2.28135)
     | > loss_kl: 1.41029  (1.22547)
     | > loss_feat: 3.48751  (3.22260)
     | > loss_mel: 21.38159  (20.35838)
     | > loss_duration: 1.59306  (1.56268)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 30.37336  (28.65048)
     | > grad_norm_1: 2791.52222  (2767.29736)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.60630  (3.56853)
     | > loader_time: 0.01000  (0.00867)


[1m   --> STEP: 40/80 -- GLOBAL_STEP: 9400[0m
     | > loss_disc: 2.33333  (2.55392



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00000)
     | > avg_loss_disc:[92m 2.45923 [0m(-0.11339)
     | > avg_loss_disc_real_0:[92m 0.20469 [0m(-0.00686)
     | > avg_loss_disc_real_1:[92m 0.19001 [0m(-0.01043)
     | > avg_loss_disc_real_2:[92m 0.19617 [0m(-0.07718)
     | > avg_loss_disc_real_3:[92m 0.19840 [0m(-0.05051)
     | > avg_loss_disc_real_4:[92m 0.23532 [0m(-0.03770)
     | > avg_loss_disc_real_5:[92m 0.26448 [0m(-0.00631)
     | > avg_loss_0:[92m 2.45923 [0m(-0.11339)
     | > avg_loss_gen:[91m 2.29789 [0m(+0.00292)
     | > avg_loss_kl:[91m 1.18728 [0m(+0.00804)
     | > avg_loss_feat:[91m 2.75894 [0m(+0.51086)
     | > avg_loss_mel:[91m 20.24798 [0m(+1.68062)
     | > avg_loss_duration:[91m 1.94923 [0m(+0.02564)
     | > avg_loss_1:[91m 28.44132 [0m(+2.22808)


[4m[1m > EPOCH: 118/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 13:59:54) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 10/80 -- GLOBAL_STEP: 9450[0m
     | > loss_disc: 2.40292  (2.55994)
     | > loss_disc_real_0: 0.03586  (0.11197)
     | > loss_disc_real_1: 0.27193  (0.25718)
     | > loss_disc_real_2: 0.18174  (0.23668)
     | > loss_disc_real_3: 0.21209  (0.23652)
     | > loss_disc_real_4: 0.28095  (0.25042)
     | > loss_disc_real_5: 0.26499  (0.23878)
     | > loss_0: 2.40292  (2.55994)
     | > grad_norm_0: 158.44566  (598.75574)
     | > loss_gen: 2.60342  (2.41049)
     | > loss_kl: 1.31818  (1.34105)
     | > loss_feat: 4.17123  (3.42501)
     | > loss_mel: 21.86100  (21.38943)
     | > loss_duration: 1.58993  (1.58305)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 31.54376  (30.14902)
     | > grad_norm_1: 1008.16248  (2180.45825)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.57730  (3.55498)
     | > loader_time: 0.00900  (0.00821)


[1m   --> STEP: 35/80 -- GLOBAL_STEP: 9475[0m
     | > loss_disc: 2.49146  (2.52826



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[91m 2.46205 [0m(+0.00282)
     | > avg_loss_disc_real_0:[92m 0.02951 [0m(-0.17517)
     | > avg_loss_disc_real_1:[91m 0.24381 [0m(+0.05381)
     | > avg_loss_disc_real_2:[91m 0.21902 [0m(+0.02285)
     | > avg_loss_disc_real_3:[91m 0.27383 [0m(+0.07543)
     | > avg_loss_disc_real_4:[91m 0.26080 [0m(+0.02548)
     | > avg_loss_disc_real_5:[92m 0.23356 [0m(-0.03092)
     | > avg_loss_0:[91m 2.46205 [0m(+0.00282)
     | > avg_loss_gen:[92m 2.10824 [0m(-0.18965)
     | > avg_loss_kl:[91m 1.21913 [0m(+0.03185)
     | > avg_loss_feat:[91m 3.14553 [0m(+0.38658)
     | > avg_loss_mel:[91m 21.23957 [0m(+0.99160)
     | > avg_loss_duration:[92m 1.92536 [0m(-0.02387)
     | > avg_loss_1:[91m 29.63783 [0m(+1.19651)


[4m[1m > EPOCH: 119/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 14:05:28) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 5/80 -- GLOBAL_STEP: 9525[0m
     | > loss_disc: 2.31517  (2.41762)
     | > loss_disc_real_0: 0.15611  (0.08426)
     | > loss_disc_real_1: 0.20251  (0.20695)
     | > loss_disc_real_2: 0.14650  (0.21080)
     | > loss_disc_real_3: 0.20537  (0.23081)
     | > loss_disc_real_4: 0.23539  (0.23056)
     | > loss_disc_real_5: 0.25091  (0.24373)
     | > loss_0: 2.31517  (2.41762)
     | > grad_norm_0: 916.84491  (711.48053)
     | > loss_gen: 2.68530  (2.47586)
     | > loss_kl: 1.06206  (1.12603)
     | > loss_feat: 4.26971  (3.74068)
     | > loss_mel: 21.73356  (21.30941)
     | > loss_duration: 1.66790  (1.59837)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 31.41853  (30.25034)
     | > grad_norm_1: 1470.69360  (2341.39502)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.60130  (3.54350)
     | > loader_time: 0.00700  (0.00800)


[1m   --> STEP: 30/80 -- GLOBAL_STEP: 9550[0m
     | > loss_disc: 2.51390  (2.46070)



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[92m 2.41988 [0m(-0.04217)
     | > avg_loss_disc_real_0:[91m 0.06848 [0m(+0.03897)
     | > avg_loss_disc_real_1:[91m 0.27369 [0m(+0.02988)
     | > avg_loss_disc_real_2:[91m 0.26776 [0m(+0.04874)
     | > avg_loss_disc_real_3:[92m 0.23177 [0m(-0.04206)
     | > avg_loss_disc_real_4:[92m 0.25200 [0m(-0.00880)
     | > avg_loss_disc_real_5:[91m 0.25208 [0m(+0.01852)
     | > avg_loss_0:[92m 2.41988 [0m(-0.04217)
     | > avg_loss_gen:[91m 2.57356 [0m(+0.46531)
     | > avg_loss_kl:[92m 0.93064 [0m(-0.28849)
     | > avg_loss_feat:[91m 3.36412 [0m(+0.21859)
     | > avg_loss_mel:[92m 19.63199 [0m(-1.60759)
     | > avg_loss_duration:[91m 1.94317 [0m(+0.01781)
     | > avg_loss_1:[92m 28.44347 [0m(-1.19436)


[4m[1m > EPOCH: 120/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 14:11:04) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 0/80 -- GLOBAL_STEP: 9600[0m
     | > loss_disc: 2.42359  (2.42359)
     | > loss_disc_real_0: 0.07972  (0.07972)
     | > loss_disc_real_1: 0.28531  (0.28531)
     | > loss_disc_real_2: 0.24848  (0.24848)
     | > loss_disc_real_3: 0.25644  (0.25644)
     | > loss_disc_real_4: 0.24353  (0.24353)
     | > loss_disc_real_5: 0.23966  (0.23966)
     | > loss_0: 2.42359  (2.42359)
     | > grad_norm_0: 244.33688  (244.33688)
     | > loss_gen: 2.51204  (2.51204)
     | > loss_kl: 0.83734  (0.83734)
     | > loss_feat: 3.90058  (3.90058)
     | > loss_mel: 21.46552  (21.46552)
     | > loss_duration: 1.54644  (1.54644)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 30.26192  (30.26192)
     | > grad_norm_1: 1192.92920  (1192.92920)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.57830  (3.57826)
     | > loader_time: 23.33300  (23.33295)


[1m   --> STEP: 25/80 -- GLOBAL_STEP: 9625[0m
     | > loss_disc: 2.68189  (2.4818



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00000)
     | > avg_loss_disc:[92m 2.39568 [0m(-0.02421)
     | > avg_loss_disc_real_0:[91m 0.14357 [0m(+0.07509)
     | > avg_loss_disc_real_1:[92m 0.21568 [0m(-0.05801)
     | > avg_loss_disc_real_2:[92m 0.16383 [0m(-0.10393)
     | > avg_loss_disc_real_3:[91m 0.24055 [0m(+0.00878)
     | > avg_loss_disc_real_4:[92m 0.23284 [0m(-0.01915)
     | > avg_loss_disc_real_5:[92m 0.24012 [0m(-0.01196)
     | > avg_loss_0:[92m 2.39568 [0m(-0.02421)
     | > avg_loss_gen:[92m 2.45637 [0m(-0.11719)
     | > avg_loss_kl:[91m 1.38338 [0m(+0.45274)
     | > avg_loss_feat:[92m 3.33328 [0m(-0.03084)
     | > avg_loss_mel:[91m 21.14324 [0m(+1.51125)
     | > avg_loss_duration:[92m 1.92976 [0m(-0.01341)
     | > avg_loss_1:[91m 30.24602 [0m(+1.80255)


[4m[1m > EPOCH: 121/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 14:16:39) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 20/80 -- GLOBAL_STEP: 9700[0m
     | > loss_disc: 2.40159  (2.52702)
     | > loss_disc_real_0: 0.11488  (0.12127)
     | > loss_disc_real_1: 0.24972  (0.22237)
     | > loss_disc_real_2: 0.22016  (0.22417)
     | > loss_disc_real_3: 0.21996  (0.23060)
     | > loss_disc_real_4: 0.21264  (0.24617)
     | > loss_disc_real_5: 0.22538  (0.24477)
     | > loss_0: 2.40159  (2.52702)
     | > grad_norm_0: 828.55768  (720.43481)
     | > loss_gen: 2.47705  (2.36054)
     | > loss_kl: 1.39567  (1.29555)
     | > loss_feat: 3.95117  (3.46604)
     | > loss_mel: 21.12127  (20.55014)
     | > loss_duration: 1.57244  (1.56861)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 30.51760  (29.24087)
     | > grad_norm_1: 2256.27344  (2056.52759)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.62530  (3.58646)
     | > loader_time: 0.01000  (0.00851)


[1m   --> STEP: 45/80 -- GLOBAL_STEP: 9725[0m
     | > loss_disc: 2.93059  (2.50871



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.01001 [0m(+0.00100)
     | > avg_loss_disc:[91m 2.49075 [0m(+0.09508)
     | > avg_loss_disc_real_0:[92m 0.10545 [0m(-0.03812)
     | > avg_loss_disc_real_1:[91m 0.24754 [0m(+0.03186)
     | > avg_loss_disc_real_2:[91m 0.28623 [0m(+0.12240)
     | > avg_loss_disc_real_3:[91m 0.26273 [0m(+0.02218)
     | > avg_loss_disc_real_4:[91m 0.30026 [0m(+0.06741)
     | > avg_loss_disc_real_5:[91m 0.27991 [0m(+0.03979)
     | > avg_loss_0:[91m 2.49075 [0m(+0.09508)
     | > avg_loss_gen:[92m 2.29138 [0m(-0.16499)
     | > avg_loss_kl:[92m 1.24704 [0m(-0.13634)
     | > avg_loss_feat:[92m 2.64787 [0m(-0.68540)
     | > avg_loss_mel:[92m 19.82984 [0m(-1.31340)
     | > avg_loss_duration:[91m 1.94318 [0m(+0.01342)
     | > avg_loss_1:[92m 27.95931 [0m(-2.28671)


[4m[1m > EPOCH: 122/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 14:22:14) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 15/80 -- GLOBAL_STEP: 9775[0m
     | > loss_disc: 2.91259  (2.70029)
     | > loss_disc_real_0: 0.66370  (0.23430)
     | > loss_disc_real_1: 0.21981  (0.22497)
     | > loss_disc_real_2: 0.17091  (0.22241)
     | > loss_disc_real_3: 0.25828  (0.23687)
     | > loss_disc_real_4: 0.23887  (0.24264)
     | > loss_disc_real_5: 0.22782  (0.24825)
     | > loss_0: 2.91259  (2.70029)
     | > grad_norm_0: 1214.38904  (897.78052)
     | > loss_gen: 2.55701  (2.26556)
     | > loss_kl: 1.37213  (1.15765)
     | > loss_feat: 3.25991  (2.99218)
     | > loss_mel: 21.63958  (20.36992)
     | > loss_duration: 1.59848  (1.56022)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 30.42711  (28.34553)
     | > grad_norm_1: 1806.97180  (2553.64771)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.62430  (3.57439)
     | > loader_time: 0.00900  (0.00854)


[1m   --> STEP: 40/80 -- GLOBAL_STEP: 9800[0m
     | > loss_disc: 2.29436  (2.6062



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00100)
     | > avg_loss_disc:[92m 2.34876 [0m(-0.14200)
     | > avg_loss_disc_real_0:[92m 0.08155 [0m(-0.02390)
     | > avg_loss_disc_real_1:[91m 0.28788 [0m(+0.04034)
     | > avg_loss_disc_real_2:[92m 0.24127 [0m(-0.04496)
     | > avg_loss_disc_real_3:[92m 0.25759 [0m(-0.00514)
     | > avg_loss_disc_real_4:[92m 0.26012 [0m(-0.04014)
     | > avg_loss_disc_real_5:[92m 0.24123 [0m(-0.03868)
     | > avg_loss_0:[92m 2.34876 [0m(-0.14200)
     | > avg_loss_gen:[91m 2.88586 [0m(+0.59449)
     | > avg_loss_kl:[91m 1.47832 [0m(+0.23128)
     | > avg_loss_feat:[91m 3.76541 [0m(+1.11753)
     | > avg_loss_mel:[91m 22.38333 [0m(+2.55349)
     | > avg_loss_duration:[92m 1.93872 [0m(-0.00446)
     | > avg_loss_1:[91m 32.45164 [0m(+4.49233)


[4m[1m > EPOCH: 123/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 14:27:49) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 10/80 -- GLOBAL_STEP: 9850[0m
     | > loss_disc: 2.89921  (2.55186)
     | > loss_disc_real_0: 0.50209  (0.13023)
     | > loss_disc_real_1: 0.22659  (0.23515)
     | > loss_disc_real_2: 0.23037  (0.23029)
     | > loss_disc_real_3: 0.24485  (0.23448)
     | > loss_disc_real_4: 0.25248  (0.24174)
     | > loss_disc_real_5: 0.26969  (0.25019)
     | > loss_0: 2.89921  (2.55186)
     | > grad_norm_0: 1542.97729  (738.06219)
     | > loss_gen: 2.49622  (2.33509)
     | > loss_kl: 1.15526  (1.23323)
     | > loss_feat: 3.34814  (3.42690)
     | > loss_mel: 20.03047  (20.44324)
     | > loss_duration: 1.50882  (1.57159)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 28.53892  (29.01005)
     | > grad_norm_1: 973.37634  (1648.10376)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.57030  (3.55334)
     | > loader_time: 0.00800  (0.00810)


[1m   --> STEP: 35/80 -- GLOBAL_STEP: 9875[0m
     | > loss_disc: 2.43990  (2.53728



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00000)
     | > avg_loss_disc:[91m 2.58463 [0m(+0.23587)
     | > avg_loss_disc_real_0:[91m 0.18569 [0m(+0.10414)
     | > avg_loss_disc_real_1:[92m 0.25025 [0m(-0.03763)
     | > avg_loss_disc_real_2:[92m 0.22637 [0m(-0.01490)
     | > avg_loss_disc_real_3:[92m 0.18947 [0m(-0.06811)
     | > avg_loss_disc_real_4:[92m 0.24087 [0m(-0.01924)
     | > avg_loss_disc_real_5:[91m 0.25897 [0m(+0.01774)
     | > avg_loss_0:[91m 2.58463 [0m(+0.23587)
     | > avg_loss_gen:[92m 1.95539 [0m(-0.93048)
     | > avg_loss_kl:[92m 1.26065 [0m(-0.21767)
     | > avg_loss_feat:[92m 2.30204 [0m(-1.46337)
     | > avg_loss_mel:[92m 20.88665 [0m(-1.49668)
     | > avg_loss_duration:[92m 1.90110 [0m(-0.03762)
     | > avg_loss_1:[92m 28.30582 [0m(-4.14582)


[4m[1m > EPOCH: 124/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 14:33:23) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 5/80 -- GLOBAL_STEP: 9925[0m
     | > loss_disc: 2.73487  (2.43540)
     | > loss_disc_real_0: 0.04721  (0.07891)
     | > loss_disc_real_1: 0.26944  (0.21246)
     | > loss_disc_real_2: 0.31008  (0.22655)
     | > loss_disc_real_3: 0.30955  (0.23260)
     | > loss_disc_real_4: 0.33674  (0.24081)
     | > loss_disc_real_5: 0.26187  (0.23691)
     | > loss_0: 2.73487  (2.43540)
     | > grad_norm_0: 934.72681  (420.41290)
     | > loss_gen: 2.31472  (2.29579)
     | > loss_kl: 1.33015  (1.17819)
     | > loss_feat: 3.00404  (3.44653)
     | > loss_mel: 19.24158  (20.46658)
     | > loss_duration: 1.62051  (1.56624)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 27.51100  (28.95333)
     | > grad_norm_1: 4629.26270  (3129.01465)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.57130  (3.53582)
     | > loader_time: 0.01000  (0.00801)


[1m   --> STEP: 30/80 -- GLOBAL_STEP: 9950[0m
     | > loss_disc: 2.74627  (2.56720)



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00801 [0m(-0.00100)
     | > avg_loss_disc:[92m 2.21471 [0m(-0.36991)
     | > avg_loss_disc_real_0:[92m 0.01643 [0m(-0.16926)
     | > avg_loss_disc_real_1:[92m 0.23806 [0m(-0.01219)
     | > avg_loss_disc_real_2:[91m 0.25889 [0m(+0.03252)
     | > avg_loss_disc_real_3:[91m 0.23282 [0m(+0.04334)
     | > avg_loss_disc_real_4:[91m 0.24137 [0m(+0.00049)
     | > avg_loss_disc_real_5:[92m 0.24782 [0m(-0.01115)
     | > avg_loss_0:[92m 2.21471 [0m(-0.36991)
     | > avg_loss_gen:[91m 2.65471 [0m(+0.69932)
     | > avg_loss_kl:[91m 1.26536 [0m(+0.00471)
     | > avg_loss_feat:[91m 4.22610 [0m(+1.92407)
     | > avg_loss_mel:[91m 20.94807 [0m(+0.06142)
     | > avg_loss_duration:[91m 1.95867 [0m(+0.05757)
     | > avg_loss_1:[91m 31.05292 [0m(+2.74709)


[4m[1m > EPOCH: 125/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 14:38:58) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 0/80 -- GLOBAL_STEP: 10000[0m
     | > loss_disc: 2.22489  (2.22489)
     | > loss_disc_real_0: 0.01939  (0.01939)
     | > loss_disc_real_1: 0.23010  (0.23010)
     | > loss_disc_real_2: 0.24295  (0.24295)
     | > loss_disc_real_3: 0.24307  (0.24307)
     | > loss_disc_real_4: 0.24379  (0.24379)
     | > loss_disc_real_5: 0.27403  (0.27403)
     | > loss_0: 2.22489  (2.22489)
     | > grad_norm_0: 22.73176  (22.73176)
     | > loss_gen: 2.43913  (2.43913)
     | > loss_kl: 1.29736  (1.29736)
     | > loss_feat: 4.42230  (4.42230)
     | > loss_mel: 21.15056  (21.15056)
     | > loss_duration: 1.60740  (1.60740)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 30.91675  (30.91675)
     | > grad_norm_1: 332.03638  (332.03638)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.54520  (3.54523)
     | > loader_time: 23.43080  (23.43081)


 > CHECKPOINT : ./output\vits_vctk-September-23-2022_02+46AM-3c624ce\checkpoint_10000.p



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00100)
     | > avg_loss_disc:[91m 2.66019 [0m(+0.44548)
     | > avg_loss_disc_real_0:[91m 0.07989 [0m(+0.06346)
     | > avg_loss_disc_real_1:[92m 0.22753 [0m(-0.01053)
     | > avg_loss_disc_real_2:[92m 0.22525 [0m(-0.03364)
     | > avg_loss_disc_real_3:[91m 0.26756 [0m(+0.03474)
     | > avg_loss_disc_real_4:[91m 0.26291 [0m(+0.02154)
     | > avg_loss_disc_real_5:[91m 0.26283 [0m(+0.01501)
     | > avg_loss_0:[91m 2.66019 [0m(+0.44548)
     | > avg_loss_gen:[92m 1.83380 [0m(-0.82091)
     | > avg_loss_kl:[91m 1.27111 [0m(+0.00575)
     | > avg_loss_feat:[92m 2.25071 [0m(-1.97539)
     | > avg_loss_mel:[92m 19.62565 [0m(-1.32242)
     | > avg_loss_duration:[92m 1.89935 [0m(-0.05932)
     | > avg_loss_1:[92m 26.88062 [0m(-4.17229)


[4m[1m > EPOCH: 126/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 14:44:38) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 20/80 -- GLOBAL_STEP: 10100[0m
     | > loss_disc: 2.80539  (2.60030)
     | > loss_disc_real_0: 0.18755  (0.15005)
     | > loss_disc_real_1: 0.23580  (0.23046)
     | > loss_disc_real_2: 0.21328  (0.22255)
     | > loss_disc_real_3: 0.23114  (0.23009)
     | > loss_disc_real_4: 0.27993  (0.24638)
     | > loss_disc_real_5: 0.27432  (0.24553)
     | > loss_0: 2.80539  (2.60030)
     | > grad_norm_0: 1082.54480  (843.04944)
     | > loss_gen: 2.22980  (2.25659)
     | > loss_kl: 1.30132  (1.20586)
     | > loss_feat: 2.73377  (3.14453)
     | > loss_mel: 19.28878  (20.03682)
     | > loss_duration: 1.57739  (1.55514)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 27.13106  (28.19895)
     | > grad_norm_1: 4272.15820  (3099.92114)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.62030  (3.58601)
     | > loader_time: 0.01100  (0.00901)


[1m   --> STEP: 45/80 -- GLOBAL_STEP: 10125[0m
     | > loss_disc: 2.69034  (2.61



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[91m 2.82987 [0m(+0.16968)
     | > avg_loss_disc_real_0:[91m 0.28614 [0m(+0.20625)
     | > avg_loss_disc_real_1:[92m 0.17595 [0m(-0.05158)
     | > avg_loss_disc_real_2:[92m 0.21975 [0m(-0.00550)
     | > avg_loss_disc_real_3:[91m 0.29798 [0m(+0.03042)
     | > avg_loss_disc_real_4:[92m 0.24981 [0m(-0.01310)
     | > avg_loss_disc_real_5:[91m 0.26699 [0m(+0.00417)
     | > avg_loss_0:[91m 2.82987 [0m(+0.16968)
     | > avg_loss_gen:[91m 1.88217 [0m(+0.04836)
     | > avg_loss_kl:[92m 1.23371 [0m(-0.03740)
     | > avg_loss_feat:[92m 1.80795 [0m(-0.44276)
     | > avg_loss_mel:[92m 18.08117 [0m(-1.54449)
     | > avg_loss_duration:[91m 1.90296 [0m(+0.00361)
     | > avg_loss_1:[92m 24.90795 [0m(-1.97267)


[4m[1m > EPOCH: 127/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 14:50:14) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 15/80 -- GLOBAL_STEP: 10175[0m
     | > loss_disc: 2.63979  (2.58702)
     | > loss_disc_real_0: 0.25586  (0.13262)
     | > loss_disc_real_1: 0.20252  (0.22773)
     | > loss_disc_real_2: 0.25377  (0.22549)
     | > loss_disc_real_3: 0.23441  (0.23451)
     | > loss_disc_real_4: 0.22141  (0.24197)
     | > loss_disc_real_5: 0.23192  (0.24686)
     | > loss_0: 2.63979  (2.58702)
     | > grad_norm_0: 1597.18164  (970.98364)
     | > loss_gen: 2.11897  (2.20228)
     | > loss_kl: 1.38075  (1.26185)
     | > loss_feat: 2.67795  (3.04734)
     | > loss_mel: 19.38690  (19.81862)
     | > loss_duration: 1.50885  (1.54829)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 27.07343  (27.87837)
     | > grad_norm_1: 2815.45068  (3137.65820)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.64630  (3.57506)
     | > loader_time: 0.01000  (0.00834)


[1m   --> STEP: 40/80 -- GLOBAL_STEP: 10200[0m
     | > loss_disc: 2.70134  (2.58



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00000)
     | > avg_loss_disc:[92m 2.41742 [0m(-0.41245)
     | > avg_loss_disc_real_0:[92m 0.17663 [0m(-0.10951)
     | > avg_loss_disc_real_1:[91m 0.23524 [0m(+0.05929)
     | > avg_loss_disc_real_2:[92m 0.14440 [0m(-0.07534)
     | > avg_loss_disc_real_3:[92m 0.23501 [0m(-0.06297)
     | > avg_loss_disc_real_4:[92m 0.22328 [0m(-0.02652)
     | > avg_loss_disc_real_5:[92m 0.26094 [0m(-0.00606)
     | > avg_loss_0:[92m 2.41742 [0m(-0.41245)
     | > avg_loss_gen:[91m 2.21172 [0m(+0.32955)
     | > avg_loss_kl:[91m 1.29392 [0m(+0.06022)
     | > avg_loss_feat:[91m 2.86512 [0m(+1.05717)
     | > avg_loss_mel:[91m 20.66462 [0m(+2.58345)
     | > avg_loss_duration:[91m 1.92295 [0m(+0.01999)
     | > avg_loss_1:[91m 28.95833 [0m(+4.05038)


[4m[1m > EPOCH: 128/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 14:55:49) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 10/80 -- GLOBAL_STEP: 10250[0m
     | > loss_disc: 2.48382  (2.57190)
     | > loss_disc_real_0: 0.08086  (0.12672)
     | > loss_disc_real_1: 0.29277  (0.23124)
     | > loss_disc_real_2: 0.24142  (0.22598)
     | > loss_disc_real_3: 0.20757  (0.22695)
     | > loss_disc_real_4: 0.25971  (0.25229)
     | > loss_disc_real_5: 0.25385  (0.24897)
     | > loss_0: 2.48382  (2.57190)
     | > grad_norm_0: 229.75679  (749.65546)
     | > loss_gen: 2.32943  (2.17617)
     | > loss_kl: 1.15012  (1.23716)
     | > loss_feat: 2.94188  (3.09545)
     | > loss_mel: 19.06388  (19.88814)
     | > loss_duration: 1.54490  (1.55797)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 27.03021  (27.95489)
     | > grad_norm_1: 2379.08203  (2810.18945)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.59530  (3.55664)
     | > loader_time: 0.00900  (0.00841)


[1m   --> STEP: 35/80 -- GLOBAL_STEP: 10275[0m
     | > loss_disc: 2.81368  (2.654



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[91m 2.67577 [0m(+0.25834)
     | > avg_loss_disc_real_0:[92m 0.13772 [0m(-0.03892)
     | > avg_loss_disc_real_1:[92m 0.21586 [0m(-0.01938)
     | > avg_loss_disc_real_2:[91m 0.19581 [0m(+0.05141)
     | > avg_loss_disc_real_3:[92m 0.15817 [0m(-0.07684)
     | > avg_loss_disc_real_4:[92m 0.20552 [0m(-0.01776)
     | > avg_loss_disc_real_5:[92m 0.25573 [0m(-0.00521)
     | > avg_loss_0:[91m 2.67577 [0m(+0.25834)
     | > avg_loss_gen:[92m 1.62722 [0m(-0.58450)
     | > avg_loss_kl:[92m 1.12470 [0m(-0.16922)
     | > avg_loss_feat:[92m 2.45417 [0m(-0.41095)
     | > avg_loss_mel:[91m 20.99953 [0m(+0.33491)
     | > avg_loss_duration:[91m 1.93678 [0m(+0.01383)
     | > avg_loss_1:[92m 28.14240 [0m(-0.81593)


[4m[1m > EPOCH: 129/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 15:01:24) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 5/80 -- GLOBAL_STEP: 10325[0m
     | > loss_disc: 2.44708  (2.54449)
     | > loss_disc_real_0: 0.17268  (0.13484)
     | > loss_disc_real_1: 0.24599  (0.22868)
     | > loss_disc_real_2: 0.27229  (0.22198)
     | > loss_disc_real_3: 0.25548  (0.22913)
     | > loss_disc_real_4: 0.22777  (0.23444)
     | > loss_disc_real_5: 0.24022  (0.23684)
     | > loss_0: 2.44708  (2.54449)
     | > grad_norm_0: 203.14040  (542.43726)
     | > loss_gen: 2.77334  (2.38019)
     | > loss_kl: 1.21545  (1.15301)
     | > loss_feat: 3.94930  (3.27145)
     | > loss_mel: 21.11983  (21.10444)
     | > loss_duration: 1.56174  (1.57303)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 30.61966  (29.48212)
     | > grad_norm_1: 1165.31641  (2162.41772)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.59230  (3.54923)
     | > loader_time: 0.00800  (0.00780)


[1m   --> STEP: 30/80 -- GLOBAL_STEP: 10350[0m
     | > loss_disc: 2.41158  (2.4994



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[92m 2.52638 [0m(-0.14939)
     | > avg_loss_disc_real_0:[91m 0.24813 [0m(+0.11041)
     | > avg_loss_disc_real_1:[92m 0.20858 [0m(-0.00729)
     | > avg_loss_disc_real_2:[91m 0.28121 [0m(+0.08540)
     | > avg_loss_disc_real_3:[91m 0.26114 [0m(+0.10296)
     | > avg_loss_disc_real_4:[91m 0.24826 [0m(+0.04274)
     | > avg_loss_disc_real_5:[92m 0.25380 [0m(-0.00193)
     | > avg_loss_0:[92m 2.52638 [0m(-0.14939)
     | > avg_loss_gen:[91m 2.60388 [0m(+0.97667)
     | > avg_loss_kl:[92m 1.07275 [0m(-0.05195)
     | > avg_loss_feat:[91m 2.65511 [0m(+0.20094)
     | > avg_loss_mel:[92m 19.57683 [0m(-1.42270)
     | > avg_loss_duration:[92m 1.90963 [0m(-0.02715)
     | > avg_loss_1:[92m 27.81820 [0m(-0.32420)


[4m[1m > EPOCH: 130/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 15:06:59) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 0/80 -- GLOBAL_STEP: 10400[0m
     | > loss_disc: 2.43819  (2.43819)
     | > loss_disc_real_0: 0.20569  (0.20569)
     | > loss_disc_real_1: 0.19002  (0.19002)
     | > loss_disc_real_2: 0.24100  (0.24100)
     | > loss_disc_real_3: 0.23820  (0.23820)
     | > loss_disc_real_4: 0.22456  (0.22456)
     | > loss_disc_real_5: 0.21050  (0.21050)
     | > loss_0: 2.43819  (2.43819)
     | > grad_norm_0: 184.17020  (184.17020)
     | > loss_gen: 2.30066  (2.30066)
     | > loss_kl: 1.03302  (1.03302)
     | > loss_feat: 3.41323  (3.41323)
     | > loss_mel: 20.19871  (20.19871)
     | > loss_duration: 1.59993  (1.59993)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 28.54555  (28.54555)
     | > grad_norm_1: 2334.72266  (2334.72266)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.54620  (3.54623)
     | > loader_time: 23.08510  (23.08507)


[1m   --> STEP: 25/80 -- GLOBAL_STEP: 10425[0m
     | > loss_disc: 2.42124  (2.53



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00801 [0m(-0.00100)
     | > avg_loss_disc:[92m 2.40977 [0m(-0.11661)
     | > avg_loss_disc_real_0:[92m 0.11245 [0m(-0.13567)
     | > avg_loss_disc_real_1:[91m 0.22993 [0m(+0.02136)
     | > avg_loss_disc_real_2:[92m 0.26720 [0m(-0.01400)
     | > avg_loss_disc_real_3:[92m 0.25662 [0m(-0.00451)
     | > avg_loss_disc_real_4:[92m 0.23728 [0m(-0.01098)
     | > avg_loss_disc_real_5:[92m 0.23477 [0m(-0.01903)
     | > avg_loss_0:[92m 2.40977 [0m(-0.11661)
     | > avg_loss_gen:[92m 2.44035 [0m(-0.16354)
     | > avg_loss_kl:[91m 1.41515 [0m(+0.34239)
     | > avg_loss_feat:[91m 2.96381 [0m(+0.30870)
     | > avg_loss_mel:[92m 18.88664 [0m(-0.69019)
     | > avg_loss_duration:[92m 1.90693 [0m(-0.00270)
     | > avg_loss_1:[92m 27.61287 [0m(-0.20533)


[4m[1m > EPOCH: 131/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 15:12:34) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 20/80 -- GLOBAL_STEP: 10500[0m
     | > loss_disc: 2.43155  (2.61321)
     | > loss_disc_real_0: 0.05479  (0.14165)
     | > loss_disc_real_1: 0.21117  (0.23056)
     | > loss_disc_real_2: 0.21341  (0.22784)
     | > loss_disc_real_3: 0.19262  (0.22844)
     | > loss_disc_real_4: 0.25298  (0.23950)
     | > loss_disc_real_5: 0.23306  (0.24296)
     | > loss_0: 2.43155  (2.61321)
     | > grad_norm_0: 219.75688  (658.12274)
     | > loss_gen: 2.33977  (2.22904)
     | > loss_kl: 1.28266  (1.35187)
     | > loss_feat: 4.12734  (3.32112)
     | > loss_mel: 22.19843  (20.38293)
     | > loss_duration: 1.58136  (1.55479)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 31.52956  (28.83975)
     | > grad_norm_1: 331.41656  (1115.56348)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.60730  (3.57945)
     | > loader_time: 0.01100  (0.00910)


[1m   --> STEP: 45/80 -- GLOBAL_STEP: 10525[0m
     | > loss_disc: 2.72741  (2.5699



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00100)
     | > avg_loss_disc:[91m 2.86862 [0m(+0.45885)
     | > avg_loss_disc_real_0:[92m 0.01732 [0m(-0.09513)
     | > avg_loss_disc_real_1:[91m 0.31358 [0m(+0.08365)
     | > avg_loss_disc_real_2:[92m 0.24086 [0m(-0.02634)
     | > avg_loss_disc_real_3:[92m 0.19579 [0m(-0.06084)
     | > avg_loss_disc_real_4:[91m 0.25239 [0m(+0.01511)
     | > avg_loss_disc_real_5:[91m 0.26966 [0m(+0.03489)
     | > avg_loss_0:[91m 2.86862 [0m(+0.45885)
     | > avg_loss_gen:[92m 1.72309 [0m(-0.71726)
     | > avg_loss_kl:[92m 1.03103 [0m(-0.38412)
     | > avg_loss_feat:[92m 2.24251 [0m(-0.72130)
     | > avg_loss_mel:[91m 20.90259 [0m(+2.01595)
     | > avg_loss_duration:[91m 1.93769 [0m(+0.03076)
     | > avg_loss_1:[91m 27.83689 [0m(+0.22402)


[4m[1m > EPOCH: 132/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 15:18:10) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 15/80 -- GLOBAL_STEP: 10575[0m
     | > loss_disc: 2.73006  (2.60999)
     | > loss_disc_real_0: 0.22096  (0.13051)
     | > loss_disc_real_1: 0.21168  (0.22220)
     | > loss_disc_real_2: 0.31393  (0.22703)
     | > loss_disc_real_3: 0.25484  (0.23020)
     | > loss_disc_real_4: 0.21797  (0.23574)
     | > loss_disc_real_5: 0.22266  (0.24009)
     | > loss_0: 2.73006  (2.60999)
     | > grad_norm_0: 333.34866  (202.01791)
     | > loss_gen: 2.17887  (2.04028)
     | > loss_kl: 1.49387  (1.28598)
     | > loss_feat: 2.71468  (2.77333)
     | > loss_mel: 20.06975  (20.00489)
     | > loss_duration: 1.56103  (1.54835)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 28.01820  (27.65282)
     | > grad_norm_1: 3305.18115  (1721.18689)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.61530  (3.57345)
     | > loader_time: 0.00900  (0.00868)


[1m   --> STEP: 40/80 -- GLOBAL_STEP: 10600[0m
     | > loss_disc: 2.46150  (2.592



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[92m 2.75492 [0m(-0.11369)
     | > avg_loss_disc_real_0:[91m 0.04081 [0m(+0.02349)
     | > avg_loss_disc_real_1:[92m 0.22504 [0m(-0.08854)
     | > avg_loss_disc_real_2:[92m 0.20522 [0m(-0.03564)
     | > avg_loss_disc_real_3:[91m 0.22187 [0m(+0.02608)
     | > avg_loss_disc_real_4:[92m 0.20774 [0m(-0.04465)
     | > avg_loss_disc_real_5:[92m 0.24816 [0m(-0.02150)
     | > avg_loss_0:[92m 2.75492 [0m(-0.11369)
     | > avg_loss_gen:[92m 1.61429 [0m(-0.10879)
     | > avg_loss_kl:[91m 1.30292 [0m(+0.27190)
     | > avg_loss_feat:[91m 2.30942 [0m(+0.06691)
     | > avg_loss_mel:[92m 19.52099 [0m(-1.38160)
     | > avg_loss_duration:[91m 1.94104 [0m(+0.00335)
     | > avg_loss_1:[92m 26.68867 [0m(-1.14823)


[4m[1m > EPOCH: 133/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 15:23:45) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 10/80 -- GLOBAL_STEP: 10650[0m
     | > loss_disc: 2.61688  (2.50693)
     | > loss_disc_real_0: 0.07978  (0.08856)
     | > loss_disc_real_1: 0.23835  (0.22349)
     | > loss_disc_real_2: 0.26504  (0.23358)
     | > loss_disc_real_3: 0.25020  (0.22972)
     | > loss_disc_real_4: 0.22899  (0.25051)
     | > loss_disc_real_5: 0.23430  (0.24402)
     | > loss_0: 2.61688  (2.50693)
     | > grad_norm_0: 1084.44189  (510.14883)
     | > loss_gen: 2.32940  (2.40793)
     | > loss_kl: 1.13789  (1.18600)
     | > loss_feat: 3.37078  (3.53616)
     | > loss_mel: 19.66758  (20.28052)
     | > loss_duration: 1.57039  (1.56387)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 28.07605  (28.97448)
     | > grad_norm_1: 2588.90698  (1604.46472)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.59330  (3.56232)
     | > loader_time: 0.00900  (0.00821)


[1m   --> STEP: 35/80 -- GLOBAL_STEP: 10675[0m
     | > loss_disc: 2.72579  (2.54



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00000)
     | > avg_loss_disc:[92m 2.28353 [0m(-0.47139)
     | > avg_loss_disc_real_0:[91m 0.06274 [0m(+0.02193)
     | > avg_loss_disc_real_1:[92m 0.18702 [0m(-0.03802)
     | > avg_loss_disc_real_2:[91m 0.20901 [0m(+0.00379)
     | > avg_loss_disc_real_3:[91m 0.29138 [0m(+0.06951)
     | > avg_loss_disc_real_4:[91m 0.22065 [0m(+0.01292)
     | > avg_loss_disc_real_5:[91m 0.24943 [0m(+0.00127)
     | > avg_loss_0:[92m 2.28353 [0m(-0.47139)
     | > avg_loss_gen:[91m 2.64793 [0m(+1.03364)
     | > avg_loss_kl:[91m 1.55301 [0m(+0.25009)
     | > avg_loss_feat:[91m 3.71684 [0m(+1.40742)
     | > avg_loss_mel:[91m 20.93158 [0m(+1.41059)
     | > avg_loss_duration:[92m 1.92974 [0m(-0.01130)
     | > avg_loss_1:[91m 30.77910 [0m(+4.09043)


[4m[1m > EPOCH: 134/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 15:29:20) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 5/80 -- GLOBAL_STEP: 10725[0m
     | > loss_disc: 3.06264  (2.72299)
     | > loss_disc_real_0: 0.42796  (0.11725)
     | > loss_disc_real_1: 0.25020  (0.23632)
     | > loss_disc_real_2: 0.28606  (0.23923)
     | > loss_disc_real_3: 0.28400  (0.23768)
     | > loss_disc_real_4: 0.27214  (0.24641)
     | > loss_disc_real_5: 0.24700  (0.23241)
     | > loss_0: 3.06264  (2.72299)
     | > grad_norm_0: 1434.55798  (813.57190)
     | > loss_gen: 1.96642  (2.24640)
     | > loss_kl: 1.23821  (1.10500)
     | > loss_feat: 2.59281  (3.11281)
     | > loss_mel: 19.30255  (20.29918)
     | > loss_duration: 1.58577  (1.57442)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 26.68576  (28.33780)
     | > grad_norm_1: 523.71851  (1113.03259)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.56120  (3.53001)
     | > loader_time: 0.00800  (0.00801)


[1m   --> STEP: 30/80 -- GLOBAL_STEP: 10750[0m
     | > loss_disc: 2.40380  (2.6405



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[91m 2.57437 [0m(+0.29084)
     | > avg_loss_disc_real_0:[92m 0.05280 [0m(-0.00994)
     | > avg_loss_disc_real_1:[91m 0.23637 [0m(+0.04935)
     | > avg_loss_disc_real_2:[92m 0.20476 [0m(-0.00424)
     | > avg_loss_disc_real_3:[92m 0.26359 [0m(-0.02778)
     | > avg_loss_disc_real_4:[91m 0.26497 [0m(+0.04432)
     | > avg_loss_disc_real_5:[91m 0.29356 [0m(+0.04413)
     | > avg_loss_0:[91m 2.57437 [0m(+0.29084)
     | > avg_loss_gen:[92m 2.05197 [0m(-0.59596)
     | > avg_loss_kl:[92m 1.50591 [0m(-0.04710)
     | > avg_loss_feat:[92m 2.40021 [0m(-1.31663)
     | > avg_loss_mel:[92m 17.98184 [0m(-2.94974)
     | > avg_loss_duration:[91m 1.93577 [0m(+0.00602)
     | > avg_loss_1:[92m 25.87569 [0m(-4.90341)


[4m[1m > EPOCH: 135/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 15:34:56) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 0/80 -- GLOBAL_STEP: 10800[0m
     | > loss_disc: 2.40799  (2.40799)
     | > loss_disc_real_0: 0.05765  (0.05765)
     | > loss_disc_real_1: 0.20973  (0.20973)
     | > loss_disc_real_2: 0.15761  (0.15761)
     | > loss_disc_real_3: 0.22772  (0.22772)
     | > loss_disc_real_4: 0.23354  (0.23354)
     | > loss_disc_real_5: 0.26912  (0.26912)
     | > loss_0: 2.40799  (2.40799)
     | > grad_norm_0: 545.63263  (545.63263)
     | > loss_gen: 2.44977  (2.44977)
     | > loss_kl: 1.09524  (1.09524)
     | > loss_feat: 3.84894  (3.84894)
     | > loss_mel: 21.08972  (21.08972)
     | > loss_duration: 1.57707  (1.57707)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 30.06073  (30.06073)
     | > grad_norm_1: 1333.38330  (1333.38330)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.59430  (3.59427)
     | > loader_time: 23.44310  (23.44308)


[1m   --> STEP: 25/80 -- GLOBAL_STEP: 10825[0m
     | > loss_disc: 2.60944  (2.54



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00000)
     | > avg_loss_disc:[91m 2.66984 [0m(+0.09547)
     | > avg_loss_disc_real_0:[91m 0.47297 [0m(+0.42017)
     | > avg_loss_disc_real_1:[92m 0.23630 [0m(-0.00007)
     | > avg_loss_disc_real_2:[92m 0.19716 [0m(-0.00760)
     | > avg_loss_disc_real_3:[92m 0.18726 [0m(-0.07633)
     | > avg_loss_disc_real_4:[91m 0.26589 [0m(+0.00092)
     | > avg_loss_disc_real_5:[92m 0.26327 [0m(-0.03029)
     | > avg_loss_0:[91m 2.66984 [0m(+0.09547)
     | > avg_loss_gen:[91m 2.54223 [0m(+0.49026)
     | > avg_loss_kl:[92m 1.43557 [0m(-0.07033)
     | > avg_loss_feat:[91m 2.89276 [0m(+0.49255)
     | > avg_loss_mel:[91m 21.20351 [0m(+3.22167)
     | > avg_loss_duration:[92m 1.91764 [0m(-0.01812)
     | > avg_loss_1:[91m 29.99171 [0m(+4.11602)


[4m[1m > EPOCH: 136/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 15:40:31) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 20/80 -- GLOBAL_STEP: 10900[0m
     | > loss_disc: 2.70008  (2.60176)
     | > loss_disc_real_0: 0.10395  (0.11929)
     | > loss_disc_real_1: 0.27789  (0.23808)
     | > loss_disc_real_2: 0.26727  (0.23773)
     | > loss_disc_real_3: 0.24843  (0.23480)
     | > loss_disc_real_4: 0.26607  (0.23805)
     | > loss_disc_real_5: 0.23987  (0.24360)
     | > loss_0: 2.70008  (2.60176)
     | > grad_norm_0: 288.27060  (289.91919)
     | > loss_gen: 2.00892  (2.19126)
     | > loss_kl: 1.34851  (1.33290)
     | > loss_feat: 2.45926  (3.18523)
     | > loss_mel: 21.14204  (20.55380)
     | > loss_duration: 1.57359  (1.55448)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 28.53233  (28.81768)
     | > grad_norm_1: 1120.95520  (1283.31799)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.64130  (3.58152)
     | > loader_time: 0.01000  (0.00866)


[1m   --> STEP: 45/80 -- GLOBAL_STEP: 10925[0m
     | > loss_disc: 2.55965  (2.597



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[92m 2.49909 [0m(-0.17075)
     | > avg_loss_disc_real_0:[92m 0.30364 [0m(-0.16932)
     | > avg_loss_disc_real_1:[92m 0.19157 [0m(-0.04473)
     | > avg_loss_disc_real_2:[91m 0.22590 [0m(+0.02873)
     | > avg_loss_disc_real_3:[92m 0.18151 [0m(-0.00575)
     | > avg_loss_disc_real_4:[92m 0.21490 [0m(-0.05099)
     | > avg_loss_disc_real_5:[92m 0.23451 [0m(-0.02877)
     | > avg_loss_0:[92m 2.49909 [0m(-0.17075)
     | > avg_loss_gen:[92m 2.40168 [0m(-0.14054)
     | > avg_loss_kl:[92m 1.32160 [0m(-0.11398)
     | > avg_loss_feat:[91m 3.04225 [0m(+0.14949)
     | > avg_loss_mel:[92m 20.69075 [0m(-0.51276)
     | > avg_loss_duration:[91m 1.95433 [0m(+0.03668)
     | > avg_loss_1:[92m 29.41061 [0m(-0.58111)


[4m[1m > EPOCH: 137/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 15:46:06) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 15/80 -- GLOBAL_STEP: 10975[0m
     | > loss_disc: 2.61684  (2.50799)
     | > loss_disc_real_0: 0.07311  (0.05683)
     | > loss_disc_real_1: 0.20778  (0.22694)
     | > loss_disc_real_2: 0.21302  (0.21881)
     | > loss_disc_real_3: 0.19444  (0.23362)
     | > loss_disc_real_4: 0.23536  (0.24091)
     | > loss_disc_real_5: 0.22864  (0.24669)
     | > loss_0: 2.61684  (2.50799)
     | > grad_norm_0: 306.79562  (251.08568)
     | > loss_gen: 1.95369  (2.18272)
     | > loss_kl: 1.18248  (1.25856)
     | > loss_feat: 2.82788  (3.31977)
     | > loss_mel: 20.41282  (20.11076)
     | > loss_duration: 1.56629  (1.54988)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 27.94316  (28.42169)
     | > grad_norm_1: 1555.79688  (1416.67773)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.62430  (3.57346)
     | > loader_time: 0.01000  (0.00854)


[1m   --> STEP: 40/80 -- GLOBAL_STEP: 11000[0m
     | > loss_disc: 2.85783  (2.535



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00000)
     | > avg_loss_disc:[91m 2.69686 [0m(+0.19777)
     | > avg_loss_disc_real_0:[92m 0.05929 [0m(-0.24436)
     | > avg_loss_disc_real_1:[91m 0.23021 [0m(+0.03864)
     | > avg_loss_disc_real_2:[92m 0.22220 [0m(-0.00370)
     | > avg_loss_disc_real_3:[91m 0.21469 [0m(+0.03318)
     | > avg_loss_disc_real_4:[91m 0.21574 [0m(+0.00084)
     | > avg_loss_disc_real_5:[92m 0.22908 [0m(-0.00543)
     | > avg_loss_0:[91m 2.69686 [0m(+0.19777)
     | > avg_loss_gen:[92m 1.62967 [0m(-0.77201)
     | > avg_loss_kl:[92m 0.96001 [0m(-0.36159)
     | > avg_loss_feat:[92m 2.56592 [0m(-0.47633)
     | > avg_loss_mel:[92m 20.40186 [0m(-0.28889)
     | > avg_loss_duration:[92m 1.92531 [0m(-0.02902)
     | > avg_loss_1:[92m 27.48277 [0m(-1.92784)


[4m[1m > EPOCH: 138/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 15:51:42) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 10/80 -- GLOBAL_STEP: 11050[0m
     | > loss_disc: 2.38816  (2.47877)
     | > loss_disc_real_0: 0.09278  (0.09707)
     | > loss_disc_real_1: 0.21975  (0.22148)
     | > loss_disc_real_2: 0.24798  (0.22317)
     | > loss_disc_real_3: 0.25815  (0.23186)
     | > loss_disc_real_4: 0.23945  (0.24249)
     | > loss_disc_real_5: 0.24553  (0.24699)
     | > loss_0: 2.38816  (2.47877)
     | > grad_norm_0: 453.23877  (469.53305)
     | > loss_gen: 2.32240  (2.25126)
     | > loss_kl: 1.08859  (1.17601)
     | > loss_feat: 3.59856  (3.41829)
     | > loss_mel: 20.56258  (20.11592)
     | > loss_duration: 1.52352  (1.53736)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 29.09565  (28.49884)
     | > grad_norm_1: 2373.88916  (2692.64624)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.60430  (3.56495)
     | > loader_time: 0.00900  (0.00831)


[1m   --> STEP: 35/80 -- GLOBAL_STEP: 11075[0m
     | > loss_disc: 2.40381  (2.543



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00800 [0m(-0.00100)
     | > avg_loss_disc:[92m 2.59902 [0m(-0.09784)
     | > avg_loss_disc_real_0:[91m 0.17732 [0m(+0.11804)
     | > avg_loss_disc_real_1:[91m 0.23919 [0m(+0.00899)
     | > avg_loss_disc_real_2:[92m 0.19725 [0m(-0.02495)
     | > avg_loss_disc_real_3:[91m 0.27056 [0m(+0.05587)
     | > avg_loss_disc_real_4:[91m 0.24604 [0m(+0.03029)
     | > avg_loss_disc_real_5:[91m 0.25911 [0m(+0.03003)
     | > avg_loss_0:[92m 2.59902 [0m(-0.09784)
     | > avg_loss_gen:[91m 2.09722 [0m(+0.46755)
     | > avg_loss_kl:[91m 1.39877 [0m(+0.43876)
     | > avg_loss_feat:[91m 2.74673 [0m(+0.18081)
     | > avg_loss_mel:[92m 20.05467 [0m(-0.34719)
     | > avg_loss_duration:[91m 1.94836 [0m(+0.02305)
     | > avg_loss_1:[91m 28.24575 [0m(+0.76298)


[4m[1m > EPOCH: 139/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 15:57:18) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 5/80 -- GLOBAL_STEP: 11125[0m
     | > loss_disc: 2.52401  (2.59607)
     | > loss_disc_real_0: 0.16435  (0.14104)
     | > loss_disc_real_1: 0.22148  (0.21974)
     | > loss_disc_real_2: 0.22017  (0.22808)
     | > loss_disc_real_3: 0.22414  (0.21593)
     | > loss_disc_real_4: 0.23704  (0.23006)
     | > loss_disc_real_5: 0.23718  (0.23792)
     | > loss_0: 2.52401  (2.59607)
     | > grad_norm_0: 749.42413  (944.98236)
     | > loss_gen: 2.26401  (2.26614)
     | > loss_kl: 1.42739  (1.23340)
     | > loss_feat: 3.00009  (3.21063)
     | > loss_mel: 20.96899  (20.43289)
     | > loss_duration: 1.59123  (1.54803)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 29.25171  (28.69109)
     | > grad_norm_1: 3511.90259  (2383.20630)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.56630  (3.54186)
     | > loader_time: 0.01000  (0.00801)


[1m   --> STEP: 30/80 -- GLOBAL_STEP: 11150[0m
     | > loss_disc: 2.61467  (2.5103



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00100)
     | > avg_loss_disc:[92m 2.31344 [0m(-0.28558)
     | > avg_loss_disc_real_0:[92m 0.04719 [0m(-0.13013)
     | > avg_loss_disc_real_1:[91m 0.24661 [0m(+0.00742)
     | > avg_loss_disc_real_2:[91m 0.28080 [0m(+0.08356)
     | > avg_loss_disc_real_3:[91m 0.27098 [0m(+0.00042)
     | > avg_loss_disc_real_4:[92m 0.22060 [0m(-0.02544)
     | > avg_loss_disc_real_5:[91m 0.28821 [0m(+0.02910)
     | > avg_loss_0:[92m 2.31344 [0m(-0.28558)
     | > avg_loss_gen:[91m 2.55287 [0m(+0.45565)
     | > avg_loss_kl:[92m 1.26519 [0m(-0.13358)
     | > avg_loss_feat:[91m 3.12827 [0m(+0.38154)
     | > avg_loss_mel:[92m 19.07445 [0m(-0.98022)
     | > avg_loss_duration:[92m 1.90935 [0m(-0.03900)
     | > avg_loss_1:[92m 27.93013 [0m(-0.31563)


[4m[1m > EPOCH: 140/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 16:02:54) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 0/80 -- GLOBAL_STEP: 11200[0m
     | > loss_disc: 2.38964  (2.38964)
     | > loss_disc_real_0: 0.05762  (0.05762)
     | > loss_disc_real_1: 0.23455  (0.23455)
     | > loss_disc_real_2: 0.26102  (0.26102)
     | > loss_disc_real_3: 0.25204  (0.25204)
     | > loss_disc_real_4: 0.21584  (0.21584)
     | > loss_disc_real_5: 0.27300  (0.27300)
     | > loss_0: 2.38964  (2.38964)
     | > grad_norm_0: 133.06857  (133.06857)
     | > loss_gen: 2.42721  (2.42721)
     | > loss_kl: 1.45258  (1.45258)
     | > loss_feat: 3.54794  (3.54794)
     | > loss_mel: 20.78600  (20.78600)
     | > loss_duration: 1.59632  (1.59632)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 29.81005  (29.81005)
     | > grad_norm_1: 1249.49744  (1249.49744)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.56720  (3.56725)
     | > loader_time: 23.31590  (23.31590)


[1m   --> STEP: 25/80 -- GLOBAL_STEP: 11225[0m
     | > loss_disc: 2.91949  (2.50



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[91m 2.54440 [0m(+0.23096)
     | > avg_loss_disc_real_0:[91m 0.06987 [0m(+0.02268)
     | > avg_loss_disc_real_1:[92m 0.18986 [0m(-0.05675)
     | > avg_loss_disc_real_2:[92m 0.17872 [0m(-0.10208)
     | > avg_loss_disc_real_3:[92m 0.22796 [0m(-0.04302)
     | > avg_loss_disc_real_4:[91m 0.26180 [0m(+0.04120)
     | > avg_loss_disc_real_5:[92m 0.22960 [0m(-0.05861)
     | > avg_loss_0:[91m 2.54440 [0m(+0.23096)
     | > avg_loss_gen:[92m 1.79069 [0m(-0.76218)
     | > avg_loss_kl:[92m 0.88331 [0m(-0.38188)
     | > avg_loss_feat:[92m 2.94710 [0m(-0.18117)
     | > avg_loss_mel:[91m 21.09929 [0m(+2.02484)
     | > avg_loss_duration:[91m 1.93048 [0m(+0.02113)
     | > avg_loss_1:[91m 28.65087 [0m(+0.72075)


[4m[1m > EPOCH: 141/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 16:08:29) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 20/80 -- GLOBAL_STEP: 11300[0m
     | > loss_disc: 2.36705  (2.58991)
     | > loss_disc_real_0: 0.09875  (0.13354)
     | > loss_disc_real_1: 0.20455  (0.22439)
     | > loss_disc_real_2: 0.21046  (0.22498)
     | > loss_disc_real_3: 0.21119  (0.23016)
     | > loss_disc_real_4: 0.23338  (0.24029)
     | > loss_disc_real_5: 0.26511  (0.24469)
     | > loss_0: 2.36705  (2.58991)
     | > grad_norm_0: 232.24733  (650.67010)
     | > loss_gen: 2.21727  (2.20251)
     | > loss_kl: 1.18835  (1.29013)
     | > loss_feat: 3.48689  (3.13398)
     | > loss_mel: 20.38089  (19.85629)
     | > loss_duration: 1.55561  (1.53501)
     | > amp_scaler: 32.00000  (56.00000)
     | > loss_1: 28.82901  (28.01792)
     | > grad_norm_1: 3178.34570  (2251.47192)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.63730  (3.58755)
     | > loader_time: 0.01000  (0.00876)


[1m   --> STEP: 45/80 -- GLOBAL_STEP: 11325[0m
     | > loss_disc: 2.61678  (2.577



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00000)
     | > avg_loss_disc:[92m 2.41226 [0m(-0.13214)
     | > avg_loss_disc_real_0:[91m 0.18791 [0m(+0.11804)
     | > avg_loss_disc_real_1:[91m 0.19004 [0m(+0.00017)
     | > avg_loss_disc_real_2:[91m 0.19118 [0m(+0.01245)
     | > avg_loss_disc_real_3:[92m 0.20947 [0m(-0.01849)
     | > avg_loss_disc_real_4:[92m 0.23724 [0m(-0.02456)
     | > avg_loss_disc_real_5:[91m 0.24517 [0m(+0.01557)
     | > avg_loss_0:[92m 2.41226 [0m(-0.13214)
     | > avg_loss_gen:[91m 2.36753 [0m(+0.57684)
     | > avg_loss_kl:[91m 1.26297 [0m(+0.37966)
     | > avg_loss_feat:[91m 3.31943 [0m(+0.37233)
     | > avg_loss_mel:[92m 19.86861 [0m(-1.23068)
     | > avg_loss_duration:[92m 1.92334 [0m(-0.00715)
     | > avg_loss_1:[91m 28.74187 [0m(+0.09100)


[4m[1m > EPOCH: 142/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 16:14:05) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 15/80 -- GLOBAL_STEP: 11375[0m
     | > loss_disc: 2.36997  (2.61296)
     | > loss_disc_real_0: 0.11111  (0.15155)
     | > loss_disc_real_1: 0.19852  (0.23269)
     | > loss_disc_real_2: 0.22017  (0.24764)
     | > loss_disc_real_3: 0.21485  (0.23262)
     | > loss_disc_real_4: 0.23633  (0.24284)
     | > loss_disc_real_5: 0.23300  (0.24488)
     | > loss_0: 2.36997  (2.61296)
     | > grad_norm_0: 502.06079  (784.02576)
     | > loss_gen: 2.47128  (2.35277)
     | > loss_kl: 1.38016  (1.29406)
     | > loss_feat: 4.30905  (3.49197)
     | > loss_mel: 20.62923  (20.31581)
     | > loss_duration: 1.58959  (1.54610)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 30.37931  (29.00071)
     | > grad_norm_1: 663.80634  (1433.49548)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.60590  (3.57837)
     | > loader_time: 0.01100  (0.00874)


[1m   --> STEP: 40/80 -- GLOBAL_STEP: 11400[0m
     | > loss_disc: 2.37995  (2.5424



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.01001 [0m(+0.00100)
     | > avg_loss_disc:[91m 2.56967 [0m(+0.15741)
     | > avg_loss_disc_real_0:[92m 0.13223 [0m(-0.05568)
     | > avg_loss_disc_real_1:[91m 0.24681 [0m(+0.05677)
     | > avg_loss_disc_real_2:[91m 0.23289 [0m(+0.04171)
     | > avg_loss_disc_real_3:[91m 0.25409 [0m(+0.04462)
     | > avg_loss_disc_real_4:[91m 0.23976 [0m(+0.00253)
     | > avg_loss_disc_real_5:[91m 0.24708 [0m(+0.00191)
     | > avg_loss_0:[91m 2.56967 [0m(+0.15741)
     | > avg_loss_gen:[92m 2.23702 [0m(-0.13051)
     | > avg_loss_kl:[91m 1.41588 [0m(+0.15291)
     | > avg_loss_feat:[92m 2.91501 [0m(-0.40442)
     | > avg_loss_mel:[91m 21.07438 [0m(+1.20577)
     | > avg_loss_duration:[91m 1.92769 [0m(+0.00435)
     | > avg_loss_1:[91m 29.56997 [0m(+0.82809)


[4m[1m > EPOCH: 143/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 16:19:40) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 10/80 -- GLOBAL_STEP: 11450[0m
     | > loss_disc: 2.50113  (2.57317)
     | > loss_disc_real_0: 0.19293  (0.13336)
     | > loss_disc_real_1: 0.20624  (0.23867)
     | > loss_disc_real_2: 0.18038  (0.21607)
     | > loss_disc_real_3: 0.22514  (0.22749)
     | > loss_disc_real_4: 0.23744  (0.24317)
     | > loss_disc_real_5: 0.26416  (0.24773)
     | > loss_0: 2.50113  (2.57317)
     | > grad_norm_0: 62.13655  (389.40182)
     | > loss_gen: 2.42583  (2.26376)
     | > loss_kl: 1.33249  (1.23069)
     | > loss_feat: 3.07265  (3.27590)
     | > loss_mel: 19.99495  (20.20502)
     | > loss_duration: 1.53586  (1.55803)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 28.36178  (28.53340)
     | > grad_norm_1: 2871.87036  (1890.87073)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.60130  (3.55604)
     | > loader_time: 0.00800  (0.00841)


[1m   --> STEP: 35/80 -- GLOBAL_STEP: 11475[0m
     | > loss_disc: 2.42979  (2.5698



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00100)
     | > avg_loss_disc:[91m 3.43133 [0m(+0.86166)
     | > avg_loss_disc_real_0:[91m 0.29214 [0m(+0.15991)
     | > avg_loss_disc_real_1:[92m 0.19555 [0m(-0.05127)
     | > avg_loss_disc_real_2:[92m 0.20674 [0m(-0.02615)
     | > avg_loss_disc_real_3:[91m 0.26429 [0m(+0.01019)
     | > avg_loss_disc_real_4:[91m 0.25647 [0m(+0.01671)
     | > avg_loss_disc_real_5:[92m 0.20918 [0m(-0.03790)
     | > avg_loss_0:[91m 3.43133 [0m(+0.86166)
     | > avg_loss_gen:[92m 1.49033 [0m(-0.74668)
     | > avg_loss_kl:[92m 1.24869 [0m(-0.16718)
     | > avg_loss_feat:[92m 2.83756 [0m(-0.07745)
     | > avg_loss_mel:[92m 20.24978 [0m(-0.82459)
     | > avg_loss_duration:[91m 1.93729 [0m(+0.00961)
     | > avg_loss_1:[92m 27.76367 [0m(-1.80630)


[4m[1m > EPOCH: 144/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 16:25:15) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 5/80 -- GLOBAL_STEP: 11525[0m
     | > loss_disc: 2.71534  (2.79365)
     | > loss_disc_real_0: 0.23044  (0.26761)
     | > loss_disc_real_1: 0.20402  (0.21973)
     | > loss_disc_real_2: 0.24021  (0.23233)
     | > loss_disc_real_3: 0.25680  (0.23197)
     | > loss_disc_real_4: 0.25689  (0.23520)
     | > loss_disc_real_5: 0.26485  (0.24768)
     | > loss_0: 2.71534  (2.79365)
     | > grad_norm_0: 336.36157  (901.55157)
     | > loss_gen: 1.83869  (2.04471)
     | > loss_kl: 1.38454  (1.30700)
     | > loss_feat: 2.30215  (2.65045)
     | > loss_mel: 20.56721  (20.32024)
     | > loss_duration: 1.56889  (1.55387)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 27.66148  (27.87626)
     | > grad_norm_1: 407.17548  (1656.91333)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.56820  (3.54498)
     | > loader_time: 0.01100  (0.00820)


[1m   --> STEP: 30/80 -- GLOBAL_STEP: 11550[0m
     | > loss_disc: 2.47909  (2.61962



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time: 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[92m 2.59983 [0m(-0.83150)
     | > avg_loss_disc_real_0:[92m 0.04019 [0m(-0.25195)
     | > avg_loss_disc_real_1:[92m 0.18842 [0m(-0.00712)
     | > avg_loss_disc_real_2:[91m 0.25142 [0m(+0.04468)
     | > avg_loss_disc_real_3:[92m 0.20510 [0m(-0.05919)
     | > avg_loss_disc_real_4:[92m 0.22522 [0m(-0.03125)
     | > avg_loss_disc_real_5:[91m 0.24661 [0m(+0.03743)
     | > avg_loss_0:[92m 2.59983 [0m(-0.83150)
     | > avg_loss_gen:[91m 1.77407 [0m(+0.28374)
     | > avg_loss_kl:[91m 1.46880 [0m(+0.22010)
     | > avg_loss_feat:[91m 2.85526 [0m(+0.01770)
     | > avg_loss_mel:[92m 20.06270 [0m(-0.18708)
     | > avg_loss_duration:[92m 1.90092 [0m(-0.03638)
     | > avg_loss_1:[91m 28.06175 [0m(+0.29808)


[4m[1m > EPOCH: 145/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 16:30:50) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 0/80 -- GLOBAL_STEP: 11600[0m
     | > loss_disc: 2.67325  (2.67325)
     | > loss_disc_real_0: 0.04476  (0.04476)
     | > loss_disc_real_1: 0.20120  (0.20120)
     | > loss_disc_real_2: 0.25560  (0.25560)
     | > loss_disc_real_3: 0.21945  (0.21945)
     | > loss_disc_real_4: 0.23437  (0.23437)
     | > loss_disc_real_5: 0.26173  (0.26173)
     | > loss_0: 2.67325  (2.67325)
     | > grad_norm_0: 995.28149  (995.28149)
     | > loss_gen: 2.15937  (2.15937)
     | > loss_kl: 1.22407  (1.22407)
     | > loss_feat: 3.41756  (3.41756)
     | > loss_mel: 19.47083  (19.47083)
     | > loss_duration: 1.57834  (1.57834)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 27.85017  (27.85017)
     | > grad_norm_1: 3033.25513  (3033.25513)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.55620  (3.55625)
     | > loader_time: 23.14630  (23.14632)


[1m   --> STEP: 25/80 -- GLOBAL_STEP: 11625[0m
     | > loss_disc: 2.34625  (2.52



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.01001 [0m(+0.00100)
     | > avg_loss_disc:[92m 2.53626 [0m(-0.06357)
     | > avg_loss_disc_real_0:[91m 0.27248 [0m(+0.23229)
     | > avg_loss_disc_real_1:[91m 0.20346 [0m(+0.01504)
     | > avg_loss_disc_real_2:[92m 0.19272 [0m(-0.05870)
     | > avg_loss_disc_real_3:[92m 0.17126 [0m(-0.03383)
     | > avg_loss_disc_real_4:[91m 0.24237 [0m(+0.01716)
     | > avg_loss_disc_real_5:[91m 0.25159 [0m(+0.00498)
     | > avg_loss_0:[92m 2.53626 [0m(-0.06357)
     | > avg_loss_gen:[91m 2.27685 [0m(+0.50277)
     | > avg_loss_kl:[92m 1.29570 [0m(-0.17309)
     | > avg_loss_feat:[91m 3.05496 [0m(+0.19969)
     | > avg_loss_mel:[91m 20.40235 [0m(+0.33965)
     | > avg_loss_duration:[91m 1.92893 [0m(+0.02801)
     | > avg_loss_1:[91m 28.95878 [0m(+0.89703)


[4m[1m > EPOCH: 146/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 16:36:25) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 20/80 -- GLOBAL_STEP: 11700[0m
     | > loss_disc: 2.94637  (2.51592)
     | > loss_disc_real_0: 0.03665  (0.08468)
     | > loss_disc_real_1: 0.18951  (0.22413)
     | > loss_disc_real_2: 0.39634  (0.23867)
     | > loss_disc_real_3: 0.22091  (0.22856)
     | > loss_disc_real_4: 0.24508  (0.23923)
     | > loss_disc_real_5: 0.25735  (0.24245)
     | > loss_0: 2.94637  (2.51592)
     | > grad_norm_0: 1238.68152  (334.94116)
     | > loss_gen: 2.05470  (2.27077)
     | > loss_kl: 1.09763  (1.25624)
     | > loss_feat: 3.17974  (3.48472)
     | > loss_mel: 19.71790  (20.07860)
     | > loss_duration: 1.55686  (1.54569)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 27.60683  (28.63601)
     | > grad_norm_1: 3154.90771  (1502.48206)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.61530  (3.58289)
     | > loader_time: 0.00900  (0.00896)


[1m   --> STEP: 45/80 -- GLOBAL_STEP: 11725[0m
     | > loss_disc: 2.70610  (2.55



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00100)
     | > avg_loss_disc:[91m 2.78868 [0m(+0.25241)
     | > avg_loss_disc_real_0:[91m 0.28911 [0m(+0.01663)
     | > avg_loss_disc_real_1:[91m 0.25898 [0m(+0.05551)
     | > avg_loss_disc_real_2:[91m 0.22090 [0m(+0.02818)
     | > avg_loss_disc_real_3:[91m 0.21902 [0m(+0.04776)
     | > avg_loss_disc_real_4:[92m 0.20884 [0m(-0.03353)
     | > avg_loss_disc_real_5:[91m 0.26153 [0m(+0.00994)
     | > avg_loss_0:[91m 2.78868 [0m(+0.25241)
     | > avg_loss_gen:[92m 1.80482 [0m(-0.47202)
     | > avg_loss_kl:[91m 1.31850 [0m(+0.02280)
     | > avg_loss_feat:[92m 2.00260 [0m(-1.05236)
     | > avg_loss_mel:[92m 19.57473 [0m(-0.82762)
     | > avg_loss_duration:[92m 1.91301 [0m(-0.01592)
     | > avg_loss_1:[92m 26.61366 [0m(-2.34512)


[4m[1m > EPOCH: 147/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 16:42:00) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 15/80 -- GLOBAL_STEP: 11775[0m
     | > loss_disc: 2.73707  (2.77119)
     | > loss_disc_real_0: 0.23997  (0.25177)
     | > loss_disc_real_1: 0.25569  (0.22471)
     | > loss_disc_real_2: 0.23353  (0.22755)
     | > loss_disc_real_3: 0.21417  (0.23066)
     | > loss_disc_real_4: 0.21918  (0.24225)
     | > loss_disc_real_5: 0.23882  (0.24480)
     | > loss_0: 2.73707  (2.77119)
     | > grad_norm_0: 10.14577  (17.56066)
     | > loss_gen: 1.86266  (1.86269)
     | > loss_kl: 1.32743  (1.33283)
     | > loss_feat: 2.40344  (2.33065)
     | > loss_mel: 20.06483  (19.63570)
     | > loss_duration: 1.55152  (1.54359)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 27.20988  (26.70546)
     | > grad_norm_1: 140.64047  (230.89478)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.59230  (3.56331)
     | > loader_time: 0.00900  (0.00841)


[1m   --> STEP: 40/80 -- GLOBAL_STEP: 11800[0m
     | > loss_disc: 2.69344  (2.71972)




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.01001 [0m(+0.00100)
     | > avg_loss_disc:[91m 2.87724 [0m(+0.08856)
     | > avg_loss_disc_real_0:[92m 0.20711 [0m(-0.08199)
     | > avg_loss_disc_real_1:[91m 0.27428 [0m(+0.01530)
     | > avg_loss_disc_real_2:[91m 0.24206 [0m(+0.02116)
     | > avg_loss_disc_real_3:[92m 0.21809 [0m(-0.00092)
     | > avg_loss_disc_real_4:[91m 0.24059 [0m(+0.03176)
     | > avg_loss_disc_real_5:[92m 0.24327 [0m(-0.01826)
     | > avg_loss_0:[91m 2.87724 [0m(+0.08856)
     | > avg_loss_gen:[92m 1.67132 [0m(-0.13351)
     | > avg_loss_kl:[92m 1.07439 [0m(-0.24411)
     | > avg_loss_feat:[92m 1.78269 [0m(-0.21990)
     | > avg_loss_mel:[92m 17.46378 [0m(-2.11095)
     | > avg_loss_duration:[91m 1.92720 [0m(+0.01419)
     | > avg_loss_1:[92m 23.91939 [0m(-2.69427)

 > BEST MODEL : ./output\vits_vctk-September-23-2022_02+46AM-3c624ce\best_model_11840.pth

[4m[1m > EPOCH: 148/1000[0m
 --> ./output\vits_vctk-S



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 16:47:39) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 10/80 -- GLOBAL_STEP: 11850[0m
     | > loss_disc: 2.83436  (2.76452)
     | > loss_disc_real_0: 0.26420  (0.21497)
     | > loss_disc_real_1: 0.29610  (0.23449)
     | > loss_disc_real_2: 0.24085  (0.23505)
     | > loss_disc_real_3: 0.30934  (0.24626)
     | > loss_disc_real_4: 0.28559  (0.24947)
     | > loss_disc_real_5: 0.27504  (0.24794)
     | > loss_0: 2.83436  (2.76452)
     | > grad_norm_0: 83.61102  (43.94658)
     | > loss_gen: 1.80229  (1.87776)
     | > loss_kl: 1.28469  (1.23242)
     | > loss_feat: 2.16929  (2.41138)
     | > loss_mel: 19.25862  (19.80613)
     | > loss_duration: 1.50572  (1.53750)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 26.02061  (26.86517)
     | > grad_norm_1: 607.55011  (439.48788)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.57730  (3.54944)
     | > loader_time: 0.00900  (0.00841)


[1m   --> STEP: 35/80 -- GLOBAL_STEP: 11875[0m
     | > loss_disc: 2.52083  (2.68677)




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00100)
     | > avg_loss_disc:[91m 2.94193 [0m(+0.06469)
     | > avg_loss_disc_real_0:[92m 0.02142 [0m(-0.18569)
     | > avg_loss_disc_real_1:[92m 0.23781 [0m(-0.03647)
     | > avg_loss_disc_real_2:[92m 0.18925 [0m(-0.05281)
     | > avg_loss_disc_real_3:[91m 0.24186 [0m(+0.02377)
     | > avg_loss_disc_real_4:[92m 0.21734 [0m(-0.02325)
     | > avg_loss_disc_real_5:[91m 0.25056 [0m(+0.00729)
     | > avg_loss_0:[91m 2.94193 [0m(+0.06469)
     | > avg_loss_gen:[92m 1.47311 [0m(-0.19820)
     | > avg_loss_kl:[91m 1.28232 [0m(+0.20793)
     | > avg_loss_feat:[91m 2.30409 [0m(+0.52140)
     | > avg_loss_mel:[91m 18.38428 [0m(+0.92050)
     | > avg_loss_duration:[92m 1.92513 [0m(-0.00207)
     | > avg_loss_1:[91m 25.36894 [0m(+1.44955)


[4m[1m > EPOCH: 149/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 16:53:14) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 5/80 -- GLOBAL_STEP: 11925[0m
     | > loss_disc: 2.44576  (2.54469)
     | > loss_disc_real_0: 0.07499  (0.08234)
     | > loss_disc_real_1: 0.24772  (0.22245)
     | > loss_disc_real_2: 0.16419  (0.22126)
     | > loss_disc_real_3: 0.23448  (0.23490)
     | > loss_disc_real_4: 0.24182  (0.25179)
     | > loss_disc_real_5: 0.23780  (0.24217)
     | > loss_0: 2.44576  (2.54469)
     | > grad_norm_0: 621.40320  (480.00430)
     | > loss_gen: 2.03087  (2.29657)
     | > loss_kl: 1.22888  (1.18418)
     | > loss_feat: 2.71725  (3.32091)
     | > loss_mel: 20.34392  (20.15423)
     | > loss_duration: 1.63172  (1.55152)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 27.95263  (28.50742)
     | > grad_norm_1: 1429.47351  (1443.44006)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.57020  (3.53743)
     | > loader_time: 0.00800  (0.00820)


[1m   --> STEP: 30/80 -- GLOBAL_STEP: 11950[0m
     | > loss_disc: 2.44760  (2.6113



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.01001 [0m(+0.00100)
     | > avg_loss_disc:[92m 2.66441 [0m(-0.27752)
     | > avg_loss_disc_real_0:[91m 0.13831 [0m(+0.11689)
     | > avg_loss_disc_real_1:[91m 0.27706 [0m(+0.03924)
     | > avg_loss_disc_real_2:[92m 0.15800 [0m(-0.03124)
     | > avg_loss_disc_real_3:[91m 0.29705 [0m(+0.05519)
     | > avg_loss_disc_real_4:[91m 0.22882 [0m(+0.01148)
     | > avg_loss_disc_real_5:[92m 0.21935 [0m(-0.03121)
     | > avg_loss_0:[92m 2.66441 [0m(-0.27752)
     | > avg_loss_gen:[91m 1.78193 [0m(+0.30882)
     | > avg_loss_kl:[92m 1.03517 [0m(-0.24715)
     | > avg_loss_feat:[91m 2.36939 [0m(+0.06530)
     | > avg_loss_mel:[91m 19.57422 [0m(+1.18994)
     | > avg_loss_duration:[91m 1.93047 [0m(+0.00534)
     | > avg_loss_1:[91m 26.69118 [0m(+1.32224)


[4m[1m > EPOCH: 150/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 16:58:49) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 0/80 -- GLOBAL_STEP: 12000[0m
     | > loss_disc: 2.60126  (2.60126)
     | > loss_disc_real_0: 0.13553  (0.13553)
     | > loss_disc_real_1: 0.24568  (0.24568)
     | > loss_disc_real_2: 0.14941  (0.14941)
     | > loss_disc_real_3: 0.26826  (0.26826)
     | > loss_disc_real_4: 0.20333  (0.20333)
     | > loss_disc_real_5: 0.21039  (0.21039)
     | > loss_0: 2.60126  (2.60126)
     | > grad_norm_0: 204.11346  (204.11346)
     | > loss_gen: 1.85137  (1.85137)
     | > loss_kl: 1.15787  (1.15787)
     | > loss_feat: 2.87108  (2.87108)
     | > loss_mel: 20.24224  (20.24224)
     | > loss_duration: 1.55658  (1.55658)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 27.67915  (27.67915)
     | > grad_norm_1: 239.51865  (239.51865)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.56120  (3.56124)
     | > loader_time: 23.36100  (23.36103)


[1m   --> STEP: 25/80 -- GLOBAL_STEP: 12025[0m
     | > loss_disc: 2.82066  (2.6436



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00100)
     | > avg_loss_disc:[91m 2.69331 [0m(+0.02890)
     | > avg_loss_disc_real_0:[91m 0.35798 [0m(+0.21967)
     | > avg_loss_disc_real_1:[92m 0.17785 [0m(-0.09921)
     | > avg_loss_disc_real_2:[91m 0.18824 [0m(+0.03023)
     | > avg_loss_disc_real_3:[92m 0.20734 [0m(-0.08971)
     | > avg_loss_disc_real_4:[91m 0.23671 [0m(+0.00789)
     | > avg_loss_disc_real_5:[92m 0.21188 [0m(-0.00747)
     | > avg_loss_0:[91m 2.69331 [0m(+0.02890)
     | > avg_loss_gen:[91m 1.88633 [0m(+0.10441)
     | > avg_loss_kl:[91m 1.37494 [0m(+0.33977)
     | > avg_loss_feat:[91m 2.93684 [0m(+0.56745)
     | > avg_loss_mel:[91m 20.46491 [0m(+0.89068)
     | > avg_loss_duration:[91m 1.95276 [0m(+0.02229)
     | > avg_loss_1:[91m 28.61578 [0m(+1.92460)


[4m[1m > EPOCH: 151/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 17:04:25) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 20/80 -- GLOBAL_STEP: 12100[0m
     | > loss_disc: 2.57097  (2.64651)
     | > loss_disc_real_0: 0.17230  (0.15555)
     | > loss_disc_real_1: 0.22185  (0.22636)
     | > loss_disc_real_2: 0.26162  (0.23020)
     | > loss_disc_real_3: 0.22844  (0.23375)
     | > loss_disc_real_4: 0.24322  (0.24119)
     | > loss_disc_real_5: 0.24985  (0.24450)
     | > loss_0: 2.57097  (2.64651)
     | > grad_norm_0: 929.71735  (755.95337)
     | > loss_gen: 2.45148  (2.21787)
     | > loss_kl: 1.21998  (1.28741)
     | > loss_feat: 3.61145  (3.27035)
     | > loss_mel: 19.90593  (19.96306)
     | > loss_duration: 1.57163  (1.53305)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 28.76048  (28.27173)
     | > grad_norm_1: 1842.96570  (1817.22388)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.62630  (3.58957)
     | > loader_time: 0.01000  (0.00921)


[1m   --> STEP: 45/80 -- GLOBAL_STEP: 12125[0m
     | > loss_disc: 2.62459  (2.627



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00801 [0m(-0.00100)
     | > avg_loss_disc:[92m 2.50597 [0m(-0.18734)
     | > avg_loss_disc_real_0:[92m 0.23607 [0m(-0.12191)
     | > avg_loss_disc_real_1:[91m 0.23947 [0m(+0.06162)
     | > avg_loss_disc_real_2:[92m 0.15913 [0m(-0.02911)
     | > avg_loss_disc_real_3:[92m 0.18805 [0m(-0.01929)
     | > avg_loss_disc_real_4:[91m 0.27736 [0m(+0.04065)
     | > avg_loss_disc_real_5:[91m 0.25982 [0m(+0.04794)
     | > avg_loss_0:[92m 2.50597 [0m(-0.18734)
     | > avg_loss_gen:[91m 2.21274 [0m(+0.32641)
     | > avg_loss_kl:[91m 1.60133 [0m(+0.22640)
     | > avg_loss_feat:[91m 2.98562 [0m(+0.04878)
     | > avg_loss_mel:[91m 21.14722 [0m(+0.68231)
     | > avg_loss_duration:[91m 1.97204 [0m(+0.01928)
     | > avg_loss_1:[91m 29.91896 [0m(+1.30318)


[4m[1m > EPOCH: 152/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 17:10:01) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 15/80 -- GLOBAL_STEP: 12175[0m
     | > loss_disc: 2.76844  (2.68104)
     | > loss_disc_real_0: 0.19071  (0.18117)
     | > loss_disc_real_1: 0.25414  (0.23235)
     | > loss_disc_real_2: 0.16925  (0.23092)
     | > loss_disc_real_3: 0.23268  (0.23415)
     | > loss_disc_real_4: 0.20638  (0.23972)
     | > loss_disc_real_5: 0.22542  (0.24510)
     | > loss_0: 2.76844  (2.68104)
     | > grad_norm_0: 383.33524  (449.87521)
     | > loss_gen: 1.80090  (2.02046)
     | > loss_kl: 1.43109  (1.33950)
     | > loss_feat: 2.76945  (2.77542)
     | > loss_mel: 20.74203  (19.98324)
     | > loss_duration: 1.55846  (1.53559)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 28.30193  (27.65421)
     | > grad_norm_1: 358.80179  (1899.23608)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.62830  (3.57579)
     | > loader_time: 0.00900  (0.00874)


[1m   --> STEP: 40/80 -- GLOBAL_STEP: 12200[0m
     | > loss_disc: 2.53468  (2.6363



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.01001 [0m(+0.00200)
     | > avg_loss_disc:[92m 2.32493 [0m(-0.18104)
     | > avg_loss_disc_real_0:[92m 0.11516 [0m(-0.12090)
     | > avg_loss_disc_real_1:[92m 0.17152 [0m(-0.06794)
     | > avg_loss_disc_real_2:[91m 0.20183 [0m(+0.04271)
     | > avg_loss_disc_real_3:[91m 0.20756 [0m(+0.01950)
     | > avg_loss_disc_real_4:[92m 0.22797 [0m(-0.04939)
     | > avg_loss_disc_real_5:[92m 0.23763 [0m(-0.02218)
     | > avg_loss_0:[92m 2.32493 [0m(-0.18104)
     | > avg_loss_gen:[92m 2.08644 [0m(-0.12630)
     | > avg_loss_kl:[92m 1.57564 [0m(-0.02569)
     | > avg_loss_feat:[91m 3.12847 [0m(+0.14285)
     | > avg_loss_mel:[92m 20.64741 [0m(-0.49981)
     | > avg_loss_duration:[92m 1.93156 [0m(-0.04048)
     | > avg_loss_1:[92m 29.36953 [0m(-0.54943)


[4m[1m > EPOCH: 153/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 17:15:36) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 10/80 -- GLOBAL_STEP: 12250[0m
     | > loss_disc: 2.72028  (2.69628)
     | > loss_disc_real_0: 0.25635  (0.21193)
     | > loss_disc_real_1: 0.29871  (0.22385)
     | > loss_disc_real_2: 0.24950  (0.22471)
     | > loss_disc_real_3: 0.29881  (0.24242)
     | > loss_disc_real_4: 0.25100  (0.24150)
     | > loss_disc_real_5: 0.25789  (0.24275)
     | > loss_0: 2.72028  (2.69628)
     | > grad_norm_0: 224.88232  (658.48871)
     | > loss_gen: 1.97028  (2.04207)
     | > loss_kl: 1.30963  (1.32827)
     | > loss_feat: 2.49772  (2.86551)
     | > loss_mel: 19.36205  (19.73909)
     | > loss_duration: 1.46109  (1.54717)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 26.60076  (27.52211)
     | > grad_norm_1: 614.61700  (1473.83801)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.60430  (3.55854)
     | > loader_time: 0.00800  (0.00831)


[1m   --> STEP: 35/80 -- GLOBAL_STEP: 12275[0m
     | > loss_disc: 2.57523  (2.6533



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00100)
     | > avg_loss_disc:[91m 2.43637 [0m(+0.11144)
     | > avg_loss_disc_real_0:[91m 0.16120 [0m(+0.04604)
     | > avg_loss_disc_real_1:[91m 0.19439 [0m(+0.02286)
     | > avg_loss_disc_real_2:[92m 0.18669 [0m(-0.01514)
     | > avg_loss_disc_real_3:[91m 0.22282 [0m(+0.01526)
     | > avg_loss_disc_real_4:[91m 0.24217 [0m(+0.01419)
     | > avg_loss_disc_real_5:[92m 0.22757 [0m(-0.01006)
     | > avg_loss_0:[91m 2.43637 [0m(+0.11144)
     | > avg_loss_gen:[91m 2.40259 [0m(+0.31614)
     | > avg_loss_kl:[92m 1.35731 [0m(-0.21834)
     | > avg_loss_feat:[91m 3.46053 [0m(+0.33206)
     | > avg_loss_mel:[92m 20.33491 [0m(-0.31250)
     | > avg_loss_duration:[91m 1.94759 [0m(+0.01603)
     | > avg_loss_1:[91m 29.50292 [0m(+0.13340)


[4m[1m > EPOCH: 154/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 17:21:12) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 5/80 -- GLOBAL_STEP: 12325[0m
     | > loss_disc: 2.40003  (2.51642)
     | > loss_disc_real_0: 0.08132  (0.08698)
     | > loss_disc_real_1: 0.21471  (0.22935)
     | > loss_disc_real_2: 0.23088  (0.23285)
     | > loss_disc_real_3: 0.23915  (0.23515)
     | > loss_disc_real_4: 0.21432  (0.24890)
     | > loss_disc_real_5: 0.24280  (0.24278)
     | > loss_0: 2.40003  (2.51642)
     | > grad_norm_0: 535.67108  (632.00671)
     | > loss_gen: 2.41013  (2.22286)
     | > loss_kl: 1.41451  (1.35333)
     | > loss_feat: 3.73477  (3.31560)
     | > loss_mel: 19.69846  (19.95173)
     | > loss_duration: 1.53513  (1.52549)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 28.79300  (28.36901)
     | > grad_norm_1: 1702.73633  (1880.31921)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.58030  (3.54182)
     | > loader_time: 0.00800  (0.00781)


[1m   --> STEP: 30/80 -- GLOBAL_STEP: 12350[0m
     | > loss_disc: 2.88694  (2.5613



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[91m 2.64533 [0m(+0.20895)
     | > avg_loss_disc_real_0:[91m 0.29310 [0m(+0.13190)
     | > avg_loss_disc_real_1:[91m 0.30205 [0m(+0.10766)
     | > avg_loss_disc_real_2:[91m 0.23219 [0m(+0.04550)
     | > avg_loss_disc_real_3:[92m 0.20668 [0m(-0.01614)
     | > avg_loss_disc_real_4:[91m 0.28578 [0m(+0.04361)
     | > avg_loss_disc_real_5:[91m 0.29274 [0m(+0.06516)
     | > avg_loss_0:[91m 2.64533 [0m(+0.20895)
     | > avg_loss_gen:[91m 2.43316 [0m(+0.03057)
     | > avg_loss_kl:[91m 1.53043 [0m(+0.17312)
     | > avg_loss_feat:[92m 2.64345 [0m(-0.81709)
     | > avg_loss_mel:[92m 19.98997 [0m(-0.34493)
     | > avg_loss_duration:[91m 1.96111 [0m(+0.01352)
     | > avg_loss_1:[92m 28.55812 [0m(-0.94480)


[4m[1m > EPOCH: 155/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 17:26:46) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 0/80 -- GLOBAL_STEP: 12400[0m
     | > loss_disc: 2.63983  (2.63983)
     | > loss_disc_real_0: 0.28965  (0.28965)
     | > loss_disc_real_1: 0.26945  (0.26945)
     | > loss_disc_real_2: 0.21522  (0.21522)
     | > loss_disc_real_3: 0.25558  (0.25558)
     | > loss_disc_real_4: 0.28670  (0.28670)
     | > loss_disc_real_5: 0.27187  (0.27187)
     | > loss_0: 2.63983  (2.63983)
     | > grad_norm_0: 1155.84180  (1155.84180)
     | > loss_gen: 2.56477  (2.56477)
     | > loss_kl: 1.23119  (1.23119)
     | > loss_feat: 3.33027  (3.33027)
     | > loss_mel: 20.03887  (20.03887)
     | > loss_duration: 1.57423  (1.57423)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 28.73934  (28.73934)
     | > grad_norm_1: 1147.62207  (1147.62207)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.59730  (3.59728)
     | > loader_time: 23.40470  (23.40467)


[1m   --> STEP: 25/80 -- GLOBAL_STEP: 12425[0m
     | > loss_disc: 2.45627  (2.



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.01001 [0m(+0.00100)
     | > avg_loss_disc:[92m 2.57625 [0m(-0.06908)
     | > avg_loss_disc_real_0:[92m 0.14189 [0m(-0.15121)
     | > avg_loss_disc_real_1:[92m 0.21871 [0m(-0.08334)
     | > avg_loss_disc_real_2:[92m 0.20599 [0m(-0.02620)
     | > avg_loss_disc_real_3:[91m 0.25909 [0m(+0.05241)
     | > avg_loss_disc_real_4:[92m 0.23354 [0m(-0.05225)
     | > avg_loss_disc_real_5:[92m 0.26956 [0m(-0.02318)
     | > avg_loss_0:[92m 2.57625 [0m(-0.06908)
     | > avg_loss_gen:[92m 1.97999 [0m(-0.45317)
     | > avg_loss_kl:[91m 1.54217 [0m(+0.01174)
     | > avg_loss_feat:[92m 2.24347 [0m(-0.39998)
     | > avg_loss_mel:[91m 20.97072 [0m(+0.98075)
     | > avg_loss_duration:[92m 1.94644 [0m(-0.01467)
     | > avg_loss_1:[91m 28.68279 [0m(+0.12467)


[4m[1m > EPOCH: 156/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 17:32:21) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 20/80 -- GLOBAL_STEP: 12500[0m
     | > loss_disc: 2.60297  (2.69869)
     | > loss_disc_real_0: 0.20292  (0.16887)
     | > loss_disc_real_1: 0.22293  (0.22801)
     | > loss_disc_real_2: 0.22627  (0.23864)
     | > loss_disc_real_3: 0.21409  (0.23615)
     | > loss_disc_real_4: 0.22686  (0.24441)
     | > loss_disc_real_5: 0.23291  (0.24209)
     | > loss_0: 2.60297  (2.69869)
     | > grad_norm_0: 389.74017  (646.04132)
     | > loss_gen: 2.16483  (2.13734)
     | > loss_kl: 1.31258  (1.25856)
     | > loss_feat: 2.98087  (3.04208)
     | > loss_mel: 20.51507  (20.12318)
     | > loss_duration: 1.52292  (1.52548)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 28.49627  (28.08664)
     | > grad_norm_1: 1769.47815  (1599.37537)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.63630  (3.58157)
     | > loader_time: 0.01000  (0.00906)


[1m   --> STEP: 45/80 -- GLOBAL_STEP: 12525[0m
     | > loss_disc: 2.61056  (2.685



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00100)
     | > avg_loss_disc:[91m 2.70935 [0m(+0.13310)
     | > avg_loss_disc_real_0:[92m 0.05550 [0m(-0.08639)
     | > avg_loss_disc_real_1:[91m 0.23457 [0m(+0.01586)
     | > avg_loss_disc_real_2:[92m 0.20172 [0m(-0.00427)
     | > avg_loss_disc_real_3:[92m 0.19883 [0m(-0.06025)
     | > avg_loss_disc_real_4:[91m 0.26170 [0m(+0.02817)
     | > avg_loss_disc_real_5:[92m 0.24749 [0m(-0.02207)
     | > avg_loss_0:[91m 2.70935 [0m(+0.13310)
     | > avg_loss_gen:[92m 1.63765 [0m(-0.34234)
     | > avg_loss_kl:[92m 1.25140 [0m(-0.29077)
     | > avg_loss_feat:[91m 2.29013 [0m(+0.04666)
     | > avg_loss_mel:[92m 18.74554 [0m(-2.22518)
     | > avg_loss_duration:[91m 1.97846 [0m(+0.03201)
     | > avg_loss_1:[92m 25.90318 [0m(-2.77962)


[4m[1m > EPOCH: 157/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 17:37:56) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 15/80 -- GLOBAL_STEP: 12575[0m
     | > loss_disc: 2.49100  (2.64371)
     | > loss_disc_real_0: 0.03466  (0.17032)
     | > loss_disc_real_1: 0.21115  (0.23205)
     | > loss_disc_real_2: 0.26755  (0.22355)
     | > loss_disc_real_3: 0.20236  (0.23799)
     | > loss_disc_real_4: 0.23069  (0.24056)
     | > loss_disc_real_5: 0.22910  (0.24599)
     | > loss_0: 2.49100  (2.64371)
     | > grad_norm_0: 562.31177  (803.40228)
     | > loss_gen: 2.38970  (2.23991)
     | > loss_kl: 1.22605  (1.24828)
     | > loss_feat: 3.41797  (3.17119)
     | > loss_mel: 18.95669  (19.63064)
     | > loss_duration: 1.53576  (1.52744)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 27.52617  (27.81745)
     | > grad_norm_1: 1847.16211  (1526.43604)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.62530  (3.56878)
     | > loader_time: 0.00900  (0.00827)


[1m   --> STEP: 40/80 -- GLOBAL_STEP: 12600[0m
     | > loss_disc: 2.63095  (2.623



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.01001 [0m(+0.00100)
     | > avg_loss_disc:[92m 2.39411 [0m(-0.31524)
     | > avg_loss_disc_real_0:[91m 0.09162 [0m(+0.03612)
     | > avg_loss_disc_real_1:[92m 0.20820 [0m(-0.02637)
     | > avg_loss_disc_real_2:[91m 0.27771 [0m(+0.07599)
     | > avg_loss_disc_real_3:[92m 0.19737 [0m(-0.00147)
     | > avg_loss_disc_real_4:[91m 0.27259 [0m(+0.01089)
     | > avg_loss_disc_real_5:[91m 0.27290 [0m(+0.02541)
     | > avg_loss_0:[92m 2.39411 [0m(-0.31524)
     | > avg_loss_gen:[91m 2.27084 [0m(+0.63319)
     | > avg_loss_kl:[91m 1.62277 [0m(+0.37137)
     | > avg_loss_feat:[91m 2.99607 [0m(+0.70594)
     | > avg_loss_mel:[91m 19.07410 [0m(+0.32856)
     | > avg_loss_duration:[92m 1.95107 [0m(-0.02739)
     | > avg_loss_1:[91m 27.91485 [0m(+2.01167)


[4m[1m > EPOCH: 158/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 17:43:30) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 10/80 -- GLOBAL_STEP: 12650[0m
     | > loss_disc: 2.66084  (2.68938)
     | > loss_disc_real_0: 0.29888  (0.18281)
     | > loss_disc_real_1: 0.24196  (0.23842)
     | > loss_disc_real_2: 0.20068  (0.23481)
     | > loss_disc_real_3: 0.27402  (0.23974)
     | > loss_disc_real_4: 0.24080  (0.24406)
     | > loss_disc_real_5: 0.25033  (0.25460)
     | > loss_0: 2.66084  (2.68938)
     | > grad_norm_0: 710.32593  (595.79266)
     | > loss_gen: 2.41659  (2.22281)
     | > loss_kl: 1.55893  (1.24484)
     | > loss_feat: 3.49565  (2.98934)
     | > loss_mel: 19.74158  (20.09852)
     | > loss_duration: 1.49977  (1.53073)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 28.71251  (28.08624)
     | > grad_norm_1: 1146.57361  (1768.33728)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.58630  (3.55343)
     | > loader_time: 0.00800  (0.00861)


[1m   --> STEP: 35/80 -- GLOBAL_STEP: 12675[0m
     | > loss_disc: 2.57271  (2.653



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00100)
     | > avg_loss_disc:[91m 2.96785 [0m(+0.57374)
     | > avg_loss_disc_real_0:[91m 0.32631 [0m(+0.23469)
     | > avg_loss_disc_real_1:[92m 0.16894 [0m(-0.03926)
     | > avg_loss_disc_real_2:[92m 0.27248 [0m(-0.00523)
     | > avg_loss_disc_real_3:[91m 0.24475 [0m(+0.04738)
     | > avg_loss_disc_real_4:[92m 0.22789 [0m(-0.04470)
     | > avg_loss_disc_real_5:[92m 0.23705 [0m(-0.03585)
     | > avg_loss_0:[91m 2.96785 [0m(+0.57374)
     | > avg_loss_gen:[92m 1.68674 [0m(-0.58409)
     | > avg_loss_kl:[91m 1.93382 [0m(+0.31105)
     | > avg_loss_feat:[92m 2.41827 [0m(-0.57780)
     | > avg_loss_mel:[91m 20.32370 [0m(+1.24960)
     | > avg_loss_duration:[92m 1.92136 [0m(-0.02971)
     | > avg_loss_1:[91m 28.28389 [0m(+0.36904)


[4m[1m > EPOCH: 159/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 17:49:05) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 5/80 -- GLOBAL_STEP: 12725[0m
     | > loss_disc: 2.62452  (2.57421)
     | > loss_disc_real_0: 0.06093  (0.15088)
     | > loss_disc_real_1: 0.28111  (0.26003)
     | > loss_disc_real_2: 0.27331  (0.23235)
     | > loss_disc_real_3: 0.27387  (0.23962)
     | > loss_disc_real_4: 0.25877  (0.24767)
     | > loss_disc_real_5: 0.26058  (0.25637)
     | > loss_0: 2.62452  (2.57421)
     | > grad_norm_0: 201.25468  (437.63321)
     | > loss_gen: 1.98999  (2.44596)
     | > loss_kl: 1.25716  (1.27538)
     | > loss_feat: 2.98682  (3.63515)
     | > loss_mel: 19.81484  (20.33910)
     | > loss_duration: 1.54473  (1.53921)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 27.59355  (29.23480)
     | > grad_norm_1: 445.16339  (785.60638)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.55520  (3.52801)
     | > loader_time: 0.00900  (0.00761)


[1m   --> STEP: 30/80 -- GLOBAL_STEP: 12750[0m
     | > loss_disc: 3.15612  (2.63091)



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.01001 [0m(+0.00100)
     | > avg_loss_disc:[92m 2.45739 [0m(-0.51046)
     | > avg_loss_disc_real_0:[92m 0.05848 [0m(-0.26783)
     | > avg_loss_disc_real_1:[91m 0.19110 [0m(+0.02217)
     | > avg_loss_disc_real_2:[91m 0.29794 [0m(+0.02547)
     | > avg_loss_disc_real_3:[91m 0.26138 [0m(+0.01663)
     | > avg_loss_disc_real_4:[91m 0.26791 [0m(+0.04002)
     | > avg_loss_disc_real_5:[91m 0.24591 [0m(+0.00886)
     | > avg_loss_0:[92m 2.45739 [0m(-0.51046)
     | > avg_loss_gen:[91m 2.33606 [0m(+0.64932)
     | > avg_loss_kl:[92m 1.08914 [0m(-0.84468)
     | > avg_loss_feat:[91m 3.30403 [0m(+0.88576)
     | > avg_loss_mel:[92m 19.06199 [0m(-1.26171)
     | > avg_loss_duration:[91m 1.95189 [0m(+0.03053)
     | > avg_loss_1:[92m 27.74310 [0m(-0.54078)


[4m[1m > EPOCH: 160/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 17:54:40) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 0/80 -- GLOBAL_STEP: 12800[0m
     | > loss_disc: 2.54339  (2.54339)
     | > loss_disc_real_0: 0.03681  (0.03681)
     | > loss_disc_real_1: 0.19077  (0.19077)
     | > loss_disc_real_2: 0.26699  (0.26699)
     | > loss_disc_real_3: 0.25272  (0.25272)
     | > loss_disc_real_4: 0.24299  (0.24299)
     | > loss_disc_real_5: 0.22991  (0.22991)
     | > loss_0: 2.54339  (2.54339)
     | > grad_norm_0: 160.96548  (160.96548)
     | > loss_gen: 2.25536  (2.25536)
     | > loss_kl: 1.16441  (1.16441)
     | > loss_feat: 3.36428  (3.36428)
     | > loss_mel: 19.85988  (19.85988)
     | > loss_duration: 1.59012  (1.59012)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 28.23405  (28.23405)
     | > grad_norm_1: 1255.10999  (1255.10999)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.55520  (3.55525)
     | > loader_time: 23.22370  (23.22369)


[1m   --> STEP: 25/80 -- GLOBAL_STEP: 12825[0m
     | > loss_disc: 2.45687  (2.72



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00100)
     | > avg_loss_disc:[91m 2.76263 [0m(+0.30524)
     | > avg_loss_disc_real_0:[91m 0.13040 [0m(+0.07191)
     | > avg_loss_disc_real_1:[91m 0.26069 [0m(+0.06959)
     | > avg_loss_disc_real_2:[92m 0.21797 [0m(-0.07998)
     | > avg_loss_disc_real_3:[91m 0.26298 [0m(+0.00159)
     | > avg_loss_disc_real_4:[91m 0.28207 [0m(+0.01416)
     | > avg_loss_disc_real_5:[92m 0.24099 [0m(-0.00492)
     | > avg_loss_0:[91m 2.76263 [0m(+0.30524)
     | > avg_loss_gen:[92m 1.78654 [0m(-0.54952)
     | > avg_loss_kl:[91m 1.26608 [0m(+0.17694)
     | > avg_loss_feat:[92m 2.17646 [0m(-1.12756)
     | > avg_loss_mel:[91m 20.00957 [0m(+0.94759)
     | > avg_loss_duration:[91m 1.95957 [0m(+0.00768)
     | > avg_loss_1:[92m 27.19823 [0m(-0.54488)


[4m[1m > EPOCH: 161/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 18:00:14) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 20/80 -- GLOBAL_STEP: 12900[0m
     | > loss_disc: 2.66616  (2.59854)
     | > loss_disc_real_0: 0.25338  (0.14667)
     | > loss_disc_real_1: 0.22919  (0.21576)
     | > loss_disc_real_2: 0.27593  (0.22966)
     | > loss_disc_real_3: 0.27848  (0.23432)
     | > loss_disc_real_4: 0.26463  (0.24087)
     | > loss_disc_real_5: 0.27455  (0.24548)
     | > loss_0: 2.66616  (2.59854)
     | > grad_norm_0: 847.77155  (437.40445)
     | > loss_gen: 2.22059  (2.11534)
     | > loss_kl: 1.28738  (1.25280)
     | > loss_feat: 3.01782  (3.05307)
     | > loss_mel: 19.74248  (19.89113)
     | > loss_duration: 1.54094  (1.53087)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 27.80921  (27.84321)
     | > grad_norm_1: 1230.90857  (1422.36462)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.62830  (3.58044)
     | > loader_time: 0.01000  (0.00861)


[1m   --> STEP: 45/80 -- GLOBAL_STEP: 12925[0m
     | > loss_disc: 2.57502  (2.644



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.01001 [0m(+0.00100)
     | > avg_loss_disc:[91m 2.77265 [0m(+0.01002)
     | > avg_loss_disc_real_0:[91m 0.46026 [0m(+0.32986)
     | > avg_loss_disc_real_1:[92m 0.18300 [0m(-0.07769)
     | > avg_loss_disc_real_2:[92m 0.20294 [0m(-0.01502)
     | > avg_loss_disc_real_3:[92m 0.21678 [0m(-0.04619)
     | > avg_loss_disc_real_4:[92m 0.25330 [0m(-0.02878)
     | > avg_loss_disc_real_5:[91m 0.24929 [0m(+0.00830)
     | > avg_loss_0:[91m 2.77265 [0m(+0.01002)
     | > avg_loss_gen:[91m 2.18607 [0m(+0.39953)
     | > avg_loss_kl:[91m 1.63142 [0m(+0.36534)
     | > avg_loss_feat:[91m 2.50575 [0m(+0.32929)
     | > avg_loss_mel:[92m 19.07885 [0m(-0.93073)
     | > avg_loss_duration:[92m 1.93859 [0m(-0.02098)
     | > avg_loss_1:[91m 27.34069 [0m(+0.14246)


[4m[1m > EPOCH: 162/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 18:05:49) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 15/80 -- GLOBAL_STEP: 12975[0m
     | > loss_disc: 2.90820  (2.71715)
     | > loss_disc_real_0: 0.02348  (0.20102)
     | > loss_disc_real_1: 0.16826  (0.22428)
     | > loss_disc_real_2: 0.17445  (0.22353)
     | > loss_disc_real_3: 0.16379  (0.23094)
     | > loss_disc_real_4: 0.18707  (0.23684)
     | > loss_disc_real_5: 0.22053  (0.24704)
     | > loss_0: 2.90820  (2.71715)
     | > grad_norm_0: 196.18909  (477.08029)
     | > loss_gen: 1.75619  (2.11950)
     | > loss_kl: 1.29655  (1.30196)
     | > loss_feat: 2.84470  (3.18723)
     | > loss_mel: 20.10475  (19.78285)
     | > loss_duration: 1.54214  (1.52581)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 27.54433  (27.91735)
     | > grad_norm_1: 205.95564  (879.88959)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.58430  (3.56795)
     | > loader_time: 0.00900  (0.00867)


[1m   --> STEP: 40/80 -- GLOBAL_STEP: 13000[0m
     | > loss_disc: 2.61241  (2.68259



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.01001 [0m(+0.00000)
     | > avg_loss_disc:[92m 2.54220 [0m(-0.23045)
     | > avg_loss_disc_real_0:[92m 0.11875 [0m(-0.34151)
     | > avg_loss_disc_real_1:[91m 0.24627 [0m(+0.06328)
     | > avg_loss_disc_real_2:[91m 0.23192 [0m(+0.02898)
     | > avg_loss_disc_real_3:[91m 0.23812 [0m(+0.02133)
     | > avg_loss_disc_real_4:[92m 0.23865 [0m(-0.01464)
     | > avg_loss_disc_real_5:[91m 0.29587 [0m(+0.04658)
     | > avg_loss_0:[92m 2.54220 [0m(-0.23045)
     | > avg_loss_gen:[91m 2.20580 [0m(+0.01973)
     | > avg_loss_kl:[92m 1.51641 [0m(-0.11501)
     | > avg_loss_feat:[91m 2.68407 [0m(+0.17831)
     | > avg_loss_mel:[91m 19.57215 [0m(+0.49331)
     | > avg_loss_duration:[91m 1.95166 [0m(+0.01306)
     | > avg_loss_1:[91m 27.93009 [0m(+0.58940)


[4m[1m > EPOCH: 163/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 18:11:24) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 10/80 -- GLOBAL_STEP: 13050[0m
     | > loss_disc: 2.50005  (2.61980)
     | > loss_disc_real_0: 0.16338  (0.16590)
     | > loss_disc_real_1: 0.28229  (0.23950)
     | > loss_disc_real_2: 0.20201  (0.22463)
     | > loss_disc_real_3: 0.25077  (0.23537)
     | > loss_disc_real_4: 0.20941  (0.24272)
     | > loss_disc_real_5: 0.26066  (0.23941)
     | > loss_0: 2.50005  (2.61980)
     | > grad_norm_0: 454.19110  (377.74295)
     | > loss_gen: 2.20859  (2.08565)
     | > loss_kl: 1.20690  (1.30642)
     | > loss_feat: 3.61409  (2.98374)
     | > loss_mel: 19.83286  (19.90539)
     | > loss_duration: 1.55214  (1.52734)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 28.41458  (27.80855)
     | > grad_norm_1: 1680.56958  (1307.98499)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.61130  (3.55260)
     | > loader_time: 0.00900  (0.00881)


[1m   --> STEP: 35/80 -- GLOBAL_STEP: 13075[0m
     | > loss_disc: 2.65058  (2.668



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00801 [0m(-0.00200)
     | > avg_loss_disc:[91m 2.87358 [0m(+0.33138)
     | > avg_loss_disc_real_0:[91m 0.21120 [0m(+0.09245)
     | > avg_loss_disc_real_1:[91m 0.27816 [0m(+0.03189)
     | > avg_loss_disc_real_2:[91m 0.23689 [0m(+0.00496)
     | > avg_loss_disc_real_3:[91m 0.26465 [0m(+0.02653)
     | > avg_loss_disc_real_4:[91m 0.26994 [0m(+0.03129)
     | > avg_loss_disc_real_5:[92m 0.26207 [0m(-0.03380)
     | > avg_loss_0:[91m 2.87358 [0m(+0.33138)
     | > avg_loss_gen:[92m 1.79138 [0m(-0.41443)
     | > avg_loss_kl:[91m 1.52785 [0m(+0.01144)
     | > avg_loss_feat:[92m 2.13789 [0m(-0.54618)
     | > avg_loss_mel:[92m 18.81915 [0m(-0.75300)
     | > avg_loss_duration:[91m 1.96487 [0m(+0.01322)
     | > avg_loss_1:[92m 26.24113 [0m(-1.68895)


[4m[1m > EPOCH: 164/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 18:16:58) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 5/80 -- GLOBAL_STEP: 13125[0m
     | > loss_disc: 2.62718  (2.68919)
     | > loss_disc_real_0: 0.33847  (0.28801)
     | > loss_disc_real_1: 0.24868  (0.22479)
     | > loss_disc_real_2: 0.24868  (0.22488)
     | > loss_disc_real_3: 0.26428  (0.23686)
     | > loss_disc_real_4: 0.24945  (0.24379)
     | > loss_disc_real_5: 0.25160  (0.25370)
     | > loss_0: 2.62718  (2.68919)
     | > grad_norm_0: 530.75885  (526.37183)
     | > loss_gen: 2.29372  (2.23665)
     | > loss_kl: 1.08888  (1.18334)
     | > loss_feat: 3.12306  (2.88119)
     | > loss_mel: 19.51920  (19.77356)
     | > loss_duration: 1.51924  (1.52983)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 27.54409  (27.60458)
     | > grad_norm_1: 1453.43555  (1301.05469)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.55420  (3.52681)
     | > loader_time: 0.00800  (0.00820)


[1m   --> STEP: 30/80 -- GLOBAL_STEP: 13150[0m
     | > loss_disc: 3.07787  (2.6581



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00601 [0m(-0.00200)
     | > avg_loss_disc:[92m 2.52685 [0m(-0.34673)
     | > avg_loss_disc_real_0:[92m 0.11072 [0m(-0.10048)
     | > avg_loss_disc_real_1:[92m 0.26796 [0m(-0.01021)
     | > avg_loss_disc_real_2:[92m 0.23598 [0m(-0.00091)
     | > avg_loss_disc_real_3:[92m 0.20580 [0m(-0.05885)
     | > avg_loss_disc_real_4:[92m 0.26147 [0m(-0.00847)
     | > avg_loss_disc_real_5:[91m 0.26934 [0m(+0.00727)
     | > avg_loss_0:[92m 2.52685 [0m(-0.34673)
     | > avg_loss_gen:[91m 2.15771 [0m(+0.36633)
     | > avg_loss_kl:[92m 1.34628 [0m(-0.18156)
     | > avg_loss_feat:[91m 2.84327 [0m(+0.70538)
     | > avg_loss_mel:[91m 19.47390 [0m(+0.65475)
     | > avg_loss_duration:[92m 1.94041 [0m(-0.02446)
     | > avg_loss_1:[91m 27.76157 [0m(+1.52044)


[4m[1m > EPOCH: 165/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 18:22:33) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 0/80 -- GLOBAL_STEP: 13200[0m
     | > loss_disc: 2.46956  (2.46956)
     | > loss_disc_real_0: 0.09869  (0.09869)
     | > loss_disc_real_1: 0.25310  (0.25310)
     | > loss_disc_real_2: 0.22257  (0.22257)
     | > loss_disc_real_3: 0.20347  (0.20347)
     | > loss_disc_real_4: 0.24583  (0.24583)
     | > loss_disc_real_5: 0.26433  (0.26433)
     | > loss_0: 2.46956  (2.46956)
     | > grad_norm_0: 387.37064  (387.37064)
     | > loss_gen: 2.21416  (2.21416)
     | > loss_kl: 1.10764  (1.10764)
     | > loss_feat: 3.57624  (3.57624)
     | > loss_mel: 20.44652  (20.44652)
     | > loss_duration: 1.58283  (1.58283)
     | > amp_scaler: 32.00000  (32.00000)
     | > loss_1: 28.92739  (28.92739)
     | > grad_norm_1: 1374.72461  (1374.72461)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.56620  (3.56625)
     | > loader_time: 23.45240  (23.45242)


[1m   --> STEP: 25/80 -- GLOBAL_STEP: 13225[0m
     | > loss_disc: 2.57597  (2.65



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00801 [0m(+0.00200)
     | > avg_loss_disc:[91m 2.79279 [0m(+0.26594)
     | > avg_loss_disc_real_0:[91m 0.20585 [0m(+0.09513)
     | > avg_loss_disc_real_1:[92m 0.24424 [0m(-0.02372)
     | > avg_loss_disc_real_2:[91m 0.27272 [0m(+0.03674)
     | > avg_loss_disc_real_3:[91m 0.22308 [0m(+0.01728)
     | > avg_loss_disc_real_4:[92m 0.22706 [0m(-0.03441)
     | > avg_loss_disc_real_5:[92m 0.24431 [0m(-0.02503)
     | > avg_loss_0:[91m 2.79279 [0m(+0.26594)
     | > avg_loss_gen:[92m 1.79628 [0m(-0.36143)
     | > avg_loss_kl:[91m 1.48746 [0m(+0.14118)
     | > avg_loss_feat:[92m 2.28407 [0m(-0.55920)
     | > avg_loss_mel:[91m 20.03964 [0m(+0.56574)
     | > avg_loss_duration:[91m 1.96778 [0m(+0.02737)
     | > avg_loss_1:[92m 27.57524 [0m(-0.18633)


[4m[1m > EPOCH: 166/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 18:28:07) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 20/80 -- GLOBAL_STEP: 13300[0m
     | > loss_disc: 2.57417  (2.63436)
     | > loss_disc_real_0: 0.16722  (0.16648)
     | > loss_disc_real_1: 0.25360  (0.23155)
     | > loss_disc_real_2: 0.16850  (0.22647)
     | > loss_disc_real_3: 0.19706  (0.23358)
     | > loss_disc_real_4: 0.23274  (0.24137)
     | > loss_disc_real_5: 0.22399  (0.24717)
     | > loss_0: 2.57417  (2.63436)
     | > grad_norm_0: 197.58849  (293.04303)
     | > loss_gen: 1.95763  (2.05153)
     | > loss_kl: 1.41552  (1.27775)
     | > loss_feat: 3.00103  (2.91299)
     | > loss_mel: 19.85134  (19.90130)
     | > loss_duration: 1.54770  (1.51780)
     | > amp_scaler: 64.00000  (40.00000)
     | > loss_1: 27.77321  (27.66136)
     | > grad_norm_1: 1355.88513  (1037.50720)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.61230  (3.57282)
     | > loader_time: 0.01000  (0.00891)


[1m   --> STEP: 45/80 -- GLOBAL_STEP: 13325[0m
     | > loss_disc: 2.55895  (2.651



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00100)
     | > avg_loss_disc:[92m 2.57661 [0m(-0.21618)
     | > avg_loss_disc_real_0:[92m 0.20285 [0m(-0.00300)
     | > avg_loss_disc_real_1:[92m 0.22275 [0m(-0.02149)
     | > avg_loss_disc_real_2:[92m 0.23329 [0m(-0.03943)
     | > avg_loss_disc_real_3:[92m 0.20612 [0m(-0.01695)
     | > avg_loss_disc_real_4:[91m 0.25669 [0m(+0.02963)
     | > avg_loss_disc_real_5:[91m 0.25121 [0m(+0.00690)
     | > avg_loss_0:[92m 2.57661 [0m(-0.21618)
     | > avg_loss_gen:[91m 2.01013 [0m(+0.21385)
     | > avg_loss_kl:[91m 1.84377 [0m(+0.35631)
     | > avg_loss_feat:[91m 2.62636 [0m(+0.34228)
     | > avg_loss_mel:[91m 20.49228 [0m(+0.45264)
     | > avg_loss_duration:[92m 1.95545 [0m(-0.01233)
     | > avg_loss_1:[91m 28.92800 [0m(+1.35275)


[4m[1m > EPOCH: 167/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 18:33:41) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 15/80 -- GLOBAL_STEP: 13375[0m
     | > loss_disc: 2.62428  (2.69574)
     | > loss_disc_real_0: 0.27496  (0.20361)
     | > loss_disc_real_1: 0.23576  (0.23617)
     | > loss_disc_real_2: 0.26952  (0.22864)
     | > loss_disc_real_3: 0.26877  (0.23508)
     | > loss_disc_real_4: 0.21973  (0.24018)
     | > loss_disc_real_5: 0.22397  (0.24994)
     | > loss_0: 2.62428  (2.69574)
     | > grad_norm_0: 665.27380  (472.56631)
     | > loss_gen: 2.33971  (2.08917)
     | > loss_kl: 1.24405  (1.22220)
     | > loss_feat: 4.02487  (2.92732)
     | > loss_mel: 20.02913  (19.52882)
     | > loss_duration: 1.54251  (1.51535)
     | > amp_scaler: 64.00000  (64.00000)
     | > loss_1: 29.18027  (27.28286)
     | > grad_norm_1: 1117.34253  (1443.09460)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.60030  (3.56954)
     | > loader_time: 0.01500  (0.00941)


[1m   --> STEP: 40/80 -- GLOBAL_STEP: 13400[0m
     | > loss_disc: 2.70278  (2.786



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00000)
     | > avg_loss_disc:[92m 2.53059 [0m(-0.04602)
     | > avg_loss_disc_real_0:[92m 0.15820 [0m(-0.04464)
     | > avg_loss_disc_real_1:[92m 0.19505 [0m(-0.02770)
     | > avg_loss_disc_real_2:[91m 0.26330 [0m(+0.03001)
     | > avg_loss_disc_real_3:[91m 0.24032 [0m(+0.03419)
     | > avg_loss_disc_real_4:[92m 0.25561 [0m(-0.00108)
     | > avg_loss_disc_real_5:[92m 0.25050 [0m(-0.00071)
     | > avg_loss_0:[92m 2.53059 [0m(-0.04602)
     | > avg_loss_gen:[91m 2.18254 [0m(+0.17241)
     | > avg_loss_kl:[92m 1.67414 [0m(-0.16963)
     | > avg_loss_feat:[91m 2.98857 [0m(+0.36221)
     | > avg_loss_mel:[92m 19.79168 [0m(-0.70060)
     | > avg_loss_duration:[92m 1.91740 [0m(-0.03806)
     | > avg_loss_1:[92m 28.55433 [0m(-0.37367)


[4m[1m > EPOCH: 168/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 18:39:15) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 10/80 -- GLOBAL_STEP: 13450[0m
     | > loss_disc: 3.01553  (2.84351)
     | > loss_disc_real_0: 0.60682  (0.27507)
     | > loss_disc_real_1: 0.17407  (0.22987)
     | > loss_disc_real_2: 0.22205  (0.23041)
     | > loss_disc_real_3: 0.20113  (0.23019)
     | > loss_disc_real_4: 0.24015  (0.24138)
     | > loss_disc_real_5: 0.23791  (0.24301)
     | > loss_0: 3.01553  (2.84351)
     | > grad_norm_0: 783.30054  (586.50586)
     | > loss_gen: 2.10949  (2.16603)
     | > loss_kl: 1.30797  (1.27390)
     | > loss_feat: 2.58917  (3.15532)
     | > loss_mel: 19.03418  (19.48238)
     | > loss_duration: 1.46111  (1.50854)
     | > amp_scaler: 64.00000  (64.00000)
     | > loss_1: 26.50191  (27.58617)
     | > grad_norm_1: 906.76056  (905.00079)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.58430  (3.55023)
     | > loader_time: 0.00900  (0.00861)


[1m   --> STEP: 35/80 -- GLOBAL_STEP: 13475[0m
     | > loss_disc: 2.68873  (2.70931



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time: 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[91m 2.76173 [0m(+0.23113)
     | > avg_loss_disc_real_0:[92m 0.15744 [0m(-0.00077)
     | > avg_loss_disc_real_1:[91m 0.25654 [0m(+0.06149)
     | > avg_loss_disc_real_2:[92m 0.21710 [0m(-0.04620)
     | > avg_loss_disc_real_3:[91m 0.26026 [0m(+0.01994)
     | > avg_loss_disc_real_4:[92m 0.20830 [0m(-0.04731)
     | > avg_loss_disc_real_5:[91m 0.25540 [0m(+0.00490)
     | > avg_loss_0:[91m 2.76173 [0m(+0.23113)
     | > avg_loss_gen:[92m 1.71191 [0m(-0.47063)
     | > avg_loss_kl:[92m 1.38451 [0m(-0.28964)
     | > avg_loss_feat:[92m 2.19907 [0m(-0.78950)
     | > avg_loss_mel:[92m 19.53390 [0m(-0.25778)
     | > avg_loss_duration:[91m 1.92702 [0m(+0.00962)
     | > avg_loss_1:[92m 26.75640 [0m(-1.79793)


[4m[1m > EPOCH: 169/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 18:44:50) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 5/80 -- GLOBAL_STEP: 13525[0m
     | > loss_disc: 2.69181  (2.65481)
     | > loss_disc_real_0: 0.20357  (0.18247)
     | > loss_disc_real_1: 0.21048  (0.22105)
     | > loss_disc_real_2: 0.26284  (0.22859)
     | > loss_disc_real_3: 0.23157  (0.24872)
     | > loss_disc_real_4: 0.25431  (0.25081)
     | > loss_disc_real_5: 0.24878  (0.24716)
     | > loss_0: 2.69181  (2.65481)
     | > grad_norm_0: 510.02264  (332.92502)
     | > loss_gen: 2.11294  (2.06441)
     | > loss_kl: 1.24102  (1.18029)
     | > loss_feat: 3.25568  (3.00384)
     | > loss_mel: 19.58856  (19.53948)
     | > loss_duration: 1.55075  (1.53289)
     | > amp_scaler: 64.00000  (64.00000)
     | > loss_1: 27.74895  (27.32092)
     | > grad_norm_1: 1388.95154  (1205.16516)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.55420  (3.52921)
     | > loader_time: 0.00800  (0.00740)


[1m   --> STEP: 30/80 -- GLOBAL_STEP: 13550[0m
     | > loss_disc: 2.70672  (2.6813



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[91m 2.84806 [0m(+0.08633)
     | > avg_loss_disc_real_0:[91m 0.43955 [0m(+0.28211)
     | > avg_loss_disc_real_1:[92m 0.19176 [0m(-0.06478)
     | > avg_loss_disc_real_2:[92m 0.20163 [0m(-0.01547)
     | > avg_loss_disc_real_3:[91m 0.28025 [0m(+0.01999)
     | > avg_loss_disc_real_4:[91m 0.21837 [0m(+0.01007)
     | > avg_loss_disc_real_5:[91m 0.27975 [0m(+0.02435)
     | > avg_loss_0:[91m 2.84806 [0m(+0.08633)
     | > avg_loss_gen:[91m 2.04680 [0m(+0.33489)
     | > avg_loss_kl:[91m 1.67056 [0m(+0.28605)
     | > avg_loss_feat:[92m 2.12513 [0m(-0.07393)
     | > avg_loss_mel:[92m 18.02927 [0m(-1.50463)
     | > avg_loss_duration:[91m 1.94053 [0m(+0.01352)
     | > avg_loss_1:[92m 25.81229 [0m(-0.94411)


[4m[1m > EPOCH: 170/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 18:51:46) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 0/80 -- GLOBAL_STEP: 13600[0m
     | > loss_disc: 2.62467  (2.62467)
     | > loss_disc_real_0: 0.38098  (0.38098)
     | > loss_disc_real_1: 0.18608  (0.18608)
     | > loss_disc_real_2: 0.21081  (0.21081)
     | > loss_disc_real_3: 0.27975  (0.27975)
     | > loss_disc_real_4: 0.19700  (0.19700)
     | > loss_disc_real_5: 0.24425  (0.24425)
     | > loss_0: 2.62467  (2.62467)
     | > grad_norm_0: 545.39301  (545.39301)
     | > loss_gen: 2.27906  (2.27906)
     | > loss_kl: 1.11650  (1.11650)
     | > loss_feat: 3.27442  (3.27442)
     | > loss_mel: 21.21093  (21.21093)
     | > loss_duration: 1.55050  (1.55050)
     | > amp_scaler: 64.00000  (64.00000)
     | > loss_1: 29.43141  (29.43141)
     | > grad_norm_1: 1070.11023  (1070.11023)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 4.10570  (4.10570)
     | > loader_time: 25.81780  (25.81783)


[1m   --> STEP: 25/80 -- GLOBAL_STEP: 13625[0m
     | > loss_disc: 2.61423  (2.65



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.01401 [0m(+0.00500)
     | > avg_loss_disc:[92m 2.70471 [0m(-0.14335)
     | > avg_loss_disc_real_0:[92m 0.09328 [0m(-0.34627)
     | > avg_loss_disc_real_1:[91m 0.21394 [0m(+0.02219)
     | > avg_loss_disc_real_2:[91m 0.24042 [0m(+0.03879)
     | > avg_loss_disc_real_3:[92m 0.21982 [0m(-0.06044)
     | > avg_loss_disc_real_4:[92m 0.21007 [0m(-0.00830)
     | > avg_loss_disc_real_5:[92m 0.26615 [0m(-0.01360)
     | > avg_loss_0:[92m 2.70471 [0m(-0.14335)
     | > avg_loss_gen:[92m 1.66925 [0m(-0.37755)
     | > avg_loss_kl:[92m 1.41736 [0m(-0.25320)
     | > avg_loss_feat:[91m 2.36874 [0m(+0.24360)
     | > avg_loss_mel:[91m 19.41351 [0m(+1.38424)
     | > avg_loss_duration:[91m 1.94212 [0m(+0.00159)
     | > avg_loss_1:[91m 26.81097 [0m(+0.99868)


[4m[1m > EPOCH: 171/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 19:16:13) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 20/80 -- GLOBAL_STEP: 13700[0m
     | > loss_disc: 2.65333  (2.69668)
     | > loss_disc_real_0: 0.28059  (0.21076)
     | > loss_disc_real_1: 0.22331  (0.22336)
     | > loss_disc_real_2: 0.20494  (0.23201)
     | > loss_disc_real_3: 0.28581  (0.23398)
     | > loss_disc_real_4: 0.26321  (0.24200)
     | > loss_disc_real_5: 0.26756  (0.24594)
     | > loss_0: 2.65333  (2.69668)
     | > grad_norm_0: 474.28638  (474.39127)
     | > loss_gen: 2.19421  (2.08061)
     | > loss_kl: 1.40854  (1.20493)
     | > loss_feat: 2.99341  (2.96067)
     | > loss_mel: 20.62832  (19.76017)
     | > loss_duration: 1.58307  (1.51381)
     | > amp_scaler: 64.00000  (64.00000)
     | > loss_1: 28.80757  (27.52019)
     | > grad_norm_1: 1277.43005  (1064.79187)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.65530  (3.58762)
     | > loader_time: 0.00900  (0.00856)


[1m   --> STEP: 45/80 -- GLOBAL_STEP: 13725[0m
     | > loss_disc: 2.56432  (2.675



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00500)
     | > avg_loss_disc:[92m 2.42954 [0m(-0.27516)
     | > avg_loss_disc_real_0:[92m 0.07882 [0m(-0.01446)
     | > avg_loss_disc_real_1:[91m 0.22480 [0m(+0.01086)
     | > avg_loss_disc_real_2:[92m 0.17627 [0m(-0.06415)
     | > avg_loss_disc_real_3:[92m 0.18645 [0m(-0.03337)
     | > avg_loss_disc_real_4:[91m 0.27296 [0m(+0.06289)
     | > avg_loss_disc_real_5:[92m 0.23192 [0m(-0.03423)
     | > avg_loss_0:[92m 2.42954 [0m(-0.27516)
     | > avg_loss_gen:[91m 2.00305 [0m(+0.33380)
     | > avg_loss_kl:[91m 1.59902 [0m(+0.18166)
     | > avg_loss_feat:[91m 3.32106 [0m(+0.95232)
     | > avg_loss_mel:[91m 20.22744 [0m(+0.81393)
     | > avg_loss_duration:[92m 1.93262 [0m(-0.00950)
     | > avg_loss_1:[91m 29.08318 [0m(+2.27221)


[4m[1m > EPOCH: 172/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 19:21:49) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 15/80 -- GLOBAL_STEP: 13775[0m
     | > loss_disc: 2.63713  (2.73670)
     | > loss_disc_real_0: 0.19149  (0.22277)
     | > loss_disc_real_1: 0.25195  (0.23306)
     | > loss_disc_real_2: 0.27344  (0.23448)
     | > loss_disc_real_3: 0.20710  (0.23323)
     | > loss_disc_real_4: 0.23781  (0.23927)
     | > loss_disc_real_5: 0.24381  (0.24159)
     | > loss_0: 2.63713  (2.73670)
     | > grad_norm_0: 229.15674  (325.46915)
     | > loss_gen: 1.93344  (2.00615)
     | > loss_kl: 1.30824  (1.23895)
     | > loss_feat: 2.83947  (2.86620)
     | > loss_mel: 19.16578  (19.60402)
     | > loss_duration: 1.51289  (1.51205)
     | > amp_scaler: 64.00000  (64.00000)
     | > loss_1: 26.75982  (27.22738)
     | > grad_norm_1: 1098.11072  (890.90692)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.61130  (3.57399)
     | > loader_time: 0.00900  (0.00807)


[1m   --> STEP: 40/80 -- GLOBAL_STEP: 13800[0m
     | > loss_disc: 2.74293  (2.7138



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.01001 [0m(+0.00100)
     | > avg_loss_disc:[91m 2.56751 [0m(+0.13796)
     | > avg_loss_disc_real_0:[91m 0.22127 [0m(+0.14245)
     | > avg_loss_disc_real_1:[92m 0.18504 [0m(-0.03976)
     | > avg_loss_disc_real_2:[91m 0.21657 [0m(+0.04030)
     | > avg_loss_disc_real_3:[91m 0.21347 [0m(+0.02702)
     | > avg_loss_disc_real_4:[92m 0.20758 [0m(-0.06538)
     | > avg_loss_disc_real_5:[91m 0.27684 [0m(+0.04492)
     | > avg_loss_0:[91m 2.56751 [0m(+0.13796)
     | > avg_loss_gen:[92m 2.00146 [0m(-0.00159)
     | > avg_loss_kl:[92m 1.28265 [0m(-0.31637)
     | > avg_loss_feat:[92m 2.35110 [0m(-0.96996)
     | > avg_loss_mel:[92m 18.74514 [0m(-1.48230)
     | > avg_loss_duration:[91m 1.94352 [0m(+0.01090)
     | > avg_loss_1:[92m 26.32386 [0m(-2.75932)


[4m[1m > EPOCH: 173/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 19:27:24) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 10/80 -- GLOBAL_STEP: 13850[0m
     | > loss_disc: 2.72853  (2.66464)
     | > loss_disc_real_0: 0.16402  (0.18205)
     | > loss_disc_real_1: 0.29676  (0.23515)
     | > loss_disc_real_2: 0.23329  (0.22792)
     | > loss_disc_real_3: 0.21915  (0.23165)
     | > loss_disc_real_4: 0.27877  (0.24261)
     | > loss_disc_real_5: 0.24769  (0.24868)
     | > loss_0: 2.72853  (2.66464)
     | > grad_norm_0: 37.13935  (384.19370)
     | > loss_gen: 1.84286  (2.11407)
     | > loss_kl: 1.28685  (1.21173)
     | > loss_feat: 2.46218  (3.06916)
     | > loss_mel: 18.99150  (19.64290)
     | > loss_duration: 1.45811  (1.51179)
     | > amp_scaler: 64.00000  (64.00000)
     | > loss_1: 26.04150  (27.54963)
     | > grad_norm_1: 894.23413  (1203.57935)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.56930  (3.55125)
     | > loader_time: 0.00800  (0.00771)


[1m   --> STEP: 35/80 -- GLOBAL_STEP: 13875[0m
     | > loss_disc: 2.60584  (2.67931



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00801 [0m(-0.00200)
     | > avg_loss_disc:[92m 2.47465 [0m(-0.09286)
     | > avg_loss_disc_real_0:[92m 0.14882 [0m(-0.07244)
     | > avg_loss_disc_real_1:[91m 0.19530 [0m(+0.01026)
     | > avg_loss_disc_real_2:[92m 0.21029 [0m(-0.00628)
     | > avg_loss_disc_real_3:[91m 0.25180 [0m(+0.03833)
     | > avg_loss_disc_real_4:[91m 0.26094 [0m(+0.05335)
     | > avg_loss_disc_real_5:[92m 0.25463 [0m(-0.02221)
     | > avg_loss_0:[92m 2.47465 [0m(-0.09286)
     | > avg_loss_gen:[91m 2.06040 [0m(+0.05895)
     | > avg_loss_kl:[91m 1.66271 [0m(+0.38006)
     | > avg_loss_feat:[91m 2.85851 [0m(+0.50740)
     | > avg_loss_mel:[91m 20.21615 [0m(+1.47101)
     | > avg_loss_duration:[91m 1.95001 [0m(+0.00650)
     | > avg_loss_1:[91m 28.74778 [0m(+2.42392)


[4m[1m > EPOCH: 174/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 19:32:58) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 5/80 -- GLOBAL_STEP: 13925[0m
     | > loss_disc: 2.59255  (2.61527)
     | > loss_disc_real_0: 0.17447  (0.15360)
     | > loss_disc_real_1: 0.23036  (0.22839)
     | > loss_disc_real_2: 0.19021  (0.23253)
     | > loss_disc_real_3: 0.25094  (0.23127)
     | > loss_disc_real_4: 0.23073  (0.24733)
     | > loss_disc_real_5: 0.24590  (0.26197)
     | > loss_0: 2.59255  (2.61527)
     | > grad_norm_0: 287.03162  (315.20895)
     | > loss_gen: 2.14616  (2.09348)
     | > loss_kl: 1.37099  (1.22574)
     | > loss_feat: 3.12624  (3.04404)
     | > loss_mel: 19.73019  (19.76208)
     | > loss_duration: 1.49921  (1.50311)
     | > amp_scaler: 64.00000  (64.00000)
     | > loss_1: 27.87279  (27.62846)
     | > grad_norm_1: 1389.54700  (1113.90894)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.56320  (3.53161)
     | > loader_time: 0.00800  (0.00780)


[1m   --> STEP: 30/80 -- GLOBAL_STEP: 13950[0m
     | > loss_disc: 2.80447  (2.6868



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00100)
     | > avg_loss_disc:[91m 2.59769 [0m(+0.12304)
     | > avg_loss_disc_real_0:[92m 0.05492 [0m(-0.09391)
     | > avg_loss_disc_real_1:[91m 0.23212 [0m(+0.03682)
     | > avg_loss_disc_real_2:[92m 0.16915 [0m(-0.04115)
     | > avg_loss_disc_real_3:[91m 0.32104 [0m(+0.06924)
     | > avg_loss_disc_real_4:[92m 0.22488 [0m(-0.03606)
     | > avg_loss_disc_real_5:[92m 0.22984 [0m(-0.02479)
     | > avg_loss_0:[91m 2.59769 [0m(+0.12304)
     | > avg_loss_gen:[92m 1.77584 [0m(-0.28457)
     | > avg_loss_kl:[92m 1.40959 [0m(-0.25311)
     | > avg_loss_feat:[92m 2.57808 [0m(-0.28042)
     | > avg_loss_mel:[92m 19.95786 [0m(-0.25829)
     | > avg_loss_duration:[92m 1.94262 [0m(-0.00739)
     | > avg_loss_1:[92m 27.66400 [0m(-1.08378)


[4m[1m > EPOCH: 175/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 19:38:33) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 0/80 -- GLOBAL_STEP: 14000[0m
     | > loss_disc: 2.53530  (2.53530)
     | > loss_disc_real_0: 0.04880  (0.04880)
     | > loss_disc_real_1: 0.22081  (0.22081)
     | > loss_disc_real_2: 0.16830  (0.16830)
     | > loss_disc_real_3: 0.27737  (0.27737)
     | > loss_disc_real_4: 0.22120  (0.22120)
     | > loss_disc_real_5: 0.23780  (0.23780)
     | > loss_0: 2.53530  (2.53530)
     | > grad_norm_0: 266.58020  (266.58020)
     | > loss_gen: 1.92086  (1.92086)
     | > loss_kl: 1.47269  (1.47269)
     | > loss_feat: 3.45022  (3.45022)
     | > loss_mel: 20.88435  (20.88435)
     | > loss_duration: 1.53848  (1.53848)
     | > amp_scaler: 64.00000  (64.00000)
     | > loss_1: 29.26661  (29.26661)
     | > grad_norm_1: 985.74304  (985.74304)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.55020  (3.55023)
     | > loader_time: 23.06350  (23.06349)


[1m   --> STEP: 25/80 -- GLOBAL_STEP: 14025[0m
     | > loss_disc: 2.53077  (2.6799



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00801 [0m(-0.00100)
     | > avg_loss_disc:[91m 2.61166 [0m(+0.01397)
     | > avg_loss_disc_real_0:[91m 0.16481 [0m(+0.10990)
     | > avg_loss_disc_real_1:[92m 0.21656 [0m(-0.01556)
     | > avg_loss_disc_real_2:[91m 0.19312 [0m(+0.02397)
     | > avg_loss_disc_real_3:[92m 0.19930 [0m(-0.12174)
     | > avg_loss_disc_real_4:[92m 0.22478 [0m(-0.00009)
     | > avg_loss_disc_real_5:[91m 0.24584 [0m(+0.01600)
     | > avg_loss_0:[91m 2.61166 [0m(+0.01397)
     | > avg_loss_gen:[91m 1.89283 [0m(+0.11700)
     | > avg_loss_kl:[92m 1.24988 [0m(-0.15971)
     | > avg_loss_feat:[91m 3.10432 [0m(+0.52623)
     | > avg_loss_mel:[92m 19.86826 [0m(-0.08960)
     | > avg_loss_duration:[91m 1.94279 [0m(+0.00016)
     | > avg_loss_1:[91m 28.05807 [0m(+0.39408)


[4m[1m > EPOCH: 176/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 19:44:07) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 20/80 -- GLOBAL_STEP: 14100[0m
     | > loss_disc: 2.71990  (2.72589)
     | > loss_disc_real_0: 0.41553  (0.21319)
     | > loss_disc_real_1: 0.21360  (0.22420)
     | > loss_disc_real_2: 0.19758  (0.22275)
     | > loss_disc_real_3: 0.21621  (0.23910)
     | > loss_disc_real_4: 0.20257  (0.24683)
     | > loss_disc_real_5: 0.25075  (0.24869)
     | > loss_0: 2.71990  (2.72589)
     | > grad_norm_0: 548.52582  (349.45462)
     | > loss_gen: 2.17278  (2.03514)
     | > loss_kl: 1.31544  (1.29095)
     | > loss_feat: 3.00300  (2.89898)
     | > loss_mel: 19.80513  (19.42148)
     | > loss_duration: 1.50686  (1.50722)
     | > amp_scaler: 64.00000  (64.00000)
     | > loss_1: 27.80322  (27.15376)
     | > grad_norm_1: 1052.53394  (1131.49500)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.61830  (3.58076)
     | > loader_time: 0.01000  (0.00891)


[1m   --> STEP: 45/80 -- GLOBAL_STEP: 14125[0m
     | > loss_disc: 2.71790  (2.707



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.01001 [0m(+0.00200)
     | > avg_loss_disc:[91m 2.69711 [0m(+0.08546)
     | > avg_loss_disc_real_0:[91m 0.28832 [0m(+0.12351)
     | > avg_loss_disc_real_1:[91m 0.25820 [0m(+0.04163)
     | > avg_loss_disc_real_2:[91m 0.23164 [0m(+0.03852)
     | > avg_loss_disc_real_3:[91m 0.23707 [0m(+0.03777)
     | > avg_loss_disc_real_4:[91m 0.27402 [0m(+0.04924)
     | > avg_loss_disc_real_5:[91m 0.28431 [0m(+0.03848)
     | > avg_loss_0:[91m 2.69711 [0m(+0.08546)
     | > avg_loss_gen:[91m 2.07060 [0m(+0.17776)
     | > avg_loss_kl:[92m 1.22066 [0m(-0.02922)
     | > avg_loss_feat:[92m 1.98564 [0m(-1.11868)
     | > avg_loss_mel:[92m 18.05844 [0m(-1.80981)
     | > avg_loss_duration:[91m 1.95539 [0m(+0.01261)
     | > avg_loss_1:[92m 25.29073 [0m(-2.76734)


[4m[1m > EPOCH: 177/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 19:49:42) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 15/80 -- GLOBAL_STEP: 14175[0m
     | > loss_disc: 2.63158  (2.67203)
     | > loss_disc_real_0: 0.15513  (0.17216)
     | > loss_disc_real_1: 0.20493  (0.22050)
     | > loss_disc_real_2: 0.27268  (0.23068)
     | > loss_disc_real_3: 0.27424  (0.23191)
     | > loss_disc_real_4: 0.22410  (0.23647)
     | > loss_disc_real_5: 0.21415  (0.24291)
     | > loss_0: 2.63158  (2.67203)
     | > grad_norm_0: 332.36679  (249.26709)
     | > loss_gen: 1.91583  (1.97549)
     | > loss_kl: 1.44695  (1.22814)
     | > loss_feat: 2.98834  (2.86242)
     | > loss_mel: 19.58938  (19.30267)
     | > loss_duration: 1.55964  (1.50635)
     | > amp_scaler: 64.00000  (64.00000)
     | > loss_1: 27.50014  (26.87506)
     | > grad_norm_1: 1392.25037  (981.76569)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.59530  (3.56685)
     | > loader_time: 0.00900  (0.00834)


[1m   --> STEP: 40/80 -- GLOBAL_STEP: 14200[0m
     | > loss_disc: 2.78450  (2.6904



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00100)
     | > avg_loss_disc:[91m 2.75975 [0m(+0.06264)
     | > avg_loss_disc_real_0:[92m 0.26235 [0m(-0.02597)
     | > avg_loss_disc_real_1:[91m 0.26688 [0m(+0.00869)
     | > avg_loss_disc_real_2:[92m 0.20786 [0m(-0.02378)
     | > avg_loss_disc_real_3:[92m 0.23598 [0m(-0.00109)
     | > avg_loss_disc_real_4:[92m 0.25294 [0m(-0.02108)
     | > avg_loss_disc_real_5:[92m 0.25595 [0m(-0.02837)
     | > avg_loss_0:[91m 2.75975 [0m(+0.06264)
     | > avg_loss_gen:[92m 1.91035 [0m(-0.16025)
     | > avg_loss_kl:[91m 1.26216 [0m(+0.04150)
     | > avg_loss_feat:[91m 2.55142 [0m(+0.56578)
     | > avg_loss_mel:[91m 18.88828 [0m(+0.82984)
     | > avg_loss_duration:[92m 1.93739 [0m(-0.01800)
     | > avg_loss_1:[91m 26.54961 [0m(+1.25887)


[4m[1m > EPOCH: 178/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 19:55:17) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 10/80 -- GLOBAL_STEP: 14250[0m
     | > loss_disc: 2.86607  (2.73779)
     | > loss_disc_real_0: 0.43994  (0.22578)
     | > loss_disc_real_1: 0.21387  (0.22236)
     | > loss_disc_real_2: 0.18737  (0.23133)
     | > loss_disc_real_3: 0.23813  (0.23424)
     | > loss_disc_real_4: 0.19019  (0.23117)
     | > loss_disc_real_5: 0.21538  (0.23449)
     | > loss_0: 2.86607  (2.73779)
     | > grad_norm_0: 637.97119  (392.60767)
     | > loss_gen: 2.19934  (2.07466)
     | > loss_kl: 1.20940  (1.23225)
     | > loss_feat: 2.92547  (3.00456)
     | > loss_mel: 20.12399  (19.42639)
     | > loss_duration: 1.49611  (1.50943)
     | > amp_scaler: 64.00000  (64.00000)
     | > loss_1: 27.95431  (27.24728)
     | > grad_norm_1: 909.43976  (849.75214)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.56130  (3.54703)
     | > loader_time: 0.00700  (0.00801)


[1m   --> STEP: 35/80 -- GLOBAL_STEP: 14275[0m
     | > loss_disc: 2.72178  (2.74472



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time: 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[92m 2.54928 [0m(-0.21047)
     | > avg_loss_disc_real_0:[92m 0.21207 [0m(-0.05028)
     | > avg_loss_disc_real_1:[92m 0.24757 [0m(-0.01932)
     | > avg_loss_disc_real_2:[91m 0.27686 [0m(+0.06900)
     | > avg_loss_disc_real_3:[92m 0.22935 [0m(-0.00663)
     | > avg_loss_disc_real_4:[92m 0.23278 [0m(-0.02016)
     | > avg_loss_disc_real_5:[91m 0.26451 [0m(+0.00856)
     | > avg_loss_0:[92m 2.54928 [0m(-0.21047)
     | > avg_loss_gen:[91m 2.20529 [0m(+0.29493)
     | > avg_loss_kl:[91m 1.36718 [0m(+0.10502)
     | > avg_loss_feat:[91m 2.57489 [0m(+0.02347)
     | > avg_loss_mel:[91m 19.12486 [0m(+0.23658)
     | > avg_loss_duration:[91m 1.96085 [0m(+0.02347)
     | > avg_loss_1:[91m 27.23307 [0m(+0.68346)


[4m[1m > EPOCH: 179/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 20:00:52) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 5/80 -- GLOBAL_STEP: 14325[0m
     | > loss_disc: 2.98726  (2.75259)
     | > loss_disc_real_0: 0.62222  (0.28753)
     | > loss_disc_real_1: 0.28206  (0.21341)
     | > loss_disc_real_2: 0.23411  (0.23421)
     | > loss_disc_real_3: 0.28818  (0.23712)
     | > loss_disc_real_4: 0.25499  (0.23076)
     | > loss_disc_real_5: 0.25924  (0.24411)
     | > loss_0: 2.98726  (2.75259)
     | > grad_norm_0: 334.84210  (276.74097)
     | > loss_gen: 2.21603  (2.15888)
     | > loss_kl: 1.23255  (1.20719)
     | > loss_feat: 2.85692  (3.04890)
     | > loss_mel: 19.40083  (19.43509)
     | > loss_duration: 1.48034  (1.49893)
     | > amp_scaler: 64.00000  (64.00000)
     | > loss_1: 27.18668  (27.34899)
     | > grad_norm_1: 794.65491  (749.95087)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.57330  (3.53702)
     | > loader_time: 0.00800  (0.00801)


[1m   --> STEP: 30/80 -- GLOBAL_STEP: 14350[0m
     | > loss_disc: 2.69049  (2.77650)



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00801 [0m(-0.00100)
     | > avg_loss_disc:[91m 2.88434 [0m(+0.33506)
     | > avg_loss_disc_real_0:[91m 0.48408 [0m(+0.27201)
     | > avg_loss_disc_real_1:[92m 0.23375 [0m(-0.01381)
     | > avg_loss_disc_real_2:[92m 0.18477 [0m(-0.09209)
     | > avg_loss_disc_real_3:[91m 0.24080 [0m(+0.01145)
     | > avg_loss_disc_real_4:[91m 0.26119 [0m(+0.02841)
     | > avg_loss_disc_real_5:[91m 0.27482 [0m(+0.01031)
     | > avg_loss_0:[91m 2.88434 [0m(+0.33506)
     | > avg_loss_gen:[92m 2.18113 [0m(-0.02416)
     | > avg_loss_kl:[92m 1.24785 [0m(-0.11933)
     | > avg_loss_feat:[92m 2.53634 [0m(-0.03855)
     | > avg_loss_mel:[91m 20.15451 [0m(+1.02965)
     | > avg_loss_duration:[92m 1.95331 [0m(-0.00754)
     | > avg_loss_1:[91m 28.07315 [0m(+0.84008)


[4m[1m > EPOCH: 180/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 20:06:26) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 0/80 -- GLOBAL_STEP: 14400[0m
     | > loss_disc: 2.81316  (2.81316)
     | > loss_disc_real_0: 0.49810  (0.49810)
     | > loss_disc_real_1: 0.21239  (0.21239)
     | > loss_disc_real_2: 0.19070  (0.19070)
     | > loss_disc_real_3: 0.24487  (0.24487)
     | > loss_disc_real_4: 0.23684  (0.23684)
     | > loss_disc_real_5: 0.24418  (0.24418)
     | > loss_0: 2.81316  (2.81316)
     | > grad_norm_0: 367.91580  (367.91580)
     | > loss_gen: 2.21195  (2.21195)
     | > loss_kl: 1.37117  (1.37117)
     | > loss_feat: 2.94122  (2.94122)
     | > loss_mel: 19.14312  (19.14312)
     | > loss_duration: 1.50763  (1.50763)
     | > amp_scaler: 64.00000  (64.00000)
     | > loss_1: 27.17509  (27.17509)
     | > grad_norm_1: 1093.19604  (1093.19604)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.57230  (3.57225)
     | > loader_time: 23.33150  (23.33152)


[1m   --> STEP: 25/80 -- GLOBAL_STEP: 14425[0m
     | > loss_disc: 2.60053  (2.68



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00100)
     | > avg_loss_disc:[92m 2.66955 [0m(-0.21478)
     | > avg_loss_disc_real_0:[92m 0.17650 [0m(-0.30758)
     | > avg_loss_disc_real_1:[91m 0.26206 [0m(+0.02831)
     | > avg_loss_disc_real_2:[91m 0.20433 [0m(+0.01955)
     | > avg_loss_disc_real_3:[91m 0.24725 [0m(+0.00645)
     | > avg_loss_disc_real_4:[92m 0.22484 [0m(-0.03635)
     | > avg_loss_disc_real_5:[92m 0.23658 [0m(-0.03825)
     | > avg_loss_0:[92m 2.66955 [0m(-0.21478)
     | > avg_loss_gen:[92m 1.83048 [0m(-0.35065)
     | > avg_loss_kl:[92m 1.18836 [0m(-0.05949)
     | > avg_loss_feat:[92m 2.25382 [0m(-0.28252)
     | > avg_loss_mel:[92m 19.26466 [0m(-0.88985)
     | > avg_loss_duration:[92m 1.93455 [0m(-0.01876)
     | > avg_loss_1:[92m 26.47188 [0m(-1.60127)


[4m[1m > EPOCH: 181/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 20:12:01) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 20/80 -- GLOBAL_STEP: 14500[0m
     | > loss_disc: 2.76749  (2.67877)
     | > loss_disc_real_0: 0.16195  (0.18780)
     | > loss_disc_real_1: 0.28486  (0.23252)
     | > loss_disc_real_2: 0.21476  (0.22884)
     | > loss_disc_real_3: 0.17427  (0.23597)
     | > loss_disc_real_4: 0.25235  (0.23871)
     | > loss_disc_real_5: 0.24673  (0.24314)
     | > loss_0: 2.76749  (2.67877)
     | > grad_norm_0: 40.65845  (201.83429)
     | > loss_gen: 1.87852  (1.98038)
     | > loss_kl: 1.34071  (1.24246)
     | > loss_feat: 2.61382  (2.81976)
     | > loss_mel: 19.32935  (19.33987)
     | > loss_duration: 1.51261  (1.50547)
     | > amp_scaler: 64.00000  (64.00000)
     | > loss_1: 26.67502  (26.88793)
     | > grad_norm_1: 235.69434  (839.64728)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.60930  (3.57736)
     | > loader_time: 0.01000  (0.00916)


[1m   --> STEP: 45/80 -- GLOBAL_STEP: 14525[0m
     | > loss_disc: 2.63373  (2.68444)



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.01201 [0m(+0.00300)
     | > avg_loss_disc:[92m 2.51109 [0m(-0.15847)
     | > avg_loss_disc_real_0:[92m 0.03353 [0m(-0.14297)
     | > avg_loss_disc_real_1:[92m 0.15471 [0m(-0.10735)
     | > avg_loss_disc_real_2:[91m 0.28120 [0m(+0.07687)
     | > avg_loss_disc_real_3:[92m 0.21163 [0m(-0.03561)
     | > avg_loss_disc_real_4:[92m 0.21471 [0m(-0.01013)
     | > avg_loss_disc_real_5:[91m 0.24146 [0m(+0.00488)
     | > avg_loss_0:[92m 2.51109 [0m(-0.15847)
     | > avg_loss_gen:[92m 1.80547 [0m(-0.02502)
     | > avg_loss_kl:[91m 1.74682 [0m(+0.55845)
     | > avg_loss_feat:[91m 3.25186 [0m(+0.99804)
     | > avg_loss_mel:[92m 18.70563 [0m(-0.55903)
     | > avg_loss_duration:[91m 1.94973 [0m(+0.01517)
     | > avg_loss_1:[91m 27.45949 [0m(+0.98761)


[4m[1m > EPOCH: 182/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 20:17:35) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 15/80 -- GLOBAL_STEP: 14575[0m
     | > loss_disc: 2.60663  (2.69749)
     | > loss_disc_real_0: 0.12556  (0.19984)
     | > loss_disc_real_1: 0.19234  (0.22872)
     | > loss_disc_real_2: 0.19655  (0.22593)
     | > loss_disc_real_3: 0.25298  (0.23265)
     | > loss_disc_real_4: 0.20512  (0.23742)
     | > loss_disc_real_5: 0.22857  (0.24392)
     | > loss_0: 2.60663  (2.69749)
     | > grad_norm_0: 196.43181  (227.12897)
     | > loss_gen: 2.10355  (2.02922)
     | > loss_kl: 1.37566  (1.31204)
     | > loss_feat: 3.24737  (2.89182)
     | > loss_mel: 19.70552  (19.34728)
     | > loss_duration: 1.54759  (1.51004)
     | > amp_scaler: 64.00000  (64.00000)
     | > loss_1: 27.97970  (27.09040)
     | > grad_norm_1: 1258.27917  (931.18524)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.59430  (3.57165)
     | > loader_time: 0.00900  (0.00881)


[1m   --> STEP: 40/80 -- GLOBAL_STEP: 14600[0m
     | > loss_disc: 2.69017  (2.7070



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00300)
     | > avg_loss_disc:[91m 2.69898 [0m(+0.18789)
     | > avg_loss_disc_real_0:[91m 0.14133 [0m(+0.10780)
     | > avg_loss_disc_real_1:[91m 0.20851 [0m(+0.05381)
     | > avg_loss_disc_real_2:[92m 0.20636 [0m(-0.07483)
     | > avg_loss_disc_real_3:[92m 0.20909 [0m(-0.00254)
     | > avg_loss_disc_real_4:[91m 0.25490 [0m(+0.04019)
     | > avg_loss_disc_real_5:[91m 0.27755 [0m(+0.03609)
     | > avg_loss_0:[91m 2.69898 [0m(+0.18789)
     | > avg_loss_gen:[92m 1.76539 [0m(-0.04007)
     | > avg_loss_kl:[92m 1.42676 [0m(-0.32006)
     | > avg_loss_feat:[92m 2.42863 [0m(-0.82322)
     | > avg_loss_mel:[91m 19.00046 [0m(+0.29483)
     | > avg_loss_duration:[91m 1.95227 [0m(+0.00255)
     | > avg_loss_1:[92m 26.57352 [0m(-0.88597)


[4m[1m > EPOCH: 183/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 20:23:10) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 10/80 -- GLOBAL_STEP: 14650[0m
     | > loss_disc: 2.60370  (2.63955)
     | > loss_disc_real_0: 0.10343  (0.15984)
     | > loss_disc_real_1: 0.27921  (0.22962)
     | > loss_disc_real_2: 0.29683  (0.23354)
     | > loss_disc_real_3: 0.20555  (0.22953)
     | > loss_disc_real_4: 0.27051  (0.23097)
     | > loss_disc_real_5: 0.25697  (0.24556)
     | > loss_0: 2.60370  (2.63955)
     | > grad_norm_0: 190.94928  (229.83997)
     | > loss_gen: 2.05456  (2.00877)
     | > loss_kl: 1.42066  (1.22964)
     | > loss_feat: 2.91761  (3.00865)
     | > loss_mel: 18.85094  (19.60644)
     | > loss_duration: 1.49579  (1.50994)
     | > amp_scaler: 64.00000  (64.00000)
     | > loss_1: 26.73956  (27.36343)
     | > grad_norm_1: 1040.63977  (924.40179)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.58330  (3.54753)
     | > loader_time: 0.00800  (0.00841)


[1m   --> STEP: 35/80 -- GLOBAL_STEP: 14675[0m
     | > loss_disc: 2.76363  (2.7188



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00601 [0m(-0.00300)
     | > avg_loss_disc:[91m 2.73902 [0m(+0.04004)
     | > avg_loss_disc_real_0:[91m 0.21128 [0m(+0.06995)
     | > avg_loss_disc_real_1:[92m 0.19946 [0m(-0.00905)
     | > avg_loss_disc_real_2:[91m 0.26988 [0m(+0.06352)
     | > avg_loss_disc_real_3:[91m 0.22706 [0m(+0.01796)
     | > avg_loss_disc_real_4:[92m 0.23700 [0m(-0.01790)
     | > avg_loss_disc_real_5:[92m 0.23665 [0m(-0.04090)
     | > avg_loss_0:[91m 2.73902 [0m(+0.04004)
     | > avg_loss_gen:[91m 1.77058 [0m(+0.00518)
     | > avg_loss_kl:[91m 1.45016 [0m(+0.02340)
     | > avg_loss_feat:[92m 2.15687 [0m(-0.27176)
     | > avg_loss_mel:[92m 17.36742 [0m(-1.63304)
     | > avg_loss_duration:[91m 1.95304 [0m(+0.00077)
     | > avg_loss_1:[92m 24.69806 [0m(-1.87545)


[4m[1m > EPOCH: 184/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 20:28:44) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 5/80 -- GLOBAL_STEP: 14725[0m
     | > loss_disc: 2.65693  (2.65919)
     | > loss_disc_real_0: 0.25290  (0.19095)
     | > loss_disc_real_1: 0.22266  (0.22161)
     | > loss_disc_real_2: 0.22463  (0.22764)
     | > loss_disc_real_3: 0.22280  (0.23074)
     | > loss_disc_real_4: 0.24051  (0.24338)
     | > loss_disc_real_5: 0.25548  (0.24252)
     | > loss_0: 2.65693  (2.65919)
     | > grad_norm_0: 244.22757  (178.59094)
     | > loss_gen: 2.10375  (1.98122)
     | > loss_kl: 1.20374  (1.26441)
     | > loss_feat: 2.99652  (2.89428)
     | > loss_mel: 18.87047  (19.21473)
     | > loss_duration: 1.52550  (1.51989)
     | > amp_scaler: 64.00000  (64.00000)
     | > loss_1: 26.69999  (26.87453)
     | > grad_norm_1: 1330.96899  (917.49493)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.58430  (3.52964)
     | > loader_time: 0.01000  (0.00820)


[1m   --> STEP: 30/80 -- GLOBAL_STEP: 14750[0m
     | > loss_disc: 2.79019  (2.69183



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00300)
     | > avg_loss_disc:[92m 2.73201 [0m(-0.00701)
     | > avg_loss_disc_real_0:[91m 0.40335 [0m(+0.19207)
     | > avg_loss_disc_real_1:[92m 0.18389 [0m(-0.01557)
     | > avg_loss_disc_real_2:[92m 0.21631 [0m(-0.05357)
     | > avg_loss_disc_real_3:[92m 0.21519 [0m(-0.01186)
     | > avg_loss_disc_real_4:[91m 0.24879 [0m(+0.01179)
     | > avg_loss_disc_real_5:[91m 0.24315 [0m(+0.00650)
     | > avg_loss_0:[92m 2.73201 [0m(-0.00701)
     | > avg_loss_gen:[91m 1.97425 [0m(+0.20367)
     | > avg_loss_kl:[91m 1.60949 [0m(+0.15933)
     | > avg_loss_feat:[91m 2.36663 [0m(+0.20976)
     | > avg_loss_mel:[91m 18.54925 [0m(+1.18183)
     | > avg_loss_duration:[91m 1.99760 [0m(+0.04456)
     | > avg_loss_1:[91m 26.49722 [0m(+1.79916)


[4m[1m > EPOCH: 185/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 20:34:19) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 0/80 -- GLOBAL_STEP: 14800[0m
     | > loss_disc: 2.70093  (2.70093)
     | > loss_disc_real_0: 0.36282  (0.36282)
     | > loss_disc_real_1: 0.18689  (0.18689)
     | > loss_disc_real_2: 0.19651  (0.19651)
     | > loss_disc_real_3: 0.23289  (0.23289)
     | > loss_disc_real_4: 0.22963  (0.22963)
     | > loss_disc_real_5: 0.24479  (0.24479)
     | > loss_0: 2.70093  (2.70093)
     | > grad_norm_0: 113.17007  (113.17007)
     | > loss_gen: 1.84995  (1.84995)
     | > loss_kl: 0.92783  (0.92783)
     | > loss_feat: 2.84933  (2.84933)
     | > loss_mel: 19.67068  (19.67068)
     | > loss_duration: 1.51982  (1.51982)
     | > amp_scaler: 64.00000  (64.00000)
     | > loss_1: 26.81760  (26.81760)
     | > grad_norm_1: 415.21344  (415.21344)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.55220  (3.55223)
     | > loader_time: 23.31060  (23.31056)


[1m   --> STEP: 25/80 -- GLOBAL_STEP: 14825[0m
     | > loss_disc: 2.69531  (2.7136



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00000)
     | > avg_loss_disc:[92m 2.71248 [0m(-0.01953)
     | > avg_loss_disc_real_0:[92m 0.32183 [0m(-0.08153)
     | > avg_loss_disc_real_1:[92m 0.17415 [0m(-0.00975)
     | > avg_loss_disc_real_2:[92m 0.17210 [0m(-0.04420)
     | > avg_loss_disc_real_3:[91m 0.25241 [0m(+0.03722)
     | > avg_loss_disc_real_4:[92m 0.23816 [0m(-0.01062)
     | > avg_loss_disc_real_5:[92m 0.22871 [0m(-0.01444)
     | > avg_loss_0:[92m 2.71248 [0m(-0.01953)
     | > avg_loss_gen:[92m 1.92604 [0m(-0.04821)
     | > avg_loss_kl:[92m 1.39354 [0m(-0.21595)
     | > avg_loss_feat:[91m 2.55413 [0m(+0.18750)
     | > avg_loss_mel:[92m 18.35107 [0m(-0.19818)
     | > avg_loss_duration:[92m 1.96574 [0m(-0.03186)
     | > avg_loss_1:[92m 26.19052 [0m(-0.30670)


[4m[1m > EPOCH: 186/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 20:39:54) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 20/80 -- GLOBAL_STEP: 14900[0m
     | > loss_disc: 2.84074  (2.70408)
     | > loss_disc_real_0: 0.39099  (0.21689)
     | > loss_disc_real_1: 0.24391  (0.22319)
     | > loss_disc_real_2: 0.22258  (0.23491)
     | > loss_disc_real_3: 0.25412  (0.23814)
     | > loss_disc_real_4: 0.24857  (0.24025)
     | > loss_disc_real_5: 0.23265  (0.24691)
     | > loss_0: 2.84074  (2.70408)
     | > grad_norm_0: 93.87022  (188.96323)
     | > loss_gen: 2.03379  (2.02809)
     | > loss_kl: 1.61909  (1.30892)
     | > loss_feat: 2.66665  (2.93970)
     | > loss_mel: 19.07314  (19.29278)
     | > loss_duration: 1.49068  (1.49988)
     | > amp_scaler: 64.00000  (64.00000)
     | > loss_1: 26.88335  (27.06938)
     | > grad_norm_1: 693.87958  (714.87079)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.62130  (3.57349)
     | > loader_time: 0.01000  (0.00871)


[1m   --> STEP: 45/80 -- GLOBAL_STEP: 14925[0m
     | > loss_disc: 2.76506  (2.71940)



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00800 [0m(-0.00100)
     | > avg_loss_disc:[91m 2.74299 [0m(+0.03051)
     | > avg_loss_disc_real_0:[92m 0.14574 [0m(-0.17609)
     | > avg_loss_disc_real_1:[91m 0.26379 [0m(+0.08965)
     | > avg_loss_disc_real_2:[91m 0.27268 [0m(+0.10058)
     | > avg_loss_disc_real_3:[92m 0.23285 [0m(-0.01956)
     | > avg_loss_disc_real_4:[91m 0.25872 [0m(+0.02056)
     | > avg_loss_disc_real_5:[91m 0.23671 [0m(+0.00800)
     | > avg_loss_0:[91m 2.74299 [0m(+0.03051)
     | > avg_loss_gen:[92m 1.80273 [0m(-0.12331)
     | > avg_loss_kl:[91m 1.65865 [0m(+0.26511)
     | > avg_loss_feat:[92m 2.36568 [0m(-0.18845)
     | > avg_loss_mel:[91m 18.94407 [0m(+0.59300)
     | > avg_loss_duration:[91m 1.96612 [0m(+0.00038)
     | > avg_loss_1:[91m 26.73725 [0m(+0.54673)


[4m[1m > EPOCH: 187/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 20:45:28) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 15/80 -- GLOBAL_STEP: 14975[0m
     | > loss_disc: 2.72725  (2.70195)
     | > loss_disc_real_0: 0.17828  (0.20459)
     | > loss_disc_real_1: 0.24660  (0.23038)
     | > loss_disc_real_2: 0.24784  (0.22612)
     | > loss_disc_real_3: 0.22371  (0.22848)
     | > loss_disc_real_4: 0.22829  (0.24144)
     | > loss_disc_real_5: 0.24383  (0.24016)
     | > loss_0: 2.72725  (2.70195)
     | > grad_norm_0: 40.30916  (92.49303)
     | > loss_gen: 1.94633  (1.93451)
     | > loss_kl: 1.28082  (1.31391)
     | > loss_feat: 2.88381  (2.76049)
     | > loss_mel: 19.80501  (19.00817)
     | > loss_duration: 1.48996  (1.49396)
     | > amp_scaler: 64.00000  (64.00000)
     | > loss_1: 27.40593  (26.51103)
     | > grad_norm_1: 234.73546  (520.01062)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.59230  (3.56190)
     | > loader_time: 0.01000  (0.00847)


[1m   --> STEP: 40/80 -- GLOBAL_STEP: 15000[0m
     | > loss_disc: 2.73830  (2.74061)




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00100)
     | > avg_loss_disc:[91m 2.77724 [0m(+0.03424)
     | > avg_loss_disc_real_0:[92m 0.12696 [0m(-0.01878)
     | > avg_loss_disc_real_1:[92m 0.19952 [0m(-0.06428)
     | > avg_loss_disc_real_2:[92m 0.26680 [0m(-0.00588)
     | > avg_loss_disc_real_3:[91m 0.23351 [0m(+0.00066)
     | > avg_loss_disc_real_4:[92m 0.21884 [0m(-0.03989)
     | > avg_loss_disc_real_5:[92m 0.23600 [0m(-0.00072)
     | > avg_loss_0:[91m 2.77724 [0m(+0.03424)
     | > avg_loss_gen:[92m 1.67516 [0m(-0.12758)
     | > avg_loss_kl:[92m 1.23097 [0m(-0.42767)
     | > avg_loss_feat:[91m 2.78061 [0m(+0.41493)
     | > avg_loss_mel:[91m 20.91693 [0m(+1.97285)
     | > avg_loss_duration:[91m 1.97616 [0m(+0.01004)
     | > avg_loss_1:[91m 28.57982 [0m(+1.84257)


[4m[1m > EPOCH: 188/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 20:51:03) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 10/80 -- GLOBAL_STEP: 15050[0m
     | > loss_disc: 2.74115  (2.69038)
     | > loss_disc_real_0: 0.20335  (0.22659)
     | > loss_disc_real_1: 0.21823  (0.23494)
     | > loss_disc_real_2: 0.18747  (0.21722)
     | > loss_disc_real_3: 0.21665  (0.22747)
     | > loss_disc_real_4: 0.24569  (0.23505)
     | > loss_disc_real_5: 0.23585  (0.24086)
     | > loss_0: 2.74115  (2.69038)
     | > grad_norm_0: 34.02720  (95.53484)
     | > loss_gen: 2.04836  (1.99140)
     | > loss_kl: 1.15844  (1.24527)
     | > loss_feat: 2.75979  (2.84760)
     | > loss_mel: 19.24913  (19.28163)
     | > loss_duration: 1.50647  (1.49786)
     | > amp_scaler: 64.00000  (64.00000)
     | > loss_1: 26.72218  (26.86375)
     | > grad_norm_1: 537.65607  (617.74756)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.59030  (3.55514)
     | > loader_time: 0.00900  (0.00861)


[1m   --> STEP: 35/80 -- GLOBAL_STEP: 15075[0m
     | > loss_disc: 2.67916  (2.71676)




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time: 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[92m 2.58993 [0m(-0.18731)
     | > avg_loss_disc_real_0:[91m 0.22897 [0m(+0.10201)
     | > avg_loss_disc_real_1:[91m 0.27864 [0m(+0.07912)
     | > avg_loss_disc_real_2:[92m 0.26511 [0m(-0.00170)
     | > avg_loss_disc_real_3:[91m 0.24366 [0m(+0.01015)
     | > avg_loss_disc_real_4:[91m 0.26389 [0m(+0.04505)
     | > avg_loss_disc_real_5:[91m 0.24682 [0m(+0.01082)
     | > avg_loss_0:[92m 2.58993 [0m(-0.18731)
     | > avg_loss_gen:[91m 2.18068 [0m(+0.50552)
     | > avg_loss_kl:[91m 1.38962 [0m(+0.15865)
     | > avg_loss_feat:[92m 2.50307 [0m(-0.27753)
     | > avg_loss_mel:[92m 19.09043 [0m(-1.82649)
     | > avg_loss_duration:[92m 1.96256 [0m(-0.01359)
     | > avg_loss_1:[92m 27.12637 [0m(-1.45345)


[4m[1m > EPOCH: 189/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 20:56:37) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 5/80 -- GLOBAL_STEP: 15125[0m
     | > loss_disc: 2.75237  (2.74124)
     | > loss_disc_real_0: 0.30151  (0.22468)
     | > loss_disc_real_1: 0.24120  (0.21781)
     | > loss_disc_real_2: 0.24085  (0.22451)
     | > loss_disc_real_3: 0.27189  (0.23340)
     | > loss_disc_real_4: 0.26188  (0.23915)
     | > loss_disc_real_5: 0.27474  (0.24929)
     | > loss_0: 2.75237  (2.74124)
     | > grad_norm_0: 39.77811  (62.63740)
     | > loss_gen: 1.94918  (1.91250)
     | > loss_kl: 1.15760  (1.15274)
     | > loss_feat: 2.48097  (2.62437)
     | > loss_mel: 19.19561  (19.03965)
     | > loss_duration: 1.45608  (1.49739)
     | > amp_scaler: 64.00000  (64.00000)
     | > loss_1: 26.23944  (26.22666)
     | > grad_norm_1: 154.59396  (379.67001)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.58230  (3.54082)
     | > loader_time: 0.00800  (0.00821)


[1m   --> STEP: 30/80 -- GLOBAL_STEP: 15150[0m
     | > loss_disc: 2.76067  (2.71644)
 



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00801 [0m(-0.00100)
     | > avg_loss_disc:[91m 2.70068 [0m(+0.11075)
     | > avg_loss_disc_real_0:[92m 0.20961 [0m(-0.01937)
     | > avg_loss_disc_real_1:[92m 0.25909 [0m(-0.01955)
     | > avg_loss_disc_real_2:[91m 0.31750 [0m(+0.05239)
     | > avg_loss_disc_real_3:[91m 0.24843 [0m(+0.00477)
     | > avg_loss_disc_real_4:[92m 0.25872 [0m(-0.00517)
     | > avg_loss_disc_real_5:[91m 0.27144 [0m(+0.02462)
     | > avg_loss_0:[91m 2.70068 [0m(+0.11075)
     | > avg_loss_gen:[92m 2.07640 [0m(-0.10428)
     | > avg_loss_kl:[91m 1.49739 [0m(+0.10777)
     | > avg_loss_feat:[92m 2.28154 [0m(-0.22153)
     | > avg_loss_mel:[92m 18.76768 [0m(-0.32275)
     | > avg_loss_duration:[91m 1.98045 [0m(+0.01788)
     | > avg_loss_1:[92m 26.60346 [0m(-0.52291)


[4m[1m > EPOCH: 190/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 21:02:12) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 0/80 -- GLOBAL_STEP: 15200[0m
     | > loss_disc: 2.74587  (2.74587)
     | > loss_disc_real_0: 0.19589  (0.19589)
     | > loss_disc_real_1: 0.23627  (0.23627)
     | > loss_disc_real_2: 0.26014  (0.26014)
     | > loss_disc_real_3: 0.26537  (0.26537)
     | > loss_disc_real_4: 0.25986  (0.25986)
     | > loss_disc_real_5: 0.23223  (0.23223)
     | > loss_0: 2.74587  (2.74587)
     | > grad_norm_0: 39.93518  (39.93518)
     | > loss_gen: 1.80693  (1.80693)
     | > loss_kl: 1.17716  (1.17716)
     | > loss_feat: 2.69550  (2.69550)
     | > loss_mel: 19.02217  (19.02217)
     | > loss_duration: 1.54810  (1.54810)
     | > amp_scaler: 64.00000  (64.00000)
     | > loss_1: 26.24985  (26.24985)
     | > grad_norm_1: 298.16336  (298.16336)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.54120  (3.54122)
     | > loader_time: 23.53330  (23.53333)


[1m   --> STEP: 25/80 -- GLOBAL_STEP: 15225[0m
     | > loss_disc: 2.60630  (2.74256)



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00100)
     | > avg_loss_disc:[92m 2.69073 [0m(-0.00995)
     | > avg_loss_disc_real_0:[91m 0.22594 [0m(+0.01633)
     | > avg_loss_disc_real_1:[92m 0.24162 [0m(-0.01747)
     | > avg_loss_disc_real_2:[92m 0.22308 [0m(-0.09443)
     | > avg_loss_disc_real_3:[92m 0.21130 [0m(-0.03713)
     | > avg_loss_disc_real_4:[92m 0.20535 [0m(-0.05336)
     | > avg_loss_disc_real_5:[92m 0.26348 [0m(-0.00795)
     | > avg_loss_0:[92m 2.69073 [0m(-0.00995)
     | > avg_loss_gen:[92m 1.80544 [0m(-0.27096)
     | > avg_loss_kl:[91m 1.58895 [0m(+0.09155)
     | > avg_loss_feat:[92m 2.24923 [0m(-0.03231)
     | > avg_loss_mel:[92m 17.81110 [0m(-0.95658)
     | > avg_loss_duration:[92m 1.97144 [0m(-0.00901)
     | > avg_loss_1:[92m 25.42616 [0m(-1.17731)


[4m[1m > EPOCH: 191/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 21:07:46) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 20/80 -- GLOBAL_STEP: 15300[0m
     | > loss_disc: 2.75627  (2.73652)
     | > loss_disc_real_0: 0.04412  (0.20867)
     | > loss_disc_real_1: 0.19639  (0.23252)
     | > loss_disc_real_2: 0.23713  (0.22709)
     | > loss_disc_real_3: 0.25149  (0.23693)
     | > loss_disc_real_4: 0.24144  (0.24135)
     | > loss_disc_real_5: 0.24196  (0.24469)
     | > loss_0: 2.75627  (2.73652)
     | > grad_norm_0: 211.23465  (115.57748)
     | > loss_gen: 1.87600  (1.95406)
     | > loss_kl: 1.45910  (1.28453)
     | > loss_feat: 2.68280  (2.82678)
     | > loss_mel: 19.55712  (19.15668)
     | > loss_duration: 1.51537  (1.49063)
     | > amp_scaler: 128.00000  (80.00000)
     | > loss_1: 27.09039  (26.71268)
     | > grad_norm_1: 739.37097  (679.97284)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.59630  (3.57216)
     | > loader_time: 0.00800  (0.00866)


[1m   --> STEP: 45/80 -- GLOBAL_STEP: 15325[0m
     | > loss_disc: 2.81216  (2.7339



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.01001 [0m(+0.00100)
     | > avg_loss_disc:[92m 2.64725 [0m(-0.04348)
     | > avg_loss_disc_real_0:[92m 0.19031 [0m(-0.03563)
     | > avg_loss_disc_real_1:[92m 0.20126 [0m(-0.04036)
     | > avg_loss_disc_real_2:[91m 0.22584 [0m(+0.00277)
     | > avg_loss_disc_real_3:[92m 0.14790 [0m(-0.06340)
     | > avg_loss_disc_real_4:[91m 0.24703 [0m(+0.04167)
     | > avg_loss_disc_real_5:[92m 0.22899 [0m(-0.03450)
     | > avg_loss_0:[92m 2.64725 [0m(-0.04348)
     | > avg_loss_gen:[92m 1.78091 [0m(-0.02452)
     | > avg_loss_kl:[92m 1.39964 [0m(-0.18931)
     | > avg_loss_feat:[91m 2.79044 [0m(+0.54121)
     | > avg_loss_mel:[91m 20.49236 [0m(+2.68126)
     | > avg_loss_duration:[92m 1.97021 [0m(-0.00123)
     | > avg_loss_1:[91m 28.43356 [0m(+3.00740)


[4m[1m > EPOCH: 192/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 21:13:20) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 15/80 -- GLOBAL_STEP: 15375[0m
     | > loss_disc: 2.60947  (2.69404)
     | > loss_disc_real_0: 0.11003  (0.18419)
     | > loss_disc_real_1: 0.21722  (0.22995)
     | > loss_disc_real_2: 0.19706  (0.22237)
     | > loss_disc_real_3: 0.23713  (0.23889)
     | > loss_disc_real_4: 0.24329  (0.23958)
     | > loss_disc_real_5: 0.24311  (0.24136)
     | > loss_0: 2.60947  (2.69404)
     | > grad_norm_0: 161.70274  (139.64153)
     | > loss_gen: 2.07621  (1.94408)
     | > loss_kl: 1.52266  (1.24900)
     | > loss_feat: 3.06289  (2.80172)
     | > loss_mel: 18.75259  (18.97328)
     | > loss_duration: 1.48127  (1.49510)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 26.89563  (26.46318)
     | > grad_norm_1: 789.18481  (770.47357)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.59530  (3.56438)
     | > loader_time: 0.01000  (0.00874)


[1m   --> STEP: 40/80 -- GLOBAL_STEP: 15400[0m
     | > loss_disc: 2.57399  (2.692



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.01001 [0m(-0.00000)
     | > avg_loss_disc:[92m 2.61778 [0m(-0.02947)
     | > avg_loss_disc_real_0:[91m 0.19565 [0m(+0.00534)
     | > avg_loss_disc_real_1:[91m 0.20521 [0m(+0.00395)
     | > avg_loss_disc_real_2:[92m 0.19978 [0m(-0.02606)
     | > avg_loss_disc_real_3:[91m 0.21442 [0m(+0.06652)
     | > avg_loss_disc_real_4:[91m 0.26511 [0m(+0.01809)
     | > avg_loss_disc_real_5:[91m 0.24568 [0m(+0.01669)
     | > avg_loss_0:[92m 2.61778 [0m(-0.02947)
     | > avg_loss_gen:[91m 1.86072 [0m(+0.07980)
     | > avg_loss_kl:[91m 1.58100 [0m(+0.18137)
     | > avg_loss_feat:[92m 2.43526 [0m(-0.35518)
     | > avg_loss_mel:[92m 18.97066 [0m(-1.52169)
     | > avg_loss_duration:[91m 1.99144 [0m(+0.02123)
     | > avg_loss_1:[92m 26.83909 [0m(-1.59447)


[4m[1m > EPOCH: 193/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 21:18:55) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 10/80 -- GLOBAL_STEP: 15450[0m
     | > loss_disc: 2.76414  (2.73274)
     | > loss_disc_real_0: 0.15522  (0.20152)
     | > loss_disc_real_1: 0.23673  (0.22751)
     | > loss_disc_real_2: 0.22052  (0.24228)
     | > loss_disc_real_3: 0.26058  (0.23822)
     | > loss_disc_real_4: 0.22357  (0.24438)
     | > loss_disc_real_5: 0.25080  (0.24645)
     | > loss_0: 2.76414  (2.73274)
     | > grad_norm_0: 68.55982  (137.06273)
     | > loss_gen: 1.81938  (1.92017)
     | > loss_kl: 1.67202  (1.35947)
     | > loss_feat: 2.56361  (2.69875)
     | > loss_mel: 19.32278  (19.21251)
     | > loss_duration: 1.52693  (1.49603)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 26.90473  (26.68691)
     | > grad_norm_1: 81.95640  (744.14764)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.59430  (3.55624)
     | > loader_time: 0.00900  (0.00851)


[1m   --> STEP: 35/80 -- GLOBAL_STEP: 15475[0m
     | > loss_disc: 2.73070  (2.69829



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00700 [0m(-0.00300)
     | > avg_loss_disc:[91m 2.66905 [0m(+0.05127)
     | > avg_loss_disc_real_0:[92m 0.10688 [0m(-0.08877)
     | > avg_loss_disc_real_1:[91m 0.21142 [0m(+0.00621)
     | > avg_loss_disc_real_2:[91m 0.26168 [0m(+0.06190)
     | > avg_loss_disc_real_3:[91m 0.22808 [0m(+0.01366)
     | > avg_loss_disc_real_4:[92m 0.23192 [0m(-0.03319)
     | > avg_loss_disc_real_5:[92m 0.22950 [0m(-0.01618)
     | > avg_loss_0:[91m 2.66905 [0m(+0.05127)
     | > avg_loss_gen:[92m 1.76610 [0m(-0.09462)
     | > avg_loss_kl:[91m 1.68134 [0m(+0.10033)
     | > avg_loss_feat:[91m 2.89124 [0m(+0.45598)
     | > avg_loss_mel:[91m 19.43985 [0m(+0.46918)
     | > avg_loss_duration:[92m 1.95584 [0m(-0.03560)
     | > avg_loss_1:[91m 27.73437 [0m(+0.89528)


[4m[1m > EPOCH: 194/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 21:24:30) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 5/80 -- GLOBAL_STEP: 15525[0m
     | > loss_disc: 2.69188  (2.59353)
     | > loss_disc_real_0: 0.20001  (0.16495)
     | > loss_disc_real_1: 0.22081  (0.20831)
     | > loss_disc_real_2: 0.24285  (0.22739)
     | > loss_disc_real_3: 0.24057  (0.23962)
     | > loss_disc_real_4: 0.24937  (0.23848)
     | > loss_disc_real_5: 0.25718  (0.23570)
     | > loss_0: 2.69188  (2.59353)
     | > grad_norm_0: 10.28247  (196.31792)
     | > loss_gen: 2.04396  (2.02477)
     | > loss_kl: 1.65715  (1.22573)
     | > loss_feat: 2.72192  (3.11919)
     | > loss_mel: 19.54135  (19.68121)
     | > loss_duration: 1.53393  (1.49908)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 27.49831  (27.54998)
     | > grad_norm_1: 290.13068  (667.83887)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.56020  (3.54182)
     | > loader_time: 0.00700  (0.00781)


[1m   --> STEP: 30/80 -- GLOBAL_STEP: 15550[0m
     | > loss_disc: 2.73328  (2.71062



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00200)
     | > avg_loss_disc:[92m 2.51547 [0m(-0.15358)
     | > avg_loss_disc_real_0:[91m 0.28559 [0m(+0.17872)
     | > avg_loss_disc_real_1:[92m 0.15395 [0m(-0.05748)
     | > avg_loss_disc_real_2:[92m 0.17601 [0m(-0.08567)
     | > avg_loss_disc_real_3:[92m 0.20655 [0m(-0.02154)
     | > avg_loss_disc_real_4:[92m 0.19578 [0m(-0.03614)
     | > avg_loss_disc_real_5:[91m 0.24501 [0m(+0.01550)
     | > avg_loss_0:[92m 2.51547 [0m(-0.15358)
     | > avg_loss_gen:[91m 1.96591 [0m(+0.19981)
     | > avg_loss_kl:[92m 1.59141 [0m(-0.08993)
     | > avg_loss_feat:[92m 2.66605 [0m(-0.22520)
     | > avg_loss_mel:[92m 18.95652 [0m(-0.48333)
     | > avg_loss_duration:[91m 2.00216 [0m(+0.04632)
     | > avg_loss_1:[92m 27.18205 [0m(-0.55232)


[4m[1m > EPOCH: 195/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 21:30:05) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 0/80 -- GLOBAL_STEP: 15600[0m
     | > loss_disc: 2.66153  (2.66153)
     | > loss_disc_real_0: 0.29582  (0.29582)
     | > loss_disc_real_1: 0.20897  (0.20897)
     | > loss_disc_real_2: 0.19387  (0.19387)
     | > loss_disc_real_3: 0.22051  (0.22051)
     | > loss_disc_real_4: 0.22597  (0.22597)
     | > loss_disc_real_5: 0.24645  (0.24645)
     | > loss_0: 2.66153  (2.66153)
     | > grad_norm_0: 185.54143  (185.54143)
     | > loss_gen: 1.99091  (1.99091)
     | > loss_kl: 1.36499  (1.36499)
     | > loss_feat: 2.95628  (2.95628)
     | > loss_mel: 19.33916  (19.33916)
     | > loss_duration: 1.48590  (1.48590)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 27.13725  (27.13725)
     | > grad_norm_1: 1303.43958  (1303.43958)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.59030  (3.59027)
     | > loader_time: 23.40520  (23.40517)


[1m   --> STEP: 25/80 -- GLOBAL_STEP: 15625[0m
     | > loss_disc: 2.72257  (2.



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00000)
     | > avg_loss_disc:[91m 2.79484 [0m(+0.27937)
     | > avg_loss_disc_real_0:[91m 0.40116 [0m(+0.11557)
     | > avg_loss_disc_real_1:[91m 0.21367 [0m(+0.05972)
     | > avg_loss_disc_real_2:[91m 0.23221 [0m(+0.05620)
     | > avg_loss_disc_real_3:[91m 0.25320 [0m(+0.04665)
     | > avg_loss_disc_real_4:[91m 0.26073 [0m(+0.06494)
     | > avg_loss_disc_real_5:[92m 0.23185 [0m(-0.01316)
     | > avg_loss_0:[91m 2.79484 [0m(+0.27937)
     | > avg_loss_gen:[91m 2.06949 [0m(+0.10358)
     | > avg_loss_kl:[91m 1.98594 [0m(+0.39453)
     | > avg_loss_feat:[92m 2.08413 [0m(-0.58192)
     | > avg_loss_mel:[92m 18.85109 [0m(-0.10543)
     | > avg_loss_duration:[92m 1.96051 [0m(-0.04165)
     | > avg_loss_1:[92m 26.95117 [0m(-0.23088)


[4m[1m > EPOCH: 196/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 21:35:39) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 20/80 -- GLOBAL_STEP: 15700[0m
     | > loss_disc: 2.73217  (2.73093)
     | > loss_disc_real_0: 0.21964  (0.20074)
     | > loss_disc_real_1: 0.22821  (0.22552)
     | > loss_disc_real_2: 0.28604  (0.23805)
     | > loss_disc_real_3: 0.24353  (0.23634)
     | > loss_disc_real_4: 0.24016  (0.24262)
     | > loss_disc_real_5: 0.26234  (0.24712)
     | > loss_0: 2.73217  (2.73093)
     | > grad_norm_0: 25.96554  (27.71134)
     | > loss_gen: 1.70405  (1.87409)
     | > loss_kl: 1.34176  (1.31672)
     | > loss_feat: 2.50659  (2.72603)
     | > loss_mel: 18.81503  (18.93194)
     | > loss_duration: 1.51747  (1.48673)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 25.88490  (26.33552)
     | > grad_norm_1: 136.83264  (269.60593)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.60830  (3.56855)
     | > loader_time: 0.00900  (0.00886)


[1m   --> STEP: 45/80 -- GLOBAL_STEP: 15725[0m
     | > loss_disc: 2.79652  (2.74063



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.01001 [0m(+0.00100)
     | > avg_loss_disc:[92m 2.67241 [0m(-0.12243)
     | > avg_loss_disc_real_0:[92m 0.18894 [0m(-0.21222)
     | > avg_loss_disc_real_1:[91m 0.21508 [0m(+0.00142)
     | > avg_loss_disc_real_2:[92m 0.20563 [0m(-0.02659)
     | > avg_loss_disc_real_3:[92m 0.22320 [0m(-0.03000)
     | > avg_loss_disc_real_4:[92m 0.25556 [0m(-0.00516)
     | > avg_loss_disc_real_5:[91m 0.24228 [0m(+0.01043)
     | > avg_loss_0:[92m 2.67241 [0m(-0.12243)
     | > avg_loss_gen:[92m 1.77617 [0m(-0.29332)
     | > avg_loss_kl:[92m 1.40981 [0m(-0.57614)
     | > avg_loss_feat:[91m 2.11926 [0m(+0.03514)
     | > avg_loss_mel:[92m 18.10757 [0m(-0.74352)
     | > avg_loss_duration:[91m 1.96902 [0m(+0.00851)
     | > avg_loss_1:[92m 25.38183 [0m(-1.56933)


[4m[1m > EPOCH: 197/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 21:41:14) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 15/80 -- GLOBAL_STEP: 15775[0m
     | > loss_disc: 2.69681  (2.70751)
     | > loss_disc_real_0: 0.20711  (0.20019)
     | > loss_disc_real_1: 0.20646  (0.23421)
     | > loss_disc_real_2: 0.20257  (0.22496)
     | > loss_disc_real_3: 0.21149  (0.23437)
     | > loss_disc_real_4: 0.23139  (0.24305)
     | > loss_disc_real_5: 0.25008  (0.24519)
     | > loss_0: 2.69681  (2.70751)
     | > grad_norm_0: 31.08380  (51.28664)
     | > loss_gen: 1.90864  (1.92695)
     | > loss_kl: 1.33437  (1.27314)
     | > loss_feat: 2.76924  (2.82605)
     | > loss_mel: 18.88625  (19.02878)
     | > loss_duration: 1.52354  (1.48633)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 26.42204  (26.54125)
     | > grad_norm_1: 166.73303  (328.41653)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.58030  (3.55730)
     | > loader_time: 0.01000  (0.00834)


[1m   --> STEP: 40/80 -- GLOBAL_STEP: 15800[0m
     | > loss_disc: 2.75990  (2.71451



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00701 [0m(-0.00300)
     | > avg_loss_disc:[91m 2.72686 [0m(+0.05445)
     | > avg_loss_disc_real_0:[91m 0.32381 [0m(+0.13487)
     | > avg_loss_disc_real_1:[91m 0.23912 [0m(+0.02404)
     | > avg_loss_disc_real_2:[91m 0.21646 [0m(+0.01083)
     | > avg_loss_disc_real_3:[91m 0.22986 [0m(+0.00666)
     | > avg_loss_disc_real_4:[91m 0.26564 [0m(+0.01008)
     | > avg_loss_disc_real_5:[91m 0.26412 [0m(+0.02184)
     | > avg_loss_0:[91m 2.72686 [0m(+0.05445)
     | > avg_loss_gen:[91m 2.05294 [0m(+0.27677)
     | > avg_loss_kl:[91m 1.65920 [0m(+0.24940)
     | > avg_loss_feat:[91m 2.32454 [0m(+0.20528)
     | > avg_loss_mel:[91m 18.70850 [0m(+0.60093)
     | > avg_loss_duration:[91m 2.00076 [0m(+0.03173)
     | > avg_loss_1:[91m 26.74594 [0m(+1.36411)


[4m[1m > EPOCH: 198/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 21:46:48) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 10/80 -- GLOBAL_STEP: 15850[0m
     | > loss_disc: 2.70554  (2.71316)
     | > loss_disc_real_0: 0.23382  (0.20578)
     | > loss_disc_real_1: 0.22904  (0.22024)
     | > loss_disc_real_2: 0.22303  (0.22715)
     | > loss_disc_real_3: 0.23017  (0.24209)
     | > loss_disc_real_4: 0.22836  (0.23944)
     | > loss_disc_real_5: 0.22855  (0.24404)
     | > loss_0: 2.70554  (2.71316)
     | > grad_norm_0: 191.63379  (93.91634)
     | > loss_gen: 1.87084  (1.92698)
     | > loss_kl: 1.48912  (1.30378)
     | > loss_feat: 2.95441  (2.85259)
     | > loss_mel: 18.36321  (19.21414)
     | > loss_duration: 1.50118  (1.49214)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 26.17876  (26.78963)
     | > grad_norm_1: 846.98615  (607.38409)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.58730  (3.54873)
     | > loader_time: 0.00900  (0.00821)


[1m   --> STEP: 35/80 -- GLOBAL_STEP: 15875[0m
     | > loss_disc: 2.70768  (2.7139



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00200)
     | > avg_loss_disc:[92m 2.69805 [0m(-0.02881)
     | > avg_loss_disc_real_0:[92m 0.07248 [0m(-0.25133)
     | > avg_loss_disc_real_1:[92m 0.23640 [0m(-0.00271)
     | > avg_loss_disc_real_2:[91m 0.25701 [0m(+0.04056)
     | > avg_loss_disc_real_3:[92m 0.20028 [0m(-0.02958)
     | > avg_loss_disc_real_4:[92m 0.22831 [0m(-0.03733)
     | > avg_loss_disc_real_5:[92m 0.25191 [0m(-0.01220)
     | > avg_loss_0:[92m 2.69805 [0m(-0.02881)
     | > avg_loss_gen:[92m 1.66245 [0m(-0.39049)
     | > avg_loss_kl:[91m 1.69997 [0m(+0.04076)
     | > avg_loss_feat:[91m 2.45280 [0m(+0.12825)
     | > avg_loss_mel:[92m 18.46061 [0m(-0.24789)
     | > avg_loss_duration:[92m 1.96406 [0m(-0.03669)
     | > avg_loss_1:[92m 26.23988 [0m(-0.50606)


[4m[1m > EPOCH: 199/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 21:52:23) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 5/80 -- GLOBAL_STEP: 15925[0m
     | > loss_disc: 2.64879  (2.74660)
     | > loss_disc_real_0: 0.22581  (0.22572)
     | > loss_disc_real_1: 0.20805  (0.22797)
     | > loss_disc_real_2: 0.22016  (0.23199)
     | > loss_disc_real_3: 0.22622  (0.23770)
     | > loss_disc_real_4: 0.26626  (0.25010)
     | > loss_disc_real_5: 0.25634  (0.23898)
     | > loss_0: 2.64879  (2.74660)
     | > grad_norm_0: 17.98140  (32.05959)
     | > loss_gen: 1.83829  (1.96278)
     | > loss_kl: 1.25069  (1.11100)
     | > loss_feat: 2.82420  (2.66922)
     | > loss_mel: 18.55053  (18.81945)
     | > loss_duration: 1.47633  (1.47993)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 25.94004  (26.04238)
     | > grad_norm_1: 427.75742  (409.39569)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.57730  (3.53242)
     | > loader_time: 0.00700  (0.00781)


[1m   --> STEP: 30/80 -- GLOBAL_STEP: 15950[0m
     | > loss_disc: 2.73533  (2.76945)



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[92m 2.63749 [0m(-0.06056)
     | > avg_loss_disc_real_0:[91m 0.11506 [0m(+0.04258)
     | > avg_loss_disc_real_1:[92m 0.21733 [0m(-0.01908)
     | > avg_loss_disc_real_2:[92m 0.20841 [0m(-0.04861)
     | > avg_loss_disc_real_3:[91m 0.25762 [0m(+0.05735)
     | > avg_loss_disc_real_4:[92m 0.22297 [0m(-0.00535)
     | > avg_loss_disc_real_5:[91m 0.25687 [0m(+0.00496)
     | > avg_loss_0:[92m 2.63749 [0m(-0.06056)
     | > avg_loss_gen:[91m 1.75196 [0m(+0.08952)
     | > avg_loss_kl:[92m 1.61378 [0m(-0.08618)
     | > avg_loss_feat:[91m 2.50335 [0m(+0.05055)
     | > avg_loss_mel:[91m 18.88204 [0m(+0.42143)
     | > avg_loss_duration:[91m 1.99369 [0m(+0.02963)
     | > avg_loss_1:[91m 26.74483 [0m(+0.50494)


[4m[1m > EPOCH: 200/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 21:57:57) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 0/80 -- GLOBAL_STEP: 16000[0m
     | > loss_disc: 2.65631  (2.65631)
     | > loss_disc_real_0: 0.11877  (0.11877)
     | > loss_disc_real_1: 0.20645  (0.20645)
     | > loss_disc_real_2: 0.19396  (0.19396)
     | > loss_disc_real_3: 0.26222  (0.26222)
     | > loss_disc_real_4: 0.23336  (0.23336)
     | > loss_disc_real_5: 0.23943  (0.23943)
     | > loss_0: 2.65631  (2.65631)
     | > grad_norm_0: 71.87030  (71.87030)
     | > loss_gen: 1.86319  (1.86319)
     | > loss_kl: 1.41739  (1.41739)
     | > loss_feat: 2.71889  (2.71889)
     | > loss_mel: 18.66676  (18.66676)
     | > loss_duration: 1.56384  (1.56384)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 26.23007  (26.23007)
     | > grad_norm_1: 565.02942  (565.02942)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.54720  (3.54723)
     | > loader_time: 23.31230  (23.31233)


[1m   --> STEP: 25/80 -- GLOBAL_STEP: 16025[0m
     | > loss_disc: 2.77523  (2.7164



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00000)
     | > avg_loss_disc:[91m 2.76182 [0m(+0.12433)
     | > avg_loss_disc_real_0:[91m 0.30997 [0m(+0.19492)
     | > avg_loss_disc_real_1:[91m 0.26715 [0m(+0.04983)
     | > avg_loss_disc_real_2:[92m 0.19574 [0m(-0.01266)
     | > avg_loss_disc_real_3:[91m 0.25849 [0m(+0.00087)
     | > avg_loss_disc_real_4:[91m 0.27540 [0m(+0.05243)
     | > avg_loss_disc_real_5:[91m 0.26312 [0m(+0.00625)
     | > avg_loss_0:[91m 2.76182 [0m(+0.12433)
     | > avg_loss_gen:[91m 2.02703 [0m(+0.27507)
     | > avg_loss_kl:[92m 1.47689 [0m(-0.13689)
     | > avg_loss_feat:[92m 2.43529 [0m(-0.06806)
     | > avg_loss_mel:[91m 18.96893 [0m(+0.08689)
     | > avg_loss_duration:[92m 1.99211 [0m(-0.00158)
     | > avg_loss_1:[91m 26.90025 [0m(+0.15543)


[4m[1m > EPOCH: 201/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 22:03:32) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 20/80 -- GLOBAL_STEP: 16100[0m
     | > loss_disc: 2.83402  (2.73294)
     | > loss_disc_real_0: 0.28640  (0.19053)
     | > loss_disc_real_1: 0.25186  (0.23108)
     | > loss_disc_real_2: 0.24010  (0.23892)
     | > loss_disc_real_3: 0.25850  (0.23901)
     | > loss_disc_real_4: 0.22500  (0.24095)
     | > loss_disc_real_5: 0.25724  (0.24691)
     | > loss_0: 2.83402  (2.73294)
     | > grad_norm_0: 15.11092  (73.93007)
     | > loss_gen: 2.05503  (1.93260)
     | > loss_kl: 1.39298  (1.27484)
     | > loss_feat: 2.28374  (2.79609)
     | > loss_mel: 18.50921  (19.00702)
     | > loss_duration: 1.48300  (1.48534)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 25.72395  (26.49587)
     | > grad_norm_1: 280.25320  (567.20056)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.61730  (3.57541)
     | > loader_time: 0.00900  (0.00901)


[1m   --> STEP: 45/80 -- GLOBAL_STEP: 16125[0m
     | > loss_disc: 2.74863  (2.73331



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[92m 2.67805 [0m(-0.08377)
     | > avg_loss_disc_real_0:[92m 0.20875 [0m(-0.10122)
     | > avg_loss_disc_real_1:[92m 0.24374 [0m(-0.02342)
     | > avg_loss_disc_real_2:[92m 0.18186 [0m(-0.01388)
     | > avg_loss_disc_real_3:[92m 0.22157 [0m(-0.03692)
     | > avg_loss_disc_real_4:[92m 0.19363 [0m(-0.08176)
     | > avg_loss_disc_real_5:[92m 0.24979 [0m(-0.01333)
     | > avg_loss_0:[92m 2.67805 [0m(-0.08377)
     | > avg_loss_gen:[92m 1.77410 [0m(-0.25293)
     | > avg_loss_kl:[92m 1.46232 [0m(-0.01457)
     | > avg_loss_feat:[91m 2.49478 [0m(+0.05948)
     | > avg_loss_mel:[92m 18.02480 [0m(-0.94413)
     | > avg_loss_duration:[92m 1.98192 [0m(-0.01020)
     | > avg_loss_1:[92m 25.73791 [0m(-1.16234)


[4m[1m > EPOCH: 202/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 22:09:06) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 15/80 -- GLOBAL_STEP: 16175[0m
     | > loss_disc: 2.82816  (2.73079)
     | > loss_disc_real_0: 0.19944  (0.20497)
     | > loss_disc_real_1: 0.24318  (0.23032)
     | > loss_disc_real_2: 0.27497  (0.23169)
     | > loss_disc_real_3: 0.20527  (0.23734)
     | > loss_disc_real_4: 0.26702  (0.24506)
     | > loss_disc_real_5: 0.26544  (0.24770)
     | > loss_0: 2.82816  (2.73079)
     | > grad_norm_0: 38.26434  (51.54139)
     | > loss_gen: 1.89237  (1.90492)
     | > loss_kl: 1.38322  (1.36560)
     | > loss_feat: 2.54485  (2.69360)
     | > loss_mel: 18.21071  (18.61184)
     | > loss_duration: 1.46771  (1.48116)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 25.49886  (26.05712)
     | > grad_norm_1: 459.27805  (432.50613)
     | > current_lr_0: 0.00020 
     | > current_lr_1: 0.00020 
     | > step_time: 3.61130  (3.56478)
     | > loader_time: 0.00900  (0.00834)


[1m   --> STEP: 40/80 -- GLOBAL_STEP: 16200[0m
     | > loss_disc: 2.72021  (2.71051



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.01001 [0m(+0.00100)
     | > avg_loss_disc:[91m 2.70700 [0m(+0.02894)
     | > avg_loss_disc_real_0:[92m 0.12051 [0m(-0.08824)
     | > avg_loss_disc_real_1:[92m 0.18918 [0m(-0.05456)
     | > avg_loss_disc_real_2:[91m 0.18974 [0m(+0.00788)
     | > avg_loss_disc_real_3:[91m 0.26168 [0m(+0.04011)
     | > avg_loss_disc_real_4:[91m 0.21322 [0m(+0.01958)
     | > avg_loss_disc_real_5:[92m 0.23068 [0m(-0.01911)
     | > avg_loss_0:[91m 2.70700 [0m(+0.02894)
     | > avg_loss_gen:[92m 1.63376 [0m(-0.14034)
     | > avg_loss_kl:[91m 1.56156 [0m(+0.09924)
     | > avg_loss_feat:[91m 2.80841 [0m(+0.31364)
     | > avg_loss_mel:[91m 18.43812 [0m(+0.41332)
     | > avg_loss_duration:[91m 1.99726 [0m(+0.01535)
     | > avg_loss_1:[91m 26.43912 [0m(+0.70121)


[4m[1m > EPOCH: 203/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 22:14:41) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 10/80 -- GLOBAL_STEP: 16250[0m
     | > loss_disc: 2.79663  (2.70761)
     | > loss_disc_real_0: 0.26994  (0.20653)
     | > loss_disc_real_1: 0.24129  (0.22154)
     | > loss_disc_real_2: 0.14145  (0.22684)
     | > loss_disc_real_3: 0.25900  (0.23714)
     | > loss_disc_real_4: 0.26028  (0.24492)
     | > loss_disc_real_5: 0.26742  (0.24570)
     | > loss_0: 2.79663  (2.70761)
     | > grad_norm_0: 132.95343  (63.49931)
     | > loss_gen: 1.97774  (1.97867)
     | > loss_kl: 1.16932  (1.26801)
     | > loss_feat: 2.72754  (2.98139)
     | > loss_mel: 19.08357  (19.15617)
     | > loss_duration: 1.47681  (1.48273)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 26.43498  (26.86698)
     | > grad_norm_1: 1155.08057  (610.83356)
     | > current_lr_0: 0.00019 
     | > current_lr_1: 0.00019 
     | > step_time: 3.60530  (3.54833)
     | > loader_time: 0.00800  (0.00851)


[1m   --> STEP: 35/80 -- GLOBAL_STEP: 16275[0m
     | > loss_disc: 2.73435  (2.690



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.01001 [0m(-0.00000)
     | > avg_loss_disc:[91m 2.86619 [0m(+0.15919)
     | > avg_loss_disc_real_0:[91m 0.33670 [0m(+0.21619)
     | > avg_loss_disc_real_1:[91m 0.22160 [0m(+0.03243)
     | > avg_loss_disc_real_2:[91m 0.23320 [0m(+0.04346)
     | > avg_loss_disc_real_3:[91m 0.27694 [0m(+0.01525)
     | > avg_loss_disc_real_4:[91m 0.22854 [0m(+0.01532)
     | > avg_loss_disc_real_5:[91m 0.26172 [0m(+0.03104)
     | > avg_loss_0:[91m 2.86619 [0m(+0.15919)
     | > avg_loss_gen:[91m 1.83638 [0m(+0.20262)
     | > avg_loss_kl:[92m 1.46544 [0m(-0.09612)
     | > avg_loss_feat:[92m 1.86757 [0m(-0.94084)
     | > avg_loss_mel:[92m 17.17998 [0m(-1.25814)
     | > avg_loss_duration:[92m 1.96861 [0m(-0.02865)
     | > avg_loss_1:[92m 24.31800 [0m(-2.12113)


[4m[1m > EPOCH: 204/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 22:20:16) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 5/80 -- GLOBAL_STEP: 16325[0m
     | > loss_disc: 2.68889  (2.70250)
     | > loss_disc_real_0: 0.19245  (0.19540)
     | > loss_disc_real_1: 0.21980  (0.22689)
     | > loss_disc_real_2: 0.21612  (0.21672)
     | > loss_disc_real_3: 0.24256  (0.23656)
     | > loss_disc_real_4: 0.24653  (0.24932)
     | > loss_disc_real_5: 0.25843  (0.24950)
     | > loss_0: 2.68889  (2.70250)
     | > grad_norm_0: 66.28923  (79.08075)
     | > loss_gen: 1.93495  (1.93604)
     | > loss_kl: 1.33827  (1.23388)
     | > loss_feat: 3.03473  (2.88168)
     | > loss_mel: 18.79798  (18.84437)
     | > loss_duration: 1.56328  (1.49587)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 26.66922  (26.39184)
     | > grad_norm_1: 562.91144  (566.76306)
     | > current_lr_0: 0.00019 
     | > current_lr_1: 0.00019 
     | > step_time: 3.53220  (3.53442)
     | > loader_time: 0.00900  (0.00821)


[1m   --> STEP: 30/80 -- GLOBAL_STEP: 16350[0m
     | > loss_disc: 2.80552  (2.72532)



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00100)
     | > avg_loss_disc:[92m 2.60445 [0m(-0.26173)
     | > avg_loss_disc_real_0:[92m 0.18952 [0m(-0.14718)
     | > avg_loss_disc_real_1:[91m 0.22276 [0m(+0.00115)
     | > avg_loss_disc_real_2:[92m 0.18976 [0m(-0.04344)
     | > avg_loss_disc_real_3:[92m 0.16933 [0m(-0.10761)
     | > avg_loss_disc_real_4:[91m 0.24152 [0m(+0.01299)
     | > avg_loss_disc_real_5:[91m 0.26210 [0m(+0.00037)
     | > avg_loss_0:[92m 2.60445 [0m(-0.26173)
     | > avg_loss_gen:[91m 1.84075 [0m(+0.00437)
     | > avg_loss_kl:[91m 1.46802 [0m(+0.00258)
     | > avg_loss_feat:[91m 2.86369 [0m(+0.99611)
     | > avg_loss_mel:[91m 19.35671 [0m(+2.17672)
     | > avg_loss_duration:[91m 2.00865 [0m(+0.04004)
     | > avg_loss_1:[91m 27.53782 [0m(+3.21982)


[4m[1m > EPOCH: 205/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 22:25:50) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 0/80 -- GLOBAL_STEP: 16400[0m
     | > loss_disc: 2.65432  (2.65432)
     | > loss_disc_real_0: 0.20581  (0.20581)
     | > loss_disc_real_1: 0.19714  (0.19714)
     | > loss_disc_real_2: 0.20515  (0.20515)
     | > loss_disc_real_3: 0.20710  (0.20710)
     | > loss_disc_real_4: 0.24286  (0.24286)
     | > loss_disc_real_5: 0.25295  (0.25295)
     | > loss_0: 2.65432  (2.65432)
     | > grad_norm_0: 18.07575  (18.07575)
     | > loss_gen: 1.97204  (1.97204)
     | > loss_kl: 1.00029  (1.00029)
     | > loss_feat: 2.84981  (2.84981)
     | > loss_mel: 18.69586  (18.69586)
     | > loss_duration: 1.48302  (1.48302)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 26.00102  (26.00102)
     | > grad_norm_1: 215.65810  (215.65810)
     | > current_lr_0: 0.00019 
     | > current_lr_1: 0.00019 
     | > step_time: 3.56320  (3.56324)
     | > loader_time: 23.33600  (23.33601)


[1m   --> STEP: 25/80 -- GLOBAL_STEP: 16425[0m
     | > loss_disc: 2.62769  (2.7251



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00801 [0m(-0.00100)
     | > avg_loss_disc:[91m 2.75898 [0m(+0.15453)
     | > avg_loss_disc_real_0:[91m 0.19695 [0m(+0.00742)
     | > avg_loss_disc_real_1:[91m 0.23495 [0m(+0.01219)
     | > avg_loss_disc_real_2:[92m 0.18125 [0m(-0.00851)
     | > avg_loss_disc_real_3:[91m 0.23341 [0m(+0.06409)
     | > avg_loss_disc_real_4:[92m 0.23725 [0m(-0.00428)
     | > avg_loss_disc_real_5:[92m 0.25881 [0m(-0.00329)
     | > avg_loss_0:[91m 2.75898 [0m(+0.15453)
     | > avg_loss_gen:[92m 1.71050 [0m(-0.13025)
     | > avg_loss_kl:[92m 1.43402 [0m(-0.03400)
     | > avg_loss_feat:[92m 2.27586 [0m(-0.58783)
     | > avg_loss_mel:[92m 18.29629 [0m(-1.06042)
     | > avg_loss_duration:[92m 1.97777 [0m(-0.03088)
     | > avg_loss_1:[92m 25.69444 [0m(-1.84338)


[4m[1m > EPOCH: 206/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 22:31:25) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 20/80 -- GLOBAL_STEP: 16500[0m
     | > loss_disc: 2.72137  (2.74284)
     | > loss_disc_real_0: 0.19711  (0.22246)
     | > loss_disc_real_1: 0.28178  (0.23319)
     | > loss_disc_real_2: 0.21527  (0.23221)
     | > loss_disc_real_3: 0.26177  (0.23570)
     | > loss_disc_real_4: 0.27046  (0.23941)
     | > loss_disc_real_5: 0.26663  (0.24642)
     | > loss_0: 2.72137  (2.74284)
     | > grad_norm_0: 14.47750  (34.90220)
     | > loss_gen: 1.77433  (1.91334)
     | > loss_kl: 1.58073  (1.29745)
     | > loss_feat: 2.69150  (2.73778)
     | > loss_mel: 18.55462  (18.81527)
     | > loss_duration: 1.50258  (1.47883)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 26.10375  (26.24268)
     | > grad_norm_1: 581.37341  (373.30865)
     | > current_lr_0: 0.00019 
     | > current_lr_1: 0.00019 
     | > step_time: 3.62430  (3.57488)
     | > loader_time: 0.01000  (0.00876)


[1m   --> STEP: 45/80 -- GLOBAL_STEP: 16525[0m
     | > loss_disc: 2.70623  (2.72327



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00100)
     | > avg_loss_disc:[92m 2.74661 [0m(-0.01237)
     | > avg_loss_disc_real_0:[92m 0.10651 [0m(-0.09044)
     | > avg_loss_disc_real_1:[92m 0.21884 [0m(-0.01612)
     | > avg_loss_disc_real_2:[91m 0.25793 [0m(+0.07668)
     | > avg_loss_disc_real_3:[91m 0.23857 [0m(+0.00515)
     | > avg_loss_disc_real_4:[91m 0.25679 [0m(+0.01954)
     | > avg_loss_disc_real_5:[92m 0.25041 [0m(-0.00841)
     | > avg_loss_0:[92m 2.74661 [0m(-0.01237)
     | > avg_loss_gen:[91m 1.73014 [0m(+0.01964)
     | > avg_loss_kl:[91m 1.77846 [0m(+0.34443)
     | > avg_loss_feat:[92m 2.25652 [0m(-0.01934)
     | > avg_loss_mel:[92m 17.79986 [0m(-0.49643)
     | > avg_loss_duration:[92m 1.97586 [0m(-0.00191)
     | > avg_loss_1:[92m 25.54084 [0m(-0.15360)


[4m[1m > EPOCH: 207/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 22:36:59) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 15/80 -- GLOBAL_STEP: 16575[0m
     | > loss_disc: 2.64855  (2.75425)
     | > loss_disc_real_0: 0.22637  (0.24167)
     | > loss_disc_real_1: 0.21666  (0.23501)
     | > loss_disc_real_2: 0.16453  (0.22410)
     | > loss_disc_real_3: 0.24034  (0.23597)
     | > loss_disc_real_4: 0.21518  (0.24410)
     | > loss_disc_real_5: 0.22930  (0.24563)
     | > loss_0: 2.64855  (2.75425)
     | > grad_norm_0: 22.31487  (69.98076)
     | > loss_gen: 2.01190  (1.93710)
     | > loss_kl: 1.46500  (1.27508)
     | > loss_feat: 2.99856  (2.75138)
     | > loss_mel: 18.97388  (18.79134)
     | > loss_duration: 1.51276  (1.47919)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 26.96210  (26.23408)
     | > grad_norm_1: 191.53387  (401.13025)
     | > current_lr_0: 0.00019 
     | > current_lr_1: 0.00019 
     | > step_time: 3.60130  (3.56358)
     | > loader_time: 0.01000  (0.00894)


[1m   --> STEP: 40/80 -- GLOBAL_STEP: 16600[0m
     | > loss_disc: 2.70666  (2.73289



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[92m 2.71403 [0m(-0.03258)
     | > avg_loss_disc_real_0:[91m 0.17441 [0m(+0.06790)
     | > avg_loss_disc_real_1:[92m 0.21194 [0m(-0.00689)
     | > avg_loss_disc_real_2:[92m 0.18384 [0m(-0.07409)
     | > avg_loss_disc_real_3:[92m 0.21550 [0m(-0.02307)
     | > avg_loss_disc_real_4:[92m 0.24940 [0m(-0.00739)
     | > avg_loss_disc_real_5:[92m 0.24238 [0m(-0.00803)
     | > avg_loss_0:[92m 2.71403 [0m(-0.03258)
     | > avg_loss_gen:[92m 1.69428 [0m(-0.03586)
     | > avg_loss_kl:[92m 1.43215 [0m(-0.34630)
     | > avg_loss_feat:[91m 2.29613 [0m(+0.03961)
     | > avg_loss_mel:[91m 18.22194 [0m(+0.42208)
     | > avg_loss_duration:[91m 1.98091 [0m(+0.00504)
     | > avg_loss_1:[91m 25.62541 [0m(+0.08457)


[4m[1m > EPOCH: 208/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 22:42:34) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 10/80 -- GLOBAL_STEP: 16650[0m
     | > loss_disc: 2.91827  (2.75131)
     | > loss_disc_real_0: 0.25955  (0.22006)
     | > loss_disc_real_1: 0.28319  (0.22947)
     | > loss_disc_real_2: 0.31364  (0.23718)
     | > loss_disc_real_3: 0.25186  (0.23332)
     | > loss_disc_real_4: 0.23418  (0.24337)
     | > loss_disc_real_5: 0.25030  (0.24726)
     | > loss_0: 2.91827  (2.75131)
     | > grad_norm_0: 11.84361  (36.10794)
     | > loss_gen: 1.71977  (1.91396)
     | > loss_kl: 1.11197  (1.23484)
     | > loss_feat: 2.53452  (2.79431)
     | > loss_mel: 18.66077  (18.88667)
     | > loss_duration: 1.46174  (1.47948)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 25.48877  (26.30926)
     | > grad_norm_1: 33.33768  (215.03667)
     | > current_lr_0: 0.00019 
     | > current_lr_1: 0.00019 
     | > step_time: 3.58830  (3.54673)
     | > loader_time: 0.01000  (0.00871)


[1m   --> STEP: 35/80 -- GLOBAL_STEP: 16675[0m
     | > loss_disc: 2.71049  (2.74732)



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00000)
     | > avg_loss_disc:[91m 2.72656 [0m(+0.01253)
     | > avg_loss_disc_real_0:[92m 0.12625 [0m(-0.04816)
     | > avg_loss_disc_real_1:[91m 0.26337 [0m(+0.05143)
     | > avg_loss_disc_real_2:[91m 0.24419 [0m(+0.06035)
     | > avg_loss_disc_real_3:[91m 0.28260 [0m(+0.06710)
     | > avg_loss_disc_real_4:[92m 0.24214 [0m(-0.00726)
     | > avg_loss_disc_real_5:[91m 0.24945 [0m(+0.00707)
     | > avg_loss_0:[91m 2.72656 [0m(+0.01253)
     | > avg_loss_gen:[91m 1.85418 [0m(+0.15990)
     | > avg_loss_kl:[91m 1.81012 [0m(+0.37797)
     | > avg_loss_feat:[92m 2.21415 [0m(-0.08198)
     | > avg_loss_mel:[92m 18.15236 [0m(-0.06958)
     | > avg_loss_duration:[91m 1.99315 [0m(+0.01224)
     | > avg_loss_1:[91m 26.02396 [0m(+0.39855)


[4m[1m > EPOCH: 209/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 22:48:08) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 5/80 -- GLOBAL_STEP: 16725[0m
     | > loss_disc: 2.71198  (2.74993)
     | > loss_disc_real_0: 0.24230  (0.25799)
     | > loss_disc_real_1: 0.21900  (0.22147)
     | > loss_disc_real_2: 0.27293  (0.23425)
     | > loss_disc_real_3: 0.28401  (0.23373)
     | > loss_disc_real_4: 0.25794  (0.24184)
     | > loss_disc_real_5: 0.25697  (0.24649)
     | > loss_0: 2.71198  (2.74993)
     | > grad_norm_0: 20.46173  (49.70550)
     | > loss_gen: 1.80146  (1.91515)
     | > loss_kl: 1.36301  (1.33516)
     | > loss_feat: 2.47932  (2.61864)
     | > loss_mel: 18.06073  (18.76569)
     | > loss_duration: 1.45493  (1.47539)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 25.15945  (26.11003)
     | > grad_norm_1: 177.71681  (406.79819)
     | > current_lr_0: 0.00019 
     | > current_lr_1: 0.00019 
     | > step_time: 3.55620  (3.53442)
     | > loader_time: 0.00800  (0.00781)


[1m   --> STEP: 30/80 -- GLOBAL_STEP: 16750[0m
     | > loss_disc: 2.60796  (2.72074)



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00000)
     | > avg_loss_disc:[91m 2.75005 [0m(+0.02349)
     | > avg_loss_disc_real_0:[92m 0.09248 [0m(-0.03377)
     | > avg_loss_disc_real_1:[92m 0.21105 [0m(-0.05232)
     | > avg_loss_disc_real_2:[92m 0.17773 [0m(-0.06647)
     | > avg_loss_disc_real_3:[92m 0.20251 [0m(-0.08009)
     | > avg_loss_disc_real_4:[92m 0.22649 [0m(-0.01565)
     | > avg_loss_disc_real_5:[92m 0.23073 [0m(-0.01871)
     | > avg_loss_0:[91m 2.75005 [0m(+0.02349)
     | > avg_loss_gen:[92m 1.54010 [0m(-0.31408)
     | > avg_loss_kl:[91m 1.95867 [0m(+0.14855)
     | > avg_loss_feat:[91m 2.76063 [0m(+0.54648)
     | > avg_loss_mel:[91m 19.25485 [0m(+1.10249)
     | > avg_loss_duration:[91m 1.99765 [0m(+0.00450)
     | > avg_loss_1:[91m 27.51189 [0m(+1.48794)


[4m[1m > EPOCH: 210/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 22:53:43) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 0/80 -- GLOBAL_STEP: 16800[0m
     | > loss_disc: 2.63307  (2.63307)
     | > loss_disc_real_0: 0.12384  (0.12384)
     | > loss_disc_real_1: 0.20470  (0.20470)
     | > loss_disc_real_2: 0.17865  (0.17865)
     | > loss_disc_real_3: 0.19153  (0.19153)
     | > loss_disc_real_4: 0.21490  (0.21490)
     | > loss_disc_real_5: 0.23260  (0.23260)
     | > loss_0: 2.63307  (2.63307)
     | > grad_norm_0: 16.94438  (16.94438)
     | > loss_gen: 1.93516  (1.93516)
     | > loss_kl: 1.48111  (1.48111)
     | > loss_feat: 2.93847  (2.93847)
     | > loss_mel: 18.64813  (18.64813)
     | > loss_duration: 1.46079  (1.46079)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 26.46366  (26.46366)
     | > grad_norm_1: 310.58984  (310.58984)
     | > current_lr_0: 0.00019 
     | > current_lr_1: 0.00019 
     | > step_time: 3.56720  (3.56725)
     | > loader_time: 23.32170  (23.32171)


[1m   --> STEP: 25/80 -- GLOBAL_STEP: 16825[0m
     | > loss_disc: 2.78598  (2.7042



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.01001 [0m(+0.00100)
     | > avg_loss_disc:[91m 2.80620 [0m(+0.05615)
     | > avg_loss_disc_real_0:[91m 0.18506 [0m(+0.09258)
     | > avg_loss_disc_real_1:[91m 0.22020 [0m(+0.00915)
     | > avg_loss_disc_real_2:[91m 0.23245 [0m(+0.05473)
     | > avg_loss_disc_real_3:[91m 0.25361 [0m(+0.05110)
     | > avg_loss_disc_real_4:[91m 0.26078 [0m(+0.03429)
     | > avg_loss_disc_real_5:[91m 0.25305 [0m(+0.02232)
     | > avg_loss_0:[91m 2.80620 [0m(+0.05615)
     | > avg_loss_gen:[91m 1.74394 [0m(+0.20384)
     | > avg_loss_kl:[92m 1.54773 [0m(-0.41093)
     | > avg_loss_feat:[92m 2.70664 [0m(-0.05399)
     | > avg_loss_mel:[91m 20.62900 [0m(+1.37415)
     | > avg_loss_duration:[91m 1.99836 [0m(+0.00072)
     | > avg_loss_1:[91m 28.62568 [0m(+1.11379)


[4m[1m > EPOCH: 211/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 22:59:17) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 20/80 -- GLOBAL_STEP: 16900[0m
     | > loss_disc: 2.74015  (2.71729)
     | > loss_disc_real_0: 0.16620  (0.20204)
     | > loss_disc_real_1: 0.19715  (0.22865)
     | > loss_disc_real_2: 0.18647  (0.22849)
     | > loss_disc_real_3: 0.24332  (0.23363)
     | > loss_disc_real_4: 0.25043  (0.23996)
     | > loss_disc_real_5: 0.25450  (0.24599)
     | > loss_0: 2.74015  (2.71729)
     | > grad_norm_0: 18.17733  (29.60404)
     | > loss_gen: 1.90737  (1.88892)
     | > loss_kl: 1.48352  (1.30647)
     | > loss_feat: 2.83670  (2.74090)
     | > loss_mel: 19.77862  (18.89402)
     | > loss_duration: 1.51064  (1.47629)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 27.51685  (26.30659)
     | > grad_norm_1: 308.57434  (300.58102)
     | > current_lr_0: 0.00019 
     | > current_lr_1: 0.00019 
     | > step_time: 3.60030  (3.57054)
     | > loader_time: 0.00900  (0.00911)


[1m   --> STEP: 45/80 -- GLOBAL_STEP: 16925[0m
     | > loss_disc: 2.77512  (2.72385



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.01001 [0m(+0.00000)
     | > avg_loss_disc:[92m 2.73889 [0m(-0.06731)
     | > avg_loss_disc_real_0:[92m 0.08081 [0m(-0.10425)
     | > avg_loss_disc_real_1:[91m 0.22256 [0m(+0.00236)
     | > avg_loss_disc_real_2:[92m 0.21136 [0m(-0.02110)
     | > avg_loss_disc_real_3:[92m 0.19236 [0m(-0.06125)
     | > avg_loss_disc_real_4:[92m 0.21134 [0m(-0.04944)
     | > avg_loss_disc_real_5:[92m 0.24315 [0m(-0.00990)
     | > avg_loss_0:[92m 2.73889 [0m(-0.06731)
     | > avg_loss_gen:[92m 1.54600 [0m(-0.19794)
     | > avg_loss_kl:[92m 1.47803 [0m(-0.06970)
     | > avg_loss_feat:[92m 2.47246 [0m(-0.23418)
     | > avg_loss_mel:[92m 18.44438 [0m(-2.18462)
     | > avg_loss_duration:[92m 1.97167 [0m(-0.02670)
     | > avg_loss_1:[92m 25.91254 [0m(-2.71315)


[4m[1m > EPOCH: 212/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 23:04:52) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 15/80 -- GLOBAL_STEP: 16975[0m
     | > loss_disc: 2.84554  (2.75289)
     | > loss_disc_real_0: 0.28003  (0.21853)
     | > loss_disc_real_1: 0.22721  (0.22875)
     | > loss_disc_real_2: 0.31711  (0.23295)
     | > loss_disc_real_3: 0.24955  (0.23693)
     | > loss_disc_real_4: 0.25622  (0.24539)
     | > loss_disc_real_5: 0.25426  (0.24704)
     | > loss_0: 2.84554  (2.75289)
     | > grad_norm_0: 48.87970  (54.35582)
     | > loss_gen: 1.84717  (1.90765)
     | > loss_kl: 1.20128  (1.28109)
     | > loss_feat: 2.53707  (2.75879)
     | > loss_mel: 18.77067  (18.75523)
     | > loss_duration: 1.48501  (1.46911)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 25.84121  (26.17186)
     | > grad_norm_1: 338.82501  (460.58957)
     | > current_lr_0: 0.00019 
     | > current_lr_1: 0.00019 
     | > step_time: 3.64230  (3.57406)
     | > loader_time: 0.00900  (0.00840)


[1m   --> STEP: 40/80 -- GLOBAL_STEP: 17000[0m
     | > loss_disc: 2.64769  (2.73405



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00100)
     | > avg_loss_disc:[92m 2.59414 [0m(-0.14475)
     | > avg_loss_disc_real_0:[91m 0.13665 [0m(+0.05584)
     | > avg_loss_disc_real_1:[92m 0.21038 [0m(-0.01218)
     | > avg_loss_disc_real_2:[91m 0.22749 [0m(+0.01613)
     | > avg_loss_disc_real_3:[91m 0.21618 [0m(+0.02383)
     | > avg_loss_disc_real_4:[91m 0.24662 [0m(+0.03527)
     | > avg_loss_disc_real_5:[92m 0.24015 [0m(-0.00300)
     | > avg_loss_0:[92m 2.59414 [0m(-0.14475)
     | > avg_loss_gen:[91m 1.79988 [0m(+0.25388)
     | > avg_loss_kl:[92m 1.17827 [0m(-0.29976)
     | > avg_loss_feat:[91m 2.82054 [0m(+0.34808)
     | > avg_loss_mel:[91m 18.92857 [0m(+0.48420)
     | > avg_loss_duration:[91m 1.99605 [0m(+0.02438)
     | > avg_loss_1:[91m 26.72332 [0m(+0.81078)


[4m[1m > EPOCH: 213/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 23:10:28) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 10/80 -- GLOBAL_STEP: 17050[0m
     | > loss_disc: 2.77175  (2.70649)
     | > loss_disc_real_0: 0.14637  (0.17932)
     | > loss_disc_real_1: 0.29200  (0.24262)
     | > loss_disc_real_2: 0.21290  (0.23189)
     | > loss_disc_real_3: 0.28174  (0.23950)
     | > loss_disc_real_4: 0.28628  (0.24236)
     | > loss_disc_real_5: 0.27696  (0.24345)
     | > loss_0: 2.77175  (2.70649)
     | > grad_norm_0: 70.06712  (107.51037)
     | > loss_gen: 2.07500  (2.00417)
     | > loss_kl: 1.67695  (1.32851)
     | > loss_feat: 2.58844  (2.90762)
     | > loss_mel: 18.32917  (18.83027)
     | > loss_duration: 1.44940  (1.47396)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 26.11897  (26.54453)
     | > grad_norm_1: 150.82320  (627.95697)
     | > current_lr_0: 0.00019 
     | > current_lr_1: 0.00019 
     | > step_time: 3.58430  (3.56154)
     | > loader_time: 0.00900  (0.00841)


[1m   --> STEP: 35/80 -- GLOBAL_STEP: 17075[0m
     | > loss_disc: 2.71989  (2.7177



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.01001 [0m(+0.00100)
     | > avg_loss_disc:[91m 2.65660 [0m(+0.06246)
     | > avg_loss_disc_real_0:[91m 0.19624 [0m(+0.05959)
     | > avg_loss_disc_real_1:[91m 0.21314 [0m(+0.00276)
     | > avg_loss_disc_real_2:[91m 0.25226 [0m(+0.02477)
     | > avg_loss_disc_real_3:[91m 0.22564 [0m(+0.00945)
     | > avg_loss_disc_real_4:[91m 0.25434 [0m(+0.00772)
     | > avg_loss_disc_real_5:[92m 0.23875 [0m(-0.00140)
     | > avg_loss_0:[91m 2.65660 [0m(+0.06246)
     | > avg_loss_gen:[91m 1.91222 [0m(+0.11234)
     | > avg_loss_kl:[91m 1.26007 [0m(+0.08180)
     | > avg_loss_feat:[92m 2.62696 [0m(-0.19358)
     | > avg_loss_mel:[91m 19.19387 [0m(+0.26530)
     | > avg_loss_duration:[92m 1.98160 [0m(-0.01445)
     | > avg_loss_1:[91m 26.97473 [0m(+0.25142)


[4m[1m > EPOCH: 214/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 23:16:04) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 5/80 -- GLOBAL_STEP: 17125[0m
     | > loss_disc: 2.76595  (2.69738)
     | > loss_disc_real_0: 0.22762  (0.20082)
     | > loss_disc_real_1: 0.22297  (0.22042)
     | > loss_disc_real_2: 0.19807  (0.21284)
     | > loss_disc_real_3: 0.19939  (0.23588)
     | > loss_disc_real_4: 0.22236  (0.23944)
     | > loss_disc_real_5: 0.26454  (0.24692)
     | > loss_0: 2.76595  (2.69738)
     | > grad_norm_0: 145.61807  (85.92094)
     | > loss_gen: 2.01661  (1.94570)
     | > loss_kl: 1.41778  (1.33686)
     | > loss_feat: 2.83615  (2.88462)
     | > loss_mel: 18.30485  (18.73984)
     | > loss_duration: 1.46938  (1.47296)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 26.04477  (26.37999)
     | > grad_norm_1: 1069.41174  (643.57489)
     | > current_lr_0: 0.00019 
     | > current_lr_1: 0.00019 
     | > step_time: 3.56620  (3.53882)
     | > loader_time: 0.01000  (0.00861)


[1m   --> STEP: 30/80 -- GLOBAL_STEP: 17150[0m
     | > loss_disc: 2.75555  (2.7144



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00100)
     | > avg_loss_disc:[91m 2.79791 [0m(+0.14131)
     | > avg_loss_disc_real_0:[91m 0.28687 [0m(+0.09063)
     | > avg_loss_disc_real_1:[91m 0.29527 [0m(+0.08213)
     | > avg_loss_disc_real_2:[92m 0.21600 [0m(-0.03626)
     | > avg_loss_disc_real_3:[91m 0.26620 [0m(+0.04056)
     | > avg_loss_disc_real_4:[92m 0.24664 [0m(-0.00769)
     | > avg_loss_disc_real_5:[91m 0.26922 [0m(+0.03047)
     | > avg_loss_0:[91m 2.79791 [0m(+0.14131)
     | > avg_loss_gen:[91m 2.02342 [0m(+0.11120)
     | > avg_loss_kl:[91m 1.43635 [0m(+0.17628)
     | > avg_loss_feat:[92m 2.41682 [0m(-0.21014)
     | > avg_loss_mel:[92m 18.28302 [0m(-0.91085)
     | > avg_loss_duration:[91m 1.99254 [0m(+0.01094)
     | > avg_loss_1:[92m 26.15215 [0m(-0.82258)


[4m[1m > EPOCH: 215/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 23:21:40) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 0/80 -- GLOBAL_STEP: 17200[0m
     | > loss_disc: 2.84844  (2.84844)
     | > loss_disc_real_0: 0.29390  (0.29390)
     | > loss_disc_real_1: 0.27067  (0.27067)
     | > loss_disc_real_2: 0.21602  (0.21602)
     | > loss_disc_real_3: 0.27637  (0.27637)
     | > loss_disc_real_4: 0.23589  (0.23589)
     | > loss_disc_real_5: 0.25366  (0.25366)
     | > loss_0: 2.84844  (2.84844)
     | > grad_norm_0: 32.60044  (32.60044)
     | > loss_gen: 1.73599  (1.73599)
     | > loss_kl: 1.44435  (1.44435)
     | > loss_feat: 2.36336  (2.36336)
     | > loss_mel: 18.16010  (18.16010)
     | > loss_duration: 1.53955  (1.53955)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 25.24336  (25.24336)
     | > grad_norm_1: 84.15977  (84.15977)
     | > current_lr_0: 0.00019 
     | > current_lr_1: 0.00019 
     | > step_time: 3.59530  (3.59527)
     | > loader_time: 23.68110  (23.68113)


[1m   --> STEP: 25/80 -- GLOBAL_STEP: 17225[0m
     | > loss_disc: 2.65888  (2.74058)



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00801 [0m(-0.00100)
     | > avg_loss_disc:[92m 2.76164 [0m(-0.03628)
     | > avg_loss_disc_real_0:[92m 0.13310 [0m(-0.15377)
     | > avg_loss_disc_real_1:[92m 0.23799 [0m(-0.05728)
     | > avg_loss_disc_real_2:[92m 0.19660 [0m(-0.01941)
     | > avg_loss_disc_real_3:[92m 0.26482 [0m(-0.00137)
     | > avg_loss_disc_real_4:[91m 0.29164 [0m(+0.04500)
     | > avg_loss_disc_real_5:[92m 0.25339 [0m(-0.01583)
     | > avg_loss_0:[92m 2.76164 [0m(-0.03628)
     | > avg_loss_gen:[92m 1.74796 [0m(-0.27546)
     | > avg_loss_kl:[91m 1.71771 [0m(+0.28136)
     | > avg_loss_feat:[92m 2.16038 [0m(-0.25644)
     | > avg_loss_mel:[92m 18.01453 [0m(-0.26850)
     | > avg_loss_duration:[92m 1.96978 [0m(-0.02276)
     | > avg_loss_1:[92m 25.61036 [0m(-0.54180)


[4m[1m > EPOCH: 216/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 23:27:17) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 20/80 -- GLOBAL_STEP: 17300[0m
     | > loss_disc: 2.87450  (2.71963)
     | > loss_disc_real_0: 0.45597  (0.22341)
     | > loss_disc_real_1: 0.25983  (0.22893)
     | > loss_disc_real_2: 0.25738  (0.22607)
     | > loss_disc_real_3: 0.25740  (0.23373)
     | > loss_disc_real_4: 0.24406  (0.23851)
     | > loss_disc_real_5: 0.24404  (0.24404)
     | > loss_0: 2.87450  (2.71963)
     | > grad_norm_0: 101.26129  (125.01051)
     | > loss_gen: 2.18623  (2.01086)
     | > loss_kl: 1.35853  (1.30648)
     | > loss_feat: 2.65560  (2.93121)
     | > loss_mel: 19.37148  (18.81184)
     | > loss_duration: 1.49802  (1.46714)
     | > amp_scaler: 256.00000  (160.00000)
     | > loss_1: 27.06986  (26.52754)
     | > grad_norm_1: 466.75516  (677.81372)
     | > current_lr_0: 0.00019 
     | > current_lr_1: 0.00019 
     | > step_time: 3.63030  (3.58676)
     | > loader_time: 0.01000  (0.00876)


[1m   --> STEP: 45/80 -- GLOBAL_STEP: 17325[0m
     | > loss_disc: 2.74268  (2.720



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.01001 [0m(+0.00200)
     | > avg_loss_disc:[92m 2.52686 [0m(-0.23478)
     | > avg_loss_disc_real_0:[92m 0.12247 [0m(-0.01063)
     | > avg_loss_disc_real_1:[92m 0.15031 [0m(-0.08768)
     | > avg_loss_disc_real_2:[92m 0.15811 [0m(-0.03848)
     | > avg_loss_disc_real_3:[92m 0.21276 [0m(-0.05207)
     | > avg_loss_disc_real_4:[92m 0.20437 [0m(-0.08727)
     | > avg_loss_disc_real_5:[92m 0.22971 [0m(-0.02368)
     | > avg_loss_0:[92m 2.52686 [0m(-0.23478)
     | > avg_loss_gen:[92m 1.69925 [0m(-0.04871)
     | > avg_loss_kl:[92m 1.53648 [0m(-0.18123)
     | > avg_loss_feat:[91m 3.29759 [0m(+1.13721)
     | > avg_loss_mel:[91m 19.93563 [0m(+1.92111)
     | > avg_loss_duration:[91m 1.99163 [0m(+0.02185)
     | > avg_loss_1:[91m 28.46059 [0m(+2.85023)


[4m[1m > EPOCH: 217/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 23:32:53) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 15/80 -- GLOBAL_STEP: 17375[0m
     | > loss_disc: 2.68823  (2.70735)
     | > loss_disc_real_0: 0.27402  (0.20661)
     | > loss_disc_real_1: 0.20600  (0.22737)
     | > loss_disc_real_2: 0.22126  (0.22719)
     | > loss_disc_real_3: 0.22921  (0.23677)
     | > loss_disc_real_4: 0.22835  (0.24482)
     | > loss_disc_real_5: 0.22905  (0.24437)
     | > loss_0: 2.68823  (2.70735)
     | > grad_norm_0: 100.55657  (83.99478)
     | > loss_gen: 1.87183  (1.94251)
     | > loss_kl: 1.27457  (1.30232)
     | > loss_feat: 2.88539  (2.87755)
     | > loss_mel: 18.76277  (18.59605)
     | > loss_duration: 1.47130  (1.46730)
     | > amp_scaler: 256.00000  (256.00000)
     | > loss_1: 26.26585  (26.18573)
     | > grad_norm_1: 829.40741  (570.49622)
     | > current_lr_0: 0.00019 
     | > current_lr_1: 0.00019 
     | > step_time: 3.62530  (3.58200)
     | > loader_time: 0.00900  (0.00801)


[1m   --> STEP: 40/80 -- GLOBAL_STEP: 17400[0m
     | > loss_disc: 2.85939  (2.7142



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00100)
     | > avg_loss_disc:[91m 2.68311 [0m(+0.15626)
     | > avg_loss_disc_real_0:[91m 0.14323 [0m(+0.02076)
     | > avg_loss_disc_real_1:[91m 0.22209 [0m(+0.07178)
     | > avg_loss_disc_real_2:[91m 0.36484 [0m(+0.20673)
     | > avg_loss_disc_real_3:[91m 0.23712 [0m(+0.02436)
     | > avg_loss_disc_real_4:[91m 0.23349 [0m(+0.02912)
     | > avg_loss_disc_real_5:[91m 0.23200 [0m(+0.00229)
     | > avg_loss_0:[91m 2.68311 [0m(+0.15626)
     | > avg_loss_gen:[91m 1.94443 [0m(+0.24518)
     | > avg_loss_kl:[91m 1.85788 [0m(+0.32140)
     | > avg_loss_feat:[92m 2.55916 [0m(-0.73843)
     | > avg_loss_mel:[92m 17.47896 [0m(-2.45668)
     | > avg_loss_duration:[91m 1.99700 [0m(+0.00537)
     | > avg_loss_1:[92m 25.83743 [0m(-2.62316)


[4m[1m > EPOCH: 218/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 23:38:29) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 10/80 -- GLOBAL_STEP: 17450[0m
     | > loss_disc: 2.72297  (2.67713)
     | > loss_disc_real_0: 0.17716  (0.18807)
     | > loss_disc_real_1: 0.19935  (0.23144)
     | > loss_disc_real_2: 0.16917  (0.22314)
     | > loss_disc_real_3: 0.23298  (0.22724)
     | > loss_disc_real_4: 0.24003  (0.24362)
     | > loss_disc_real_5: 0.25060  (0.24395)
     | > loss_0: 2.72297  (2.67713)
     | > grad_norm_0: 65.36622  (72.77594)
     | > loss_gen: 1.97874  (1.95768)
     | > loss_kl: 1.34199  (1.33751)
     | > loss_feat: 2.91849  (2.88019)
     | > loss_mel: 18.41291  (18.73384)
     | > loss_duration: 1.42471  (1.46055)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 26.07684  (26.36977)
     | > grad_norm_1: 908.16223  (624.81342)
     | > current_lr_0: 0.00019 
     | > current_lr_1: 0.00019 
     | > step_time: 3.59930  (3.56144)
     | > loader_time: 0.00900  (0.00831)


[1m   --> STEP: 35/80 -- GLOBAL_STEP: 17475[0m
     | > loss_disc: 2.66227  (2.71792



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00801 [0m(-0.00100)
     | > avg_loss_disc:[92m 2.66234 [0m(-0.02078)
     | > avg_loss_disc_real_0:[91m 0.22320 [0m(+0.07997)
     | > avg_loss_disc_real_1:[92m 0.16989 [0m(-0.05220)
     | > avg_loss_disc_real_2:[92m 0.22961 [0m(-0.13523)
     | > avg_loss_disc_real_3:[91m 0.23757 [0m(+0.00045)
     | > avg_loss_disc_real_4:[92m 0.21842 [0m(-0.01507)
     | > avg_loss_disc_real_5:[91m 0.25683 [0m(+0.02483)
     | > avg_loss_0:[92m 2.66234 [0m(-0.02078)
     | > avg_loss_gen:[92m 1.84287 [0m(-0.10156)
     | > avg_loss_kl:[92m 1.58440 [0m(-0.27348)
     | > avg_loss_feat:[92m 2.41176 [0m(-0.14741)
     | > avg_loss_mel:[91m 19.19585 [0m(+1.71690)
     | > avg_loss_duration:[91m 2.00898 [0m(+0.01197)
     | > avg_loss_1:[91m 27.04386 [0m(+1.20643)


[4m[1m > EPOCH: 219/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 23:44:05) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 5/80 -- GLOBAL_STEP: 17525[0m
     | > loss_disc: 2.68971  (2.66132)
     | > loss_disc_real_0: 0.28150  (0.18902)
     | > loss_disc_real_1: 0.21468  (0.22209)
     | > loss_disc_real_2: 0.20509  (0.21611)
     | > loss_disc_real_3: 0.27280  (0.24145)
     | > loss_disc_real_4: 0.24266  (0.24037)
     | > loss_disc_real_5: 0.25136  (0.23723)
     | > loss_0: 2.68971  (2.66132)
     | > grad_norm_0: 118.34274  (70.53174)
     | > loss_gen: 2.07616  (1.96917)
     | > loss_kl: 1.36285  (1.30726)
     | > loss_feat: 3.30804  (2.92795)
     | > loss_mel: 19.03347  (18.78974)
     | > loss_duration: 1.48206  (1.46504)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 27.26257  (26.45916)
     | > grad_norm_1: 1177.82263  (551.68665)
     | > current_lr_0: 0.00019 
     | > current_lr_1: 0.00019 
     | > step_time: 3.57630  (3.54143)
     | > loader_time: 0.00800  (0.00840)


[1m   --> STEP: 30/80 -- GLOBAL_STEP: 17550[0m
     | > loss_disc: 2.72403  (2.7139



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00100)
     | > avg_loss_disc:[91m 2.78906 [0m(+0.12672)
     | > avg_loss_disc_real_0:[91m 0.24382 [0m(+0.02061)
     | > avg_loss_disc_real_1:[91m 0.24910 [0m(+0.07921)
     | > avg_loss_disc_real_2:[92m 0.20506 [0m(-0.02455)
     | > avg_loss_disc_real_3:[91m 0.26443 [0m(+0.02686)
     | > avg_loss_disc_real_4:[91m 0.24107 [0m(+0.02266)
     | > avg_loss_disc_real_5:[92m 0.25525 [0m(-0.00158)
     | > avg_loss_0:[91m 2.78906 [0m(+0.12672)
     | > avg_loss_gen:[92m 1.82162 [0m(-0.02125)
     | > avg_loss_kl:[92m 1.38579 [0m(-0.19861)
     | > avg_loss_feat:[92m 2.21407 [0m(-0.19768)
     | > avg_loss_mel:[92m 18.76516 [0m(-0.43069)
     | > avg_loss_duration:[92m 1.98954 [0m(-0.01943)
     | > avg_loss_1:[92m 26.17619 [0m(-0.86767)


[4m[1m > EPOCH: 220/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 23:49:41) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 0/80 -- GLOBAL_STEP: 17600[0m
     | > loss_disc: 2.72229  (2.72229)
     | > loss_disc_real_0: 0.18394  (0.18394)
     | > loss_disc_real_1: 0.23320  (0.23320)
     | > loss_disc_real_2: 0.19515  (0.19515)
     | > loss_disc_real_3: 0.22708  (0.22708)
     | > loss_disc_real_4: 0.22013  (0.22013)
     | > loss_disc_real_5: 0.23473  (0.23473)
     | > loss_0: 2.72229  (2.72229)
     | > grad_norm_0: 31.20443  (31.20443)
     | > loss_gen: 1.94304  (1.94304)
     | > loss_kl: 0.79764  (0.79764)
     | > loss_feat: 2.76374  (2.76374)
     | > loss_mel: 18.69302  (18.69302)
     | > loss_duration: 1.47492  (1.47492)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 25.67235  (25.67235)
     | > grad_norm_1: 459.92044  (459.92044)
     | > current_lr_0: 0.00019 
     | > current_lr_1: 0.00019 
     | > step_time: 3.57430  (3.57425)
     | > loader_time: 23.66530  (23.66535)


[1m   --> STEP: 25/80 -- GLOBAL_STEP: 17625[0m
     | > loss_disc: 2.67743  (2.7248



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00000)
     | > avg_loss_disc:[92m 2.66842 [0m(-0.12063)
     | > avg_loss_disc_real_0:[92m 0.16073 [0m(-0.08309)
     | > avg_loss_disc_real_1:[92m 0.22021 [0m(-0.02889)
     | > avg_loss_disc_real_2:[91m 0.23762 [0m(+0.03256)
     | > avg_loss_disc_real_3:[92m 0.24483 [0m(-0.01960)
     | > avg_loss_disc_real_4:[91m 0.27364 [0m(+0.03256)
     | > avg_loss_disc_real_5:[92m 0.24916 [0m(-0.00609)
     | > avg_loss_0:[92m 2.66842 [0m(-0.12063)
     | > avg_loss_gen:[91m 1.88966 [0m(+0.06804)
     | > avg_loss_kl:[91m 1.69431 [0m(+0.30852)
     | > avg_loss_feat:[91m 3.02398 [0m(+0.80990)
     | > avg_loss_mel:[91m 19.78124 [0m(+1.01608)
     | > avg_loss_duration:[91m 2.00422 [0m(+0.01467)
     | > avg_loss_1:[91m 28.39341 [0m(+2.21722)


[4m[1m > EPOCH: 221/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-23 23:55:17) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 20/80 -- GLOBAL_STEP: 17700[0m
     | > loss_disc: 2.70510  (2.70946)
     | > loss_disc_real_0: 0.19685  (0.20166)
     | > loss_disc_real_1: 0.19573  (0.23314)
     | > loss_disc_real_2: 0.21031  (0.22800)
     | > loss_disc_real_3: 0.28220  (0.23807)
     | > loss_disc_real_4: 0.24690  (0.24281)
     | > loss_disc_real_5: 0.25711  (0.24586)
     | > loss_0: 2.70510  (2.70946)
     | > grad_norm_0: 22.04682  (33.38823)
     | > loss_gen: 1.91652  (1.91345)
     | > loss_kl: 1.27585  (1.24800)
     | > loss_feat: 2.48445  (2.82483)
     | > loss_mel: 17.61396  (18.44752)
     | > loss_duration: 1.51204  (1.46880)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 24.80282  (25.90260)
     | > grad_norm_1: 204.38506  (366.57468)
     | > current_lr_0: 0.00019 
     | > current_lr_1: 0.00019 
     | > step_time: 3.62230  (3.58546)
     | > loader_time: 0.01100  (0.00896)


[1m   --> STEP: 45/80 -- GLOBAL_STEP: 17725[0m
     | > loss_disc: 2.60773  (2.70511



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00000)
     | > avg_loss_disc:[91m 2.70911 [0m(+0.04069)
     | > avg_loss_disc_real_0:[92m 0.12865 [0m(-0.03208)
     | > avg_loss_disc_real_1:[91m 0.24122 [0m(+0.02101)
     | > avg_loss_disc_real_2:[91m 0.23856 [0m(+0.00094)
     | > avg_loss_disc_real_3:[92m 0.24345 [0m(-0.00138)
     | > avg_loss_disc_real_4:[92m 0.20695 [0m(-0.06668)
     | > avg_loss_disc_real_5:[91m 0.26144 [0m(+0.01228)
     | > avg_loss_0:[91m 2.70911 [0m(+0.04069)
     | > avg_loss_gen:[92m 1.75842 [0m(-0.13124)
     | > avg_loss_kl:[92m 1.56340 [0m(-0.13092)
     | > avg_loss_feat:[92m 2.59081 [0m(-0.43317)
     | > avg_loss_mel:[92m 18.66848 [0m(-1.11277)
     | > avg_loss_duration:[92m 2.00119 [0m(-0.00303)
     | > avg_loss_1:[92m 26.58229 [0m(-1.81111)


[4m[1m > EPOCH: 222/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-24 00:00:53) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 15/80 -- GLOBAL_STEP: 17775[0m
     | > loss_disc: 2.65028  (2.71104)
     | > loss_disc_real_0: 0.17357  (0.20171)
     | > loss_disc_real_1: 0.22839  (0.22848)
     | > loss_disc_real_2: 0.19475  (0.22228)
     | > loss_disc_real_3: 0.25744  (0.23854)
     | > loss_disc_real_4: 0.21038  (0.24068)
     | > loss_disc_real_5: 0.22731  (0.24651)
     | > loss_0: 2.65028  (2.71104)
     | > grad_norm_0: 103.99232  (56.53892)
     | > loss_gen: 1.98581  (1.91518)
     | > loss_kl: 1.49585  (1.31717)
     | > loss_feat: 3.14374  (2.86517)
     | > loss_mel: 18.47952  (18.47087)
     | > loss_duration: 1.45613  (1.47253)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 26.56106  (26.04092)
     | > grad_norm_1: 563.26776  (452.50656)
     | > current_lr_0: 0.00019 
     | > current_lr_1: 0.00019 
     | > step_time: 3.61030  (3.57345)
     | > loader_time: 0.01000  (0.00847)


[1m   --> STEP: 40/80 -- GLOBAL_STEP: 17800[0m
     | > loss_disc: 2.75892  (2.6848



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00801 [0m(-0.00100)
     | > avg_loss_disc:[91m 2.80241 [0m(+0.09330)
     | > avg_loss_disc_real_0:[91m 0.26577 [0m(+0.13712)
     | > avg_loss_disc_real_1:[92m 0.21734 [0m(-0.02388)
     | > avg_loss_disc_real_2:[91m 0.30294 [0m(+0.06438)
     | > avg_loss_disc_real_3:[92m 0.23003 [0m(-0.01343)
     | > avg_loss_disc_real_4:[91m 0.27747 [0m(+0.07052)
     | > avg_loss_disc_real_5:[91m 0.26715 [0m(+0.00571)
     | > avg_loss_0:[91m 2.80241 [0m(+0.09330)
     | > avg_loss_gen:[91m 1.93846 [0m(+0.18003)
     | > avg_loss_kl:[91m 1.75632 [0m(+0.19292)
     | > avg_loss_feat:[92m 2.27095 [0m(-0.31986)
     | > avg_loss_mel:[91m 19.25285 [0m(+0.58438)
     | > avg_loss_duration:[91m 2.00534 [0m(+0.00415)
     | > avg_loss_1:[91m 27.22391 [0m(+0.64161)


[4m[1m > EPOCH: 223/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-24 00:06:30) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 10/80 -- GLOBAL_STEP: 17850[0m
     | > loss_disc: 2.71315  (2.71482)
     | > loss_disc_real_0: 0.15665  (0.20619)
     | > loss_disc_real_1: 0.23023  (0.22966)
     | > loss_disc_real_2: 0.22551  (0.21907)
     | > loss_disc_real_3: 0.21242  (0.23818)
     | > loss_disc_real_4: 0.22155  (0.23897)
     | > loss_disc_real_5: 0.22995  (0.24259)
     | > loss_0: 2.71315  (2.71482)
     | > grad_norm_0: 71.07343  (81.35590)
     | > loss_gen: 1.89180  (1.94214)
     | > loss_kl: 1.43803  (1.24802)
     | > loss_feat: 2.95295  (2.97270)
     | > loss_mel: 18.26947  (18.63658)
     | > loss_duration: 1.42594  (1.46499)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 25.97819  (26.26443)
     | > grad_norm_1: 345.25766  (744.46527)
     | > current_lr_0: 0.00019 
     | > current_lr_1: 0.00019 
     | > step_time: 3.60630  (3.56214)
     | > loader_time: 0.00800  (0.00821)


[1m   --> STEP: 35/80 -- GLOBAL_STEP: 17875[0m
     | > loss_disc: 2.72071  (2.70209



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.01001 [0m(+0.00200)
     | > avg_loss_disc:[92m 2.78068 [0m(-0.02174)
     | > avg_loss_disc_real_0:[92m 0.24261 [0m(-0.02316)
     | > avg_loss_disc_real_1:[91m 0.24953 [0m(+0.03219)
     | > avg_loss_disc_real_2:[92m 0.23403 [0m(-0.06891)
     | > avg_loss_disc_real_3:[91m 0.24535 [0m(+0.01532)
     | > avg_loss_disc_real_4:[92m 0.23894 [0m(-0.03853)
     | > avg_loss_disc_real_5:[92m 0.23795 [0m(-0.02920)
     | > avg_loss_0:[92m 2.78068 [0m(-0.02174)
     | > avg_loss_gen:[92m 1.80807 [0m(-0.13039)
     | > avg_loss_kl:[92m 1.43196 [0m(-0.32436)
     | > avg_loss_feat:[91m 2.29682 [0m(+0.02588)
     | > avg_loss_mel:[92m 18.20638 [0m(-1.04647)
     | > avg_loss_duration:[92m 1.96461 [0m(-0.04072)
     | > avg_loss_1:[92m 25.70784 [0m(-1.51607)


[4m[1m > EPOCH: 224/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-24 00:12:06) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 5/80 -- GLOBAL_STEP: 17925[0m
     | > loss_disc: 2.69186  (2.68683)
     | > loss_disc_real_0: 0.18860  (0.20214)
     | > loss_disc_real_1: 0.21532  (0.23389)
     | > loss_disc_real_2: 0.25277  (0.22330)
     | > loss_disc_real_3: 0.23330  (0.22701)
     | > loss_disc_real_4: 0.22356  (0.22986)
     | > loss_disc_real_5: 0.25561  (0.25302)
     | > loss_0: 2.69186  (2.68683)
     | > grad_norm_0: 156.88326  (71.05019)
     | > loss_gen: 2.05302  (2.00239)
     | > loss_kl: 1.54719  (1.36485)
     | > loss_feat: 3.22762  (3.04610)
     | > loss_mel: 19.82690  (19.22335)
     | > loss_duration: 1.51277  (1.48230)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 28.16750  (27.11900)
     | > grad_norm_1: 1428.81396  (613.16632)
     | > current_lr_0: 0.00019 
     | > current_lr_1: 0.00019 
     | > step_time: 3.58230  (3.54262)
     | > loader_time: 0.00800  (0.00821)


[1m   --> STEP: 30/80 -- GLOBAL_STEP: 17950[0m
     | > loss_disc: 2.74874  (2.7170



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00100)
     | > avg_loss_disc:[92m 2.77633 [0m(-0.00435)
     | > avg_loss_disc_real_0:[91m 0.35632 [0m(+0.11371)
     | > avg_loss_disc_real_1:[91m 0.25211 [0m(+0.00257)
     | > avg_loss_disc_real_2:[91m 0.26170 [0m(+0.02766)
     | > avg_loss_disc_real_3:[91m 0.25584 [0m(+0.01049)
     | > avg_loss_disc_real_4:[91m 0.25665 [0m(+0.01771)
     | > avg_loss_disc_real_5:[91m 0.27161 [0m(+0.03366)
     | > avg_loss_0:[92m 2.77633 [0m(-0.00435)
     | > avg_loss_gen:[91m 2.12927 [0m(+0.32120)
     | > avg_loss_kl:[91m 1.59793 [0m(+0.16597)
     | > avg_loss_feat:[92m 1.84016 [0m(-0.45666)
     | > avg_loss_mel:[92m 15.42675 [0m(-2.77963)
     | > avg_loss_duration:[91m 2.03254 [0m(+0.06793)
     | > avg_loss_1:[92m 23.02665 [0m(-2.68119)

 > BEST MODEL : ./output\vits_vctk-September-23-2022_02+46AM-3c624ce\best_model_18000.pth

[4m[1m > EPOCH: 225/1000[0m
 --> ./output\vits_vctk-S



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-24 00:17:46) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 0/80 -- GLOBAL_STEP: 18000[0m
     | > loss_disc: 2.62910  (2.62910)
     | > loss_disc_real_0: 0.25564  (0.25564)
     | > loss_disc_real_1: 0.19357  (0.19357)
     | > loss_disc_real_2: 0.21678  (0.21678)
     | > loss_disc_real_3: 0.22937  (0.22937)
     | > loss_disc_real_4: 0.25505  (0.25505)
     | > loss_disc_real_5: 0.24126  (0.24126)
     | > loss_0: 2.62910  (2.62910)
     | > grad_norm_0: 118.69462  (118.69462)
     | > loss_gen: 2.07887  (2.07887)
     | > loss_kl: 1.10175  (1.10175)
     | > loss_feat: 3.06786  (3.06786)
     | > loss_mel: 18.93177  (18.93177)
     | > loss_duration: 1.56222  (1.56222)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 26.74248  (26.74248)
     | > grad_norm_1: 996.03790  (996.03790)
     | > current_lr_0: 0.00019 
     | > current_lr_1: 0.00019 
     | > step_time: 3.56820  (3.56825)
     | > loader_time: 24.13310  (24.13307)


[1m   --> STEP: 25/80 -- GLOBAL_STEP: 18025[0m
     | > loss_disc: 2.70882  (2.70



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.01001 [0m(+0.00100)
     | > avg_loss_disc:[91m 2.93662 [0m(+0.16030)
     | > avg_loss_disc_real_0:[91m 0.38258 [0m(+0.02627)
     | > avg_loss_disc_real_1:[92m 0.21776 [0m(-0.03434)
     | > avg_loss_disc_real_2:[92m 0.21217 [0m(-0.04952)
     | > avg_loss_disc_real_3:[91m 0.26189 [0m(+0.00604)
     | > avg_loss_disc_real_4:[91m 0.29252 [0m(+0.03588)
     | > avg_loss_disc_real_5:[91m 0.27867 [0m(+0.00705)
     | > avg_loss_0:[91m 2.93662 [0m(+0.16030)
     | > avg_loss_gen:[92m 1.94436 [0m(-0.18491)
     | > avg_loss_kl:[92m 1.29551 [0m(-0.30242)
     | > avg_loss_feat:[91m 2.19202 [0m(+0.35186)
     | > avg_loss_mel:[91m 17.48458 [0m(+2.05783)
     | > avg_loss_duration:[92m 1.98558 [0m(-0.04696)
     | > avg_loss_1:[91m 24.90204 [0m(+1.87539)


[4m[1m > EPOCH: 226/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-24 00:23:23) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 20/80 -- GLOBAL_STEP: 18100[0m
     | > loss_disc: 2.67411  (2.73005)
     | > loss_disc_real_0: 0.24549  (0.19630)
     | > loss_disc_real_1: 0.22996  (0.24068)
     | > loss_disc_real_2: 0.18293  (0.22656)
     | > loss_disc_real_3: 0.22553  (0.23567)
     | > loss_disc_real_4: 0.27576  (0.24164)
     | > loss_disc_real_5: 0.24706  (0.24589)
     | > loss_0: 2.67411  (2.73005)
     | > grad_norm_0: 32.41865  (24.42898)
     | > loss_gen: 1.85953  (1.90577)
     | > loss_kl: 1.32637  (1.34401)
     | > loss_feat: 2.76546  (2.88423)
     | > loss_mel: 18.46625  (18.52309)
     | > loss_duration: 1.47299  (1.46156)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 25.89060  (26.11867)
     | > grad_norm_1: 190.61325  (257.54974)
     | > current_lr_0: 0.00019 
     | > current_lr_1: 0.00019 
     | > step_time: 3.62530  (3.58396)
     | > loader_time: 0.00900  (0.00891)


[1m   --> STEP: 45/80 -- GLOBAL_STEP: 18125[0m
     | > loss_disc: 2.74716  (2.70770



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00801 [0m(-0.00200)
     | > avg_loss_disc:[92m 2.82725 [0m(-0.10938)
     | > avg_loss_disc_real_0:[92m 0.32301 [0m(-0.05957)
     | > avg_loss_disc_real_1:[91m 0.23762 [0m(+0.01986)
     | > avg_loss_disc_real_2:[92m 0.21073 [0m(-0.00145)
     | > avg_loss_disc_real_3:[91m 0.26251 [0m(+0.00062)
     | > avg_loss_disc_real_4:[92m 0.25510 [0m(-0.03743)
     | > avg_loss_disc_real_5:[92m 0.25485 [0m(-0.02381)
     | > avg_loss_0:[92m 2.82725 [0m(-0.10938)
     | > avg_loss_gen:[92m 1.94121 [0m(-0.00315)
     | > avg_loss_kl:[91m 1.63307 [0m(+0.33756)
     | > avg_loss_feat:[91m 2.72971 [0m(+0.53769)
     | > avg_loss_mel:[91m 19.49526 [0m(+2.01068)
     | > avg_loss_duration:[91m 2.05033 [0m(+0.06475)
     | > avg_loss_1:[91m 27.84958 [0m(+2.94753)


[4m[1m > EPOCH: 227/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-24 00:28:59) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 15/80 -- GLOBAL_STEP: 18175[0m
     | > loss_disc: 2.69216  (2.72138)
     | > loss_disc_real_0: 0.29600  (0.20652)
     | > loss_disc_real_1: 0.22067  (0.23226)
     | > loss_disc_real_2: 0.20569  (0.22437)
     | > loss_disc_real_3: 0.21798  (0.23760)
     | > loss_disc_real_4: 0.21701  (0.24465)
     | > loss_disc_real_5: 0.24062  (0.24581)
     | > loss_0: 2.69216  (2.72138)
     | > grad_norm_0: 116.50609  (55.99032)
     | > loss_gen: 1.95444  (1.90444)
     | > loss_kl: 1.41229  (1.25978)
     | > loss_feat: 2.93804  (2.86268)
     | > loss_mel: 18.13528  (18.45464)
     | > loss_duration: 1.47927  (1.46500)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 25.91932  (25.94653)
     | > grad_norm_1: 1200.53528  (520.97192)
     | > current_lr_0: 0.00019 
     | > current_lr_1: 0.00019 
     | > step_time: 3.62730  (3.57679)
     | > loader_time: 0.00900  (0.00854)


[1m   --> STEP: 40/80 -- GLOBAL_STEP: 18200[0m
     | > loss_disc: 2.69081  (2.704



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.01001 [0m(+0.00200)
     | > avg_loss_disc:[91m 2.86766 [0m(+0.04041)
     | > avg_loss_disc_real_0:[92m 0.19799 [0m(-0.12502)
     | > avg_loss_disc_real_1:[91m 0.29841 [0m(+0.06079)
     | > avg_loss_disc_real_2:[91m 0.27711 [0m(+0.06638)
     | > avg_loss_disc_real_3:[92m 0.25633 [0m(-0.00618)
     | > avg_loss_disc_real_4:[92m 0.25048 [0m(-0.00462)
     | > avg_loss_disc_real_5:[91m 0.26016 [0m(+0.00531)
     | > avg_loss_0:[91m 2.86766 [0m(+0.04041)
     | > avg_loss_gen:[92m 1.84611 [0m(-0.09509)
     | > avg_loss_kl:[92m 1.37284 [0m(-0.26023)
     | > avg_loss_feat:[92m 2.31239 [0m(-0.41732)
     | > avg_loss_mel:[92m 18.42904 [0m(-1.06622)
     | > avg_loss_duration:[92m 1.98741 [0m(-0.06292)
     | > avg_loss_1:[92m 25.94779 [0m(-1.90179)


[4m[1m > EPOCH: 228/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-24 00:34:35) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 10/80 -- GLOBAL_STEP: 18250[0m
     | > loss_disc: 2.77686  (2.70367)
     | > loss_disc_real_0: 0.30713  (0.21251)
     | > loss_disc_real_1: 0.25519  (0.22312)
     | > loss_disc_real_2: 0.28126  (0.23079)
     | > loss_disc_real_3: 0.24221  (0.23910)
     | > loss_disc_real_4: 0.24898  (0.23972)
     | > loss_disc_real_5: 0.26228  (0.24444)
     | > loss_0: 2.77686  (2.70367)
     | > grad_norm_0: 72.18379  (70.86509)
     | > loss_gen: 1.92581  (1.92730)
     | > loss_kl: 1.30684  (1.26308)
     | > loss_feat: 2.77701  (2.86895)
     | > loss_mel: 17.96102  (18.48755)
     | > loss_duration: 1.48340  (1.47191)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 25.45408  (26.01879)
     | > grad_norm_1: 506.16187  (521.97327)
     | > current_lr_0: 0.00019 
     | > current_lr_1: 0.00019 
     | > step_time: 3.57130  (3.56315)
     | > loader_time: 0.00900  (0.00841)


[1m   --> STEP: 35/80 -- GLOBAL_STEP: 18275[0m
     | > loss_disc: 2.70256  (2.72847



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time: 0.01001 [0m(+0.00000)
     | > avg_loss_disc:[91m 2.90807 [0m(+0.04041)
     | > avg_loss_disc_real_0:[92m 0.18370 [0m(-0.01428)
     | > avg_loss_disc_real_1:[92m 0.20484 [0m(-0.09357)
     | > avg_loss_disc_real_2:[91m 0.28957 [0m(+0.01247)
     | > avg_loss_disc_real_3:[92m 0.23381 [0m(-0.02252)
     | > avg_loss_disc_real_4:[91m 0.28401 [0m(+0.03353)
     | > avg_loss_disc_real_5:[92m 0.25369 [0m(-0.00647)
     | > avg_loss_0:[91m 2.90807 [0m(+0.04041)
     | > avg_loss_gen:[92m 1.65709 [0m(-0.18902)
     | > avg_loss_kl:[91m 1.42770 [0m(+0.05485)
     | > avg_loss_feat:[92m 1.95511 [0m(-0.35727)
     | > avg_loss_mel:[92m 17.54964 [0m(-0.87940)
     | > avg_loss_duration:[91m 2.03173 [0m(+0.04432)
     | > avg_loss_1:[92m 24.62128 [0m(-1.32651)


[4m[1m > EPOCH: 229/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-24 00:40:11) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 5/80 -- GLOBAL_STEP: 18325[0m
     | > loss_disc: 2.68822  (2.67727)
     | > loss_disc_real_0: 0.22090  (0.18962)
     | > loss_disc_real_1: 0.26716  (0.24244)
     | > loss_disc_real_2: 0.26618  (0.24373)
     | > loss_disc_real_3: 0.24794  (0.23705)
     | > loss_disc_real_4: 0.25344  (0.23556)
     | > loss_disc_real_5: 0.24574  (0.24582)
     | > loss_0: 2.68822  (2.67727)
     | > grad_norm_0: 66.55827  (44.41604)
     | > loss_gen: 1.80202  (1.95848)
     | > loss_kl: 1.54629  (1.21866)
     | > loss_feat: 2.94782  (2.96447)
     | > loss_mel: 18.79216  (18.70093)
     | > loss_duration: 1.47173  (1.47327)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 26.56001  (26.31579)
     | > grad_norm_1: 657.56580  (609.68378)
     | > current_lr_0: 0.00019 
     | > current_lr_1: 0.00019 
     | > step_time: 3.56230  (3.53793)
     | > loader_time: 0.00900  (0.00820)


[1m   --> STEP: 30/80 -- GLOBAL_STEP: 18350[0m
     | > loss_disc: 2.70330  (2.72861)



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00100)
     | > avg_loss_disc:[92m 2.76677 [0m(-0.14129)
     | > avg_loss_disc_real_0:[91m 0.33387 [0m(+0.15017)
     | > avg_loss_disc_real_1:[92m 0.19075 [0m(-0.01409)
     | > avg_loss_disc_real_2:[92m 0.19268 [0m(-0.09689)
     | > avg_loss_disc_real_3:[91m 0.24653 [0m(+0.01272)
     | > avg_loss_disc_real_4:[92m 0.23374 [0m(-0.05027)
     | > avg_loss_disc_real_5:[92m 0.20863 [0m(-0.04506)
     | > avg_loss_0:[92m 2.76677 [0m(-0.14129)
     | > avg_loss_gen:[91m 1.90766 [0m(+0.25057)
     | > avg_loss_kl:[91m 1.71758 [0m(+0.28988)
     | > avg_loss_feat:[91m 2.39912 [0m(+0.44401)
     | > avg_loss_mel:[91m 18.72949 [0m(+1.17985)
     | > avg_loss_duration:[91m 2.04269 [0m(+0.01095)
     | > avg_loss_1:[91m 26.79654 [0m(+2.17527)


[4m[1m > EPOCH: 230/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-24 00:45:47) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 0/80 -- GLOBAL_STEP: 18400[0m
     | > loss_disc: 2.85038  (2.85038)
     | > loss_disc_real_0: 0.35880  (0.35880)
     | > loss_disc_real_1: 0.19333  (0.19333)
     | > loss_disc_real_2: 0.20011  (0.20011)
     | > loss_disc_real_3: 0.25443  (0.25443)
     | > loss_disc_real_4: 0.22954  (0.22954)
     | > loss_disc_real_5: 0.21231  (0.21231)
     | > loss_0: 2.85038  (2.85038)
     | > grad_norm_0: 41.72511  (41.72511)
     | > loss_gen: 1.96151  (1.96151)
     | > loss_kl: 1.02410  (1.02410)
     | > loss_feat: 2.77996  (2.77996)
     | > loss_mel: 18.51449  (18.51449)
     | > loss_duration: 1.48175  (1.48175)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 25.76180  (25.76180)
     | > grad_norm_1: 535.27728  (535.27728)
     | > current_lr_0: 0.00019 
     | > current_lr_1: 0.00019 
     | > step_time: 3.58930  (3.58927)
     | > loader_time: 23.65420  (23.65425)


[1m   --> STEP: 25/80 -- GLOBAL_STEP: 18425[0m
     | > loss_disc: 2.70703  (2.7425



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00900 [0m(-0.00000)
     | > avg_loss_disc:[91m 2.80941 [0m(+0.04264)
     | > avg_loss_disc_real_0:[92m 0.18323 [0m(-0.15064)
     | > avg_loss_disc_real_1:[91m 0.22008 [0m(+0.02933)
     | > avg_loss_disc_real_2:[91m 0.22305 [0m(+0.03037)
     | > avg_loss_disc_real_3:[92m 0.21182 [0m(-0.03471)
     | > avg_loss_disc_real_4:[91m 0.23559 [0m(+0.00185)
     | > avg_loss_disc_real_5:[91m 0.23871 [0m(+0.03007)
     | > avg_loss_0:[91m 2.80941 [0m(+0.04264)
     | > avg_loss_gen:[92m 1.68997 [0m(-0.21769)
     | > avg_loss_kl:[92m 1.46107 [0m(-0.25650)
     | > avg_loss_feat:[91m 2.91622 [0m(+0.51710)
     | > avg_loss_mel:[91m 19.23245 [0m(+0.50295)
     | > avg_loss_duration:[92m 2.03197 [0m(-0.01072)
     | > avg_loss_1:[91m 27.33169 [0m(+0.53514)


[4m[1m > EPOCH: 231/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-24 00:51:23) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 20/80 -- GLOBAL_STEP: 18500[0m
     | > loss_disc: 2.67548  (2.70139)
     | > loss_disc_real_0: 0.16397  (0.19790)
     | > loss_disc_real_1: 0.24345  (0.22990)
     | > loss_disc_real_2: 0.16742  (0.22688)
     | > loss_disc_real_3: 0.22216  (0.23428)
     | > loss_disc_real_4: 0.25193  (0.24657)
     | > loss_disc_real_5: 0.25512  (0.24593)
     | > loss_0: 2.67548  (2.70139)
     | > grad_norm_0: 66.77646  (69.46798)
     | > loss_gen: 1.98780  (1.96341)
     | > loss_kl: 1.26885  (1.34329)
     | > loss_feat: 2.97098  (2.99813)
     | > loss_mel: 17.63737  (18.59503)
     | > loss_duration: 1.50106  (1.46474)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 25.36606  (26.36460)
     | > grad_norm_1: 666.02686  (614.37579)
     | > current_lr_0: 0.00019 
     | > current_lr_1: 0.00019 
     | > step_time: 3.63530  (3.58517)
     | > loader_time: 0.01000  (0.00881)


[1m   --> STEP: 45/80 -- GLOBAL_STEP: 18525[0m
     | > loss_disc: 2.77125  (2.71418



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[92m 2.69265 [0m(-0.11676)
     | > avg_loss_disc_real_0:[92m 0.12710 [0m(-0.05613)
     | > avg_loss_disc_real_1:[92m 0.21485 [0m(-0.00523)
     | > avg_loss_disc_real_2:[91m 0.27512 [0m(+0.05207)
     | > avg_loss_disc_real_3:[91m 0.25821 [0m(+0.04639)
     | > avg_loss_disc_real_4:[92m 0.23493 [0m(-0.00066)
     | > avg_loss_disc_real_5:[91m 0.26283 [0m(+0.02413)
     | > avg_loss_0:[92m 2.69265 [0m(-0.11676)
     | > avg_loss_gen:[91m 1.83062 [0m(+0.14065)
     | > avg_loss_kl:[92m 1.45659 [0m(-0.00448)
     | > avg_loss_feat:[92m 2.65001 [0m(-0.26621)
     | > avg_loss_mel:[92m 18.64543 [0m(-0.58702)
     | > avg_loss_duration:[92m 1.98035 [0m(-0.05162)
     | > avg_loss_1:[92m 26.56301 [0m(-0.76868)


[4m[1m > EPOCH: 232/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-24 00:56:59) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 15/80 -- GLOBAL_STEP: 18575[0m
     | > loss_disc: 2.78870  (2.72439)
     | > loss_disc_real_0: 0.18974  (0.21277)
     | > loss_disc_real_1: 0.25278  (0.22316)
     | > loss_disc_real_2: 0.22439  (0.22826)
     | > loss_disc_real_3: 0.25110  (0.23407)
     | > loss_disc_real_4: 0.23936  (0.24071)
     | > loss_disc_real_5: 0.23484  (0.24662)
     | > loss_0: 2.78870  (2.72439)
     | > grad_norm_0: 55.10986  (69.18109)
     | > loss_gen: 1.99575  (1.93392)
     | > loss_kl: 1.59911  (1.36126)
     | > loss_feat: 2.85352  (2.89298)
     | > loss_mel: 18.40900  (18.52712)
     | > loss_duration: 1.40483  (1.44820)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 26.26220  (26.16348)
     | > grad_norm_1: 484.19925  (602.07788)
     | > current_lr_0: 0.00019 
     | > current_lr_1: 0.00019 
     | > step_time: 3.62530  (3.57812)
     | > loader_time: 0.00900  (0.00874)


[1m   --> STEP: 40/80 -- GLOBAL_STEP: 18600[0m
     | > loss_disc: 2.66896  (2.71980



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[91m 0.00901 [0m(+0.00000)
     | > avg_loss_disc:[92m 2.64393 [0m(-0.04872)
     | > avg_loss_disc_real_0:[91m 0.18222 [0m(+0.05512)
     | > avg_loss_disc_real_1:[91m 0.27478 [0m(+0.05993)
     | > avg_loss_disc_real_2:[92m 0.23552 [0m(-0.03960)
     | > avg_loss_disc_real_3:[91m 0.27780 [0m(+0.01960)
     | > avg_loss_disc_real_4:[91m 0.26524 [0m(+0.03031)
     | > avg_loss_disc_real_5:[92m 0.25489 [0m(-0.00795)
     | > avg_loss_0:[92m 2.64393 [0m(-0.04872)
     | > avg_loss_gen:[91m 2.10566 [0m(+0.27504)
     | > avg_loss_kl:[91m 1.80176 [0m(+0.34517)
     | > avg_loss_feat:[91m 2.68259 [0m(+0.03258)
     | > avg_loss_mel:[92m 18.03886 [0m(-0.60657)
     | > avg_loss_duration:[91m 2.01306 [0m(+0.03271)
     | > avg_loss_1:[91m 26.64193 [0m(+0.07893)


[4m[1m > EPOCH: 233/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-24 01:02:36) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 10/80 -- GLOBAL_STEP: 18650[0m
     | > loss_disc: 2.67437  (2.69621)
     | > loss_disc_real_0: 0.21890  (0.19087)
     | > loss_disc_real_1: 0.21556  (0.22250)
     | > loss_disc_real_2: 0.18917  (0.22638)
     | > loss_disc_real_3: 0.27813  (0.23996)
     | > loss_disc_real_4: 0.23162  (0.23975)
     | > loss_disc_real_5: 0.24241  (0.24966)
     | > loss_0: 2.67437  (2.69621)
     | > grad_norm_0: 45.30297  (47.00296)
     | > loss_gen: 1.96418  (1.93877)
     | > loss_kl: 1.24874  (1.35418)
     | > loss_feat: 2.92401  (2.97949)
     | > loss_mel: 18.65106  (18.56644)
     | > loss_duration: 1.42147  (1.45633)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 26.20946  (26.29520)
     | > grad_norm_1: 517.35101  (483.70612)
     | > current_lr_0: 0.00019 
     | > current_lr_1: 0.00019 
     | > step_time: 3.60230  (3.56274)
     | > loader_time: 0.00900  (0.00871)


[1m   --> STEP: 35/80 -- GLOBAL_STEP: 18675[0m
     | > loss_disc: 2.77236  (2.71994



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00000)
     | > avg_loss_disc:[91m 2.65026 [0m(+0.00633)
     | > avg_loss_disc_real_0:[92m 0.11065 [0m(-0.07158)
     | > avg_loss_disc_real_1:[92m 0.20804 [0m(-0.06674)
     | > avg_loss_disc_real_2:[91m 0.25725 [0m(+0.02173)
     | > avg_loss_disc_real_3:[92m 0.20025 [0m(-0.07755)
     | > avg_loss_disc_real_4:[92m 0.25074 [0m(-0.01450)
     | > avg_loss_disc_real_5:[92m 0.25019 [0m(-0.00470)
     | > avg_loss_0:[91m 2.65026 [0m(+0.00633)
     | > avg_loss_gen:[92m 1.81285 [0m(-0.29281)
     | > avg_loss_kl:[92m 1.57697 [0m(-0.22479)
     | > avg_loss_feat:[91m 3.02818 [0m(+0.34559)
     | > avg_loss_mel:[91m 18.81209 [0m(+0.77323)
     | > avg_loss_duration:[91m 2.02575 [0m(+0.01269)
     | > avg_loss_1:[91m 27.25584 [0m(+0.61390)


[4m[1m > EPOCH: 234/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-24 01:08:11) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 5/80 -- GLOBAL_STEP: 18725[0m
     | > loss_disc: 2.69117  (2.68112)
     | > loss_disc_real_0: 0.19809  (0.21334)
     | > loss_disc_real_1: 0.21381  (0.22162)
     | > loss_disc_real_2: 0.16317  (0.21700)
     | > loss_disc_real_3: 0.25766  (0.24384)
     | > loss_disc_real_4: 0.25317  (0.23998)
     | > loss_disc_real_5: 0.25501  (0.24867)
     | > loss_0: 2.69117  (2.68112)
     | > grad_norm_0: 41.09038  (52.91206)
     | > loss_gen: 1.87524  (1.92238)
     | > loss_kl: 1.34350  (1.28926)
     | > loss_feat: 2.82058  (2.98611)
     | > loss_mel: 18.10690  (18.42022)
     | > loss_duration: 1.47239  (1.46035)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 25.61861  (26.07831)
     | > grad_norm_1: 583.46613  (711.34644)
     | > current_lr_0: 0.00019 
     | > current_lr_1: 0.00019 
     | > step_time: 3.57530  (3.54223)
     | > loader_time: 0.00800  (0.00800)


[1m   --> STEP: 30/80 -- GLOBAL_STEP: 18750[0m
     | > loss_disc: 2.72111  (2.72025)



> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 25
 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 36
 | > Avg text length: 49.64
 | 
 | > Max audio length: 146689.0
 | > Min audio length: 45363.0
 | > Avg audio length: 68484.4
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.
 | > Synthesizing test sentences.



  [1m--> EVAL PERFORMANCE[0m
     | > avg_loader_time:[92m 0.00901 [0m(-0.00000)
     | > avg_loss_disc:[91m 2.66726 [0m(+0.01700)
     | > avg_loss_disc_real_0:[91m 0.20332 [0m(+0.09268)
     | > avg_loss_disc_real_1:[91m 0.22393 [0m(+0.01589)
     | > avg_loss_disc_real_2:[91m 0.27154 [0m(+0.01429)
     | > avg_loss_disc_real_3:[91m 0.24568 [0m(+0.04543)
     | > avg_loss_disc_real_4:[91m 0.26410 [0m(+0.01336)
     | > avg_loss_disc_real_5:[91m 0.26303 [0m(+0.01284)
     | > avg_loss_0:[91m 2.66726 [0m(+0.01700)
     | > avg_loss_gen:[91m 2.00895 [0m(+0.19610)
     | > avg_loss_kl:[91m 1.65556 [0m(+0.07859)
     | > avg_loss_feat:[92m 2.61618 [0m(-0.41200)
     | > avg_loss_mel:[92m 17.88013 [0m(-0.93196)
     | > avg_loss_duration:[92m 2.00462 [0m(-0.02112)
     | > avg_loss_1:[92m 26.16546 [0m(-1.09038)


[4m[1m > EPOCH: 235/1000[0m
 --> ./output\vits_vctk-September-23-2022_02+46AM-3c624ce




> DataLoader initialization
| > Tokenizer:
	| > add_blank: True
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en
		| > phoneme backend: gruut
	| > 1 not found characters:
	| > ͡
| > Number of instances : 2555



[1m > TRAINING (2022-09-24 01:13:47) [0m


 | > Preprocessing samples
 | > Max text length: 81
 | > Min text length: 16
 | > Avg text length: 49.37025440313111
 | 
 | > Max audio length: 171808.0
 | > Min audio length: 21798.0
 | > Avg audio length: 67219.45401174168
 | > Num. instances discarded samples: 0
 | > Batch group size: 160.



[1m   --> STEP: 0/80 -- GLOBAL_STEP: 18800[0m
     | > loss_disc: 2.70787  (2.70787)
     | > loss_disc_real_0: 0.18303  (0.18303)
     | > loss_disc_real_1: 0.20511  (0.20511)
     | > loss_disc_real_2: 0.25571  (0.25571)
     | > loss_disc_real_3: 0.23978  (0.23978)
     | > loss_disc_real_4: 0.23067  (0.23067)
     | > loss_disc_real_5: 0.23985  (0.23985)
     | > loss_0: 2.70787  (2.70787)
     | > grad_norm_0: 159.85724  (159.85724)
     | > loss_gen: 1.83592  (1.83592)
     | > loss_kl: 1.19972  (1.19972)
     | > loss_feat: 3.00049  (3.00049)
     | > loss_mel: 18.62750  (18.62750)
     | > loss_duration: 1.45286  (1.45286)
     | > amp_scaler: 128.00000  (128.00000)
     | > loss_1: 26.11650  (26.11650)
     | > grad_norm_1: 910.97003  (910.97003)
     | > current_lr_0: 0.00019 
     | > current_lr_1: 0.00019 
     | > step_time: 3.54620  (3.54623)
     | > loader_time: 23.70830  (23.70827)


[1m   --> STEP: 25/80 -- GLOBAL_STEP: 18825[0m
     | > loss_disc: 2.64383  (2.70

In [13]:
%load_ext tensorboard

In [14]:
%tensorboard --logdir="./output\run-September-15-2022_04+45AM-910d77a"