# VITS Training

This notebook is designed to provide a guide on how to train VITS as part of the TTS pipeline. It contains the following sections

  1. VITS and NeMo - An introduction to the VITS model
  2. LJSpeech - How to train VITS on LJSpeech

# License

> Copyright 2023, NVIDIA CORPORATION & AFFILIATES. All Rights Reserved.
> 
> Licensed under the Apache License, Version 2.0 (the "License");
> you may not use this file except in compliance with the License.
> You may obtain a copy of the License at
> 
>     http://www.apache.org/licenses/LICENSE-2.0
> 
> Unless required by applicable law or agreed to in writing, software
> distributed under the License is distributed on an "AS IS" BASIS,
> WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> See the License for the specific language governing permissions and
> limitations under the License.

# VITS and NeMo

VITS is a neural network that converts text characters into an audio sample. For more details on the model, please refer to Nvidia's [VITS Model Card](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/tts_en_lj_vits), or the original [paper](https://arxiv.org/abs/2106.06103).

VITS like most NeMo models are defined as a LightningModule, allowing for easy training via PyTorch Lightning, and parameterized by a configuration, currently defined via a yaml file and loading using Hydra.

Let's take a look using NeMo's pretrained model and how to use it to generate spectrograms.

In [None]:
# Load the VITSModel
from nemo.collections.tts.models import VitsModel
from nemo.collections.tts.models.base import TextToWaveform

# Let's see what pretrained models are available
print(VitsModel.list_available_models())

In [None]:
# We can load the pre-trained model as follows
model = VitsModel.from_pretrained("tts_en_lj_vits")

In [None]:
# VITS is a TextToWaveform
assert isinstance(model, TextToWaveform)

TextToWaveform in NeMo have two helper functions:
   1. ```python
      parse(self, str_input: str),
      ``` 
      which takes an English string and produces a token tensor



   2. ```python
      convert_text_to_waveform(self, *, tokens),
      ```
      which takes the token tensor and generates an audio sample
Let's try it out

# Training

Now that we looked at the VITS model, let's see how to train a VITS Model



In [None]:
!wget https://raw.githubusercontent.com/NVIDIA/NeMo/main/examples/tts/vits.py
!(mkdir -p conf \
  && cd conf \
  && wget https://raw.githubusercontent.com/NVIDIA/NeMo/main/examples/tts/conf/vits.yaml \
  && cd ..)

Now that we have some sample data, we can try training VITS!

Note that the sample data is not enough data to fully train a VITS model. The following code uses a toy dataset to illustrate how the pipeline for training would work.

In [None]:
CUDA_VISIBLE_DEVICES=0 python vits.py model.sample_rate=16000 train_dataset=/content/drive/MyDrive/datasetTTS_malavoglia/train_manifest_vits.json validation_datasets=/content/drive/MyDrive/datasetTTS_malavoglia/val_manifest_vits.json trainer.strategy='ddp_find_unused_parameters_true' trainer.check_val_every_n_epoch=10 +init_from_plt_ckpt=/content/drive/MyDrive/VITS22.1797.ckpt

In [None]:
!(CUDA_VISIBLE_DEVICES=0 python vits.py \
    sample_rate=16000 \
    train_dataset=/home/giacomo/dataset/MAILABS/it_IT/by_book/female/lisa_caputo/malavoglia/train_manifest_vits.json \
    validation_datasets=/home/giacomo/dataset/MAILABS/it_IT/by_book/female/lisa_caputo/malavoglia/val_manifest_vits.json \
    trainer.strategy='ddp_find_unused_parameters_true' \
    trainer.check_val_every_n_epoch=5)
    #+init_from_nemo_model=./mymodel.nemo)
    #+init_from_pretrained_model="tts_en_lj_vits")

# Training Data

In order to train VITS, it is highly recommended to obtain high quality speech data with the following properties:
  - Sampling rate of 22050Hz or higher
  - Speech should contain a variety of speech phonemes
  - Audio split into segments of 1-10 seconds
  - Audio segments should not have silence at the beginning and end
  - Audio segments should not contain long silences inside

After obtaining the speech data and splitting into training, validation, and test sections, it is required to construct .json files to tell NeMo where to find these audio files.

The .json files should adhere to the format required by the `nemo.collections.tts.data.dataset.TTSDataset` class. For example, here is a sample .json file

```json
{"audio_filepath": "/path/to/audio1.wav", "text": "the transcription", "duration": 0.82}
{"audio_filepath": "/path/to/audio2.wav", "text": "the other transcription", "duration": 2.1}
...
```
Please note that the duration is in seconds.


## Evaluating VITS

Let's evaluate the quality of the VITS model.

VITS is end-to-end model, so we don't need any additional models to generate audios.

In [None]:
from matplotlib.pyplot import imshow
from matplotlib import pyplot as plt
import IPython.display as ipd
import numpy as np
import torch
import librosa
import soundfile as sf

target_sr = 16000

audio_path = "/home/giacomo/dataset/MAILABS/it_IT/by_book/female/lisa_caputo/malavoglia/wavs/imalavoglia_00_verga_f000006.wav"
text_raw = "Il movente dell’attività umana che produce la fiumana del progresso è preso qui alle sue sorgenti"

#audio_path = "/home/giacomo/dataset/MAILABS/it_IT/by_book/female/lisa_caputo/malavoglia/wavs/imalavoglia_00_verga_f000008.wav"
#text_raw = "Il meccanismo delle passioni che la determinano in quelle basse sfere è meno complicato	e potrà quindi osservarsi con maggior precisione."


audio_data, orig_sr = sf.read(audio_path)
if orig_sr != target_sr:
    audio_data = librosa.core.resample(audio_data, orig_sr=orig_sr, target_sr=target_sr)

# Let's double-check that everything matches up!
print(f"Duration (s): {len(audio_data)/target_sr}")
print("Transcript:", text_raw)
ipd.Audio(audio_data, rate=target_sr)

In [10]:
import torch
import nemo
from nemo.collections.tts.models import VitsModel

# Load your PyTorch checkpoint
#pytorch_checkpoint_path = "/home/giacomo/italian-tts/checkpoints/vits/vits.pth"
state_dict = torch.load("/home/giacomo/italian-tts/checkpoints/vits/vits.pth", map_location='cpu')
model = VitsModel.load_from_checkpoint("/home/giacomo/win-home/Downloads/VITS21.ckpt")
model.load_state_dict(state_dict)

# Create a NeMo ASR model to hold your weights
#model = VitsModel.load_from_checkpoint("/home/giacomo/win-home/Downloads/VITS--loss_gen_all=21.6121-epoch=279.ckpt").cpu().eval()

# Copy the weights from your PyTorch model to the NeMo model
#nemo_asr_model = nemo.utils.exp_utils.PyTorchRestore.from_pt_model(pytorch_model, model)

# Save the NeMo model checkpoint
#nemo_checkpoint_path = "your_model_checkpoint.nemo"
#nemo_asr_model.save_to(nemo_checkpoint_path)


[NeMo W 2023-10-12 19:16:20 modelPT:161] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.
    Train config : 
    dataset:
      _target_: nemo.collections.tts.data.dataset.TTSDataset
      manifest_filepath: /content/drive/MyDrive/datasetTTS_malavoglia/train_manifest_vits.json
      sample_rate: 16000
      sup_data_path: null
      sup_data_types: null
      n_fft: 1024
      win_length: 1024
      hop_length: 256
      window: hann
      n_mels: 80
      lowfreq: 0
      highfreq: null
      max_duration: null
      min_duration: 0.1
      ignore_file: null
      trim: false
      pitch_fmin: 65.40639132514966
      pitch_fmax: 2093.004522404789
    dataloader_params:
      num_workers: 8
      pin_memory: false
    batch_sampler:
      batch_size: 32
      boundaries:
      - 32
      - 300
      - 400
      - 500
      - 600
      - 700
      - 800
      - 900
  

[NeMo I 2023-10-12 19:16:20 features:289] PADDING: 1
[NeMo I 2023-10-12 19:16:20 features:297] STFT using exact pad


RuntimeError: Error(s) in loading state_dict for VitsModel:
	Missing key(s) in state_dict: "audio_to_melspec_processor.window", "audio_to_melspec_processor.fb", "net_g.enc_p.emb.weight", "net_g.enc_p.encoder.attn_layers.0.emb_rel_k", "net_g.enc_p.encoder.attn_layers.0.emb_rel_v", "net_g.enc_p.encoder.attn_layers.0.conv_q.weight", "net_g.enc_p.encoder.attn_layers.0.conv_q.bias", "net_g.enc_p.encoder.attn_layers.0.conv_k.weight", "net_g.enc_p.encoder.attn_layers.0.conv_k.bias", "net_g.enc_p.encoder.attn_layers.0.conv_v.weight", "net_g.enc_p.encoder.attn_layers.0.conv_v.bias", "net_g.enc_p.encoder.attn_layers.0.conv_o.weight", "net_g.enc_p.encoder.attn_layers.0.conv_o.bias", "net_g.enc_p.encoder.attn_layers.1.emb_rel_k", "net_g.enc_p.encoder.attn_layers.1.emb_rel_v", "net_g.enc_p.encoder.attn_layers.1.conv_q.weight", "net_g.enc_p.encoder.attn_layers.1.conv_q.bias", "net_g.enc_p.encoder.attn_layers.1.conv_k.weight", "net_g.enc_p.encoder.attn_layers.1.conv_k.bias", "net_g.enc_p.encoder.attn_layers.1.conv_v.weight", "net_g.enc_p.encoder.attn_layers.1.conv_v.bias", "net_g.enc_p.encoder.attn_layers.1.conv_o.weight", "net_g.enc_p.encoder.attn_layers.1.conv_o.bias", "net_g.enc_p.encoder.attn_layers.2.emb_rel_k", "net_g.enc_p.encoder.attn_layers.2.emb_rel_v", "net_g.enc_p.encoder.attn_layers.2.conv_q.weight", "net_g.enc_p.encoder.attn_layers.2.conv_q.bias", "net_g.enc_p.encoder.attn_layers.2.conv_k.weight", "net_g.enc_p.encoder.attn_layers.2.conv_k.bias", "net_g.enc_p.encoder.attn_layers.2.conv_v.weight", "net_g.enc_p.encoder.attn_layers.2.conv_v.bias", "net_g.enc_p.encoder.attn_layers.2.conv_o.weight", "net_g.enc_p.encoder.attn_layers.2.conv_o.bias", "net_g.enc_p.encoder.attn_layers.3.emb_rel_k", "net_g.enc_p.encoder.attn_layers.3.emb_rel_v", "net_g.enc_p.encoder.attn_layers.3.conv_q.weight", "net_g.enc_p.encoder.attn_layers.3.conv_q.bias", "net_g.enc_p.encoder.attn_layers.3.conv_k.weight", "net_g.enc_p.encoder.attn_layers.3.conv_k.bias", "net_g.enc_p.encoder.attn_layers.3.conv_v.weight", "net_g.enc_p.encoder.attn_layers.3.conv_v.bias", "net_g.enc_p.encoder.attn_layers.3.conv_o.weight", "net_g.enc_p.encoder.attn_layers.3.conv_o.bias", "net_g.enc_p.encoder.attn_layers.4.emb_rel_k", "net_g.enc_p.encoder.attn_layers.4.emb_rel_v", "net_g.enc_p.encoder.attn_layers.4.conv_q.weight", "net_g.enc_p.encoder.attn_layers.4.conv_q.bias", "net_g.enc_p.encoder.attn_layers.4.conv_k.weight", "net_g.enc_p.encoder.attn_layers.4.conv_k.bias", "net_g.enc_p.encoder.attn_layers.4.conv_v.weight", "net_g.enc_p.encoder.attn_layers.4.conv_v.bias", "net_g.enc_p.encoder.attn_layers.4.conv_o.weight", "net_g.enc_p.encoder.attn_layers.4.conv_o.bias", "net_g.enc_p.encoder.attn_layers.5.emb_rel_k", "net_g.enc_p.encoder.attn_layers.5.emb_rel_v", "net_g.enc_p.encoder.attn_layers.5.conv_q.weight", "net_g.enc_p.encoder.attn_layers.5.conv_q.bias", "net_g.enc_p.encoder.attn_layers.5.conv_k.weight", "net_g.enc_p.encoder.attn_layers.5.conv_k.bias", "net_g.enc_p.encoder.attn_layers.5.conv_v.weight", "net_g.enc_p.encoder.attn_layers.5.conv_v.bias", "net_g.enc_p.encoder.attn_layers.5.conv_o.weight", "net_g.enc_p.encoder.attn_layers.5.conv_o.bias", "net_g.enc_p.encoder.norm_layers_1.0.gamma", "net_g.enc_p.encoder.norm_layers_1.0.beta", "net_g.enc_p.encoder.norm_layers_1.1.gamma", "net_g.enc_p.encoder.norm_layers_1.1.beta", "net_g.enc_p.encoder.norm_layers_1.2.gamma", "net_g.enc_p.encoder.norm_layers_1.2.beta", "net_g.enc_p.encoder.norm_layers_1.3.gamma", "net_g.enc_p.encoder.norm_layers_1.3.beta", "net_g.enc_p.encoder.norm_layers_1.4.gamma", "net_g.enc_p.encoder.norm_layers_1.4.beta", "net_g.enc_p.encoder.norm_layers_1.5.gamma", "net_g.enc_p.encoder.norm_layers_1.5.beta", "net_g.enc_p.encoder.ffn_layers.0.conv_1.weight", "net_g.enc_p.encoder.ffn_layers.0.conv_1.bias", "net_g.enc_p.encoder.ffn_layers.0.conv_2.weight", "net_g.enc_p.encoder.ffn_layers.0.conv_2.bias", "net_g.enc_p.encoder.ffn_layers.1.conv_1.weight", "net_g.enc_p.encoder.ffn_layers.1.conv_1.bias", "net_g.enc_p.encoder.ffn_layers.1.conv_2.weight", "net_g.enc_p.encoder.ffn_layers.1.conv_2.bias", "net_g.enc_p.encoder.ffn_layers.2.conv_1.weight", "net_g.enc_p.encoder.ffn_layers.2.conv_1.bias", "net_g.enc_p.encoder.ffn_layers.2.conv_2.weight", "net_g.enc_p.encoder.ffn_layers.2.conv_2.bias", "net_g.enc_p.encoder.ffn_layers.3.conv_1.weight", "net_g.enc_p.encoder.ffn_layers.3.conv_1.bias", "net_g.enc_p.encoder.ffn_layers.3.conv_2.weight", "net_g.enc_p.encoder.ffn_layers.3.conv_2.bias", "net_g.enc_p.encoder.ffn_layers.4.conv_1.weight", "net_g.enc_p.encoder.ffn_layers.4.conv_1.bias", "net_g.enc_p.encoder.ffn_layers.4.conv_2.weight", "net_g.enc_p.encoder.ffn_layers.4.conv_2.bias", "net_g.enc_p.encoder.ffn_layers.5.conv_1.weight", "net_g.enc_p.encoder.ffn_layers.5.conv_1.bias", "net_g.enc_p.encoder.ffn_layers.5.conv_2.weight", "net_g.enc_p.encoder.ffn_layers.5.conv_2.bias", "net_g.enc_p.encoder.norm_layers_2.0.gamma", "net_g.enc_p.encoder.norm_layers_2.0.beta", "net_g.enc_p.encoder.norm_layers_2.1.gamma", "net_g.enc_p.encoder.norm_layers_2.1.beta", "net_g.enc_p.encoder.norm_layers_2.2.gamma", "net_g.enc_p.encoder.norm_layers_2.2.beta", "net_g.enc_p.encoder.norm_layers_2.3.gamma", "net_g.enc_p.encoder.norm_layers_2.3.beta", "net_g.enc_p.encoder.norm_layers_2.4.gamma", "net_g.enc_p.encoder.norm_layers_2.4.beta", "net_g.enc_p.encoder.norm_layers_2.5.gamma", "net_g.enc_p.encoder.norm_layers_2.5.beta", "net_g.enc_p.proj.weight", "net_g.enc_p.proj.bias", "net_g.dec.conv_pre.weight", "net_g.dec.conv_pre.bias", "net_g.dec.ups.0.bias", "net_g.dec.ups.0.weight_g", "net_g.dec.ups.0.weight_v", "net_g.dec.ups.1.bias", "net_g.dec.ups.1.weight_g", "net_g.dec.ups.1.weight_v", "net_g.dec.ups.2.bias", "net_g.dec.ups.2.weight_g", "net_g.dec.ups.2.weight_v", "net_g.dec.ups.3.bias", "net_g.dec.ups.3.weight_g", "net_g.dec.ups.3.weight_v", "net_g.dec.resblocks.0.convs1.0.bias", "net_g.dec.resblocks.0.convs1.0.weight_g", "net_g.dec.resblocks.0.convs1.0.weight_v", "net_g.dec.resblocks.0.convs1.1.bias", "net_g.dec.resblocks.0.convs1.1.weight_g", "net_g.dec.resblocks.0.convs1.1.weight_v", "net_g.dec.resblocks.0.convs1.2.bias", "net_g.dec.resblocks.0.convs1.2.weight_g", "net_g.dec.resblocks.0.convs1.2.weight_v", "net_g.dec.resblocks.0.convs2.0.bias", "net_g.dec.resblocks.0.convs2.0.weight_g", "net_g.dec.resblocks.0.convs2.0.weight_v", "net_g.dec.resblocks.0.convs2.1.bias", "net_g.dec.resblocks.0.convs2.1.weight_g", "net_g.dec.resblocks.0.convs2.1.weight_v", "net_g.dec.resblocks.0.convs2.2.bias", "net_g.dec.resblocks.0.convs2.2.weight_g", "net_g.dec.resblocks.0.convs2.2.weight_v", "net_g.dec.resblocks.1.convs1.0.bias", "net_g.dec.resblocks.1.convs1.0.weight_g", "net_g.dec.resblocks.1.convs1.0.weight_v", "net_g.dec.resblocks.1.convs1.1.bias", "net_g.dec.resblocks.1.convs1.1.weight_g", "net_g.dec.resblocks.1.convs1.1.weight_v", "net_g.dec.resblocks.1.convs1.2.bias", "net_g.dec.resblocks.1.convs1.2.weight_g", "net_g.dec.resblocks.1.convs1.2.weight_v", "net_g.dec.resblocks.1.convs2.0.bias", "net_g.dec.resblocks.1.convs2.0.weight_g", "net_g.dec.resblocks.1.convs2.0.weight_v", "net_g.dec.resblocks.1.convs2.1.bias", "net_g.dec.resblocks.1.convs2.1.weight_g", "net_g.dec.resblocks.1.convs2.1.weight_v", "net_g.dec.resblocks.1.convs2.2.bias", "net_g.dec.resblocks.1.convs2.2.weight_g", "net_g.dec.resblocks.1.convs2.2.weight_v", "net_g.dec.resblocks.2.convs1.0.bias", "net_g.dec.resblocks.2.convs1.0.weight_g", "net_g.dec.resblocks.2.convs1.0.weight_v", "net_g.dec.resblocks.2.convs1.1.bias", "net_g.dec.resblocks.2.convs1.1.weight_g", "net_g.dec.resblocks.2.convs1.1.weight_v", "net_g.dec.resblocks.2.convs1.2.bias", "net_g.dec.resblocks.2.convs1.2.weight_g", "net_g.dec.resblocks.2.convs1.2.weight_v", "net_g.dec.resblocks.2.convs2.0.bias", "net_g.dec.resblocks.2.convs2.0.weight_g", "net_g.dec.resblocks.2.convs2.0.weight_v", "net_g.dec.resblocks.2.convs2.1.bias", "net_g.dec.resblocks.2.convs2.1.weight_g", "net_g.dec.resblocks.2.convs2.1.weight_v", "net_g.dec.resblocks.2.convs2.2.bias", "net_g.dec.resblocks.2.convs2.2.weight_g", "net_g.dec.resblocks.2.convs2.2.weight_v", "net_g.dec.resblocks.3.convs1.0.bias", "net_g.dec.resblocks.3.convs1.0.weight_g", "net_g.dec.resblocks.3.convs1.0.weight_v", "net_g.dec.resblocks.3.convs1.1.bias", "net_g.dec.resblocks.3.convs1.1.weight_g", "net_g.dec.resblocks.3.convs1.1.weight_v", "net_g.dec.resblocks.3.convs1.2.bias", "net_g.dec.resblocks.3.convs1.2.weight_g", "net_g.dec.resblocks.3.convs1.2.weight_v", "net_g.dec.resblocks.3.convs2.0.bias", "net_g.dec.resblocks.3.convs2.0.weight_g", "net_g.dec.resblocks.3.convs2.0.weight_v", "net_g.dec.resblocks.3.convs2.1.bias", "net_g.dec.resblocks.3.convs2.1.weight_g", "net_g.dec.resblocks.3.convs2.1.weight_v", "net_g.dec.resblocks.3.convs2.2.bias", "net_g.dec.resblocks.3.convs2.2.weight_g", "net_g.dec.resblocks.3.convs2.2.weight_v", "net_g.dec.resblocks.4.convs1.0.bias", "net_g.dec.resblocks.4.convs1.0.weight_g", "net_g.dec.resblocks.4.convs1.0.weight_v", "net_g.dec.resblocks.4.convs1.1.bias", "net_g.dec.resblocks.4.convs1.1.weight_g", "net_g.dec.resblocks.4.convs1.1.weight_v", "net_g.dec.resblocks.4.convs1.2.bias", "net_g.dec.resblocks.4.convs1.2.weight_g", "net_g.dec.resblocks.4.convs1.2.weight_v", "net_g.dec.resblocks.4.convs2.0.bias", "net_g.dec.resblocks.4.convs2.0.weight_g", "net_g.dec.resblocks.4.convs2.0.weight_v", "net_g.dec.resblocks.4.convs2.1.bias", "net_g.dec.resblocks.4.convs2.1.weight_g", "net_g.dec.resblocks.4.convs2.1.weight_v", "net_g.dec.resblocks.4.convs2.2.bias", "net_g.dec.resblocks.4.convs2.2.weight_g", "net_g.dec.resblocks.4.convs2.2.weight_v", "net_g.dec.resblocks.5.convs1.0.bias", "net_g.dec.resblocks.5.convs1.0.weight_g", "net_g.dec.resblocks.5.convs1.0.weight_v", "net_g.dec.resblocks.5.convs1.1.bias", "net_g.dec.resblocks.5.convs1.1.weight_g", "net_g.dec.resblocks.5.convs1.1.weight_v", "net_g.dec.resblocks.5.convs1.2.bias", "net_g.dec.resblocks.5.convs1.2.weight_g", "net_g.dec.resblocks.5.convs1.2.weight_v", "net_g.dec.resblocks.5.convs2.0.bias", "net_g.dec.resblocks.5.convs2.0.weight_g", "net_g.dec.resblocks.5.convs2.0.weight_v", "net_g.dec.resblocks.5.convs2.1.bias", "net_g.dec.resblocks.5.convs2.1.weight_g", "net_g.dec.resblocks.5.convs2.1.weight_v", "net_g.dec.resblocks.5.convs2.2.bias", "net_g.dec.resblocks.5.convs2.2.weight_g", "net_g.dec.resblocks.5.convs2.2.weight_v", "net_g.dec.resblocks.6.convs1.0.bias", "net_g.dec.resblocks.6.convs1.0.weight_g", "net_g.dec.resblocks.6.convs1.0.weight_v", "net_g.dec.resblocks.6.convs1.1.bias", "net_g.dec.resblocks.6.convs1.1.weight_g", "net_g.dec.resblocks.6.convs1.1.weight_v", "net_g.dec.resblocks.6.convs1.2.bias", "net_g.dec.resblocks.6.convs1.2.weight_g", "net_g.dec.resblocks.6.convs1.2.weight_v", "net_g.dec.resblocks.6.convs2.0.bias", "net_g.dec.resblocks.6.convs2.0.weight_g", "net_g.dec.resblocks.6.convs2.0.weight_v", "net_g.dec.resblocks.6.convs2.1.bias", "net_g.dec.resblocks.6.convs2.1.weight_g", "net_g.dec.resblocks.6.convs2.1.weight_v", "net_g.dec.resblocks.6.convs2.2.bias", "net_g.dec.resblocks.6.convs2.2.weight_g", "net_g.dec.resblocks.6.convs2.2.weight_v", "net_g.dec.resblocks.7.convs1.0.bias", "net_g.dec.resblocks.7.convs1.0.weight_g", "net_g.dec.resblocks.7.convs1.0.weight_v", "net_g.dec.resblocks.7.convs1.1.bias", "net_g.dec.resblocks.7.convs1.1.weight_g", "net_g.dec.resblocks.7.convs1.1.weight_v", "net_g.dec.resblocks.7.convs1.2.bias", "net_g.dec.resblocks.7.convs1.2.weight_g", "net_g.dec.resblocks.7.convs1.2.weight_v", "net_g.dec.resblocks.7.convs2.0.bias", "net_g.dec.resblocks.7.convs2.0.weight_g", "net_g.dec.resblocks.7.convs2.0.weight_v", "net_g.dec.resblocks.7.convs2.1.bias", "net_g.dec.resblocks.7.convs2.1.weight_g", "net_g.dec.resblocks.7.convs2.1.weight_v", "net_g.dec.resblocks.7.convs2.2.bias", "net_g.dec.resblocks.7.convs2.2.weight_g", "net_g.dec.resblocks.7.convs2.2.weight_v", "net_g.dec.resblocks.8.convs1.0.bias", "net_g.dec.resblocks.8.convs1.0.weight_g", "net_g.dec.resblocks.8.convs1.0.weight_v", "net_g.dec.resblocks.8.convs1.1.bias", "net_g.dec.resblocks.8.convs1.1.weight_g", "net_g.dec.resblocks.8.convs1.1.weight_v", "net_g.dec.resblocks.8.convs1.2.bias", "net_g.dec.resblocks.8.convs1.2.weight_g", "net_g.dec.resblocks.8.convs1.2.weight_v", "net_g.dec.resblocks.8.convs2.0.bias", "net_g.dec.resblocks.8.convs2.0.weight_g", "net_g.dec.resblocks.8.convs2.0.weight_v", "net_g.dec.resblocks.8.convs2.1.bias", "net_g.dec.resblocks.8.convs2.1.weight_g", "net_g.dec.resblocks.8.convs2.1.weight_v", "net_g.dec.resblocks.8.convs2.2.bias", "net_g.dec.resblocks.8.convs2.2.weight_g", "net_g.dec.resblocks.8.convs2.2.weight_v", "net_g.dec.resblocks.9.convs1.0.bias", "net_g.dec.resblocks.9.convs1.0.weight_g", "net_g.dec.resblocks.9.convs1.0.weight_v", "net_g.dec.resblocks.9.convs1.1.bias", "net_g.dec.resblocks.9.convs1.1.weight_g", "net_g.dec.resblocks.9.convs1.1.weight_v", "net_g.dec.resblocks.9.convs1.2.bias", "net_g.dec.resblocks.9.convs1.2.weight_g", "net_g.dec.resblocks.9.convs1.2.weight_v", "net_g.dec.resblocks.9.convs2.0.bias", "net_g.dec.resblocks.9.convs2.0.weight_g", "net_g.dec.resblocks.9.convs2.0.weight_v", "net_g.dec.resblocks.9.convs2.1.bias", "net_g.dec.resblocks.9.convs2.1.weight_g", "net_g.dec.resblocks.9.convs2.1.weight_v", "net_g.dec.resblocks.9.convs2.2.bias", "net_g.dec.resblocks.9.convs2.2.weight_g", "net_g.dec.resblocks.9.convs2.2.weight_v", "net_g.dec.resblocks.10.convs1.0.bias", "net_g.dec.resblocks.10.convs1.0.weight_g", "net_g.dec.resblocks.10.convs1.0.weight_v", "net_g.dec.resblocks.10.convs1.1.bias", "net_g.dec.resblocks.10.convs1.1.weight_g", "net_g.dec.resblocks.10.convs1.1.weight_v", "net_g.dec.resblocks.10.convs1.2.bias", "net_g.dec.resblocks.10.convs1.2.weight_g", "net_g.dec.resblocks.10.convs1.2.weight_v", "net_g.dec.resblocks.10.convs2.0.bias", "net_g.dec.resblocks.10.convs2.0.weight_g", "net_g.dec.resblocks.10.convs2.0.weight_v", "net_g.dec.resblocks.10.convs2.1.bias", "net_g.dec.resblocks.10.convs2.1.weight_g", "net_g.dec.resblocks.10.convs2.1.weight_v", "net_g.dec.resblocks.10.convs2.2.bias", "net_g.dec.resblocks.10.convs2.2.weight_g", "net_g.dec.resblocks.10.convs2.2.weight_v", "net_g.dec.resblocks.11.convs1.0.bias", "net_g.dec.resblocks.11.convs1.0.weight_g", "net_g.dec.resblocks.11.convs1.0.weight_v", "net_g.dec.resblocks.11.convs1.1.bias", "net_g.dec.resblocks.11.convs1.1.weight_g", "net_g.dec.resblocks.11.convs1.1.weight_v", "net_g.dec.resblocks.11.convs1.2.bias", "net_g.dec.resblocks.11.convs1.2.weight_g", "net_g.dec.resblocks.11.convs1.2.weight_v", "net_g.dec.resblocks.11.convs2.0.bias", "net_g.dec.resblocks.11.convs2.0.weight_g", "net_g.dec.resblocks.11.convs2.0.weight_v", "net_g.dec.resblocks.11.convs2.1.bias", "net_g.dec.resblocks.11.convs2.1.weight_g", "net_g.dec.resblocks.11.convs2.1.weight_v", "net_g.dec.resblocks.11.convs2.2.bias", "net_g.dec.resblocks.11.convs2.2.weight_g", "net_g.dec.resblocks.11.convs2.2.weight_v", "net_g.dec.conv_post.weight", "net_g.dec.cond.weight", "net_g.dec.cond.bias", "net_g.enc_q.pre.weight", "net_g.enc_q.pre.bias", "net_g.enc_q.enc.in_layers.0.bias", "net_g.enc_q.enc.in_layers.0.weight_g", "net_g.enc_q.enc.in_layers.0.weight_v", "net_g.enc_q.enc.in_layers.1.bias", "net_g.enc_q.enc.in_layers.1.weight_g", "net_g.enc_q.enc.in_layers.1.weight_v", "net_g.enc_q.enc.in_layers.2.bias", "net_g.enc_q.enc.in_layers.2.weight_g", "net_g.enc_q.enc.in_layers.2.weight_v", "net_g.enc_q.enc.in_layers.3.bias", "net_g.enc_q.enc.in_layers.3.weight_g", "net_g.enc_q.enc.in_layers.3.weight_v", "net_g.enc_q.enc.in_layers.4.bias", "net_g.enc_q.enc.in_layers.4.weight_g", "net_g.enc_q.enc.in_layers.4.weight_v", "net_g.enc_q.enc.in_layers.5.bias", "net_g.enc_q.enc.in_layers.5.weight_g", "net_g.enc_q.enc.in_layers.5.weight_v", "net_g.enc_q.enc.in_layers.6.bias", "net_g.enc_q.enc.in_layers.6.weight_g", "net_g.enc_q.enc.in_layers.6.weight_v", "net_g.enc_q.enc.in_layers.7.bias", "net_g.enc_q.enc.in_layers.7.weight_g", "net_g.enc_q.enc.in_layers.7.weight_v", "net_g.enc_q.enc.in_layers.8.bias", "net_g.enc_q.enc.in_layers.8.weight_g", "net_g.enc_q.enc.in_layers.8.weight_v", "net_g.enc_q.enc.in_layers.9.bias", "net_g.enc_q.enc.in_layers.9.weight_g", "net_g.enc_q.enc.in_layers.9.weight_v", "net_g.enc_q.enc.in_layers.10.bias", "net_g.enc_q.enc.in_layers.10.weight_g", "net_g.enc_q.enc.in_layers.10.weight_v", "net_g.enc_q.enc.in_layers.11.bias", "net_g.enc_q.enc.in_layers.11.weight_g", "net_g.enc_q.enc.in_layers.11.weight_v", "net_g.enc_q.enc.in_layers.12.bias", "net_g.enc_q.enc.in_layers.12.weight_g", "net_g.enc_q.enc.in_layers.12.weight_v", "net_g.enc_q.enc.in_layers.13.bias", "net_g.enc_q.enc.in_layers.13.weight_g", "net_g.enc_q.enc.in_layers.13.weight_v", "net_g.enc_q.enc.in_layers.14.bias", "net_g.enc_q.enc.in_layers.14.weight_g", "net_g.enc_q.enc.in_layers.14.weight_v", "net_g.enc_q.enc.in_layers.15.bias", "net_g.enc_q.enc.in_layers.15.weight_g", "net_g.enc_q.enc.in_layers.15.weight_v", "net_g.enc_q.enc.res_skip_layers.0.bias", "net_g.enc_q.enc.res_skip_layers.0.weight_g", "net_g.enc_q.enc.res_skip_layers.0.weight_v", "net_g.enc_q.enc.res_skip_layers.1.bias", "net_g.enc_q.enc.res_skip_layers.1.weight_g", "net_g.enc_q.enc.res_skip_layers.1.weight_v", "net_g.enc_q.enc.res_skip_layers.2.bias", "net_g.enc_q.enc.res_skip_layers.2.weight_g", "net_g.enc_q.enc.res_skip_layers.2.weight_v", "net_g.enc_q.enc.res_skip_layers.3.bias", "net_g.enc_q.enc.res_skip_layers.3.weight_g", "net_g.enc_q.enc.res_skip_layers.3.weight_v", "net_g.enc_q.enc.res_skip_layers.4.bias", "net_g.enc_q.enc.res_skip_layers.4.weight_g", "net_g.enc_q.enc.res_skip_layers.4.weight_v", "net_g.enc_q.enc.res_skip_layers.5.bias", "net_g.enc_q.enc.res_skip_layers.5.weight_g", "net_g.enc_q.enc.res_skip_layers.5.weight_v", "net_g.enc_q.enc.res_skip_layers.6.bias", "net_g.enc_q.enc.res_skip_layers.6.weight_g", "net_g.enc_q.enc.res_skip_layers.6.weight_v", "net_g.enc_q.enc.res_skip_layers.7.bias", "net_g.enc_q.enc.res_skip_layers.7.weight_g", "net_g.enc_q.enc.res_skip_layers.7.weight_v", "net_g.enc_q.enc.res_skip_layers.8.bias", "net_g.enc_q.enc.res_skip_layers.8.weight_g", "net_g.enc_q.enc.res_skip_layers.8.weight_v", "net_g.enc_q.enc.res_skip_layers.9.bias", "net_g.enc_q.enc.res_skip_layers.9.weight_g", "net_g.enc_q.enc.res_skip_layers.9.weight_v", "net_g.enc_q.enc.res_skip_layers.10.bias", "net_g.enc_q.enc.res_skip_layers.10.weight_g", "net_g.enc_q.enc.res_skip_layers.10.weight_v", "net_g.enc_q.enc.res_skip_layers.11.bias", "net_g.enc_q.enc.res_skip_layers.11.weight_g", "net_g.enc_q.enc.res_skip_layers.11.weight_v", "net_g.enc_q.enc.res_skip_layers.12.bias", "net_g.enc_q.enc.res_skip_layers.12.weight_g", "net_g.enc_q.enc.res_skip_layers.12.weight_v", "net_g.enc_q.enc.res_skip_layers.13.bias", "net_g.enc_q.enc.res_skip_layers.13.weight_g", "net_g.enc_q.enc.res_skip_layers.13.weight_v", "net_g.enc_q.enc.res_skip_layers.14.bias", "net_g.enc_q.enc.res_skip_layers.14.weight_g", "net_g.enc_q.enc.res_skip_layers.14.weight_v", "net_g.enc_q.enc.res_skip_layers.15.bias", "net_g.enc_q.enc.res_skip_layers.15.weight_g", "net_g.enc_q.enc.res_skip_layers.15.weight_v", "net_g.enc_q.enc.cond_layer.bias", "net_g.enc_q.enc.cond_layer.weight_g", "net_g.enc_q.enc.cond_layer.weight_v", "net_g.enc_q.proj.weight", "net_g.enc_q.proj.bias", "net_g.flow.flows.0.pre.weight", "net_g.flow.flows.0.pre.bias", "net_g.flow.flows.0.enc.in_layers.0.bias", "net_g.flow.flows.0.enc.in_layers.0.weight_g", "net_g.flow.flows.0.enc.in_layers.0.weight_v", "net_g.flow.flows.0.enc.in_layers.1.bias", "net_g.flow.flows.0.enc.in_layers.1.weight_g", "net_g.flow.flows.0.enc.in_layers.1.weight_v", "net_g.flow.flows.0.enc.in_layers.2.bias", "net_g.flow.flows.0.enc.in_layers.2.weight_g", "net_g.flow.flows.0.enc.in_layers.2.weight_v", "net_g.flow.flows.0.enc.in_layers.3.bias", "net_g.flow.flows.0.enc.in_layers.3.weight_g", "net_g.flow.flows.0.enc.in_layers.3.weight_v", "net_g.flow.flows.0.enc.res_skip_layers.0.bias", "net_g.flow.flows.0.enc.res_skip_layers.0.weight_g", "net_g.flow.flows.0.enc.res_skip_layers.0.weight_v", "net_g.flow.flows.0.enc.res_skip_layers.1.bias", "net_g.flow.flows.0.enc.res_skip_layers.1.weight_g", "net_g.flow.flows.0.enc.res_skip_layers.1.weight_v", "net_g.flow.flows.0.enc.res_skip_layers.2.bias", "net_g.flow.flows.0.enc.res_skip_layers.2.weight_g", "net_g.flow.flows.0.enc.res_skip_layers.2.weight_v", "net_g.flow.flows.0.enc.res_skip_layers.3.bias", "net_g.flow.flows.0.enc.res_skip_layers.3.weight_g", "net_g.flow.flows.0.enc.res_skip_layers.3.weight_v", "net_g.flow.flows.0.enc.cond_layer.bias", "net_g.flow.flows.0.enc.cond_layer.weight_g", "net_g.flow.flows.0.enc.cond_layer.weight_v", "net_g.flow.flows.0.post.weight", "net_g.flow.flows.0.post.bias", "net_g.flow.flows.2.pre.weight", "net_g.flow.flows.2.pre.bias", "net_g.flow.flows.2.enc.in_layers.0.bias", "net_g.flow.flows.2.enc.in_layers.0.weight_g", "net_g.flow.flows.2.enc.in_layers.0.weight_v", "net_g.flow.flows.2.enc.in_layers.1.bias", "net_g.flow.flows.2.enc.in_layers.1.weight_g", "net_g.flow.flows.2.enc.in_layers.1.weight_v", "net_g.flow.flows.2.enc.in_layers.2.bias", "net_g.flow.flows.2.enc.in_layers.2.weight_g", "net_g.flow.flows.2.enc.in_layers.2.weight_v", "net_g.flow.flows.2.enc.in_layers.3.bias", "net_g.flow.flows.2.enc.in_layers.3.weight_g", "net_g.flow.flows.2.enc.in_layers.3.weight_v", "net_g.flow.flows.2.enc.res_skip_layers.0.bias", "net_g.flow.flows.2.enc.res_skip_layers.0.weight_g", "net_g.flow.flows.2.enc.res_skip_layers.0.weight_v", "net_g.flow.flows.2.enc.res_skip_layers.1.bias", "net_g.flow.flows.2.enc.res_skip_layers.1.weight_g", "net_g.flow.flows.2.enc.res_skip_layers.1.weight_v", "net_g.flow.flows.2.enc.res_skip_layers.2.bias", "net_g.flow.flows.2.enc.res_skip_layers.2.weight_g", "net_g.flow.flows.2.enc.res_skip_layers.2.weight_v", "net_g.flow.flows.2.enc.res_skip_layers.3.bias", "net_g.flow.flows.2.enc.res_skip_layers.3.weight_g", "net_g.flow.flows.2.enc.res_skip_layers.3.weight_v", "net_g.flow.flows.2.enc.cond_layer.bias", "net_g.flow.flows.2.enc.cond_layer.weight_g", "net_g.flow.flows.2.enc.cond_layer.weight_v", "net_g.flow.flows.2.post.weight", "net_g.flow.flows.2.post.bias", "net_g.flow.flows.4.pre.weight", "net_g.flow.flows.4.pre.bias", "net_g.flow.flows.4.enc.in_layers.0.bias", "net_g.flow.flows.4.enc.in_layers.0.weight_g", "net_g.flow.flows.4.enc.in_layers.0.weight_v", "net_g.flow.flows.4.enc.in_layers.1.bias", "net_g.flow.flows.4.enc.in_layers.1.weight_g", "net_g.flow.flows.4.enc.in_layers.1.weight_v", "net_g.flow.flows.4.enc.in_layers.2.bias", "net_g.flow.flows.4.enc.in_layers.2.weight_g", "net_g.flow.flows.4.enc.in_layers.2.weight_v", "net_g.flow.flows.4.enc.in_layers.3.bias", "net_g.flow.flows.4.enc.in_layers.3.weight_g", "net_g.flow.flows.4.enc.in_layers.3.weight_v", "net_g.flow.flows.4.enc.res_skip_layers.0.bias", "net_g.flow.flows.4.enc.res_skip_layers.0.weight_g", "net_g.flow.flows.4.enc.res_skip_layers.0.weight_v", "net_g.flow.flows.4.enc.res_skip_layers.1.bias", "net_g.flow.flows.4.enc.res_skip_layers.1.weight_g", "net_g.flow.flows.4.enc.res_skip_layers.1.weight_v", "net_g.flow.flows.4.enc.res_skip_layers.2.bias", "net_g.flow.flows.4.enc.res_skip_layers.2.weight_g", "net_g.flow.flows.4.enc.res_skip_layers.2.weight_v", "net_g.flow.flows.4.enc.res_skip_layers.3.bias", "net_g.flow.flows.4.enc.res_skip_layers.3.weight_g", "net_g.flow.flows.4.enc.res_skip_layers.3.weight_v", "net_g.flow.flows.4.enc.cond_layer.bias", "net_g.flow.flows.4.enc.cond_layer.weight_g", "net_g.flow.flows.4.enc.cond_layer.weight_v", "net_g.flow.flows.4.post.weight", "net_g.flow.flows.4.post.bias", "net_g.flow.flows.6.pre.weight", "net_g.flow.flows.6.pre.bias", "net_g.flow.flows.6.enc.in_layers.0.bias", "net_g.flow.flows.6.enc.in_layers.0.weight_g", "net_g.flow.flows.6.enc.in_layers.0.weight_v", "net_g.flow.flows.6.enc.in_layers.1.bias", "net_g.flow.flows.6.enc.in_layers.1.weight_g", "net_g.flow.flows.6.enc.in_layers.1.weight_v", "net_g.flow.flows.6.enc.in_layers.2.bias", "net_g.flow.flows.6.enc.in_layers.2.weight_g", "net_g.flow.flows.6.enc.in_layers.2.weight_v", "net_g.flow.flows.6.enc.in_layers.3.bias", "net_g.flow.flows.6.enc.in_layers.3.weight_g", "net_g.flow.flows.6.enc.in_layers.3.weight_v", "net_g.flow.flows.6.enc.res_skip_layers.0.bias", "net_g.flow.flows.6.enc.res_skip_layers.0.weight_g", "net_g.flow.flows.6.enc.res_skip_layers.0.weight_v", "net_g.flow.flows.6.enc.res_skip_layers.1.bias", "net_g.flow.flows.6.enc.res_skip_layers.1.weight_g", "net_g.flow.flows.6.enc.res_skip_layers.1.weight_v", "net_g.flow.flows.6.enc.res_skip_layers.2.bias", "net_g.flow.flows.6.enc.res_skip_layers.2.weight_g", "net_g.flow.flows.6.enc.res_skip_layers.2.weight_v", "net_g.flow.flows.6.enc.res_skip_layers.3.bias", "net_g.flow.flows.6.enc.res_skip_layers.3.weight_g", "net_g.flow.flows.6.enc.res_skip_layers.3.weight_v", "net_g.flow.flows.6.enc.cond_layer.bias", "net_g.flow.flows.6.enc.cond_layer.weight_g", "net_g.flow.flows.6.enc.cond_layer.weight_v", "net_g.flow.flows.6.post.weight", "net_g.flow.flows.6.post.bias", "net_g.dp.flows.0.m", "net_g.dp.flows.0.logs", "net_g.dp.flows.1.pre.weight", "net_g.dp.flows.1.pre.bias", "net_g.dp.flows.1.convs.convs_sep.0.weight", "net_g.dp.flows.1.convs.convs_sep.0.bias", "net_g.dp.flows.1.convs.convs_sep.1.weight", "net_g.dp.flows.1.convs.convs_sep.1.bias", "net_g.dp.flows.1.convs.convs_sep.2.weight", "net_g.dp.flows.1.convs.convs_sep.2.bias", "net_g.dp.flows.1.convs.convs_1x1.0.weight", "net_g.dp.flows.1.convs.convs_1x1.0.bias", "net_g.dp.flows.1.convs.convs_1x1.1.weight", "net_g.dp.flows.1.convs.convs_1x1.1.bias", "net_g.dp.flows.1.convs.convs_1x1.2.weight", "net_g.dp.flows.1.convs.convs_1x1.2.bias", "net_g.dp.flows.1.convs.norms_1.0.gamma", "net_g.dp.flows.1.convs.norms_1.0.beta", "net_g.dp.flows.1.convs.norms_1.1.gamma", "net_g.dp.flows.1.convs.norms_1.1.beta", "net_g.dp.flows.1.convs.norms_1.2.gamma", "net_g.dp.flows.1.convs.norms_1.2.beta", "net_g.dp.flows.1.convs.norms_2.0.gamma", "net_g.dp.flows.1.convs.norms_2.0.beta", "net_g.dp.flows.1.convs.norms_2.1.gamma", "net_g.dp.flows.1.convs.norms_2.1.beta", "net_g.dp.flows.1.convs.norms_2.2.gamma", "net_g.dp.flows.1.convs.norms_2.2.beta", "net_g.dp.flows.1.proj.weight", "net_g.dp.flows.1.proj.bias", "net_g.dp.flows.3.pre.weight", "net_g.dp.flows.3.pre.bias", "net_g.dp.flows.3.convs.convs_sep.0.weight", "net_g.dp.flows.3.convs.convs_sep.0.bias", "net_g.dp.flows.3.convs.convs_sep.1.weight", "net_g.dp.flows.3.convs.convs_sep.1.bias", "net_g.dp.flows.3.convs.convs_sep.2.weight", "net_g.dp.flows.3.convs.convs_sep.2.bias", "net_g.dp.flows.3.convs.convs_1x1.0.weight", "net_g.dp.flows.3.convs.convs_1x1.0.bias", "net_g.dp.flows.3.convs.convs_1x1.1.weight", "net_g.dp.flows.3.convs.convs_1x1.1.bias", "net_g.dp.flows.3.convs.convs_1x1.2.weight", "net_g.dp.flows.3.convs.convs_1x1.2.bias", "net_g.dp.flows.3.convs.norms_1.0.gamma", "net_g.dp.flows.3.convs.norms_1.0.beta", "net_g.dp.flows.3.convs.norms_1.1.gamma", "net_g.dp.flows.3.convs.norms_1.1.beta", "net_g.dp.flows.3.convs.norms_1.2.gamma", "net_g.dp.flows.3.convs.norms_1.2.beta", "net_g.dp.flows.3.convs.norms_2.0.gamma", "net_g.dp.flows.3.convs.norms_2.0.beta", "net_g.dp.flows.3.convs.norms_2.1.gamma", "net_g.dp.flows.3.convs.norms_2.1.beta", "net_g.dp.flows.3.convs.norms_2.2.gamma", "net_g.dp.flows.3.convs.norms_2.2.beta", "net_g.dp.flows.3.proj.weight", "net_g.dp.flows.3.proj.bias", "net_g.dp.flows.5.pre.weight", "net_g.dp.flows.5.pre.bias", "net_g.dp.flows.5.convs.convs_sep.0.weight", "net_g.dp.flows.5.convs.convs_sep.0.bias", "net_g.dp.flows.5.convs.convs_sep.1.weight", "net_g.dp.flows.5.convs.convs_sep.1.bias", "net_g.dp.flows.5.convs.convs_sep.2.weight", "net_g.dp.flows.5.convs.convs_sep.2.bias", "net_g.dp.flows.5.convs.convs_1x1.0.weight", "net_g.dp.flows.5.convs.convs_1x1.0.bias", "net_g.dp.flows.5.convs.convs_1x1.1.weight", "net_g.dp.flows.5.convs.convs_1x1.1.bias", "net_g.dp.flows.5.convs.convs_1x1.2.weight", "net_g.dp.flows.5.convs.convs_1x1.2.bias", "net_g.dp.flows.5.convs.norms_1.0.gamma", "net_g.dp.flows.5.convs.norms_1.0.beta", "net_g.dp.flows.5.convs.norms_1.1.gamma", "net_g.dp.flows.5.convs.norms_1.1.beta", "net_g.dp.flows.5.convs.norms_1.2.gamma", "net_g.dp.flows.5.convs.norms_1.2.beta", "net_g.dp.flows.5.convs.norms_2.0.gamma", "net_g.dp.flows.5.convs.norms_2.0.beta", "net_g.dp.flows.5.convs.norms_2.1.gamma", "net_g.dp.flows.5.convs.norms_2.1.beta", "net_g.dp.flows.5.convs.norms_2.2.gamma", "net_g.dp.flows.5.convs.norms_2.2.beta", "net_g.dp.flows.5.proj.weight", "net_g.dp.flows.5.proj.bias", "net_g.dp.flows.7.pre.weight", "net_g.dp.flows.7.pre.bias", "net_g.dp.flows.7.convs.convs_sep.0.weight", "net_g.dp.flows.7.convs.convs_sep.0.bias", "net_g.dp.flows.7.convs.convs_sep.1.weight", "net_g.dp.flows.7.convs.convs_sep.1.bias", "net_g.dp.flows.7.convs.convs_sep.2.weight", "net_g.dp.flows.7.convs.convs_sep.2.bias", "net_g.dp.flows.7.convs.convs_1x1.0.weight", "net_g.dp.flows.7.convs.convs_1x1.0.bias", "net_g.dp.flows.7.convs.convs_1x1.1.weight", "net_g.dp.flows.7.convs.convs_1x1.1.bias", "net_g.dp.flows.7.convs.convs_1x1.2.weight", "net_g.dp.flows.7.convs.convs_1x1.2.bias", "net_g.dp.flows.7.convs.norms_1.0.gamma", "net_g.dp.flows.7.convs.norms_1.0.beta", "net_g.dp.flows.7.convs.norms_1.1.gamma", "net_g.dp.flows.7.convs.norms_1.1.beta", "net_g.dp.flows.7.convs.norms_1.2.gamma", "net_g.dp.flows.7.convs.norms_1.2.beta", "net_g.dp.flows.7.convs.norms_2.0.gamma", "net_g.dp.flows.7.convs.norms_2.0.beta", "net_g.dp.flows.7.convs.norms_2.1.gamma", "net_g.dp.flows.7.convs.norms_2.1.beta", "net_g.dp.flows.7.convs.norms_2.2.gamma", "net_g.dp.flows.7.convs.norms_2.2.beta", "net_g.dp.flows.7.proj.weight", "net_g.dp.flows.7.proj.bias", "net_g.dp.post_pre.weight", "net_g.dp.post_pre.bias", "net_g.dp.post_proj.weight", "net_g.dp.post_proj.bias", "net_g.dp.post_convs.convs_sep.0.weight", "net_g.dp.post_convs.convs_sep.0.bias", "net_g.dp.post_convs.convs_sep.1.weight", "net_g.dp.post_convs.convs_sep.1.bias", "net_g.dp.post_convs.convs_sep.2.weight", "net_g.dp.post_convs.convs_sep.2.bias", "net_g.dp.post_convs.convs_1x1.0.weight", "net_g.dp.post_convs.convs_1x1.0.bias", "net_g.dp.post_convs.convs_1x1.1.weight", "net_g.dp.post_convs.convs_1x1.1.bias", "net_g.dp.post_convs.convs_1x1.2.weight", "net_g.dp.post_convs.convs_1x1.2.bias", "net_g.dp.post_convs.norms_1.0.gamma", "net_g.dp.post_convs.norms_1.0.beta", "net_g.dp.post_convs.norms_1.1.gamma", "net_g.dp.post_convs.norms_1.1.beta", "net_g.dp.post_convs.norms_1.2.gamma", "net_g.dp.post_convs.norms_1.2.beta", "net_g.dp.post_convs.norms_2.0.gamma", "net_g.dp.post_convs.norms_2.0.beta", "net_g.dp.post_convs.norms_2.1.gamma", "net_g.dp.post_convs.norms_2.1.beta", "net_g.dp.post_convs.norms_2.2.gamma", "net_g.dp.post_convs.norms_2.2.beta", "net_g.dp.post_flows.0.m", "net_g.dp.post_flows.0.logs", "net_g.dp.post_flows.1.pre.weight", "net_g.dp.post_flows.1.pre.bias", "net_g.dp.post_flows.1.convs.convs_sep.0.weight", "net_g.dp.post_flows.1.convs.convs_sep.0.bias", "net_g.dp.post_flows.1.convs.convs_sep.1.weight", "net_g.dp.post_flows.1.convs.convs_sep.1.bias", "net_g.dp.post_flows.1.convs.convs_sep.2.weight", "net_g.dp.post_flows.1.convs.convs_sep.2.bias", "net_g.dp.post_flows.1.convs.convs_1x1.0.weight", "net_g.dp.post_flows.1.convs.convs_1x1.0.bias", "net_g.dp.post_flows.1.convs.convs_1x1.1.weight", "net_g.dp.post_flows.1.convs.convs_1x1.1.bias", "net_g.dp.post_flows.1.convs.convs_1x1.2.weight", "net_g.dp.post_flows.1.convs.convs_1x1.2.bias", "net_g.dp.post_flows.1.convs.norms_1.0.gamma", "net_g.dp.post_flows.1.convs.norms_1.0.beta", "net_g.dp.post_flows.1.convs.norms_1.1.gamma", "net_g.dp.post_flows.1.convs.norms_1.1.beta", "net_g.dp.post_flows.1.convs.norms_1.2.gamma", "net_g.dp.post_flows.1.convs.norms_1.2.beta", "net_g.dp.post_flows.1.convs.norms_2.0.gamma", "net_g.dp.post_flows.1.convs.norms_2.0.beta", "net_g.dp.post_flows.1.convs.norms_2.1.gamma", "net_g.dp.post_flows.1.convs.norms_2.1.beta", "net_g.dp.post_flows.1.convs.norms_2.2.gamma", "net_g.dp.post_flows.1.convs.norms_2.2.beta", "net_g.dp.post_flows.1.proj.weight", "net_g.dp.post_flows.1.proj.bias", "net_g.dp.post_flows.3.pre.weight", "net_g.dp.post_flows.3.pre.bias", "net_g.dp.post_flows.3.convs.convs_sep.0.weight", "net_g.dp.post_flows.3.convs.convs_sep.0.bias", "net_g.dp.post_flows.3.convs.convs_sep.1.weight", "net_g.dp.post_flows.3.convs.convs_sep.1.bias", "net_g.dp.post_flows.3.convs.convs_sep.2.weight", "net_g.dp.post_flows.3.convs.convs_sep.2.bias", "net_g.dp.post_flows.3.convs.convs_1x1.0.weight", "net_g.dp.post_flows.3.convs.convs_1x1.0.bias", "net_g.dp.post_flows.3.convs.convs_1x1.1.weight", "net_g.dp.post_flows.3.convs.convs_1x1.1.bias", "net_g.dp.post_flows.3.convs.convs_1x1.2.weight", "net_g.dp.post_flows.3.convs.convs_1x1.2.bias", "net_g.dp.post_flows.3.convs.norms_1.0.gamma", "net_g.dp.post_flows.3.convs.norms_1.0.beta", "net_g.dp.post_flows.3.convs.norms_1.1.gamma", "net_g.dp.post_flows.3.convs.norms_1.1.beta", "net_g.dp.post_flows.3.convs.norms_1.2.gamma", "net_g.dp.post_flows.3.convs.norms_1.2.beta", "net_g.dp.post_flows.3.convs.norms_2.0.gamma", "net_g.dp.post_flows.3.convs.norms_2.0.beta", "net_g.dp.post_flows.3.convs.norms_2.1.gamma", "net_g.dp.post_flows.3.convs.norms_2.1.beta", "net_g.dp.post_flows.3.convs.norms_2.2.gamma", "net_g.dp.post_flows.3.convs.norms_2.2.beta", "net_g.dp.post_flows.3.proj.weight", "net_g.dp.post_flows.3.proj.bias", "net_g.dp.post_flows.5.pre.weight", "net_g.dp.post_flows.5.pre.bias", "net_g.dp.post_flows.5.convs.convs_sep.0.weight", "net_g.dp.post_flows.5.convs.convs_sep.0.bias", "net_g.dp.post_flows.5.convs.convs_sep.1.weight", "net_g.dp.post_flows.5.convs.convs_sep.1.bias", "net_g.dp.post_flows.5.convs.convs_sep.2.weight", "net_g.dp.post_flows.5.convs.convs_sep.2.bias", "net_g.dp.post_flows.5.convs.convs_1x1.0.weight", "net_g.dp.post_flows.5.convs.convs_1x1.0.bias", "net_g.dp.post_flows.5.convs.convs_1x1.1.weight", "net_g.dp.post_flows.5.convs.convs_1x1.1.bias", "net_g.dp.post_flows.5.convs.convs_1x1.2.weight", "net_g.dp.post_flows.5.convs.convs_1x1.2.bias", "net_g.dp.post_flows.5.convs.norms_1.0.gamma", "net_g.dp.post_flows.5.convs.norms_1.0.beta", "net_g.dp.post_flows.5.convs.norms_1.1.gamma", "net_g.dp.post_flows.5.convs.norms_1.1.beta", "net_g.dp.post_flows.5.convs.norms_1.2.gamma", "net_g.dp.post_flows.5.convs.norms_1.2.beta", "net_g.dp.post_flows.5.convs.norms_2.0.gamma", "net_g.dp.post_flows.5.convs.norms_2.0.beta", "net_g.dp.post_flows.5.convs.norms_2.1.gamma", "net_g.dp.post_flows.5.convs.norms_2.1.beta", "net_g.dp.post_flows.5.convs.norms_2.2.gamma", "net_g.dp.post_flows.5.convs.norms_2.2.beta", "net_g.dp.post_flows.5.proj.weight", "net_g.dp.post_flows.5.proj.bias", "net_g.dp.post_flows.7.pre.weight", "net_g.dp.post_flows.7.pre.bias", "net_g.dp.post_flows.7.convs.convs_sep.0.weight", "net_g.dp.post_flows.7.convs.convs_sep.0.bias", "net_g.dp.post_flows.7.convs.convs_sep.1.weight", "net_g.dp.post_flows.7.convs.convs_sep.1.bias", "net_g.dp.post_flows.7.convs.convs_sep.2.weight", "net_g.dp.post_flows.7.convs.convs_sep.2.bias", "net_g.dp.post_flows.7.convs.convs_1x1.0.weight", "net_g.dp.post_flows.7.convs.convs_1x1.0.bias", "net_g.dp.post_flows.7.convs.convs_1x1.1.weight", "net_g.dp.post_flows.7.convs.convs_1x1.1.bias", "net_g.dp.post_flows.7.convs.convs_1x1.2.weight", "net_g.dp.post_flows.7.convs.convs_1x1.2.bias", "net_g.dp.post_flows.7.convs.norms_1.0.gamma", "net_g.dp.post_flows.7.convs.norms_1.0.beta", "net_g.dp.post_flows.7.convs.norms_1.1.gamma", "net_g.dp.post_flows.7.convs.norms_1.1.beta", "net_g.dp.post_flows.7.convs.norms_1.2.gamma", "net_g.dp.post_flows.7.convs.norms_1.2.beta", "net_g.dp.post_flows.7.convs.norms_2.0.gamma", "net_g.dp.post_flows.7.convs.norms_2.0.beta", "net_g.dp.post_flows.7.convs.norms_2.1.gamma", "net_g.dp.post_flows.7.convs.norms_2.1.beta", "net_g.dp.post_flows.7.convs.norms_2.2.gamma", "net_g.dp.post_flows.7.convs.norms_2.2.beta", "net_g.dp.post_flows.7.proj.weight", "net_g.dp.post_flows.7.proj.bias", "net_g.dp.pre.weight", "net_g.dp.pre.bias", "net_g.dp.proj.weight", "net_g.dp.proj.bias", "net_g.dp.convs.convs_sep.0.weight", "net_g.dp.convs.convs_sep.0.bias", "net_g.dp.convs.convs_sep.1.weight", "net_g.dp.convs.convs_sep.1.bias", "net_g.dp.convs.convs_sep.2.weight", "net_g.dp.convs.convs_sep.2.bias", "net_g.dp.convs.convs_1x1.0.weight", "net_g.dp.convs.convs_1x1.0.bias", "net_g.dp.convs.convs_1x1.1.weight", "net_g.dp.convs.convs_1x1.1.bias", "net_g.dp.convs.convs_1x1.2.weight", "net_g.dp.convs.convs_1x1.2.bias", "net_g.dp.convs.norms_1.0.gamma", "net_g.dp.convs.norms_1.0.beta", "net_g.dp.convs.norms_1.1.gamma", "net_g.dp.convs.norms_1.1.beta", "net_g.dp.convs.norms_1.2.gamma", "net_g.dp.convs.norms_1.2.beta", "net_g.dp.convs.norms_2.0.gamma", "net_g.dp.convs.norms_2.0.beta", "net_g.dp.convs.norms_2.1.gamma", "net_g.dp.convs.norms_2.1.beta", "net_g.dp.convs.norms_2.2.gamma", "net_g.dp.convs.norms_2.2.beta", "net_g.dp.cond.weight", "net_g.dp.cond.bias", "net_d.discriminators.0.convs.0.bias", "net_d.discriminators.0.convs.0.weight_g", "net_d.discriminators.0.convs.0.weight_v", "net_d.discriminators.0.convs.1.bias", "net_d.discriminators.0.convs.1.weight_g", "net_d.discriminators.0.convs.1.weight_v", "net_d.discriminators.0.convs.2.bias", "net_d.discriminators.0.convs.2.weight_g", "net_d.discriminators.0.convs.2.weight_v", "net_d.discriminators.0.convs.3.bias", "net_d.discriminators.0.convs.3.weight_g", "net_d.discriminators.0.convs.3.weight_v", "net_d.discriminators.0.convs.4.bias", "net_d.discriminators.0.convs.4.weight_g", "net_d.discriminators.0.convs.4.weight_v", "net_d.discriminators.0.convs.5.bias", "net_d.discriminators.0.convs.5.weight_g", "net_d.discriminators.0.convs.5.weight_v", "net_d.discriminators.0.conv_post.bias", "net_d.discriminators.0.conv_post.weight_g", "net_d.discriminators.0.conv_post.weight_v", "net_d.discriminators.1.convs.0.bias", "net_d.discriminators.1.convs.0.weight_g", "net_d.discriminators.1.convs.0.weight_v", "net_d.discriminators.1.convs.1.bias", "net_d.discriminators.1.convs.1.weight_g", "net_d.discriminators.1.convs.1.weight_v", "net_d.discriminators.1.convs.2.bias", "net_d.discriminators.1.convs.2.weight_g", "net_d.discriminators.1.convs.2.weight_v", "net_d.discriminators.1.convs.3.bias", "net_d.discriminators.1.convs.3.weight_g", "net_d.discriminators.1.convs.3.weight_v", "net_d.discriminators.1.convs.4.bias", "net_d.discriminators.1.convs.4.weight_g", "net_d.discriminators.1.convs.4.weight_v", "net_d.discriminators.1.conv_post.bias", "net_d.discriminators.1.conv_post.weight_g", "net_d.discriminators.1.conv_post.weight_v", "net_d.discriminators.2.convs.0.bias", "net_d.discriminators.2.convs.0.weight_g", "net_d.discriminators.2.convs.0.weight_v", "net_d.discriminators.2.convs.1.bias", "net_d.discriminators.2.convs.1.weight_g", "net_d.discriminators.2.convs.1.weight_v", "net_d.discriminators.2.convs.2.bias", "net_d.discriminators.2.convs.2.weight_g", "net_d.discriminators.2.convs.2.weight_v", "net_d.discriminators.2.convs.3.bias", "net_d.discriminators.2.convs.3.weight_g", "net_d.discriminators.2.convs.3.weight_v", "net_d.discriminators.2.convs.4.bias", "net_d.discriminators.2.convs.4.weight_g", "net_d.discriminators.2.convs.4.weight_v", "net_d.discriminators.2.conv_post.bias", "net_d.discriminators.2.conv_post.weight_g", "net_d.discriminators.2.conv_post.weight_v", "net_d.discriminators.3.convs.0.bias", "net_d.discriminators.3.convs.0.weight_g", "net_d.discriminators.3.convs.0.weight_v", "net_d.discriminators.3.convs.1.bias", "net_d.discriminators.3.convs.1.weight_g", "net_d.discriminators.3.convs.1.weight_v", "net_d.discriminators.3.convs.2.bias", "net_d.discriminators.3.convs.2.weight_g", "net_d.discriminators.3.convs.2.weight_v", "net_d.discriminators.3.convs.3.bias", "net_d.discriminators.3.convs.3.weight_g", "net_d.discriminators.3.convs.3.weight_v", "net_d.discriminators.3.convs.4.bias", "net_d.discriminators.3.convs.4.weight_g", "net_d.discriminators.3.convs.4.weight_v", "net_d.discriminators.3.conv_post.bias", "net_d.discriminators.3.conv_post.weight_g", "net_d.discriminators.3.conv_post.weight_v", "net_d.discriminators.4.convs.0.bias", "net_d.discriminators.4.convs.0.weight_g", "net_d.discriminators.4.convs.0.weight_v", "net_d.discriminators.4.convs.1.bias", "net_d.discriminators.4.convs.1.weight_g", "net_d.discriminators.4.convs.1.weight_v", "net_d.discriminators.4.convs.2.bias", "net_d.discriminators.4.convs.2.weight_g", "net_d.discriminators.4.convs.2.weight_v", "net_d.discriminators.4.convs.3.bias", "net_d.discriminators.4.convs.3.weight_g", "net_d.discriminators.4.convs.3.weight_v", "net_d.discriminators.4.convs.4.bias", "net_d.discriminators.4.convs.4.weight_g", "net_d.discriminators.4.convs.4.weight_v", "net_d.discriminators.4.conv_post.bias", "net_d.discriminators.4.conv_post.weight_g", "net_d.discriminators.4.conv_post.weight_v", "net_d.discriminators.5.convs.0.bias", "net_d.discriminators.5.convs.0.weight_g", "net_d.discriminators.5.convs.0.weight_v", "net_d.discriminators.5.convs.1.bias", "net_d.discriminators.5.convs.1.weight_g", "net_d.discriminators.5.convs.1.weight_v", "net_d.discriminators.5.convs.2.bias", "net_d.discriminators.5.convs.2.weight_g", "net_d.discriminators.5.convs.2.weight_v", "net_d.discriminators.5.convs.3.bias", "net_d.discriminators.5.convs.3.weight_g", "net_d.discriminators.5.convs.3.weight_v", "net_d.discriminators.5.convs.4.bias", "net_d.discriminators.5.convs.4.weight_g", "net_d.discriminators.5.convs.4.weight_v", "net_d.discriminators.5.conv_post.bias", "net_d.discriminators.5.conv_post.weight_g", "net_d.discriminators.5.conv_post.weight_v". 
	Unexpected key(s) in state_dict: "model", "iteration", "optimizer", "learning_rate". 

In [None]:
model = VitsModel(torch.load("/home/giacomo/italian-tts/checkpoints/vits/vits.pth", map_location='cpu')).eval()

In [None]:
model = VitsModel.load_from_checkpoint("/home/giacomo/win-home/Downloads/VITS--loss_gen_all=21.6121-epoch=279.ckpt").cpu().eval()

In [None]:
model.save_to(save_path="model.nemo")

In [None]:
mymodel=model.restore_from(restore_path="mymodel.nemo")

In [None]:
model = VitsModel.load_from_checkpoint("/home/giacomo/win-home/Downloads/VITS22.1797.ckpt").cpu().eval()

In [None]:
model = VitsModel.load_from_checkpoint("/home/giacomo/win-home/Downloads/VITS22.1797.ckpt").cpu().eval()

In [None]:
tokens = model.parse(text_raw)
audio_pred = model.convert_text_to_waveform(tokens=tokens).cpu().detach().numpy()

print("predicted audio")
ipd.Audio(audio_pred, rate=target_sr)

In [None]:
tokens = model_.parse(text_raw)
audio_pred = model_.convert_text_to_waveform(tokens=tokens).cpu().detach().numpy()

print("predicted audio")
ipd.Audio(audio_pred, rate=target_sr)

In [None]:
audio_to_mel = model.audio_to_melspec_processor


len_audio = torch.tensor(len(audio_data)).view(1, -1)

spec_pred, _ = audio_to_mel(torch.tensor(audio_pred).view(1, -1), len_audio)
spec_orig, _ = audio_to_mel(torch.tensor(audio_data).view(1, -1), len_audio)

In [None]:
fig, ax = plt.subplots(1, 2)

ax[0].imshow(spec_orig[0][0].cpu().detach())
ax[1].imshow(spec_pred[0][0].cpu().detach())

ax[0].set_title('Original spectrogram')
ax[1].set_title('Predicted spectrogram')
fig.show()