Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot run Mixer TTS colab with Mixer-TTS-X model #4803

Closed
gedefet opened this issue Aug 24, 2022 · 27 comments
Closed

Cannot run Mixer TTS colab with Mixer-TTS-X model #4803

gedefet opened this issue Aug 24, 2022 · 27 comments
Assignees
Labels
bug Something isn't working TTS

Comments

@gedefet
Copy link

gedefet commented Aug 24, 2022

Hi all. I'm struggling to run a training using Mixer-TTS-X model. I'm using the tutorial for training both Fastpitch and MixerTTS.

Modifications I've done:

pretrained_model = "tts_en_lj_mixerttsx"

Adding 'raw_texts' argument when generating a spectrogram:

spectrogram = spec_gen.generate_spectrogram(tokens=tokens, raw_texts=["Hey, this produces speech!"])

Correcting this:

from nemo.collections.tts.torch.data import MixerTTSXDataset

Just in case:

from nemo.collections.tts.torch.tts_data_types import LMTokens
from transformers.models.albert.tokenization_albert import AlbertTokenizer

add lm_tokenizer parameter here:

def pre_calculate_supplementary_data(sup_data_path, sup_data_types, text_tokenizer, text_normalizer, lm_tokenizer, text_normalizer_call_kwargs)

Getting the right config file:

&& wget https://raw.githubusercontent.com/NVIDIA/NeMo/$BRANCH/examples/tts/conf/mixer-tts-x.yaml

Creating the lm_tokenizer object:

lm_tokenizer = LMTokens()

And after running the command:

mixer_tts_sup_data_path = "mixer_tts_x_sup_data_folder"
sup_data_types = ["align_prior_matrix", "pitch", "lm_tokens"]

pitch_mean, pitch_std, pitch_min, pitch_max = pre_calculate_supplementary_data(
    mixer_tts_sup_data_path, sup_data_types, text_tokenizer, text_normalizer, lm_tokenizer, text_normalizer_call_kwargs
)

I get the following error:

[NeMo I 2022-08-24 22:00:27 data:216] Loading dataset from tests/data/asr/an4_train.json.
30it [00:00, 712.41it/s][NeMo I 2022-08-24 22:00:27 data:253] Loaded dataset with 30 files.
[NeMo I 2022-08-24 22:00:27 data:255] Dataset contains 0.02 hours.
[NeMo I 2022-08-24 22:00:27 data:357] Pruned 0 files. Final dataset contains 30 files
[NeMo I 2022-08-24 22:00:27 data:360] Pruned 0.00 hours. Final dataset contains 0.02 hours.

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
[<ipython-input-21-b87416bfd412>](https://localhost:8080/#) in <module>
      3 
      4 pitch_mean, pitch_std, pitch_min, pitch_max = pre_calculate_supplementary_data(
----> 5     mixer_tts_sup_data_path, sup_data_types, text_tokenizer, text_normalizer, lm_tokenizer, text_normalizer_call_kwargs
      6 )

3 frames
[<ipython-input-20-7893d3034131>](https://localhost:8080/#) in pre_calculate_supplementary_data(sup_data_path, sup_data_types, text_tokenizer, text_normalizer, lm_tokenizer, text_normalizer_call_kwargs)
     22             text_normalizer=text_normalizer,
     23             lm_tokenizer=lm_tokenizer,
---> 24             text_normalizer_call_kwargs=text_normalizer_call_kwargs
     25         ) 
     26         stage2dl[stage] = torch.utils.data.DataLoader(ds, batch_size=1, collate_fn=ds._collate_fn, num_workers=1)

[/usr/local/lib/python3.7/dist-packages/nemo/collections/tts/torch/data.py](https://localhost:8080/#) in __init__(self, **kwargs)
    759 class MixerTTSXDataset(TTSDataset):
    760     def __init__(self, **kwargs):
--> 761         super().__init__(**kwargs)
    762 
    763     def _albert(self):

[/usr/local/lib/python3.7/dist-packages/nemo/collections/tts/torch/data.py](https://localhost:8080/#) in __init__(self, manifest_filepath, sample_rate, text_tokenizer, tokens, text_normalizer, text_normalizer_call_kwargs, text_tokenizer_pad_id, sup_data_types, sup_data_path, max_duration, min_duration, ignore_file, trim, trim_ref, trim_top_db, trim_frame_length, trim_hop_length, n_fft, win_length, hop_length, window, n_mels, lowfreq, highfreq, **kwargs)
    323 
    324         for data_type in self.sup_data_types:
--> 325             getattr(self, f"add_{data_type.name}")(**kwargs)
    326 
    327     @staticmethod

[/usr/local/lib/python3.7/dist-packages/nemo/collections/tts/torch/data.py](https://localhost:8080/#) in add_lm_tokens(self, **kwargs)
    785 
    786     def add_lm_tokens(self, **kwargs):
--> 787         lm_model = "kwargs.pop('lm_model')"
    788 
    789         if lm_model == "albert":

KeyError: 'lm_model'

Any ideas are welcome.
Thanks,

@gedefet gedefet added the bug Something isn't working label Aug 24, 2022
@gedefet
Copy link
Author

gedefet commented Aug 24, 2022

Also I've added 'lm_tokens' in sup_data_types. It is shown in the command

@gedefet
Copy link
Author

gedefet commented Aug 25, 2022

Also changed lm_tokens for tokens, since is the name of the parameter. Also changed the index for a number.

Same results.

@redoctopus redoctopus self-assigned this Aug 25, 2022
@redoctopus
Copy link
Collaborator

Instead of passing in lm_tokenizer, please replace that with lm_model with value "albert". The MixerTTSXDataset will instantiate the appropriate tokenizer automatically once an lm_model is specified.

You can find the list of default arguments that the MixerTTSXDataset takes in this config file: https://github.com/NVIDIA/NeMo/blob/main/examples/tts/conf/mixer-tts-x.yaml#L80

Note that the pre_calculate_supplementary_data() function will now also return one extra item (lm_tokens), so you'll have to modify this line:

tokens, tokens_lengths, audios, audio_lengths, attn_prior, pitches, pitches_lengths = batch

to be this:

tokens, tokens_lengths, audios, audio_lengths, attn_prior, pitches, pitches_lengths, lm_tokens = batch

to make that part of the notebook run correctly.

@gedefet
Copy link
Author

gedefet commented Aug 25, 2022

@redoctopus great help. Now that error dont come anymore.

I have another error now:

[NeMo I 2022-08-25 19:24:16 data:216] Loading dataset from tests/data/asr/an4_train.json.
30it [00:00, 399.34it/s][NeMo I 2022-08-25 19:24:16 data:253] Loaded dataset with 30 files.
[NeMo I 2022-08-25 19:24:16 data:255] Dataset contains 0.02 hours.
[NeMo I 2022-08-25 19:24:16 data:357] Pruned 0 files. Final dataset contains 30 files
[NeMo I 2022-08-25 19:24:16 data:360] Pruned 0.00 hours. Final dataset contains 0.02 hours.

[NeMo I 2022-08-25 19:24:18 data:216] Loading dataset from tests/data/asr/an4_val.json.
10it [00:00, 671.31it/s][NeMo I 2022-08-25 19:24:18 data:253] Loaded dataset with 10 files.
[NeMo I 2022-08-25 19:24:18 data:255] Dataset contains 0.01 hours.
[NeMo I 2022-08-25 19:24:18 data:357] Pruned 0 files. Final dataset contains 10 files
[NeMo I 2022-08-25 19:24:18 data:360] Pruned 0.00 hours. Final dataset contains 0.01 hours.

0%
0/30 [00:02<?, ?it/s]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[<ipython-input-11-28a44ab27a50>](https://localhost:8080/#) in <module>
      3 
      4 pitch_mean, pitch_std, pitch_min, pitch_max = pre_calculate_supplementary_data(
----> 5     mixer_tts_sup_data_path, sup_data_types, text_tokenizer, text_normalizer, tokens, text_normalizer_call_kwargs
      6 )

6 frames
[<ipython-input-10-e8cb46cc0a78>](https://localhost:8080/#) in pre_calculate_supplementary_data(sup_data_path, sup_data_types, text_tokenizer, text_normalizer, lm_model, text_normalizer_call_kwargs)
     30     for stage, dl in stage2dl.items():
     31         pitch_list = []
---> 32         for batch in tqdm(dl, total=len(dl)):
     33             tokens, tokens_lengths, audios, audio_lengths, attn_prior, pitches, pitches_lengths, lm_tokens = batch
     34             pitch = pitches.squeeze(0)

[/usr/local/lib/python3.7/dist-packages/tqdm/notebook.py](https://localhost:8080/#) in __iter__(self)
    256         try:
    257             it = super(tqdm_notebook, self).__iter__()
--> 258             for obj in it:
    259                 # return super(tqdm...) will not catch exception
    260                 yield obj

[/usr/local/lib/python3.7/dist-packages/tqdm/std.py](https://localhost:8080/#) in __iter__(self)
   1193 
   1194         try:
-> 1195             for obj in iterable:
   1196                 yield obj
   1197                 # Update and possibly print the progressbar.

[/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py](https://localhost:8080/#) in __next__(self)
    679                 # TODO(https://github.com/pytorch/pytorch/issues/76750)
    680                 self._reset()  # type: ignore[call-arg]
--> 681             data = self._next_data()
    682             self._num_yielded += 1
    683             if self._dataset_kind == _DatasetKind.Iterable and \

[/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py](https://localhost:8080/#) in _next_data(self)
   1374             else:
   1375                 del self._task_info[idx]
-> 1376                 return self._process_data(data)
   1377 
   1378     def _try_put_index(self):

[/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py](https://localhost:8080/#) in _process_data(self, data)
   1400         self._try_put_index()
   1401         if isinstance(data, ExceptionWrapper):
-> 1402             data.reraise()
   1403         return data
   1404 

[/usr/local/lib/python3.7/dist-packages/torch/_utils.py](https://localhost:8080/#) in reraise(self)
    459             # instantiate since we don't know how to
    460             raise RuntimeError(msg) from None
--> 461         raise exception
    462 
    463 

ValueError: Caught ValueError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
    data = fetcher.fetch(index)
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
    return self.collate_fn(data)
  File "/usr/local/lib/python3.7/dist-packages/nemo/collections/tts/torch/data.py", line 840, in _collate_fn
    data_dict = self.general_collate_fn(list(zip(*batch[:13])))
  File "/usr/local/lib/python3.7/dist-packages/nemo/collections/tts/torch/data.py", line 649, in general_collate_fn
    ) = zip(*batch)
ValueError: not enough values to unpack (expected 15, got 13)

Command:

from nemo.collections.tts.torch.tts_data_types import LMTokens
from transformers.models.albert.tokenization_albert import AlbertTokenizer

def pre_calculate_supplementary_data(sup_data_path, sup_data_types, text_tokenizer, text_normalizer, lm_model, text_normalizer_call_kwargs):
    # init train and val dataloaders
    stages = ["train", "val"]
    stage2dl = {}
    for stage in stages:
        ds = MixerTTSXDataset(
            manifest_filepath=f"tests/data/asr/an4_{stage}.json",
            sample_rate=16000,
            sup_data_path=sup_data_path,
            sup_data_types=sup_data_types,
            n_fft=1024,
            win_length=1024,
            hop_length=256,
            window="hann",
            n_mels=80,
            lowfreq=0,
            highfreq=8000,
            text_tokenizer=text_tokenizer,
            text_normalizer=text_normalizer,
            lm_model="albert",
            text_normalizer_call_kwargs=text_normalizer_call_kwargs
        ) 
        stage2dl[stage] = torch.utils.data.DataLoader(ds, batch_size=1, collate_fn=ds._collate_fn, num_workers=1)

    # iteration over dataloaders
    pitch_mean, pitch_std, pitch_min, pitch_max = None, None, None, None
    for stage, dl in stage2dl.items():
        pitch_list = []
        for batch in tqdm(dl, total=len(dl)):
            tokens, tokens_lengths, audios, audio_lengths, attn_prior, pitches, pitches_lengths, lm_tokens = batch
            pitch = pitches.squeeze(0)
            pitch_list.append(pitch[pitch != 0])

        if stage == "train":
            pitch_tensor = torch.cat(pitch_list)
            pitch_mean, pitch_std = pitch_tensor.mean().item(), pitch_tensor.std().item()
            pitch_min, pitch_max = pitch_tensor.min().item(), pitch_tensor.max().item()
            
    return pitch_mean, pitch_std, pitch_min, pitch_max

I only added 1 parameter and also added in that after line, so don't know where is the mistake.

Thanks!

@gedefet
Copy link
Author

gedefet commented Aug 25, 2022

Now I've added 'lm_model' here, but it seems to not recognize it:

mixer_tts_sup_data_path = "mixer_tts_x_sup_data_folder"
sup_data_types = ["align_prior_matrix", "pitch", "lm_tokens"]

pitch_mean, pitch_std, pitch_min, pitch_max = pre_calculate_supplementary_data(
    mixer_tts_sup_data_path, sup_data_types, text_tokenizer, text_normalizer, lm_model, text_normalizer_call_kwargs
)

Error:

NameError                                 Traceback (most recent call last)
[<ipython-input-12-8bd796d3cfa9>](https://localhost:8080/#) in <module>
      3 
      4 pitch_mean, pitch_std, pitch_min, pitch_max = pre_calculate_supplementary_data(
----> 5     mixer_tts_sup_data_path, sup_data_types, text_tokenizer, text_normalizer, lm_model, text_normalizer_call_kwargs
      6 )

NameError: name 'lm_model' is not defined

Thanks.

@redoctopus
Copy link
Collaborator

Ah, yep. I ran into that too when reproducing--older versions of the dataset should be fine, but a fix is in the works in #4811 that updates the count to include voiced_mask and p_voiced. It's an error in the Dataset, so once this fix is merged you shouldn't have to edit anything in the script.

@redoctopus
Copy link
Collaborator

Re lm_model, please pass in "albert" as the value.

@gedefet
Copy link
Author

gedefet commented Aug 25, 2022

Thanks. The unpack values error remains the same but thanks for the clarification on the pending fix.

Just a question: Can I finetune this model with my own dataset right as in FastPitch right? This is, starting from the LJSpeech or HiFiTTS checkpoint and then adding my own audio, and select the speaker when inferencing.

Thanks!

@redoctopus
Copy link
Collaborator

No problem!

I don't think anyone on our team has tried it yet, but yes, you should be able to fine-tune as usual.

@gedefet
Copy link
Author

gedefet commented Aug 27, 2022

Hey, I saw the changes were merged into main, but I still getting the error:

mixer_tts_sup_data_path = "mixer_tts_x_sup_data_folder"
sup_data_types = ["align_prior_matrix", "pitch", "lm_tokens"]

pitch_mean, pitch_std, pitch_min, pitch_max = pre_calculate_supplementary_data(
    mixer_tts_sup_data_path, sup_data_types, text_tokenizer, text_normalizer, "albert", text_normalizer_call_kwargs
)

The error:

[NeMo I 2022-08-27 01:24:55 data:216] Loading dataset from tests/data/asr/an4_train.json.
30it [00:00, 381.68it/s][NeMo I 2022-08-27 01:24:55 data:253] Loaded dataset with 30 files.
[NeMo I 2022-08-27 01:24:55 data:255] Dataset contains 0.02 hours.
[NeMo I 2022-08-27 01:24:55 data:357] Pruned 0 files. Final dataset contains 30 files
[NeMo I 2022-08-27 01:24:55 data:360] Pruned 0.00 hours. Final dataset contains 0.02 hours.

[NeMo I 2022-08-27 01:24:57 data:216] Loading dataset from tests/data/asr/an4_val.json.
10it [00:00, 594.56it/s][NeMo I 2022-08-27 01:24:57 data:253] Loaded dataset with 10 files.
[NeMo I 2022-08-27 01:24:57 data:255] Dataset contains 0.01 hours.
[NeMo I 2022-08-27 01:24:57 data:357] Pruned 0 files. Final dataset contains 10 files
[NeMo I 2022-08-27 01:24:57 data:360] Pruned 0.00 hours. Final dataset contains 0.01 hours.

0%
0/30 [00:02<?, ?it/s]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[<ipython-input-9-f4870cf5cca3>](https://localhost:8080/#) in <module>
      3 
      4 pitch_mean, pitch_std, pitch_min, pitch_max = pre_calculate_supplementary_data(
----> 5     mixer_tts_sup_data_path, sup_data_types, text_tokenizer, text_normalizer, "albert", text_normalizer_call_kwargs
      6 )

6 frames
[<ipython-input-6-e8cb46cc0a78>](https://localhost:8080/#) in pre_calculate_supplementary_data(sup_data_path, sup_data_types, text_tokenizer, text_normalizer, lm_model, text_normalizer_call_kwargs)
     30     for stage, dl in stage2dl.items():
     31         pitch_list = []
---> 32         for batch in tqdm(dl, total=len(dl)):
     33             tokens, tokens_lengths, audios, audio_lengths, attn_prior, pitches, pitches_lengths, lm_tokens = batch
     34             pitch = pitches.squeeze(0)

[/usr/local/lib/python3.7/dist-packages/tqdm/notebook.py](https://localhost:8080/#) in __iter__(self)
    256         try:
    257             it = super(tqdm_notebook, self).__iter__()
--> 258             for obj in it:
    259                 # return super(tqdm...) will not catch exception
    260                 yield obj

[/usr/local/lib/python3.7/dist-packages/tqdm/std.py](https://localhost:8080/#) in __iter__(self)
   1193 
   1194         try:
-> 1195             for obj in iterable:
   1196                 yield obj
   1197                 # Update and possibly print the progressbar.

[/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py](https://localhost:8080/#) in __next__(self)
    679                 # TODO(https://github.com/pytorch/pytorch/issues/76750)
    680                 self._reset()  # type: ignore[call-arg]
--> 681             data = self._next_data()
    682             self._num_yielded += 1
    683             if self._dataset_kind == _DatasetKind.Iterable and \

[/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py](https://localhost:8080/#) in _next_data(self)
   1374             else:
   1375                 del self._task_info[idx]
-> 1376                 return self._process_data(data)
   1377 
   1378     def _try_put_index(self):

[/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py](https://localhost:8080/#) in _process_data(self, data)
   1400         self._try_put_index()
   1401         if isinstance(data, ExceptionWrapper):
-> 1402             data.reraise()
   1403         return data
   1404 

[/usr/local/lib/python3.7/dist-packages/torch/_utils.py](https://localhost:8080/#) in reraise(self)
    459             # instantiate since we don't know how to
    460             raise RuntimeError(msg) from None
--> 461         raise exception
    462 
    463 

ValueError: Caught ValueError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
    data = fetcher.fetch(index)
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
    return self.collate_fn(data)
  File "/usr/local/lib/python3.7/dist-packages/nemo/collections/tts/torch/data.py", line 840, in _collate_fn
    data_dict = self.general_collate_fn(list(zip(*batch[:13])))
  File "/usr/local/lib/python3.7/dist-packages/nemo/collections/tts/torch/data.py", line 649, in general_collate_fn
    ) = zip(*batch)
ValueError: not enough values to unpack (expected 15, got 13)

Thanks!

@redoctopus
Copy link
Collaborator

The fix was merged into the r1.11.0 branch (which is merged with main regularly). So it should work if you try that branch, or whenever the fix is merged to main as well!

@gedefet
Copy link
Author

gedefet commented Aug 29, 2022

@redoctopus I think there is a problem with that. If I run r1.11.0 branch instead of main, I get an error even before:

from nemo.collections.tts.torch.data import MixerTTSXDataset
from nemo_text_processing.text_normalization.normalize import Normalizer
from nemo_text_processing.g2p.modules import EnglishG2p
from nemo.collections.common.tokenizers.text_to_speech.tts_tokenizers import (
    EnglishPhonemesTokenizer,
    EnglishCharsTokenizer,
)

Error:

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
[<ipython-input-5-46afc0da47a7>](https://localhost:8080/#) in <module>
      1 from nemo.collections.tts.torch.data import MixerTTSXDataset
      2 from nemo_text_processing.text_normalization.normalize import Normalizer
----> 3 from nemo_text_processing.g2p.modules import EnglishG2p
      4 from nemo.collections.common.tokenizers.text_to_speech.tts_tokenizers import (
      5     EnglishPhonemesTokenizer,

ModuleNotFoundError: No module named 'nemo_text_processing.g2p.modules'

@redoctopus
Copy link
Collaborator

Have you tried reinstalling after switching branches? As of #4690 some preprocessing modules have been moved from the TTS collection to more general text processing, and if you haven't reinstalled the program can no longer find those classes.

@gedefet
Copy link
Author

gedefet commented Aug 29, 2022

I'm using Google Colab PRO +...erasing and restarting each time the kernel, but the error is still there.

Each time I reconnect to it I'm installing nemo

@redoctopus
Copy link
Collaborator

Ahh, I see the problem now. #4690 was merged to main, while #4811 was merged as a bugfix to the r1.11.0 branch. My apologies! In this case you will have to wait until the r1.11.0 branch is merged again with main, which should be within the next few days.

Alternatively you could try cherrypicking the fix in another branch, if it is urgent.

@redoctopus
Copy link
Collaborator

Oh, you should also be able to load the r1.11.0 version of the tutorial, which has the correct path for that version of the repository.

@gedefet
Copy link
Author

gedefet commented Aug 29, 2022

No problem! Thank you very much. I was able to run it loading directly the r1.11.0 branch.

I have a question. After running:

mixer_tts_sup_data_path = "mixer_tts_x_sup_data_folder"
sup_data_types = ["align_prior_matrix", "pitch", "lm_tokens"]

pitch_mean, pitch_std, pitch_min, pitch_max = pre_calculate_supplementary_data(
    mixer_tts_sup_data_path, sup_data_types, text_tokenizer, text_normalizer, "albert", text_normalizer_call_kwargs
)

The folders created under mixer_tts_x_sup_data_folder are align_prior_matrix and pitch, but no lm_tokens.

Why does this happen?It is correct?

@gedefet
Copy link
Author

gedefet commented Aug 29, 2022

Also I get this in the following command (I know i've to change sup_data_types, but given the previous post...):

!python mixer_tts.py --config-name=mixer-tts-x.yaml \
sample_rate=22050 \
train_dataset=train.json \
validation_datasets=val.json \
sup_data_types="['align_prior_matrix', 'pitch' ]" \
sup_data_path={mixer_tts_x_sup_data_path} \
+phoneme_dict_path=tts_dataset_files/cmudict-0.7b_nv22.08 \
+heteronyms_path=tts_dataset_files/heteronyms-052722 \
whitelist_path=tts_dataset_files/lj_speech.tsv \
exp_manager.exp_dir=$OUTPUT_CHEKPOINTS \
pitch_mean={pitch_mean} \
pitch_std={pitch_std} \
model.train_ds.dataloader_params.batch_size=6 \
model.train_ds.dataloader_params.num_workers=0 \
model.validation_ds.dataloader_params.num_workers=0 \
trainer.max_epochs=3 \
trainer.strategy=null \
trainer.check_val_every_n_epoch=1

The error:

[NeMo W 2022-08-29 20:24:59 optimizers:55] Apex was not found. Using the lamb or fused_adam optimizer will error out.
[NeMo W 2022-08-29 20:24:59 experimental:28] Module <class 'nemo.collections.tts.torch.tts_tokenizers.IPATokenizer'> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo W 2022-08-29 20:24:59 experimental:28] Module <class 'nemo.collections.tts.models.radtts.RadTTSModel'> is experimental, not ready for production and is not fully supported. Use at your own risk.
no viable alternative at input '{mixer_tts_x_sup_data_path}'
See https://hydra.cc/docs/next/advanced/override_grammar/basic for details

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

Thanks Jocelyn

@redoctopus
Copy link
Collaborator

The folders created under mixer_tts_x_sup_data_folder are align_prior_matrix and pitch, but no lm_tokens.

Yes, this should be fine--it will compute those on the fly during training.

Regarding the error, it looks like it can't find {mixer_tts_x_sup_data_path}; did you also set the variable name to match? e.g. mixer_tts_x_sup_data_path = "mixer_tts_x_sup_data_folder"?

@gedefet
Copy link
Author

gedefet commented Aug 30, 2022

Hi @redoctopus . Same error.

Previous command:

mixer_tts_x_sup_data_path = "mixer_tts_x_sup_data_folder"
sup_data_types = ["align_prior_matrix", "pitch", "lm_tokens"]

pitch_mean, pitch_std, pitch_min, pitch_max = pre_calculate_supplementary_data(
    mixer_tts_x_sup_data_path, sup_data_types, text_tokenizer, text_normalizer, "albert", text_normalizer_call_kwargs
)

Thank you

@redoctopus
Copy link
Collaborator

I am not able to reproduce the error, this is the run command I have:

!python mixer_tts.py \
--config-name=mixer-tts-x.yaml \
sample_rate=16000 \
train_dataset=tests/data/asr/an4_train.json \
validation_datasets=tests/data/asr/an4_val.json \
sup_data_types="['align_prior_matrix', 'pitch', 'lm_tokens']" \
sup_data_path={mixer_tts_x_sup_data_path} \
...

It looks like some parsing error, might be due to the whitespace (see facebookresearch/hydra#836). Can you use the same sup_data_types line as in mine and try again?

Once it does start running, I also found that you may run into a dim mismatch error--this is due to the fact that mixer-tts-x.yaml has a different default tokenizer than was used in the notebook's MixerTTS preprocessing step: the EnglishCharsTokenizer rather than the phoneme tokenizer.

This can be resolved by replacing the cell that creates the normalizer/tokenizer with this:

# Text normalizer
text_normalizer = Normalizer(
    lang="en", 
    input_case="cased", 
    whitelist="tts_dataset_files/lj_speech.tsv"
)

text_normalizer_call_kwargs = {
    "punct_pre_process": True,
    "punct_post_process": True
}

# Text tokenizer
text_tokenizer = EnglishCharsTokenizer(
    punct=True,
    apostrophe=True,
    pad_with_space=True,
)

@gedefet
Copy link
Author

gedefet commented Aug 30, 2022

Well I've to change sup_data_path directly to "mixer_tts_x_sup_data_folder" directly. That avoids the error. Now I have another during training. It seems that cannot find the .jsons, but they are there:

!python mixer_tts.py \
--config-name=mixer-tts-x.yaml \
sample_rate=16000 \
#train_dataset=train.json \
#validation_datasets=val.json \
train_dataset=tests/data/asr/an4_train.json \
validation_datasets=tests/data/asr/an4_val.json \
sup_data_types="['align_prior_matrix', 'pitch', 'lm_tokens']" \
sup_data_path="mixer_tts_x_sup_data_folder" \
+phoneme_dict_path=tts_dataset_files/cmudict-0.7b_nv22.08 \
+heteronyms_path=tts_dataset_files/heteronyms-052722 \
whitelist_path=tts_dataset_files/lj_speech.tsv \
#exp_manager.exp_dir=$OUTPUT_CHEKPOINTS \
pitch_mean={pitch_mean} \
pitch_std={pitch_std} \
model.train_ds.dataloader_params.batch_size=6 \
model.train_ds.dataloader_params.num_workers=0 \
model.validation_ds.dataloader_params.num_workers=0 \
trainer.max_epochs=3 \
trainer.strategy=null \
trainer.check_val_every_n_epoch=1
[NeMo W 2022-08-30 17:38:44 optimizers:55] Apex was not found. Using the lamb or fused_adam optimizer will error out.
[NeMo W 2022-08-30 17:38:45 experimental:28] Module <class 'nemo.collections.tts.torch.tts_tokenizers.IPATokenizer'> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo W 2022-08-30 17:38:45 experimental:28] Module <class 'nemo.collections.tts.models.radtts.RadTTSModel'> is experimental, not ready for production and is not fully supported. Use at your own risk.
Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
[NeMo I 2022-08-30 17:38:45 exp_manager:286] Experiments will be logged at /content/nemo_experiments/MixerTTS-X/2022-08-30_17-38-45
[NeMo I 2022-08-30 17:38:45 exp_manager:660] TensorboardLogger has been set up
[NeMo W 2022-08-30 17:38:45 nemo_logging:349] /usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py:2319: LightningDeprecationWarning: `Trainer.weights_save_path` has been deprecated in v1.6 and will be removed in v1.8.
      rank_zero_deprecation("`Trainer.weights_save_path` has been deprecated in v1.6 and will be removed in v1.8.")
    
[NeMo W 2022-08-30 17:38:45 exp_manager:900] The checkpoint callback was told to monitor a validation value and trainer's max_steps was set to -1. Please ensure that max_steps will run for at least 1 epochs to ensure that checkpointing will not error out.
Error executing job with overrides: ['sample_rate=16000']
Traceback (most recent call last):
  File "mixer_tts.py", line 27, in main
    model = MixerTTSModel(cfg=cfg.model, trainer=trainer)
  File "/usr/local/lib/python3.7/dist-packages/nemo/collections/tts/models/mixer_tts.py", line 61, in __init__
    cfg = model_utils.convert_model_config_to_dict_config(cfg)
  File "/usr/local/lib/python3.7/dist-packages/nemo/utils/model_utils.py", line 395, in convert_model_config_to_dict_config
    config = OmegaConf.to_container(cfg, resolve=True)
omegaconf.errors.InterpolationToMissingValueError: MissingMandatoryValue while resolving interpolation: Missing mandatory value: train_dataset
    full_key: model.train_ds.dataset.manifest_filepath
    object_type=dict

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

BTW, does the char tokenizer makes a difference in terms of final audio?

Thanks Jocelyn!

@redoctopus
Copy link
Collaborator

No problem! In this case I think the commented out lines are throwing it off, it seems to think that the command ends at that point. If you remove those lines it should see them again.

BTW, does the char tokenizer makes a difference in terms of final audio?

I believe Mixer-TTS-X uses the char tokenizer since it uses an external LM to get token embeddings. If you switch to a phoneme tokenizer, the LM model probably won't handle it well.

@gedefet
Copy link
Author

gedefet commented Aug 30, 2022

Well It is progressing :) It seems like it is about to start training.

I get the following now (changed tokenizer for char-based and the error is the same):

[NeMo W 2022-08-30 19:27:28 optimizers:55] Apex was not found. Using the lamb or fused_adam optimizer will error out.
[NeMo W 2022-08-30 19:27:29 experimental:28] Module <class 'nemo.collections.tts.torch.tts_tokenizers.IPATokenizer'> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo W 2022-08-30 19:27:29 experimental:28] Module <class 'nemo.collections.tts.models.radtts.RadTTSModel'> is experimental, not ready for production and is not fully supported. Use at your own risk.
Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
[NeMo I 2022-08-30 19:27:30 exp_manager:286] Experiments will be logged at /content/nemo_experiments/MixerTTS-X/2022-08-30_19-27-30
[NeMo I 2022-08-30 19:27:30 exp_manager:660] TensorboardLogger has been set up
[NeMo W 2022-08-30 19:27:30 nemo_logging:349] /usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py:2319: LightningDeprecationWarning: `Trainer.weights_save_path` has been deprecated in v1.6 and will be removed in v1.8.
      rank_zero_deprecation("`Trainer.weights_save_path` has been deprecated in v1.6 and will be removed in v1.8.")
    
[NeMo W 2022-08-30 19:27:30 exp_manager:900] The checkpoint callback was told to monitor a validation value and trainer's max_steps was set to -1. Please ensure that max_steps will run for at least 1 epochs to ensure that checkpointing will not error out.
[NeMo I 2022-08-30 19:27:32 tokenize_and_classify:87] Creating ClassifyFst grammars.
Created a temporary directory at /tmp/tmpuetiw1ga
Writing /tmp/tmpuetiw1ga/_remote_module_non_scriptable.py
[NeMo I 2022-08-30 19:27:55 data:205] Loading dataset from tests/data/asr/an4_train.json.
30it [00:00, 397.86it/s]
[NeMo I 2022-08-30 19:27:55 data:242] Loaded dataset with 30 files.
[NeMo I 2022-08-30 19:27:55 data:244] Dataset contains 0.02 hours.
[NeMo I 2022-08-30 19:27:55 data:346] Pruned 0 files. Final dataset contains 30 files
[NeMo I 2022-08-30 19:27:55 data:349] Pruned 0.00 hours. Final dataset contains 0.02 hours.
[NeMo I 2022-08-30 19:27:57 data:205] Loading dataset from tests/data/asr/an4_val.json.
10it [00:00, 621.50it/s]
[NeMo I 2022-08-30 19:27:57 data:242] Loaded dataset with 10 files.
[NeMo I 2022-08-30 19:27:57 data:244] Dataset contains 0.01 hours.
[NeMo I 2022-08-30 19:27:57 data:346] Pruned 0 files. Final dataset contains 10 files
[NeMo I 2022-08-30 19:27:57 data:349] Pruned 0.00 hours. Final dataset contains 0.01 hours.
Some weights of the model checkpoint at albert-base-v2 were not used when initializing AlbertModel: ['predictions.dense.bias', 'predictions.decoder.weight', 'predictions.decoder.bias', 'predictions.LayerNorm.weight', 'predictions.LayerNorm.bias', 'predictions.dense.weight', 'predictions.bias']
- This IS expected if you are initializing AlbertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing AlbertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
[NeMo I 2022-08-30 19:28:00 features:223] PADDING: 1
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
[NeMo I 2022-08-30 19:28:02 modelPT:587] Optimizer config = AdamW (
    Parameter Group 0
        amsgrad: False
        betas: [0.9, 0.999]
        capturable: False
        eps: 1e-08
        foreach: None
        lr: 0.001
        maximize: False
        weight_decay: 1e-06
    )
[NeMo I 2022-08-30 19:28:02 lr_scheduler:914] Scheduler "<nemo.core.optim.lr_scheduler.NoamAnnealing object at 0x7fdaa8fa6d50>" 
    will be used during training (effective maximum steps = 15) - 
    Parameters : 
    (warmup_steps: 1000
    last_epoch: -1
    d_model: 1
    max_steps: 15
    )

   | Name                  | Type                              | Params
-----------------------------------------------------------------------------
0  | aligner               | AlignmentEncoder                  | 1.0 M 
1  | forward_sum_loss      | ForwardSumLoss                    | 0     
2  | bin_loss              | BinLoss                           | 0     
3  | lm_embeddings         | Embedding                         | 3.8 M 
4  | self_attention_module | SelfAttentionModule               | 1.2 M 
5  | encoder               | MixerTTSModule                    | 7.2 M 
6  | symbol_emb            | Embedding                         | 17.3 K
7  | duration_predictor    | TemporalPredictor                 | 493 K 
8  | pitch_predictor       | TemporalPredictor                 | 493 K 
9  | pitch_emb             | Conv1d                            | 1.5 K 
10 | preprocessor          | AudioToMelSpectrogramPreprocessor | 0     
11 | decoder               | MixerTTSModule                    | 10.8 M
12 | proj                  | Linear                            | 30.8 K
-----------------------------------------------------------------------------
21.2 M    Trainable params
3.8 M     Non-trainable params
25.1 M    Total params
50.111    Total estimated model params size (MB)
Sanity Checking: 0it [00:00, ?it/s][NeMo W 2022-08-30 19:28:05 nemo_logging:349] /usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/connectors/data_connector.py:245: PossibleUserWarning: The dataloader, val_dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 8 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
      category=PossibleUserWarning,
    
[NeMo W 2022-08-30 19:28:09 nemo_logging:349] /usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/connectors/data_connector.py:245: PossibleUserWarning: The dataloader, train_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 8 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
      category=PossibleUserWarning,
    
[NeMo W 2022-08-30 19:28:09 nemo_logging:349] /usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py:1937: PossibleUserWarning: The number of training batches (5) is smaller than the logging interval Trainer(log_every_n_steps=200). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
      category=PossibleUserWarning,
    
Epoch 0:   0% 0/6 [00:00<?, ?it/s] Error executing job with overrides: ['sample_rate=16000', 'train_dataset=tests/data/asr/an4_train.json', 'validation_datasets=tests/data/asr/an4_val.json', "sup_data_types=['align_prior_matrix', 'pitch', 'lm_tokens']", 'sup_data_path=mixer_tts_x_sup_data_folder', '+phoneme_dict_path=tts_dataset_files/cmudict-0.7b_nv22.08', '+heteronyms_path=tts_dataset_files/heteronyms-052722', 'whitelist_path=tts_dataset_files/lj_speech.tsv', 'pitch_mean=195.07655334472656', 'pitch_std=273.9862976074219', 'model.train_ds.dataloader_params.batch_size=6', 'model.train_ds.dataloader_params.num_workers=0', 'model.validation_ds.dataloader_params.num_workers=0', 'trainer.max_epochs=3', 'trainer.strategy=null', 'trainer.check_val_every_n_epoch=1']
Traceback (most recent call last):
  File "mixer_tts.py", line 29, in main
    trainer.fit(model)
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py", line 771, in fit
    self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py", line 723, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py", line 811, in _fit_impl
    results = self._run(model, ckpt_path=self.ckpt_path)
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py", line 1236, in _run
    results = self._run_stage()
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py", line 1323, in _run_stage
    return self._run_train()
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py", line 1353, in _run_train
    self.fit_loop.run()
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/loops/base.py", line 204, in run
    self.advance(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/loops/fit_loop.py", line 266, in advance
    self._outputs = self.epoch_loop.run(self._data_fetcher)
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/loops/base.py", line 204, in run
    self.advance(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 208, in advance
    batch_output = self.batch_loop.run(batch, batch_idx)
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/loops/base.py", line 204, in run
    self.advance(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 88, in advance
    outputs = self.optimizer_loop.run(split_batch, optimizers, batch_idx)
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/loops/base.py", line 204, in run
    self.advance(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 207, in advance
    self.optimizer_idx,
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 256, in _run_optimization
    self._optimizer_step(optimizer, opt_idx, batch_idx, closure)
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 378, in _optimizer_step
    using_lbfgs=is_lbfgs,
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py", line 1595, in _call_lightning_module_hook
    output = fn(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/core/lightning.py", line 1646, in optimizer_step
    optimizer.step(closure=optimizer_closure)
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/core/optimizer.py", line 168, in step
    step_output = self._strategy.optimizer_step(self._optimizer, self._optimizer_idx, closure, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/strategies/strategy.py", line 193, in optimizer_step
    return self.precision_plugin.optimizer_step(model, optimizer, opt_idx, closure, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/plugins/precision/native_amp.py", line 85, in optimizer_step
    closure_result = closure()
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 148, in __call__
    self._result = self.closure(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 134, in closure
    step_output = self._step_fn()
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 427, in _training_step
    training_step_output = self.trainer._call_strategy_hook("training_step", *step_kwargs.values())
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py", line 1765, in _call_strategy_hook
    output = fn(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/strategies/strategy.py", line 333, in training_step
    return self.model.training_step(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/nemo/utils/model_utils.py", line 364, in wrap_training_step
    output_dict = wrapped(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/nemo/collections/tts/models/mixer_tts.py", line 438, in training_step
    lm_tokens=lm_tokens,
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/nemo/core/classes/common.py", line 1084, in __call__
    outputs = wrapped(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/nemo/collections/tts/models/mixer_tts.py", line 296, in forward
    text, text_len, text_mask, spect, spect_len, attn_prior
  File "/usr/local/lib/python3.7/dist-packages/nemo/collections/tts/models/mixer_tts.py", line 256, in run_aligner
    spect, text_emb.permute(0, 2, 1), mask=text_mask == 0, attn_prior=attn_prior,
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/nemo/collections/tts/modules/aligner.py", line 166, in forward
    attn = self.log_softmax(attn) + torch.log(attn_prior[:, None] + 1e-8)
RuntimeError: The size of tensor a (14) must match the size of tensor b (10) at non-singleton dimension 3

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
Epoch 0:   0% 0/6 [00:01<?, ?it/s]

@redoctopus
Copy link
Collaborator

redoctopus commented Aug 30, 2022

Ah yep, this is the error that occurs when it tries to load the old supplementary values. Can you remove that folder and re-run the supplementary data calculation with the new tokenizer? I think it's probably seeing the old data and still trying to load those values.

I'd like to note that the pre_calculate_supplementary_data() step is optional, since if you give mixer_tts.py an empty supplementary data folder, it will populate it in the first epoch. So technically you could also just remove the directory's contents and then run the script, and it should work (though the first epoch will be slower than usual as it calculates and saves the supplementary data).

The pre_calculate_supplementary_data() function just does it explicitly in case you want to check over the data, get pitch statistics, run multiple experiments with the same data, etc.

@gedefet
Copy link
Author

gedefet commented Aug 30, 2022

Well now it run without errors. I had not to run the pre_calculate_supplementary_data() function in order to run. Have not tested the checkpoints, though.

I suppose I have to pass now all the parameters directly in the training command.

One thing I'm seeing is that there is still no lm_tokens folder under the mixer_tts_x_sup_data_folder folder. Don´t know why.

Will test the checkpoints and get back.

Thank you very much!!

@redoctopus
Copy link
Collaborator

Yes, that is expected--the Dataset does not save lm_tokens.

You're very welcome, good luck with training!

@XuesongYang XuesongYang added the TTS label Sep 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working TTS
Projects
None yet
Development

No branches or pull requests

3 participants