[Bug] RuntimeError: shape '[64, 31, -1]' is invalid for input of size 8064 #3798

lyjgo · 2024-06-24T02:39:02Z

Describe the bug

I met an error when I run the train_tacotron_ddc.py in TTS/recipes/ljspeech/tacotron2-DDC with the default config. The error and the config are as follows:
ERROR

CONFIG
audio_config = BaseAudioConfig(
sample_rate=22050,
do_trim_silence=True,
trim_db=60.0,
signal_norm=False,
mel_fmin=0.0,
mel_fmax=8000,
spec_gain=1.0,
log_func="np.log",
ref_level_db=20,
preemphasis=0.0,
)

config = Tacotron2Config( # This is the config that is saved for the future use
audio=audio_config,
batch_size=64,
eval_batch_size=16,
num_loader_workers=4,
num_eval_loader_workers=4,
run_eval=True,
test_delay_epochs=-1,
r=6,
gradual_training=[[0, 6, 64], [10000, 4, 32], [50000, 3, 32], [100000, 2, 32]],
double_decoder_consistency=True,
epochs=1000,
text_cleaner="phoneme_cleaners",
use_phonemes=True,
phoneme_language="en-us",
phoneme_cache_path=os.path.join(output_path, "phoneme_cache"),
precompute_num_workers=8,
print_step=25,
print_eval=True,
mixed_precision=False,
output_path=output_path,
datasets=[dataset_config],
)

Is there anything I can do to solve this problem? Thanks

To Reproduce

Setup Python Environment locall
clone repo, and install locally via pip install -e .
download LJSpeech dataset
run TTS/recipes/ljspeech/tacotron2-DDC/train_tacotron_ddc.py

Expected behavior

No response

Logs

No response

Environment

"Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "2.2.0+cu118",
        "TTS": "0.22.0",
        "numpy": "1.22.0"
    }

Additional context

No response

lyjgo · 2024-06-24T02:42:15Z

full ERROR information:

stale · 2024-08-02T02:22:48Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.

martinclauss · 2024-08-27T07:28:27Z

Hey! :)

Same problem here:

warning: audio amplitude out of range, auto clipped.
 | > Synthesizing test sentences.
   > Decoder stopped with `max_decoder_steps` 10000
   > Decoder stopped with `max_decoder_steps` 10000
   > Decoder stopped with `max_decoder_steps` 10000
   > Decoder stopped with `max_decoder_steps` 10000
   > Decoder stopped with `max_decoder_steps` 10000
warning: audio amplitude out of range, auto clipped.
warning: audio amplitude out of range, auto clipped.
warning: audio amplitude out of range, auto clipped.
warning: audio amplitude out of range, auto clipped.
warning: audio amplitude out of range, auto clipped.

  --> EVAL PERFORMANCE
     | > avg_loader_time: 0.05446180701255798 (+0.0013189911842346191)
     | > avg_decoder_loss: 3.2692344784736633 (-0.004494816064834595)
     | > avg_postnet_loss: 4.729433000087738 (-0.21044373512268066)
     | > avg_stopnet_loss: 0.015973580186255276 (-3.169919364154339e-05)
     | > avg_decoder_coarse_loss: 3.1101098358631134 (-0.00574985146522522)
     | > avg_decoder_ddc_loss: 0.012658739928156137 (+0.000506733893416822)
     | > avg_ga_loss: 0.0026675525004975498 (+3.0617229640483856e-07)
     | > avg_decoder_diff_spec_loss: 0.44165946543216705 (-0.0007720962166786194)
     | > avg_postnet_diff_spec_loss: 0.43278444930911064 (-0.0015974976122379303)
     | > avg_decoder_ssim_loss: 0.7173683270812035 (-0.001061074435710907)
     | > avg_postnet_ssim_loss: 0.703313797712326 (-0.007824204862117767)
     | > avg_loss: 3.383451968431473 (-0.05788925290107727)
     | > avg_align_error: 0.9833826286485419 (-0.00012143643107265234)

 > BEST MODEL : /home/user/user_voice/TTS/recipes/ljspeech/tacotron2-DDC/run-August-26-2024_06+22PM-dbf1a08a/best_model_10150.pth

 > Number of output frames: 4

 > EPOCH: 50/1000
 --> /home/user/user_voice/TTS/recipes/ljspeech/tacotron2-DDC/run-August-26-2024_06+22PM-dbf1a08a

 > TRAINING (2024-08-27 06:44:20)
 ! Run is kept in /home/user/user_voice/TTS/recipes/ljspeech/tacotron2-DDC/run-August-26-2024_06+22PM-dbf1a08a
Traceback (most recent call last):
  File "/home/user/user_voice/tts_venv/lib64/python3.11/site-packages/trainer/trainer.py", line 1833, in fit
    self._fit()
  File "/home/user/user_voice/tts_venv/lib64/python3.11/site-packages/trainer/trainer.py", line 1785, in _fit
    self.train_epoch()
  File "/home/user/user_voice/tts_venv/lib64/python3.11/site-packages/trainer/trainer.py", line 1504, in train_epoch
    outputs, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/user_voice/tts_venv/lib64/python3.11/site-packages/trainer/trainer.py", line 1327, in train_step
    batch = self.format_batch(batch)
            ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/user_voice/tts_venv/lib64/python3.11/site-packages/trainer/trainer.py", line 1058, in format_batch
    batch = self.model.format_batch(batch)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/user_voice/tts_venv/lib64/python3.11/site-packages/TTS/tts/models/base_tts.py", line 215, in format_batch
    stop_targets = stop_targets.view(text_input.shape[0], stop_targets.size(1) // self.config.r, -1)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: shape '[64, 31, -1]' is invalid for input of size 8064

I used this code to train the model: https://github.com/coqui-ai/TTS/blob/dev/recipes/ljspeech/tacotron2-DDC/train_tacotron_ddc.py

Downloaded the data with: https://github.com/coqui-ai/TTS/blob/dev/recipes/ljspeech/download_ljspeech.sh

Using the current dev version: commit dbf1a08a0d4e47fdad6172e433eeb34bc6b13b4e (HEAD -> dev, origin/dev, origin/HEAD)

What could the problem be?

Thanks!

stale · 2024-11-10T12:59:09Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.

lyjgo added the bug Something isn't working label Jun 24, 2024

stale bot added the wontfix This will not be worked on but feel free to help. label Aug 2, 2024

stale bot removed the wontfix This will not be worked on but feel free to help. label Aug 27, 2024

stale bot added the wontfix This will not be worked on but feel free to help. label Nov 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] RuntimeError: shape '[64, 31, -1]' is invalid for input of size 8064 #3798

[Bug] RuntimeError: shape '[64, 31, -1]' is invalid for input of size 8064 #3798

lyjgo commented Jun 24, 2024 •

edited

Loading

lyjgo commented Jun 24, 2024

stale bot commented Aug 2, 2024

martinclauss commented Aug 27, 2024

stale bot commented Nov 10, 2024

[Bug] RuntimeError: shape '[64, 31, -1]' is invalid for input of size 8064 #3798

[Bug] RuntimeError: shape '[64, 31, -1]' is invalid for input of size 8064 #3798

Comments

lyjgo commented Jun 24, 2024 • edited Loading

Describe the bug

To Reproduce

Expected behavior

Logs

Environment

Additional context

lyjgo commented Jun 24, 2024

stale bot commented Aug 2, 2024

martinclauss commented Aug 27, 2024

stale bot commented Nov 10, 2024

lyjgo commented Jun 24, 2024 •

edited

Loading