Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] RuntimeError: shape '[64, 31, -1]' is invalid for input of size 8064 #3798

Open
lyjgo opened this issue Jun 24, 2024 · 4 comments
Open
Labels
bug Something isn't working wontfix This will not be worked on but feel free to help.

Comments

@lyjgo
Copy link

lyjgo commented Jun 24, 2024

Describe the bug

I met an error when I run the train_tacotron_ddc.py in TTS/recipes/ljspeech/tacotron2-DDC with the default config. The error and the config are as follows:
ERROR
6208dece6097660c8fa1dc0b47c2daa

CONFIG
audio_config = BaseAudioConfig(
sample_rate=22050,
do_trim_silence=True,
trim_db=60.0,
signal_norm=False,
mel_fmin=0.0,
mel_fmax=8000,
spec_gain=1.0,
log_func="np.log",
ref_level_db=20,
preemphasis=0.0,
)

config = Tacotron2Config( # This is the config that is saved for the future use
audio=audio_config,
batch_size=64,
eval_batch_size=16,
num_loader_workers=4,
num_eval_loader_workers=4,
run_eval=True,
test_delay_epochs=-1,
r=6,
gradual_training=[[0, 6, 64], [10000, 4, 32], [50000, 3, 32], [100000, 2, 32]],
double_decoder_consistency=True,
epochs=1000,
text_cleaner="phoneme_cleaners",
use_phonemes=True,
phoneme_language="en-us",
phoneme_cache_path=os.path.join(output_path, "phoneme_cache"),
precompute_num_workers=8,
print_step=25,
print_eval=True,
mixed_precision=False,
output_path=output_path,
datasets=[dataset_config],
)

Is there anything I can do to solve this problem? Thanks

To Reproduce

  1. Setup Python Environment locall
  2. clone repo, and install locally via pip install -e .
  3. download LJSpeech dataset
  4. run TTS/recipes/ljspeech/tacotron2-DDC/train_tacotron_ddc.py

Expected behavior

No response

Logs

No response

Environment

"Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "2.2.0+cu118",
        "TTS": "0.22.0",
        "numpy": "1.22.0"
    }

Additional context

No response

@lyjgo lyjgo added the bug Something isn't working label Jun 24, 2024
@lyjgo
Copy link
Author

lyjgo commented Jun 24, 2024

full ERROR information:
fa6ca25a66ed35bd073cbea311a4046

Copy link

stale bot commented Aug 2, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.

@stale stale bot added the wontfix This will not be worked on but feel free to help. label Aug 2, 2024
@martinclauss
Copy link

Hey! :)

Same problem here:

warning: audio amplitude out of range, auto clipped.
 | > Synthesizing test sentences.
   > Decoder stopped with `max_decoder_steps` 10000
   > Decoder stopped with `max_decoder_steps` 10000
   > Decoder stopped with `max_decoder_steps` 10000
   > Decoder stopped with `max_decoder_steps` 10000
   > Decoder stopped with `max_decoder_steps` 10000
warning: audio amplitude out of range, auto clipped.
warning: audio amplitude out of range, auto clipped.
warning: audio amplitude out of range, auto clipped.
warning: audio amplitude out of range, auto clipped.
warning: audio amplitude out of range, auto clipped.

  --> EVAL PERFORMANCE
     | > avg_loader_time: 0.05446180701255798 (+0.0013189911842346191)
     | > avg_decoder_loss: 3.2692344784736633 (-0.004494816064834595)
     | > avg_postnet_loss: 4.729433000087738 (-0.21044373512268066)
     | > avg_stopnet_loss: 0.015973580186255276 (-3.169919364154339e-05)
     | > avg_decoder_coarse_loss: 3.1101098358631134 (-0.00574985146522522)
     | > avg_decoder_ddc_loss: 0.012658739928156137 (+0.000506733893416822)
     | > avg_ga_loss: 0.0026675525004975498 (+3.0617229640483856e-07)
     | > avg_decoder_diff_spec_loss: 0.44165946543216705 (-0.0007720962166786194)
     | > avg_postnet_diff_spec_loss: 0.43278444930911064 (-0.0015974976122379303)
     | > avg_decoder_ssim_loss: 0.7173683270812035 (-0.001061074435710907)
     | > avg_postnet_ssim_loss: 0.703313797712326 (-0.007824204862117767)
     | > avg_loss: 3.383451968431473 (-0.05788925290107727)
     | > avg_align_error: 0.9833826286485419 (-0.00012143643107265234)

 > BEST MODEL : /home/user/user_voice/TTS/recipes/ljspeech/tacotron2-DDC/run-August-26-2024_06+22PM-dbf1a08a/best_model_10150.pth

 > Number of output frames: 4

 > EPOCH: 50/1000
 --> /home/user/user_voice/TTS/recipes/ljspeech/tacotron2-DDC/run-August-26-2024_06+22PM-dbf1a08a

 > TRAINING (2024-08-27 06:44:20)
 ! Run is kept in /home/user/user_voice/TTS/recipes/ljspeech/tacotron2-DDC/run-August-26-2024_06+22PM-dbf1a08a
Traceback (most recent call last):
  File "/home/user/user_voice/tts_venv/lib64/python3.11/site-packages/trainer/trainer.py", line 1833, in fit
    self._fit()
  File "/home/user/user_voice/tts_venv/lib64/python3.11/site-packages/trainer/trainer.py", line 1785, in _fit
    self.train_epoch()
  File "/home/user/user_voice/tts_venv/lib64/python3.11/site-packages/trainer/trainer.py", line 1504, in train_epoch
    outputs, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/user_voice/tts_venv/lib64/python3.11/site-packages/trainer/trainer.py", line 1327, in train_step
    batch = self.format_batch(batch)
            ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/user_voice/tts_venv/lib64/python3.11/site-packages/trainer/trainer.py", line 1058, in format_batch
    batch = self.model.format_batch(batch)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/user_voice/tts_venv/lib64/python3.11/site-packages/TTS/tts/models/base_tts.py", line 215, in format_batch
    stop_targets = stop_targets.view(text_input.shape[0], stop_targets.size(1) // self.config.r, -1)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: shape '[64, 31, -1]' is invalid for input of size 8064

I used this code to train the model: https://github.com/coqui-ai/TTS/blob/dev/recipes/ljspeech/tacotron2-DDC/train_tacotron_ddc.py

Downloaded the data with: https://github.com/coqui-ai/TTS/blob/dev/recipes/ljspeech/download_ljspeech.sh

Using the current dev version: commit dbf1a08a0d4e47fdad6172e433eeb34bc6b13b4e (HEAD -> dev, origin/dev, origin/HEAD)

What could the problem be?

Thanks!

@stale stale bot removed the wontfix This will not be worked on but feel free to help. label Aug 27, 2024
Copy link

stale bot commented Nov 10, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.

@stale stale bot added the wontfix This will not be worked on but feel free to help. label Nov 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working wontfix This will not be worked on but feel free to help.
Projects
None yet
Development

No branches or pull requests

2 participants