Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] If sentence too long, some part will be missing during audio file generation #1680

Closed
hengway opened this issue Jun 22, 2022 · 4 comments
Labels
bug Something isn't working

Comments

@hengway
Copy link

hengway commented Jun 22, 2022

Describe the bug

If a sentence too long (separate by comma) some part of it will missing during the audio generation

Example:
On April 1, 1942, Desmond Doss joined the United States Army. Little did he realize that three and a half years later, he would be standing on the White House lawn, receiving the nations highest award for his bravery and courage under fire. Of the 16 million men in uniform during World War 2, only 431 received the Congressional Medal of Honor.

The missing part will be: he would be standing on the White House lawn, receiving the nations highest award for his bravery and courage under fire.

To work around, shorten the sentence by replace comma with full stop:
On April 1, 1942, Desmond Doss joined the United States Army. Little did he realize that three and a half years later. He would be standing on the White House lawn, receiving the nations highest award for his bravery and courage under fire. Of the 16 million men in uniform during World War 2, only 431 received the Congressional Medal of Honor.

To Reproduce

Run below command
tts --text "On April 1, 1942, Desmond Doss joined the United States Army. Little did he realize that three and a half years later, he would be standing on the White House lawn, receiving the nations highest award for his bravery and courage under fire. Of the 16 million men in uniform during World War 2, only 431 received the Congressional Medal of Honor. " --model_name "tts_models/en/ljspeech/tacotron2-DDC_ph" --out_path /var/data/The-unlikely-hero5.wav

Expected behavior

Able to generate whole audio file

Logs

ubuntu@ubuntu:~$ tts --text "On April 1, 1942, Desmond Doss joined the United States Army. Little did he realize that three and a half years later, he would be standing on the White House lawn, receiving the nations highest award for his bravery and courage under fire. Of the 16 million men in uniform during World War 2, only 431 received the Congressional Medal of Honor. " --model_name "tts_models/en/ljspeech/tacotron2-DDC_ph" --out_path /opt/tts_output/The-unlikely-hero5.wav
 > tts_models/en/ljspeech/tacotron2-DDC_ph is already downloaded.
 > vocoder_models/en/ljspeech/univnet is already downloaded.
 > Using model: Tacotron2
 > Setting up Audio Processor...
 | > sample_rate:22050
 | > resample:False
 | > num_mels:80
 | > log_func:np.log10
 | > min_level_db:-100
 | > frame_shift_ms:None
 | > frame_length_ms:None
 | > ref_level_db:20
 | > fft_size:1024
 | > power:1.5
 | > preemphasis:0.0
 | > griffin_lim_iters:60
 | > signal_norm:True
 | > symmetric_norm:True
 | > mel_fmin:50.0
 | > mel_fmax:7600.0
 | > pitch_fmin:0.0
 | > pitch_fmax:640.0
 | > spec_gain:1.0
 | > stft_pad_mode:reflect
 | > max_norm:4.0
 | > clip_norm:True
 | > do_trim_silence:True
 | > trim_db:60
 | > do_sound_norm:False
 | > do_amp_to_db_linear:True
 | > do_amp_to_db_mel:True
 | > do_rms_norm:False
 | > db_level:None
 | > stats_path:/home/xstts/.local/share/tts/tts_models--en--ljspeech--tacotron2-DDC_ph/scale_stats.npy
 | > base:10
 | > hop_length:256
 | > win_length:1024
 > Model's reduction rate `r` is set to: 2
 > Vocoder Model: univnet
 > Setting up Audio Processor...
 | > sample_rate:22050
 | > resample:False
 | > num_mels:80
 | > log_func:np.log10
 | > min_level_db:-100
 | > frame_shift_ms:None
 | > frame_length_ms:None
 | > ref_level_db:20
 | > fft_size:1024
 | > power:1.5
 | > preemphasis:0.0
 | > griffin_lim_iters:60
 | > signal_norm:True
 | > symmetric_norm:True
 | > mel_fmin:50.0
 | > mel_fmax:7600.0
 | > pitch_fmin:0.0
 | > pitch_fmax:640.0
 | > spec_gain:1.0
 | > stft_pad_mode:reflect
 | > max_norm:4.0
 | > clip_norm:True
 | > do_trim_silence:True
 | > trim_db:60
 | > do_sound_norm:False
 | > do_amp_to_db_linear:True
 | > do_amp_to_db_mel:True
 | > do_rms_norm:False
 | > db_level:None
 | > stats_path:/home/xstts/.local/share/tts/vocoder_models--en--ljspeech--univnet/scale_stats.npy
 | > base:10
 | > hop_length:256
 | > win_length:1024
 > Generator Model: univnet_generator
 > Discriminator Model: univnet_discriminator
 > Text: On April 1, 1942, Desmond Doss joined the United States Army. Little did he realize that three and a half years later, he would be standing on the White House lawn, receiving the nations highest award for his bravery and courage under fire. Of the 16 million men in uniform during World War 2, only 431 received the Congressional Medal of Honor.
 > Text splitted to sentences.
['On April 1, 1942, Desmond Doss joined the United States Army.', 'Little did he realize that three and a half years later, he would be standing on the White House lawn, receiving the nations highest award for his bravery and courage under fire.', 'Of the 16 million men in uniform during World War 2, only 431 received the Congressional Medal of Honor.']
ɔn eɪpɹəl wʌn, naɪntin fɔɹti tu, dɛzmənd dɔs d͡ʒɔɪnd ðə junaɪtɪd steɪts ɑɹmi.
 [!] Character '͡' not found in the vocabulary. Discarding it.
[W NNPACK.cpp:51] Could not initialize NNPACK! Reason: Unsupported hardware.
 > Processing time: 18.15455675125122
 > Real-time factor: 0.9681247735486627
 > Saving output to /opt/tts_output/The-unlikely-hero5.wav

Environment

Package                Version              Location
---------------------- -------------------- --------
anyascii               0.3.1
appdirs                1.4.4
astroid                2.7.3
attrs                  19.3.0
audioread              2.1.9
Automat                0.8.0
Babel                  2.10.3
backports.zoneinfo     0.2.1
black                  22.3.0
blinker                1.4
bokeh                  1.4.0
certifi                2019.11.28
cffi                   1.15.0
chardet                3.0.4
click                  8.1.3
cloud-init             22.2
colorama               0.4.3
command-not-found      0.3
configobj              5.0.6
constantly             15.1.0
coqpit                 0.0.16
coverage               6.4.1
cryptography           2.8
cycler                 0.11.0
Cython                 0.29.28
dateparser             1.1.1
dbus-python            1.2.16
decorator              5.1.1
distro                 1.4.0
distro-info            0.23ubuntu1
docopt                 0.6.2
entrypoints            0.3
Flask                  2.1.2
fonttools              4.33.3
fsspec                 2022.5.0
gruut                  2.2.3
gruut-ipa              0.13.0
gruut-lang-cs          2.0.0
gruut-lang-de          2.0.0
gruut-lang-en          2.0.0
gruut-lang-es          2.0.0
gruut-lang-fr          2.0.2
gruut-lang-it          2.0.0
gruut-lang-nl          2.0.2
gruut-lang-pt          2.0.0
gruut-lang-ru          2.0.0
gruut-lang-sv          2.0.0
httplib2               0.14.0
hyperlink              19.0.0
idna                   2.8
importlib-metadata     4.11.4
importlib-resources    5.8.0
incremental            16.10.1
inflect                5.6.0
isort                  5.10.1
itsdangerous           2.1.2
jieba                  0.42.1
Jinja2                 3.1.2
joblib                 1.1.0
jsonlines              1.2.0
jsonpatch              1.22
jsonpointer            2.0
jsonschema             3.2.0
keyring                18.0.1
kiwisolver             1.4.3
language-selector      0.1
launchpadlib           1.10.13
lazr.restfulclient     0.14.2
lazr.uri               1.0.3
lazy-object-proxy      1.7.1
librosa                0.8.0
llvmlite               0.38.1
MarkupSafe             2.1.1
matplotlib             3.5.2
mccabe                 0.6.1
mecab-python3          1.0.5
more-itertools         4.2.0
mypy-extensions        0.4.3
netifaces              0.10.4
networkx               2.8.4
nose2                  0.11.0
num2words              0.5.10
numba                  0.55.1
numpy                  1.21.6
oauthlib               3.1.0
packaging              21.3
pandas                 1.4.2
pathspec               0.9.0
pexpect                4.6.0
Pillow                 9.1.1
pip                    20.0.2
platformdirs           2.5.2
pooch                  1.6.0
protobuf               3.19.4
pyasn1                 0.4.2
pyasn1-modules         0.2.1
pycparser              2.21
PyGObject              3.36.0
PyHamcrest             1.9.0
PyJWT                  1.7.1
pylint                 2.10.2
pymacaroons            0.13.0
PyNaCl                 1.3.0
pynndescent            0.5.7
pyOpenSSL              19.0.0
pyparsing              3.0.9
pypinyin               0.46.0
pyrsistent             0.15.5
pysbd                  0.3.4
pyserial               3.4
python-apt             2.0.0+ubuntu0.20.4.7
python-crfsuite        0.9.8
python-dateutil        2.8.2
python-debian          0.1.36ubuntu1
pytz                   2022.1
pytz-deprecation-shim  0.1.0.post0
pyworld                0.2.10
PyYAML                 5.3.1
regex                  2022.3.2
requests               2.22.0
requests-unixsocket    0.2.0
resampy                0.2.2
scikit-learn           1.1.1
scipy                  1.8.1
SecretStorage          2.3.1
service-identity       18.1.0
setuptools             45.2.0
simplejson             3.16.0
six                    1.14.0
sos                    4.3
SoundFile              0.10.3.post1
ssh-import-id          5.10
systemd-python         234
tensorboardX           2.5.1
threadpoolctl          3.1.0
toml                   0.10.2
tomli                  2.0.1
torch                  1.11.0
torchaudio             0.11.0
tornado                6.1
tqdm                   4.64.0
trainer                0.0.12
TTS                    0.7.0                /opt/TTS
Twisted                18.9.0
typing-extensions      4.2.0
tzdata                 2022.1
tzlocal                4.2
ubuntu-advantage-tools 27.8
ufw                    0.36
umap-learn             0.5.1
unattended-upgrades    0.1
unidic-lite            1.0.8
urllib3                1.25.8
wadllib                1.3.3
Werkzeug               2.1.2
wheel                  0.34.2
wrapt                  1.12.1
zipp                   3.8.0
zope.interface         4.7.1

Additional context

No response

@hengway hengway added the bug Something isn't working label Jun 22, 2022
@erogol
Copy link
Member

erogol commented Jul 5, 2022

For Tacotron models there is a cap of 250 chars not to crash your memory. You need to set it manually if you wanna change it.

@erogol erogol closed this as completed Jul 5, 2022
@genglinxiao
Copy link

I'm also looking for methods to generate long sentences. What I've found is, the limit is actually in the tokenizer, and is hard coded:

class VoiceBpeTokenizer: def __init__(self, vocab_file=None): self.tokenizer = None if vocab_file is not None: self.tokenizer = Tokenizer.from_file(vocab_file) self.char_limits = { "en": 250, "de": 253, "fr": 273, "es": 239, "it": 213, "pt": 203, "pl": 224, "zh-cn": 82, "ar": 166, "cs": 186, "ru": 182, "nl": 251, "tr": 226, "ja": 71, "hu": 224, "ko": 95, }

So you can simply modify the limit. However, I'm not sure about the downstream effect.

@FurkanGozukara
Copy link

For Tacotron models there is a cap of 250 chars not to crash your memory. You need to set it manually if you wanna change it.

what is limit for TTS V2? I saw in code 400 tokens

@m000lie
Copy link

m000lie commented Mar 6, 2024

For Tacotron models there is a cap of 250 chars not to crash your memory. You need to set it manually if you wanna change it.

how much memory is it expected to use per char? i have access to 1x H100 SCM 80GB. surely memory shouldn't be a problem right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants