Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unknown error message, just FYI #120

Open
BBC-Esq opened this issue Mar 24, 2024 · 4 comments
Open

Unknown error message, just FYI #120

BBC-Esq opened this issue Mar 24, 2024 · 4 comments

Comments

@BBC-Esq
Copy link
Contributor

BBC-Esq commented Mar 24, 2024

I'm getting the following error with some slight variations, but it's basically the same.

Error processing text to audio: cannot reshape tensor of 0 elements into shape [1, 0, 12, -1] because the unspecified dimension size -1 can be any value and is ambiguous

It's occurring when WhisperSpeech tries to playback certain text like this, which are Georgia statutes:

1 O.C.G.A. § 15-11-145(g).
2 O.C.G.A. § 15-11-145(h).
3 O.C.G.A. § 15-11-181(a).
4 O.C.G.A. § 15-11-181(b).
5 O.C.G.A. § 15-11-102.

Just FYI, not sure how you'd handle strange non-engligh or other language characters like section symbols and a variety of other types of symbols...I could curate the text beforehand, but thought you'd like to know anyways incase there's some precautions you could take internally...

My program that interacts with an LLM and uses TTS also uses Bark, and Bark screws up as well, says gibberish, skips a few words, but then picks back up and is able to hobble to the end...just fyi, seems like they've done something to handle strange characters...

@BBC-Esq BBC-Esq changed the title Unknown error message Unknown error message, just FYI Mar 25, 2024
@jpc
Copy link
Contributor

jpc commented Apr 10, 2024

I am not getting the error you are seeing with these samples. They are not spoken correctly but the model finished generating successfully. Would you mind trying to find a short code snippet with the text which consistently fails for you?

I've also noticed that we do lack support for a lot of special symbols. Since they were not in the training set the model never learned anything sensible about them so they just end up as random sounds and also confuse the decoding of the subsequent text.

You could try using some regexes to strip them out. Also the speaking speed we are using in characters per second is causing issues here with the numbers since numbers cannot really be spoken as quickly as normal words.

For the samples you provided this workaround worked quite well for me:

pipe.generate_to_notebook("1 O C G A  15 11 145 g", cps=6)

It seems you don't have to strip the -. In longer text I also noticed that replacing parenthesis (with commas) improves the prosody. Like this …replacing parenthesis ,with commas, improves….

@sidharthrajaram
Copy link

sidharthrajaram commented Apr 17, 2024

I receive the same error as @BBC-Esq :

Error: cannot reshape tensor of 0 elements into shape [1, 0, 12, -1] because the unspecified dimension size -1 can be any value and is ambiguous

Inputs that triggered it:
"2."
"3."

@sidharthrajaram
Copy link

It specifically occurs after performing inference repeatedly.
Doing inference for "2." repeatedly leads to inference working a bunch of times before resulting in the error.

@sidharthrajaram
Copy link

Specific error trace on Inference Colab:
Screenshot 2024-04-17 at 3 17 17 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants