xttsv2 model sometimes（almost 10%）produce extra noise.[Bug] #3598

seetimee · 2024-02-21T01:43:43Z

Describe the bug

For example,If it generates a 30-second audio, for the first 15 seconds it generates normal audio, and for the last 15 seconds it generates noise

To Reproduce

tts_to_file(text=text,
language=lang_code,
speaker_wav=speaker_wav,
speed=speed,
file_path=temp_file
,split_sentences=split_sentences)

Expected behavior

No response

Logs

No response

Environment

{
    "CUDA": {
        "GPU": [
            "NVIDIA GeForce RTX 3080"
        ],
        "available": true,
        "version": "11.8"
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "2.1.2+cu118",
        "TTS": "0.22.0",
        "numpy": "1.23.0"
    },
    "System": {
        "OS": "Linux",
        "architecture": [
            "64bit",
            "ELF"
        ],
        "processor": "x86_64",
        "python": "3.10.8",
        "version": "#148-Ubuntu SMP Mon Oct 17 16:02:06 UTC 2022"
    }
}

Additional context

No response

TangHaitao1994 · 2024-02-24T01:22:29Z

training data default length is about 15 seconds

seetimee · 2024-02-26T09:09:29Z

Is my training reference audio too long?（lower than 5 minutes will be better）

kaveenkumar · 2024-02-29T15:40:11Z

Most often occurs in FR language than others.

You synthesize 10 samples of a "text" for a given "voice", 2-3 out of those 10 samples contain the "text" + "some other blabbering"

Sharrnah · 2024-03-04T23:48:05Z

same with german.
English seems fine, so my guess is the training for some languages resulted in these issues.

For german, i think i sometimes could hear sentences like Punkt. Neues Kapitel or similar.
So maybe the training data included some verbose audio the readers added and the AI "learned" this for these languages.

(just some guessing here)

stale · 2024-04-22T05:35:26Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.

seetimee · 2024-04-24T09:39:13Z

keep watching

Sharrnah · 2024-05-21T15:16:46Z

i think something that confirms my guess.
In Italian, if you add a . (dot) at the end of a sentence, it very often speaks that dot "punkto".

it should not pronounce punctuation marks in my opinion.

CRochaVox · 2024-06-05T15:44:32Z

I also have a problem with noise after punctuations and in long sentences it is common for him to say the 'ponto' for the period in Portuguese

Does anyone know if it is possible to reduce this by finetunning the model?

stale · 2024-07-07T18:20:19Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.

seetimee added the bug Something isn't working label Feb 21, 2024

stale bot added the wontfix This will not be worked on but feel free to help. label Apr 22, 2024

stale bot removed the wontfix This will not be worked on but feel free to help. label Apr 24, 2024

Sharrnah mentioned this issue May 29, 2024

Text-to-Speech in Italian? Sharrnah/whispering-ui#22

Closed

stale bot added the wontfix This will not be worked on but feel free to help. label Jul 7, 2024

stale bot closed this as completed Jul 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xttsv2 model sometimes（almost 10%）produce extra noise.[Bug] #3598

xttsv2 model sometimes（almost 10%）produce extra noise.[Bug] #3598

seetimee commented Feb 21, 2024

TangHaitao1994 commented Feb 24, 2024

seetimee commented Feb 26, 2024

kaveenkumar commented Feb 29, 2024

Sharrnah commented Mar 4, 2024

stale bot commented Apr 22, 2024

seetimee commented Apr 24, 2024

Sharrnah commented May 21, 2024

CRochaVox commented Jun 5, 2024

stale bot commented Jul 7, 2024

xttsv2 model sometimes（almost 10%）produce extra noise.[Bug] #3598

xttsv2 model sometimes（almost 10%）produce extra noise.[Bug] #3598

Comments

seetimee commented Feb 21, 2024

Describe the bug

To Reproduce

Expected behavior

Logs

Environment

Additional context

TangHaitao1994 commented Feb 24, 2024

seetimee commented Feb 26, 2024

kaveenkumar commented Feb 29, 2024

Sharrnah commented Mar 4, 2024

stale bot commented Apr 22, 2024

seetimee commented Apr 24, 2024

Sharrnah commented May 21, 2024

CRochaVox commented Jun 5, 2024

stale bot commented Jul 7, 2024