Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Illogical "Avoid computing higher temperatures on no_speech" #621

Closed
Purfview opened this issue Dec 17, 2023 · 5 comments
Closed

Bug: Illogical "Avoid computing higher temperatures on no_speech" #621

Purfview opened this issue Dec 17, 2023 · 5 comments

Comments

@Purfview
Copy link
Contributor

Purfview commented Dec 17, 2023

Anyone noticed that instead of silence when no_speech_threshold is triggered it's outputting hallucinations?

Processing segment at 00:00.000
DEBUG: Compression ratio threshold is not met with temperature 0.0 (0.846154 > 0.800000)
[00:29.500 --> 00:29.780]  Sottotitoli creati dalla comunità Amara.org
Processing segment at 00:30.000
DEBUG: Compression ratio threshold is not met with temperature 0.0 (0.846154 > 0.800000)
[00:30.000 --> 00:59.980]  Sottotitoli creati dalla comunità Amara.org

Respectively from json:

            "temperature": 0.0,
            "avg_logprob": -0.37790222728953643,
            "compression_ratio": 0.8461538461538461,
            "no_speech_prob": 0.8777729868888855,
            "temperature": 0.0,
            "avg_logprob": -0.30678143220789295,
            "compression_ratio": 0.8461538461538461,
            "no_speech_prob": 0.9770684242248535,

I've used compression_ratio_threshold=0.8.

@Purfview
Copy link
Contributor Author

Purfview commented Dec 17, 2023

False alarm, I thought that compression_ratio_threshold matters too in "silence", forgot that only logprob_threshold is considered.

EDIT:
It wasn't a "false alarm" after all... 😆

@Purfview
Copy link
Contributor Author

Purfview commented Dec 17, 2023

But wait, if it's not "silence" then why it doesn't go through all fallbacks?

Is it same in vanilla Whisper?

@Purfview Purfview reopened this Dec 17, 2023
@Purfview
Copy link
Contributor Author

Purfview commented Dec 17, 2023

Here it breaks fallbacks because it's "silence":

if (
options.no_speech_threshold is not None
and result.no_speech_prob > options.no_speech_threshold
):
needs_fallback = False # silence

But here it's not "silence" anymore:

# no voice activity check
should_skip = result.no_speech_prob > options.no_speech_threshold
if (
options.log_prob_threshold is not None
and avg_logprob > options.log_prob_threshold
):
# don't skip if the logprob is high enough, despite the no_speech_prob
should_skip = False

UPDATE:
Same logic in vanilla Whisper too.

@Purfview Purfview changed the title Weirdness with no_speech_threshold Bug: Illogical "Avoid computing higher temperatures on no_speech" Dec 21, 2023
@Purfview
Copy link
Contributor Author

Purfview commented Dec 21, 2023

Example of the bug introducing the bad hallucination loop:

Processing segment at 03:07.560
DEBUG: Compression ratio threshold is not met with temperature 0.0 (6.677966 > 2.400000)
[04:17.320 --> 04:29.020]  been doing it for a long time. I'm a professional. I'm a professional. I'm a
[04:29.020 --> 04:29.340]  professional. I'm a professional. I'm a professional. I'm a professional. I'm
[04:29.340 --> 04:34.560]  a professional. I'm a professional. I'm a professional. I'm a professional. I'm
[04:34.560 --> 04:38.360]  a professional. I'm a professional. I'm a professional. I'm a professional. I'm
[04:38.360 --> 05:03.750]  a professional. I'm a professional. I'm a professional. I'm a professional. I'm
Processing segment at 03:37.460

No hallucination loop with the bugfix:

Processing segment at 03:07.560
DEBUG: Compression ratio threshold is not met with temperature 0.0 (6.677966 > 2.400000)
DEBUG: Compression ratio threshold is not met with temperature 0.2 (8.533333 > 2.400000)
DEBUG: Compression ratio threshold is not met with temperature 0.4 (8.884615 > 2.400000)
[04:17.320 --> 04:22.640]  got me feeling natural. Finding a natural-seeming way to fail at any given task.
[04:23.700 --> 04:27.140]  In each of the commercials that I'm in, I'm the one who simply can't go on
[04:27.140 --> 04:33.340]  without the product. It's ridiculous that we don't have the product. Show them.
DEBUG: Reset prompt. prompt_reset_on_temperature threshold is met 0.600000 > 0.500000
Processing segment at 03:23.580
DEBUG: Log probability threshold is not met with temperature 0.0 (-1.344815 < -1.000000)
DEBUG: Log probability threshold is not met with temperature 0.2 (-1.150256 < -1.000000)
[04:33.340 --> 04:35.340]  No, you shouldn't.
[04:36.020 --> 04:36.300]  Please.
[04:36.560 --> 04:37.520]  You wanna see?
[04:38.020 --> 04:39.080]  Yeah, I wanna see.
[04:43.260 --> 04:44.120]  She's amazing.
[05:03.870 --> 05:05.110]  I just...
[05:05.110 --> 05:05.650]  I...
Processing segment at 03:39.560

@Purfview
Copy link
Contributor Author

Bugfix is merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant