Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transcription is creating duplicate sentences #716

Open
greerviau opened this issue Feb 25, 2024 · 9 comments
Open

Transcription is creating duplicate sentences #716

greerviau opened this issue Feb 25, 2024 · 9 comments

Comments

@greerviau
Copy link

Transcribe is returning text with repeating sentences

Running with tiny.en model on cpu and int8 compute type:

[Segment(id=1, seek=240, start=0.0, end=2.4, text=' with times it. With times it. With times it. With times it. With times it. With 
times it. With times it. With times it. With times it. With times it. With times it. With times it. With times it. With times it. With times 
it. With times it. With times it. With times it. With times it. With times it. With times it. With times it. With times it. With times it. With 
times it. With times it. With times it. With times it. With times it. With times it. With times it. With times it. With times it. With times 
it. With times it. With times it. With times it. With times it. With times it. With times it. With times it. With times it. With times it. With 
times it. With times it. With times it. With times it. With times it. With times it. With times it. With times it. With times it. With times 
it. With times it. With times it. With times it', tokens=[50363, 351, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 
1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 
1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 
1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 
1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 
1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 
1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 
1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 
1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 
1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340], temperature=1.0, 
avg_logprob=-0.12661736170450846, compression_ratio=27.032258064516128, no_speech_prob=0.22830641269683838, 
words=None)]
segments, _ = self.model.transcribe(file_path, 
                                            vad_filter=True, 
                                            vad_parameters=dict(min_silence_duration_ms=500))
print(segments)

This wasnt happening before upgrading to 1.0.0

@Purfview
Copy link
Contributor

Can you share an audio sample to reproduce the issue?

@stu247
Copy link

stu247 commented Feb 27, 2024

I am also seeing this issue. I used this code:

model = WhisperModel("small.en", compute_type="int8")
segments, info = model.transcribe("turnOnKitchenSink.wav")
for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

and got this output:

[0.00s -> 29.20s]  Turn on kitchen sink. Turn on kitchen sink.

I get the same output if I don't specify compute_type. The audio sample is: turnOnKitchenSink.zip

@greerviau
Copy link
Author

I cant point to a specific audio sample it was happening to, it was happening to any that I tried. I'm not sure if it was a problem on gpu, I was using cpu for my testing, but I reverted my code to 0.10.1 until there's a fix.

@Sharrnah
Copy link

I see the same issue when i set a language. (medium and large-v3 model tested).

Happens on medium.en just as on other models if a language is set.

If i set language to autodetect, its fine. older version is fine too.

@Sharrnah
Copy link

As update: It was introduced with this commit:
0920672

When i revert this one back, its fine again.

Sharrnah added a commit to Sharrnah/faster-whisper that referenced this issue Feb 28, 2024
@trungkienbkhn
Copy link
Collaborator

@Sharrnah , hello. Can you try again with this fix ?

@Purfview
Copy link
Contributor

I am also seeing this issue.

[0.00s -> 29.20s]  Turn on kitchen sink. Turn on kitchen sink.

I get the same output if I don't specify compute_type. The audio sample is: turnOnKitchenSink.zip

Looks good with PR #705:

[00:00.000 --> 00:02.220]  Turn on kitchen sink.

@Sharrnah
Copy link

@Sharrnah , hello. Can you try again with this fix ?

Thanks. looks fine with the fix. :)

@James-Shared-Studios
Copy link

fully tested the fix, and works well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants