-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
Hi @guillaumekln ,
I've mentioned this bug here for the first time: #71 (comment)
A short summary:
Whisper (faster-whisper) is running on my Windows x86 CPU inside my SEPIA STT Server test environment in a "streaming" mode where I basically feed chunks of audio (numpy float32 arrays) to WhisperModel.transcribe every time the VAD system detected a reasonable speech sequence.
When I set temperature=0 everything is stable, but when I leave temperature at its default value I get a segmentation fault error pretty reliable by making for example a coughing sound. For normal speech input it works fine.
The same error does not appear on my Linux Aarch64 system.
Unfortunately any attempt to reproduce the error with pre-recorded audio did not work so far.
Some further investigation shows that the error happens inside generate_with_fallback while iterating the temperature values and using self.model.generate (ctranslate2.models.Whisper).
The exact sequence is:
final_temperature = 0.0tokens = [50364, 4064, 0, 50414](example)- `text = "Ha!" (example)
final_temperature = 0.2segmentation fault
If it crashes (with the right "cough" sound), it always crashes after final_temperature = 0.2.
The parameters at this step are:
encoder_output = 0.0445115 0.0431825 -0.0302493 ... 0.264346 -0.422786 -0.118524
[cpu:0 float32 storage viewed as 1x1500x384]
[prompt] = [[50258, 50259, 50359]]
length_penalty = 1
max_length = 448
return_scores = True
return_no_speech_prob = True
suppress_blank = True
suppress_tokens = [-1]
max_initial_timestamp_index = 50
This is as far as I could follow the error. The rest is happening in CTranslate2 C code I think.
Hope that helps to understand the issue 🤔 .
Cu,
Florian