-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VAD is relatively slow #364
Comments
Lowering the |
The VAD model is also run on a single CPU core: Can you try changing these values and see how they impact the performance? |
u can make vad run on gpu
in opts = onnxruntime.SessionOptions()
opts.log_severity_level = 4
opts.graph_optimization_level = onnxruntime.GraphOptimizationLevel.ORT_ENABLE_BASIC
# https://github.com/microsoft/onnxruntime/issues/11548#issuecomment-1158314424
self.session = onnxruntime.InferenceSession(
path,
providers=["CUDAExecutionProvider"],
sess_options=opts,
) |
I get faster speed with higher value, is lower faster for you?
Not sure about precision too. 1024 included insignificantly more of non-voice areas vs 1536, but 1536 excluded one voice line in music/song area.
No impact for me.
Could you benchmark VAD. CPU vs GPU? |
u have any benchmark code & data ? |
No. |
After seeing your results, I tested it too, and it took longer for lower values of
I'm not sure about the precision, I'll check it later. benchmark code:
|
Did tests on various samples to see "1536" effects on transcriptions. I made it default in r139.2. |
does your application use demucs now ? how to use demucs to preprocess audio ? |
No. And I won't include it as it's using PyTorch, that's gigabytes of additional files...
Read and ask there: https://github.com/facebookresearch/demucs |
I just checked demucs, it can run on cpu , you can make it default run on cpu |
Still, cpu only torch would increase current 70Mb .exe ~6 times... Currently I'm not interested in bundling it in. |
A couple of personal experience related comments here:
|
I think it's not very useful to measure the % of time used by the VAD. You should instead compare the total execution time with and without VAD. The VAD can remove non-speech sections which would trigger the slow temperature fallback in Whisper. In this case, the total execution time is reduced even though the VAD took X% of this time. |
Hi all, Thanks @guillaumekln! |
But it's already lightweight and superfast.
People reported that there is no significant performance increase when running it on GPU. |
Hi @Purfview, import time
from faster_whisper import WhisperModel
files_list = [
"/home/ec2-user/datasets/vad_debug/no_speech_1.wav",
"/home/ec2-user/datasets/vad_debug/no_speech_2.wav",
"/home/ec2-user/datasets/vad_debug/no_speech_3.wav",
"/home/ec2-user/datasets/vad_debug/no_speech_4.wav",
]
model_size = "large-v2"
model = WhisperModel(model_size, device="cuda", compute_type="float16")
for f in files_list:
t_i = time.time()
segments, _ = model.transcribe(f, beam_size=5, language="fr")
t_i = time.time() - t_i
time.sleep(20)
t_j = time.time()
segments_vad, _ = model.transcribe(
f,
beam_size=5,
vad_filter=True,
vad_parameters=dict(min_silence_duration_ms=2000),
language="fr",
)
t_j = time.time() - t_j
print(t_j / t_i) These are the prints of the above script:
when reducing
Note that the first 3 files are ~1 Sec long and the 4th is ~38 Sec long. Any suggestions on how to make it faster for long files? |
Obviously, why anyone would expect it to be negligible? |
@Purfview let me clarify.
Given the two points above how can we make it run faster? and if there is such a difference in the parameters count why does it add such overhead to the runtime? |
From the benchmarks posted in this thread you can see that VAD runs 134 audio seconds/s, and that's on the ancient CPU. You can use
But you don't measure the whole runtime in your code example.
You don't measure there |
we want to measure the performance in percentage, therefore
what do you mean? can you please suggest how to measure it correctly? |
Now it shows something like a car's speed in percentage relative to a speed of coolant's flow. ;)
There you was told how to do it -> #271 |
I forgot about that ;). |
I didn't noticed any impact when adjusting options related to threads. |
The default VAD takes 55s for a 2 hour audio file with speech on my system before the actual transcription begins. |
Hello guys,
I am using VAD of faster whisper using following commands. I found that on TedLium benchmark transcribing VAD takes 8% of time and 92% takes transcribing. I would prefer to decrease time of VAD so that it will not take more than 1%. Is it somehow possible to optimize VAD procedure in terms of real time?? Maybe it is possible to run VAD on several CPU's? BTW, I see that VAD is running on CPU, is it possible to run it somehow on GPU?
The text was updated successfully, but these errors were encountered: