Run with onnxruntime-gpu not working for faster_whisper #493

guilhermehge · 2023-09-27T16:11:45Z

I am trying to use faster_whisper with pyannote for speech overlap detection and speaker diarization, but the pyannote's new update 3.0.0, it will need onnxruntime-gpu to run the diarization pipeline with the new embedding model.

Installing both onnxruntime (from faster_whisper) and onnxruntime-gpu (from pyannote), causes a conflict and onnx redirects to CPU only.

I tried uninstalling onnxruntime and forcing the reinstall of onnxruntime-gpu and faster_whisper is no longer working.

Is it possible to use onnxruntime-gpu for faster_whisper?

phineas-pta · 2023-09-27T16:59:26Z

u should not have both onnxruntime and onnxruntime-gpu, it always default to cpu

installing onnxruntime-gpu alone should be enough, faster_whisper uses it for silero VAD but always cpu https://github.com/guillaumekln/faster-whisper/blob/master/faster_whisper/vad.py#L260

the caveat with onnxruntime-gpu is u must properly install cuda + cudnn at system level

guilhermehge · 2023-09-27T17:04:23Z

But can silero vad run with onnxruntime-gpu? To do that I believe I might need to change the requirements of faster whisper so it does not install onnxruntime, right?

I'm running the application on docker with the following image: nvidia/cuda:11.7.1-cudnn8-runtime-ubuntu20.04, so cuda + cudnn are properly installed

phineas-pta · 2023-09-27T17:09:58Z

it's possible to run silero vad with onnxruntime-gpu, see my comment #364 (comment)

idk u using what version of onnxruntime but for latest version better use cuda 11.8 https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements

guilhermehge · 2023-09-27T18:03:49Z

Thanks for that, phineas!

Let me ask you something else. Faster_whisper's transcribe is already taking up 99% of my GPU, if I run VAD on GPU as well, would it be a problem or would it take longer due to that? I read through transcribe.py and I see that SileroVAD is only used within the transcribe function and the segments are a generator, so it should not overload the GPU. Am I correct?

guilhermehge · 2023-09-27T18:30:41Z

I implemented this code of yours from # 364 (comment)

and it actually increased the transcribe function time, going from 2 to 7 seconds for an audio that i'm testing. Do you know why that happened?

Analyzing it further I believe that happens because it creates a session everytime we call the transcribe function, so, since it is using GPU, it increases session creation time.

phineas-pta · 2023-09-27T21:58:23Z

hmm seem like i misread your previous comment, silero vad should work with onnxruntime-gpu, default to cpu, my code is just a tweak to make it work on gpu but not absolute necessity

it always create new onnx session no matter gpu or cpu, but take more time to load to gpu i guess (loading time > processing time), maybe need a longer audio to test for actual speed up

guilhermehge · 2023-09-27T22:19:38Z

Yes, at first I did want to run it with the onnxruntime-gpu library but using the Silero VAD on CPU, but since you posted the code, I tried running it on the GPU, but session time increases the time too much for small audios, so it's not worth it in most cases, better to use CPU with more threads active.

I'm trying to run this code along with pyannote's 3.0 diarization pipeline, which requires onnxruntime-gpu, so faster_whisper's requirements were causing a conflict.

I'm using a docker container in a pod with GPU orchestrated by kubernetes, there I'm building an image based on nvidia/cuda:11.7.1-cudnn8-runtime-ubuntu20.04. I created this issue because I was testing the onnxruntime-gpu in this environment inside a jupyter-notebook but the kernel kept dying when trying to run inference with whisper, and I couldn't figure out why, but then, later, I ran a .py complete code outside jupyter and it worked fine. I still don't know why the jupyter notebook kernel keeps dying with this library.

phineas-pta · 2023-09-28T01:10:02Z

should had shared the config info since the beginning to avoid talking to nowhere 😅

so the actual problem is jupyter kernel crash, u have logs ?

guilhermehge · 2023-09-28T14:30:57Z

I'll be running some tests and I'll comeback here with the results. For now, I don't have any logs, I killed the pod before accessing them.

Edit: I'll only be able to touch this issue again next week. When I get the results, I will post them here.

thomasmol · 2023-09-29T08:25:01Z

Hi, I am having the same issue: I need to run onnxruntime-gpu but I can't easily uninstall the cpu version since I am using Cog and pushing it to Replicate. Meaning I can't change the code as per #364 (comment) , or I don't know how at least. Any ideas how to force my build to use onnxruntime-gpu and remove onnxruntime?

thomasmol · 2023-09-30T09:33:33Z

I created a pull request that fixes this issue: #499. You can try it by importing git+https://github.com/thomasmol/faster-whisper.git@master.

phineas-pta · 2023-09-30T09:58:08Z

your PR is very likely to be rejected, it only works with nvidia gpu, meanwhile faster-whisper is cross-platform, that's why my code snippet just stay as it is instead of send PR

thomasmol · 2023-09-30T10:07:57Z

Thanks for the heads up

guilhermehge · 2023-09-30T14:02:15Z

I don't recommend running silero vad on GPU either, since it takes longer to instantiate a session than the CPU version. For shorter audios, it increases the overall time significantly. I've had had 2s on CPU versus 7s on GPU for certain audios.

Perhaps it's possible we add an option for the user to select GPU or CPU for silero vad, using the parameters class.

guilhermehge · 2023-10-03T16:09:50Z

So, for this issue, @phineas-pta, I fixed it by installing only onnxruntime-gpu, the jupyter notebook is working properly and everything is running as it should be.

To do this, I cloned whisper repo, created a build with only the onnxruntime-gpu version and installed it, now everything is running normally. Thanks for the help.

thomasmol · 2023-10-03T16:27:13Z

@guilhermehge yes I did the same at works! Maybe we could create a fork faster-whisper-gpu and have a gpu only version?

remic33 · 2023-10-10T15:28:46Z

It seems that the current pyannote version (3.0.1) is not working with the current faster_whisper version. Any idea solution on that?

guilhermehge · 2023-10-10T15:44:44Z

It is. I am using it atm. How are you implementing it? Docker? Colab? Locally w/o docker?

remic33 · 2023-10-10T15:49:55Z

Locally. should be the problem I guess. Wanted to update whisperX on that matter

guilhermehge · 2023-10-10T15:52:35Z

Did you create a virtual environment to do that?

Can you further explain your problem so we can debug it?

remic33 · 2023-10-11T13:53:32Z

Its is a local env made with conda, with m2 silicon.
Env had whisperX install previously, trying to build it with new piannote version send me an error :

ERROR: Could not find a version that satisfies the requirement onnxruntime-gpu>=1.16.0 (from pyannote-audio) (from versions: none)
ERROR: No matching distribution found for onnxruntime-gpu>=1.16.0

phineas-pta · 2023-10-11T14:20:31Z

@remic33 pyannote dont officially support mac, there's already many issues on pyannote repo about that

remic33 · 2023-10-12T07:31:13Z

It worked previously, I know it because I was using it and I was part of those discussions. You just needed to add some packages. But maybe with onnx gpu it do not anymore.
Thanks for your help !

guilhermehge mentioned this issue Sep 27, 2023

fix: fix WeSpeakerPretrainedSpeakerEmbedding GPU support pyannote/pyannote-audio#1478

Merged

hbredin mentioned this issue Sep 29, 2023

pyannote/speaker-diarization-3.0 slower than pyannote/speaker-diarization? pyannote/pyannote-audio#1481

Closed

kaihe-stori mentioned this issue Sep 29, 2023

pyannote/speaker-diarization-3.0 runs slower than pyannote/speaker-diarization@2.1 m-bain/whisperX#499

Closed

thomasmol mentioned this issue Sep 30, 2023

Change onnxruntime requirement to gpu version and update VAD to run on gpu #499

Open

guilhermehge closed this as completed Oct 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run with onnxruntime-gpu not working for faster_whisper #493

Run with onnxruntime-gpu not working for faster_whisper #493

guilhermehge commented Sep 27, 2023

phineas-pta commented Sep 27, 2023

guilhermehge commented Sep 27, 2023 •

edited

Loading

phineas-pta commented Sep 27, 2023

guilhermehge commented Sep 27, 2023

guilhermehge commented Sep 27, 2023 •

edited

Loading

phineas-pta commented Sep 27, 2023

guilhermehge commented Sep 27, 2023

phineas-pta commented Sep 28, 2023

guilhermehge commented Sep 28, 2023 •

edited

Loading

thomasmol commented Sep 29, 2023

thomasmol commented Sep 30, 2023

phineas-pta commented Sep 30, 2023

thomasmol commented Sep 30, 2023

guilhermehge commented Sep 30, 2023 •

edited

Loading

guilhermehge commented Oct 3, 2023

thomasmol commented Oct 3, 2023 •

edited

Loading

remic33 commented Oct 10, 2023

guilhermehge commented Oct 10, 2023

remic33 commented Oct 10, 2023

guilhermehge commented Oct 10, 2023

remic33 commented Oct 11, 2023

phineas-pta commented Oct 11, 2023

remic33 commented Oct 12, 2023 •

edited

Loading

Run with onnxruntime-gpu not working for faster_whisper #493

Run with onnxruntime-gpu not working for faster_whisper #493

Comments

guilhermehge commented Sep 27, 2023

phineas-pta commented Sep 27, 2023

guilhermehge commented Sep 27, 2023 • edited Loading

phineas-pta commented Sep 27, 2023

guilhermehge commented Sep 27, 2023

guilhermehge commented Sep 27, 2023 • edited Loading

phineas-pta commented Sep 27, 2023

guilhermehge commented Sep 27, 2023

phineas-pta commented Sep 28, 2023

guilhermehge commented Sep 28, 2023 • edited Loading

thomasmol commented Sep 29, 2023

thomasmol commented Sep 30, 2023

phineas-pta commented Sep 30, 2023

thomasmol commented Sep 30, 2023

guilhermehge commented Sep 30, 2023 • edited Loading

guilhermehge commented Oct 3, 2023

thomasmol commented Oct 3, 2023 • edited Loading

remic33 commented Oct 10, 2023

guilhermehge commented Oct 10, 2023

remic33 commented Oct 10, 2023

guilhermehge commented Oct 10, 2023

remic33 commented Oct 11, 2023

phineas-pta commented Oct 11, 2023

remic33 commented Oct 12, 2023 • edited Loading

guilhermehge commented Sep 27, 2023 •

edited

Loading

guilhermehge commented Sep 27, 2023 •

edited

Loading

guilhermehge commented Sep 28, 2023 •

edited

Loading

guilhermehge commented Sep 30, 2023 •

edited

Loading

thomasmol commented Oct 3, 2023 •

edited

Loading

remic33 commented Oct 12, 2023 •

edited

Loading