Requested float16 compute type, but the target device or backend do not support efficient float16 computation. #42

stevevaius2015 · 2023-03-15T06:43:06Z

I recently tried this wonderful tool on CPU of my Windows 10 amchine and got quite good results. But when I tried on GPU via model = WhisperModel(model_path, device="cuda", compute_type="float16") I received following error Requested float16 compute type, but the target device or backend do not support efficient float16 computation.
I have GTX1050 Ti and main driver is 31.0.15.1694. How can I fix this error and run on my GPU card?

The text was updated successfully, but these errors were encountered:

guillaumekln · 2023-03-15T06:48:36Z

Your GPU does not support FP16 execution.

You can set compute_type to "float32" or "int8".

stevevaius2015 · 2023-03-15T07:08:59Z

Your GPU does not support FP16 execution.

You can set compute_type to "float32" or "int8".

With model = WhisperModel(model_path, device="cuda", compute_type="float32")
Now I received this error
File "D:\faster-whisper\faster_whisper\transcribe.py", line 72, in __init__ self.model = ctranslate2.models.Whisper( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: CUDA failed with error out of memory
I tried to download ctranslate float32 model after this error but ctarnslate has no option for float32 type. Only float16 and int8 types. Do I need to configure something else?
Btw one of most impressive works I found on Whisper. Thank you again

guillaumekln · 2023-03-15T07:13:32Z

Can you try with compute_type="int8"?

stevevaius2015 · 2023-03-15T07:21:01Z

Can you try with compute_type="int8"?
Now I received following error after change model model = WhisperModel(model_path, device="cuda", compute_type="int8_float16")
ValueError: Requested int8_float16 compute type, but the target device or backend do not support efficient int8_float16 computation.
Any solution?

guillaumekln · 2023-03-15T07:22:45Z

Since your GPU does not support float16, you should set "int8" and not "int8_float16".

stevevaius2015 · 2023-03-15T07:46:33Z

Since your GPU does not support float16, you should set "int8" and not "int8_float16".

Now I received this error Process finished with exit code -1073740791 (0xC0000409) after change model = WhisperModel(model_path, device="cuda", compute_type="int8")
On Colab I can run without problem. I updated my CUDA and here is info:

""""NVIDIA System Information report created on: 03/15/2023 09:45:48
System name: DESKTOP-G3BPV4R

[Display]
Operating System: Windows 10 Pro, 64-bit
DirectX version: 12.0
GPU processor: NVIDIA GeForce GTX 1050 Ti
Driver version: 531.14
Driver Type: DCH
Direct3D feature level: 12_1
CUDA Cores: 768
Core clock: 1341 MHz
Memory data rate: 7.01 Gbps
Memory interface: 128-bit
Memory bandwidth: 112.13 GB/s
Total available graphics memory: 12259 MB
Dedicated video memory: 4096 MB GDDR5
System video memory: 0 MB
Shared system memory: 8163 MB
Video BIOS version: 86.07.42.00.52
IRQ: Not used
Bus: PCI Express x16 Gen3
Device Id: 10DE 1C82 86261043
Part Number: G210 0000""""

""""nvui.dll 8.17.15.3114 NVIDIA User Experience Driver Component
nvxdplcy.dll 8.17.15.3114 NVIDIA User Experience Driver Component
nvxdbat.dll 8.17.15.3114 NVIDIA User Experience Driver Component
nvxdapix.dll 8.17.15.3114 NVIDIA User Experience Driver Component
NVCPL.DLL 8.17.15.3114 NVIDIA User Experience Driver Component
nvCplUIR.dll 8.1.940.0 NVIDIA Control Panel
nvCplUI.exe 8.1.940.0 NVIDIA Control Panel
nvWSSR.dll 31.0.15.3114 NVIDIA Workstation Server
nvWSS.dll 31.0.15.3114 NVIDIA Workstation Server
nvViTvSR.dll 31.0.15.3114 NVIDIA Video Server
nvViTvS.dll 31.0.15.3114 NVIDIA Video Server
nvLicensingS.dll 6.14.15.3114 NVIDIA Licensing Server
nvDevToolSR.dll 31.0.15.3114 NVIDIA Licensing Server
nvDevToolS.dll 31.0.15.3114 NVIDIA 3D Settings Server
nvDispSR.dll 31.0.15.3114 NVIDIA Display Server
nvDispS.dll 31.0.15.3114 NVIDIA Display Server
PhysX 09.21.0713 NVIDIA PhysX
NVCUDA64.DLL 31.0.15.3114 NVIDIA CUDA 12.1.68 driver
nvGameSR.dll 31.0.15.3114 NVIDIA 3D Settings Server
nvGameS.dll 31.0.15.3114 NVIDIA 3D Settings Server""""

guillaumekln · 2023-03-15T08:12:05Z

Did you install the required NVIDIA libraries as indicated in the README?

stevevaius2015 · 2023-03-15T12:46:38Z

Did you install the required NVIDIA libraries as indicated in the README?

Not worked. I checked CUDADNN dlls as suggested by NVidia and cuda toolkit as 12.1 version. No chance. Maybe CUDA version is problem. I do not know

guillaumekln · 2023-03-15T12:59:02Z

You need to install CUDA 11.x (not 12.x). Also make sure to configure the PATH environment variable accordingly.

LukeTheHecker · 2023-03-23T01:32:47Z

You need to install CUDA 11.x (not 12.x). Also make sure to configure the PATH environment variable accordingly.

I just want to mention that cuDNN 11.2 has a bug where int8 did not work correctly on my RTX2070 GPU. I use 11.1 which works in my case.

guillaumekln · 2023-04-05T16:16:58Z

I'm closing this issue. The initial error has been explained and there are other useful threads about Windows installation. See for example #85.

UmangRajpara13 · 2023-04-16T15:41:39Z

I am facing the same issue. i am using Nvidia MX350(Pascal). my understanding is int8 computation only works with latest architecture(Ada Lovelace), but it seems to work with my graphic card as well. as per the information on wikipedia about pascal and ada lovelace architectures my graphic should only support upto fp16('float16' in CTranslate terms) and not int8. but opposite is true.

@guillaumekln could you please explain how this is possible.

guillaumekln · 2023-04-16T16:03:59Z

INT8 computation works on GPUs with Compute Capability 6.1 and above. Your GPU probably has CC 6.1 so it is compatible with this mode.

Maybe you are thinking about FP8 which indeed requires the Ada architecture?

Regarding FP16, your GPU could support it but it does not have Tensor Cores. Currently we disable FP16 without Tensor Cores as it has worse performance than FP32 in my experience. You can override this behavior by setting the environment variable CT2_CUDA_ALLOW_FP16=1.

UmangRajpara13 · 2023-04-17T18:08:47Z

yes, I think I miss read it in some documentation and was thinking that int8 is the addition for the new architecture and not fp8. which didn't make much sense to me but i accepted it "as is", since this is not my active feild of research.

it's making more sense now.

Thanks for the clarification!

kyleboddy · 2023-08-16T21:05:05Z

int8 does not work on the Tesla P40 or the P100 - I get errors thrown. Any thoughts on flags to set for that, if any?

Also not sure why the P40 is reported as not supporting FP16 when the datasheets for the GPU indicate that it definitely does - needed to set the allow flag for it to use FP16. Will post benchmarks in a bit from FP32 vs. FP16 (with forcing flag on).

FP32 test on a ~45 min file, Tesla P40, batch size 16.

real    8m37.289s 
user    7m35.752s
sys     0m26.459s

FP16 test on the same file, Tesla P40, batch size 16, environment variable set:

real    8m24.999s
user    7m49.251s
sys     0m21.153s

Much lower memory pressure as well. Transcription was the same quality, speed/performance about the same?

guillaumekln · 2023-08-17T07:01:34Z

As explained above, FP16 is only enabled for GPUs with Tensor Cores (Compute Capability 7.0 and above). You can set the environment variable to bypass this check.

int8 should work on the P40 which has Compute Capability 6.1. What error do you get?

drohack · 2023-08-25T18:48:58Z

I was getting a similar error when trying to run faster_whisper with my GPU and was able to figure out a solution which I'll write here. (I was able to run it just fine with my CPU, but it's so much slower)
I'm running on Windows 11, with a RTX 4080. Running this in PyCharm IDE.

Error: Whenever I would try and iterate over the segments trying to use my GPU I'd get the following error:
Process finished with exit code -1073740791 (0xC0000409)

code:

import os
import datetime
import torch
from faster_whisper import WhisperModel

def transcribe_audio(audio_file_path):
    print_with_timestamp("Start transcribe_audio(" + audio_file_path + ")")

    # Transcribe with faster_whisper
    # Run on GPU with FP16
    model = WhisperModel("large-v2", device="cuda", compute_type="float16")
    # or run on GPU with INT8
    # model = WhisperModel("large-v2", device="cuda", compute_type="int8")
    # or run on CPU with INT8
    # model = WhisperModel("large-v2", device="cpu", compute_type="int8")
    segments, info = model.transcribe(audio=audio_file_path, beam_size=5, language='ja')
    print_with_timestamp("End whisper")

    print(torch.cuda.is_available())

    transcribed_str = ""
    with tqdm(total=None) as pbar:
        for segment in segments:
            start_time = str(0) + str(datetime.timedelta(seconds=int(segment.start))) + ',000'
            end_time = str(0) + str(datetime.timedelta(seconds=int(segment.end))) + ',000'
            #text = segment.text
            #segment_id = segment.id + 1
            #line = f"{segment_id}\n{segment.start} --> {segment.end}\n{text[1:] if text[0] == ' ' else text}\n\n"
            line = "%d\n%s --> %s\n%s\n\n" % (segment.id, start_time, end_time, segment.text)
            transcribed_str += line
            pbar.update()

    return transcribed_str

I downloaded the following CUDA Toolkit, cuDNN, and Zlib versions:

CUDA is just an .exe install.
I put cuDNN here: C:\Program Files\NVIDIA\CUDNN

And Zlib here: C:\Program Files (x86)\zlib1.2.3

And I updated my Windows PATH variable with the cuDNN/bin folder and zlibwapi.dll/.lib (i don't think i need the .lib, but i'm covering my bases there).

After all of this I was still getting the same error and ran across the 'torch' package and when I ran print(torch.cuda.is_available()) it would always return False. Meaning it couldn't find/work with the CUDA cores in my GPU.
I also tried torch.zeros(1).cuda() and got back the error AssertionError: Torch not compiled with CUDA enabled. So my version of torch was incorrect and needed to be re-installed.

I had to pip uninstall torch torchvision torchaudio (or in PyCharm go to packages and delete them). And re-install with the following script (from the PyTorch official site https://pytorch.org/) pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118.
To do this in PyCharm I actually had to do this from the Python Console as their package installer I can't find where to put the '--index-url' into. So I had to run the following:

import pip
pip.main(['install', '--force-reinstall', 'torch', 'torchvision', 'torchaudio', '--index-url', 'https://download.pytorch.org/whl/cu118'])

Now I get the following:

And that solved my issue. I'm now able to run faster_whisper on my GPU.

I'm still getting an error when it finishes/unloads the model, but this is a different issue that's already opened on a different topic #85
And right now I'm getting around the error by setting the temperature=0:
segments, info = model.transcribe(audio=audio_file_path, beam_size=5, language='ja', temperature=0)

guillaumekln mentioned this issue Mar 22, 2023

Can't use it on gpu #64

Closed

guillaumekln closed this as completed Apr 5, 2023

rakuri255 mentioned this issue Jul 22, 2023

Whisper compute_type float16 does not work on older GPUs rakuri255/UltraSinger#81

Closed

Tom-Neverwinter mentioned this issue Oct 1, 2023

FP16 jhj0517/Whisper-WebUI#40

Closed

R3gm mentioned this issue Apr 5, 2024

Issue installing Piper TTS and Coqui XTTS R3gm/SoniTranslate#37

Closed

elia-morrison mentioned this issue Apr 9, 2024

transcription in logs file is empty NavodPeiris/speechlib#18

Open

h9j6k mentioned this issue May 20, 2024

flatpak v4.5.0 won't start showing std::runtime error pa failed mkiol/dsnote#138

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Requested float16 compute type, but the target device or backend do not support efficient float16 computation. #42

Requested float16 compute type, but the target device or backend do not support efficient float16 computation. #42

stevevaius2015 commented Mar 15, 2023

guillaumekln commented Mar 15, 2023

stevevaius2015 commented Mar 15, 2023

guillaumekln commented Mar 15, 2023

stevevaius2015 commented Mar 15, 2023

guillaumekln commented Mar 15, 2023 •

edited

Loading

stevevaius2015 commented Mar 15, 2023 •

edited

Loading

guillaumekln commented Mar 15, 2023

stevevaius2015 commented Mar 15, 2023

guillaumekln commented Mar 15, 2023

LukeTheHecker commented Mar 23, 2023 •

edited

Loading

guillaumekln commented Apr 5, 2023

UmangRajpara13 commented Apr 16, 2023

guillaumekln commented Apr 16, 2023

UmangRajpara13 commented Apr 17, 2023

kyleboddy commented Aug 16, 2023 •

edited

Loading

guillaumekln commented Aug 17, 2023

drohack commented Aug 25, 2023

Requested float16 compute type, but the target device or backend do not support efficient float16 computation. #42

Requested float16 compute type, but the target device or backend do not support efficient float16 computation. #42

Comments

stevevaius2015 commented Mar 15, 2023

guillaumekln commented Mar 15, 2023

stevevaius2015 commented Mar 15, 2023

guillaumekln commented Mar 15, 2023

stevevaius2015 commented Mar 15, 2023

guillaumekln commented Mar 15, 2023 • edited Loading

stevevaius2015 commented Mar 15, 2023 • edited Loading

guillaumekln commented Mar 15, 2023

stevevaius2015 commented Mar 15, 2023

guillaumekln commented Mar 15, 2023

LukeTheHecker commented Mar 23, 2023 • edited Loading

guillaumekln commented Apr 5, 2023

UmangRajpara13 commented Apr 16, 2023

guillaumekln commented Apr 16, 2023

UmangRajpara13 commented Apr 17, 2023

kyleboddy commented Aug 16, 2023 • edited Loading

guillaumekln commented Aug 17, 2023

drohack commented Aug 25, 2023

guillaumekln commented Mar 15, 2023 •

edited

Loading

stevevaius2015 commented Mar 15, 2023 •

edited

Loading

LukeTheHecker commented Mar 23, 2023 •

edited

Loading

kyleboddy commented Aug 16, 2023 •

edited

Loading