Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Requested float16 compute type, but the target device or backend do not support efficient float16 computation. #42

Closed
stevevaius2015 opened this issue Mar 15, 2023 · 17 comments

Comments

@stevevaius2015
Copy link

I recently tried this wonderful tool on CPU of my Windows 10 amchine and got quite good results. But when I tried on GPU via model = WhisperModel(model_path, device="cuda", compute_type="float16") I received following error Requested float16 compute type, but the target device or backend do not support efficient float16 computation.
I have GTX1050 Ti and main driver is 31.0.15.1694. How can I fix this error and run on my GPU card?

@guillaumekln
Copy link
Contributor

Your GPU does not support FP16 execution.

You can set compute_type to "float32" or "int8".

@stevevaius2015
Copy link
Author

Your GPU does not support FP16 execution.

You can set compute_type to "float32" or "int8".

With model = WhisperModel(model_path, device="cuda", compute_type="float32")
Now I received this error
File "D:\faster-whisper\faster_whisper\transcribe.py", line 72, in __init__ self.model = ctranslate2.models.Whisper( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: CUDA failed with error out of memory
I tried to download ctranslate float32 model after this error but ctarnslate has no option for float32 type. Only float16 and int8 types. Do I need to configure something else?
Btw one of most impressive works I found on Whisper. Thank you again

@guillaumekln
Copy link
Contributor

Can you try with compute_type="int8"?

@stevevaius2015
Copy link
Author

Can you try with compute_type="int8"?
Now I received following error after change model model = WhisperModel(model_path, device="cuda", compute_type="int8_float16")
ValueError: Requested int8_float16 compute type, but the target device or backend do not support efficient int8_float16 computation.
Any solution?

@guillaumekln
Copy link
Contributor

guillaumekln commented Mar 15, 2023

Since your GPU does not support float16, you should set "int8" and not "int8_float16".

@stevevaius2015
Copy link
Author

stevevaius2015 commented Mar 15, 2023

Since your GPU does not support float16, you should set "int8" and not "int8_float16".

Now I received this error Process finished with exit code -1073740791 (0xC0000409) after change model = WhisperModel(model_path, device="cuda", compute_type="int8")
On Colab I can run without problem. I updated my CUDA and here is info:

""""NVIDIA System Information report created on: 03/15/2023 09:45:48
System name: DESKTOP-G3BPV4R

[Display]
Operating System: Windows 10 Pro, 64-bit
DirectX version: 12.0
GPU processor: NVIDIA GeForce GTX 1050 Ti
Driver version: 531.14
Driver Type: DCH
Direct3D feature level: 12_1
CUDA Cores: 768
Core clock: 1341 MHz
Memory data rate: 7.01 Gbps
Memory interface: 128-bit
Memory bandwidth: 112.13 GB/s
Total available graphics memory: 12259 MB
Dedicated video memory: 4096 MB GDDR5
System video memory: 0 MB
Shared system memory: 8163 MB
Video BIOS version: 86.07.42.00.52
IRQ: Not used
Bus: PCI Express x16 Gen3
Device Id: 10DE 1C82 86261043
Part Number: G210 0000""""

""""nvui.dll 8.17.15.3114 NVIDIA User Experience Driver Component
nvxdplcy.dll 8.17.15.3114 NVIDIA User Experience Driver Component
nvxdbat.dll 8.17.15.3114 NVIDIA User Experience Driver Component
nvxdapix.dll 8.17.15.3114 NVIDIA User Experience Driver Component
NVCPL.DLL 8.17.15.3114 NVIDIA User Experience Driver Component
nvCplUIR.dll 8.1.940.0 NVIDIA Control Panel
nvCplUI.exe 8.1.940.0 NVIDIA Control Panel
nvWSSR.dll 31.0.15.3114 NVIDIA Workstation Server
nvWSS.dll 31.0.15.3114 NVIDIA Workstation Server
nvViTvSR.dll 31.0.15.3114 NVIDIA Video Server
nvViTvS.dll 31.0.15.3114 NVIDIA Video Server
nvLicensingS.dll 6.14.15.3114 NVIDIA Licensing Server
nvDevToolSR.dll 31.0.15.3114 NVIDIA Licensing Server
nvDevToolS.dll 31.0.15.3114 NVIDIA 3D Settings Server
nvDispSR.dll 31.0.15.3114 NVIDIA Display Server
nvDispS.dll 31.0.15.3114 NVIDIA Display Server
PhysX 09.21.0713 NVIDIA PhysX
NVCUDA64.DLL 31.0.15.3114 NVIDIA CUDA 12.1.68 driver
nvGameSR.dll 31.0.15.3114 NVIDIA 3D Settings Server
nvGameS.dll 31.0.15.3114 NVIDIA 3D Settings Server""""

@guillaumekln
Copy link
Contributor

Did you install the required NVIDIA libraries as indicated in the README?

@stevevaius2015
Copy link
Author

Did you install the required NVIDIA libraries as indicated in the README?

Not worked. I checked CUDADNN dlls as suggested by NVidia and cuda toolkit as 12.1 version. No chance. Maybe CUDA version is problem. I do not know

@guillaumekln
Copy link
Contributor

You need to install CUDA 11.x (not 12.x). Also make sure to configure the PATH environment variable accordingly.

@LukeTheHecker
Copy link

LukeTheHecker commented Mar 23, 2023

You need to install CUDA 11.x (not 12.x). Also make sure to configure the PATH environment variable accordingly.

I just want to mention that cuDNN 11.2 has a bug where int8 did not work correctly on my RTX2070 GPU. I use 11.1 which works in my case.

@guillaumekln
Copy link
Contributor

I'm closing this issue. The initial error has been explained and there are other useful threads about Windows installation. See for example #85.

@UmangRajpara13
Copy link

I am facing the same issue. i am using Nvidia MX350(Pascal). my understanding is int8 computation only works with latest architecture(Ada Lovelace), but it seems to work with my graphic card as well. as per the information on wikipedia about pascal and ada lovelace architectures my graphic should only support upto fp16('float16' in CTranslate terms) and not int8. but opposite is true.

@guillaumekln could you please explain how this is possible.

@guillaumekln
Copy link
Contributor

INT8 computation works on GPUs with Compute Capability 6.1 and above. Your GPU probably has CC 6.1 so it is compatible with this mode.

Maybe you are thinking about FP8 which indeed requires the Ada architecture?

Regarding FP16, your GPU could support it but it does not have Tensor Cores. Currently we disable FP16 without Tensor Cores as it has worse performance than FP32 in my experience. You can override this behavior by setting the environment variable CT2_CUDA_ALLOW_FP16=1.

@UmangRajpara13
Copy link

yes, I think I miss read it in some documentation and was thinking that int8 is the addition for the new architecture and not fp8. which didn't make much sense to me but i accepted it "as is", since this is not my active feild of research.

it's making more sense now.

Thanks for the clarification!

@kyleboddy
Copy link

kyleboddy commented Aug 16, 2023

int8 does not work on the Tesla P40 or the P100 - I get errors thrown. Any thoughts on flags to set for that, if any?

Also not sure why the P40 is reported as not supporting FP16 when the datasheets for the GPU indicate that it definitely does - needed to set the allow flag for it to use FP16. Will post benchmarks in a bit from FP32 vs. FP16 (with forcing flag on).

FP32 test on a ~45 min file, Tesla P40, batch size 16.

real    8m37.289s 
user    7m35.752s
sys     0m26.459s

FP16 test on the same file, Tesla P40, batch size 16, environment variable set:

real    8m24.999s
user    7m49.251s
sys     0m21.153s

Much lower memory pressure as well. Transcription was the same quality, speed/performance about the same?

@guillaumekln
Copy link
Contributor

As explained above, FP16 is only enabled for GPUs with Tensor Cores (Compute Capability 7.0 and above). You can set the environment variable to bypass this check.

int8 should work on the P40 which has Compute Capability 6.1. What error do you get?

@drohack
Copy link

drohack commented Aug 25, 2023

I was getting a similar error when trying to run faster_whisper with my GPU and was able to figure out a solution which I'll write here. (I was able to run it just fine with my CPU, but it's so much slower)
I'm running on Windows 11, with a RTX 4080. Running this in PyCharm IDE.

Error: Whenever I would try and iterate over the segments trying to use my GPU I'd get the following error:
Process finished with exit code -1073740791 (0xC0000409)

code:

import os
import datetime
import torch
from faster_whisper import WhisperModel

def transcribe_audio(audio_file_path):
    print_with_timestamp("Start transcribe_audio(" + audio_file_path + ")")

    # Transcribe with faster_whisper
    # Run on GPU with FP16
    model = WhisperModel("large-v2", device="cuda", compute_type="float16")
    # or run on GPU with INT8
    # model = WhisperModel("large-v2", device="cuda", compute_type="int8")
    # or run on CPU with INT8
    # model = WhisperModel("large-v2", device="cpu", compute_type="int8")
    segments, info = model.transcribe(audio=audio_file_path, beam_size=5, language='ja')
    print_with_timestamp("End whisper")

    print(torch.cuda.is_available())

    transcribed_str = ""
    with tqdm(total=None) as pbar:
        for segment in segments:
            start_time = str(0) + str(datetime.timedelta(seconds=int(segment.start))) + ',000'
            end_time = str(0) + str(datetime.timedelta(seconds=int(segment.end))) + ',000'
            #text = segment.text
            #segment_id = segment.id + 1
            #line = f"{segment_id}\n{segment.start} --> {segment.end}\n{text[1:] if text[0] == ' ' else text}\n\n"
            line = "%d\n%s --> %s\n%s\n\n" % (segment.id, start_time, end_time, segment.text)
            transcribed_str += line
            pbar.update()

    return transcribed_str

I downloaded the following CUDA Toolkit, cuDNN, and Zlib versions:

CUDA is just an .exe install.
I put cuDNN here: C:\Program Files\NVIDIA\CUDNN
image
And Zlib here: C:\Program Files (x86)\zlib1.2.3
image

And I updated my Windows PATH variable with the cuDNN/bin folder and zlibwapi.dll/.lib (i don't think i need the .lib, but i'm covering my bases there).
image

After all of this I was still getting the same error and ran across the 'torch' package and when I ran print(torch.cuda.is_available()) it would always return False. Meaning it couldn't find/work with the CUDA cores in my GPU.
I also tried torch.zeros(1).cuda() and got back the error AssertionError: Torch not compiled with CUDA enabled. So my version of torch was incorrect and needed to be re-installed.

I had to pip uninstall torch torchvision torchaudio (or in PyCharm go to packages and delete them). And re-install with the following script (from the PyTorch official site https://pytorch.org/) pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118.
To do this in PyCharm I actually had to do this from the Python Console as their package installer I can't find where to put the '--index-url' into. So I had to run the following:

import pip
pip.main(['install', '--force-reinstall', 'torch', 'torchvision', 'torchaudio', '--index-url', 'https://download.pytorch.org/whl/cu118'])

Now I get the following:
image

And that solved my issue. I'm now able to run faster_whisper on my GPU.

I'm still getting an error when it finishes/unloads the model, but this is a different issue that's already opened on a different topic #85
And right now I'm getting around the error by setting the temperature=0:
segments, info = model.transcribe(audio=audio_file_path, beam_size=5, language='ja', temperature=0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants