Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR: Your GPU does not support Int8 Matmul! (With example code) #100

Closed
ebolam opened this issue Dec 1, 2022 · 7 comments
Closed

ERROR: Your GPU does not support Int8 Matmul! (With example code) #100

ebolam opened this issue Dec 1, 2022 · 7 comments

Comments

@ebolam
Copy link

ebolam commented Dec 1, 2022

I'm working on implementing 8 bit inference into KoboldAI and ran into the above error on my Tesla M40.

Steps to reproduce

import torch, transformers
model = transformers.AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-neo-1.3B", load_in_8bit=True, device_map="auto", cache_dir="cache")
tokenizer = transformers.AutoTokenizer.from_pretrained("EleutherAI/gpt-neo-1.3B", cache_dir="cache", use_fast=False)
model.generate(torch.tensor(tokenizer.encode("A man walks into a bar "), dtype=torch.long)[None].to(0), do_sample=True, max_length=50)

Expected Output

an list of tokens

Actual Output

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's 'attention_mask' to obtain reliable results.
Setting 'pad_token_id' to 'eos_token_id':50256 for open-end generation.

== == == == == == == == == == == == == == == == == == == == == == =
ERROR: Your GPU does not support Int8 Matmul!
== == == == == == == == == == == == == == == == == == == == == == =

python: /mmfs1/gscratch/zlab/timdettmers/git/bitsandbytes/csrc/ops.cu:379: int igemmlt(cublasLtHandle_t, int, int, int, const int8_t*, const int8_t*, void*, float*, int, int, int) [with int FORMATB = 3; int DTYPE_OUT = 32; int SCALE_ROWS = 0; cublasLtHandle_t = cublasLtContext*; int8_t = signed char]: Assertion `false' failed.
Aborted (core dumped)

Version information, etc

Python 3.8.15 | packaged by conda-forge | (default, Nov 22 2022, 08:49:35)
[GCC 10.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
from importlib.metadata import version
version('bitsandbytes')
'0.35.4'
version('transformers')
'4.24.0'
version('torch')
'1.11.0'

using cuda 11.1

@iliemihai
Copy link

I am facing the same problem on a Nvidia V100 GPU

@gururise
Copy link

Interestingly, I am getting the same error on the ROCM version of this library.

@ebolam
Copy link
Author

ebolam commented Dec 11, 2022

Ahh, I see now that int8 requires a more recent card vs the rest of the features.

@ebolam ebolam closed this as completed Dec 11, 2022
@lolxdmainkaisemaanlu
Copy link

lolxdmainkaisemaanlu commented Feb 3, 2023

Ahh, I see now that int8 requires a more recent card vs the rest of the features.

@ebolam int8 is now supported on all GPUs with the latest release!! Looking forward to Kobold with int8 support.

@chuckhope
Copy link

My GPU is V100, and I've been using bitsandbytes 0.37.1 with INT8. Although this configuration is supposed to work, I'm encountering an issue where the loss remains at 0

@miguelamendez
Copy link

solved by updating to the newest version 0.39.1
pip install -q -U bitsandbytes

@ngthanhtin
Copy link

Thanks @miguelamendez , I fixed it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants