-
-
Notifications
You must be signed in to change notification settings - Fork 796
Closed
Description
Hi, I was testing LLM.int8() on the LongT5 model, but I consistently ran into the following errors:
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.0
CUDA SETUP: Detected CUDA version 110
CUDA SETUP: Loading binary /opt/conda/envs/python38/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda110_nocublaslt.so...
python3: /mmfs1/gscratch/zlab/timdettmers/git/bitsandbytes/csrc/ops.cu:375: int igemmlt(cublasLtHandle_t, int, int, int, const int8_t*, const int8_t*, void*, float*, int, int, int) [with int FORMATB = 3; int DTYPE_OUT = 32; int SCALE_ROWS = 0; cublasLtHandle_t = cublasLtContext*; int8_t = signed char]: Assertion `false' failed.
Aborted
Sample script to reproduce:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
import torch
tokenizer = AutoTokenizer.from_pretrained('google/t5-v1_1-large')
model_8bit = AutoModelForSeq2SeqLM.from_pretrained('google/t5-v1_1-large', device_map="auto", load_in_8bit=True)
sentences = ['hello world']
inputs = tokenizer(sentences, return_tensors="pt", padding=True)
output_sequences = model_8bit.generate(
input_ids=inputs["input_ids"],
max_new_tokens=256
)
print(tokenizer.batch_decode(output_sequences, skip_special_tokens=True))
Metadata
Metadata
Assignees
Labels
No labels