Support 32dim #125

qwopqwop200 · 2023-06-02T10:10:24Z

Changed the code to support triton when divisible by 32. This enables the use of triton in falcon and GPT2-XL.

PanQiWei

Thank you very much! This update looks really neat yet also strong for it make it possible to support more models. 🔥 🔥

PanQiWei · 2023-06-02T10:25:52Z

@qwopqwop200 Hi, this is just a question: is it possible to implement a dynamic strategy to config triton warmup based on model type and the model's attributes?

qwopqwop200 · 2023-06-02T10:37:33Z

@qwopqwop200 Hi, this is just a question: is it possible to implement a dynamic strategy to config triton warmup based on model type and the model's attributes?

Can you give me an example? I didn't understand

PanQiWei · 2023-06-02T10:48:14Z

For example: if one use GPT2-large, use one group configs to warmup triton; and when use GPT2-XL, use another group configs to warmup triton.

And if this can be implemented, I think maybe we can also predefine a set of config groups based on GPU types and architectures?

qwopqwop200 · 2023-06-02T10:51:53Z

I think it's probably possible.

PanQiWei · 2023-06-02T10:57:01Z

Maybe a dynamic coding strategy need to be implemented with a wrapper or comtext_manager. Anyway, I will look into it sometime when I have enough time.

Thanks again for this pr! ❤️

TheBloke · 2023-06-02T11:04:43Z

Awesome work! I'd love to try this but am currently getting an error when using a basic test script, I think related to the HF hub download stuff:

Script:

from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig

quantized_model_dir = "/workspace/models/TheBloke_falcon-40b-instruct-GPTQ"

use_triton = True

tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir, use_fast=True)

model = AutoGPTQForCausalLM.from_quantized(quantized_model_dir,
        use_safetensors=True,
        trust_remote_code=True,
        device="cuda:0",
        use_triton=use_triton,
        quantize_config=None)

# Prevent printing spurious transformers error when using pipeline with AutoGPTQ
logging.set_verbosity(logging.CRITICAL)

prompt = "Tell me about AI"
prompt_template=f'''### Human: {prompt}
### Assistant:'''

print("*** Pipeline:")
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.95,
    repetition_penalty=1.15
)

print(pipe(prompt_template)[0]['generated_text'])

print("\n\n*** Generate:")

input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512)
print(tokenizer.decode(output[0]))

Errors:

[pytorch2] ubuntu@h100:/workspace/misc $ python ./simple_falcon.py

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin /workspace/venv/pytorch2/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda120.so
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 9.0
CUDA SETUP: Detected CUDA version 120
CUDA SETUP: Loading binary /workspace/venv/pytorch2/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda120.so...
triton is not installed, reset use_triton to False
Traceback (most recent call last):
  File "/workspace/misc/./simple_falcon.py", line 11, in <module>
    model = AutoGPTQForCausalLM.from_quantized(quantized_model_dir,
  File "/workspace/venv/pytorch2/lib/python3.10/site-packages/auto_gptq/modeling/auto.py", line 82, in from_quantized
    return quant_func(
  File "/workspace/venv/pytorch2/lib/python3.10/site-packages/auto_gptq/modeling/_base.py", line 543, in from_quantized
    quantize_config = BaseQuantizeConfig.from_pretrained(save_dir)
  File "/workspace/venv/pytorch2/lib/python3.10/site-packages/auto_gptq/modeling/_base.py", line 57, in from_pretrained
    return cls(**json.load(f))
TypeError: BaseQuantizeConfig.__init__() got an unexpected keyword argument 'model_name_or_path'

I saw that the PR is relative to peft and I've not tried the peft code at all yet so am not familiar with any changes that makes

PanQiWei · 2023-06-02T11:23:50Z

This seems because some changes in the main branch not been merged into peft branch

Support cuda 64dim

qwopqwop200 added 2 commits June 2, 2023 19:04

support 32dim triton kernel

b3654a6

support 32dim triton]

0891ea4

PanQiWei approved these changes Jun 2, 2023

View reviewed changes

support cuda 64dim

90106d7

qwopqwop200 added 2 commits June 2, 2023 19:53

support 64dim cuda

b03f532

support 64 cuda dim

2df7d71

qwopqwop200 and others added 6 commits June 3, 2023 07:27

Rename autogptq_cuda.cpp to autogptq_cuda_64.cpp

446e12d

Rename autogptq_cuda_kernel.cu to autogptq_cuda_kernel_64.cu

5fc2064

add cuda

e04c3b8

change setup

8951212

change qlinear cuda support 64dim

f4820f2

Merge pull request #126 from PanQiWei/support-cuda-64dim

95a4381

Support cuda 64dim

PanQiWei merged commit 023bb1c into peft_integration Jun 3, 2023

PanQiWei deleted the support-32dim branch June 8, 2023 06:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support 32dim #125

Support 32dim #125

qwopqwop200 commented Jun 2, 2023

PanQiWei left a comment

PanQiWei commented Jun 2, 2023

qwopqwop200 commented Jun 2, 2023

PanQiWei commented Jun 2, 2023 •

edited

Loading

qwopqwop200 commented Jun 2, 2023

PanQiWei commented Jun 2, 2023

TheBloke commented Jun 2, 2023

PanQiWei commented Jun 2, 2023

Support 32dim #125

Support 32dim #125

Conversation

qwopqwop200 commented Jun 2, 2023

PanQiWei left a comment

Choose a reason for hiding this comment

PanQiWei commented Jun 2, 2023

qwopqwop200 commented Jun 2, 2023

PanQiWei commented Jun 2, 2023 • edited Loading

qwopqwop200 commented Jun 2, 2023

PanQiWei commented Jun 2, 2023

TheBloke commented Jun 2, 2023

PanQiWei commented Jun 2, 2023

PanQiWei commented Jun 2, 2023 •

edited

Loading