Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support sharded quantized model files in from_quantized #319

Closed
shakealeg opened this issue Sep 3, 2023 · 6 comments · Fixed by #425
Closed

Support sharded quantized model files in from_quantized #319

shakealeg opened this issue Sep 3, 2023 · 6 comments · Fixed by #425
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@shakealeg
Copy link

shakealeg commented Sep 3, 2023

I've been using 0cc4m GPTQ for a while and it's been smooth sailing, no errors. But when I try AutoGPTQ, it just, doesn't work. I'm trying to load Pygmalion 7b 4bit 32g, from TehVenom. And I get the following error:

Traceback (most recent call last):
  File "/home/XXX/Documents/Projects/Sapphire/main.py", line 63, in <module>
    main()
  File "/home/XXX/Documents/Projects/Sapphire/main.py", line 30, in main
    model = AutoGPTQForCausalLM.from_quantized(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/XXX/Documents/Projects/Sapphire/venv/lib/python3.11/site-packages/auto_gptq/modeling/auto.py", line 108, in from_quantized
    return quant_func(
           ^^^^^^^^^^^
  File "/home/XXX/Documents/Projects/Sapphire/venv/lib/python3.11/site-packages/auto_gptq/modeling/_base.py", line 791, in from_quantized
    raise FileNotFoundError(f"Could not find model in {model_name_or_path}")
FileNotFoundError: Could not find model in models/pygmalion-7b-4bit-32g

My file structure looks like this:

Project:
    models/
        pygmalion-7b-4bit-32g/:
            4bit-32g.safetensors
            config.json
            generation_config.json
            special_tokens_map.json
            tokenizer_config.json
            tokenizer.json
            tokenizer.model
    venv/
    main.py

My code is the following:

from transformers import (
    AutoTokenizer,
    pipeline,
    logging
)
from auto_gptq import (
    AutoGPTQForCausalLM,
    BaseQuantizeConfig
)

USERNAME = "Frantic"
VERSION = "1.0.0"
MODEL_DIR = "models/pygmalion-7b-4bit-32g"
MODEL_BASENAME = "Pygmalion-7b-4bit-GPTQ-Safetensors"
USE_TRITON = True

def main():
    print(f"Sapphire | Version {VERSION}")

    tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR, use_fast=True)
    quantize_config = BaseQuantizeConfig(
        bits=4,
        group_size=32,
        desc_act=False
    )

    model = AutoGPTQForCausalLM.from_quantized(
        MODEL_DIR,
        use_safetensors=True,
        model_basename=MODEL_BASENAME,
        device="cuda:0",
        use_triton=USE_TRITON,
        quantize_config=quantize_config
    )

    logging.set_verbosity(logging.CRITICAL)

    prompt = "Hello, how are you today?"
    prompt_template = f"""### {USERNAME}: {prompt}
### Assistant:"""

    pipe = pipeline(
        "text-generation",
        model=model,
        tokenizer=tokenizer,
        max_new_tokens=512,
        temperature=0.9,
        top_p=0.9,
        repetition_penalty=1.15
    )

    print(pipe(prompt_template)[0]["generated_text"])

    input_ids = tokenizer(prompt_template, return_tensors="pt").input_ids.cuda()
    output = model.generate(inputs=input_ids, temperature=0.9, max_new_tokens=512)

    print(tokenizer.decode(output[0]))

if __name__ == "__main__":
    main()

Overall, I'm confused about what to do to fix it and understand AutoGPTQ. If anyone could help, that would be appreciated.

@thunderamental
Copy link

try changing your MODEL_BASENAME to 4bit-32g (the weights name without the .safetensors file extension)? sometimes these paths are fickle.

@CrazyBrick
Copy link

4bit-32g

I met the same problem when inference (Qwen-VL-Chat-Int4) with “AutoGPTQForCausalLM.from_quantized”:
File "/root/miniconda3/lib/python3.8/site-packages/auto_gptq/modeling/_base.py", line 802, in from_quantized raise FileNotFoundError(f"Could not find model in {model_name_or_path}") FileNotFoundError: Could not find model in ../Qwen-VL-Chat-Int4

I check the quantize_config.json file:"model_name_or_path": "model"
"model_file_base_name": "model"

and the name of quantized model : model-00001-of-00005.safetensors,model-00002-of-00005.safetensors....

How should I solve the problem?

@fxmarty
Copy link
Collaborator

fxmarty commented Oct 27, 2023

Hi @shakealeg, the model loading was improved in #383. Please pass the argument model_basename for custom model names, otherwise we greedily search quantize_config.model_file_base_name and models with basename f"gptq_model-{quantize_config.bits}bit-{quantize_config.group_size}g" or "model".

use_safetensors used to be False by default, which was turned to True by default in the above PR.

@CrazyBrick There is a WIP PR to support sharded models in AutoGPTQ: #364

@fxmarty fxmarty changed the title Confused on why this isn't working, confused about quantization. Support sharded quantized model files in from_quantized Oct 27, 2023
@fxmarty fxmarty added enhancement New feature or request help wanted Extra attention is needed labels Oct 27, 2023
@CrazyBrick
Copy link

CrazyBrick commented Oct 27, 2023

Hi @shakealeg, the model loading was improved in #383. Please pass the argument model_basename for custom model names, otherwise we greedily search quantize_config.model_file_base_name and models with basename f"gptq_model-{quantize_config.bits}bit-{quantize_config.group_size}g" or "model".

use_safetensors used to be False by default, which was turned to True by default in the above PR.

@CrazyBrick There is a WIP PR to support sharded models in AutoGPTQ: #364

Hi,@fxmarty,Thank you for you reply.I changed the code according to the PR#364(mainly in ``),but it dosen't work.
I print some variables while debugging:

model_name_or_path:Qwen/Qwen-VL-Chat-Int4
isdir(model_name_or_path):True
model_save_name:Qwen/Qwen-VL-Chat-Int4/model

But I have Qwen/Qwen-VL-Chat-Int4/model-00001-of-00005.safetensors Qwen/Qwen-VL-Chat-Int4/model-00001-of-00005.safetensors(00001~00005)

It won't find the whole model through one splitname(when I change the "model_file_base_name": "model" to `"model_file_base_name": "model-00001-of-00005",

then I will get NotImplementedError: Cannot copy out of meta tensor; no data!

The following is the code I am using:
from [Qwen-VL-Chat-Int4](https://huggingface.co/Qwen/Qwen-VL-Chat-Int4/tree/main)

from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
quantized_model_dir = "Qwen/Qwen-VL-Chat-Int4"
tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir, trust_remote_code=True)
# use cuda device
model = AutoGPTQForCausalLM.from_quantized(
    quantized_model_dir, 
    device_map="auto", 
    use_safetensors=True,
    low_cpu_mem_usage=True,
    trust_remote_code=True).eval()  # cuda
print(model.hf_device_map)

what should I do?

@fxmarty
Copy link
Collaborator

fxmarty commented Oct 27, 2023

Hi @CrazyBrick, the model you are trying to use uses sharded checkpoints. This is unfortunately currently not supported in AutoGPTQ, there is a WIP PR open for it #364

@xiayq1
Copy link

xiayq1 commented Apr 25, 2024

Hi @shakealeg, the model loading was improved in #383. Please pass the argument model_basename for custom model names, otherwise we greedily search quantize_config.model_file_base_name and models with basename f"gptq_model-{quantize_config.bits}bit-{quantize_config.group_size}g" or "model".
use_safetensors used to be False by default, which was turned to True by default in the above PR.
@CrazyBrick There is a WIP PR to support sharded models in AutoGPTQ: #364

Hi,@fxmarty,Thank you for you reply.I changed the code according to the PR#364(mainly in ``),but it dosen't work. I print some variables while debugging:

model_name_or_path:Qwen/Qwen-VL-Chat-Int4
isdir(model_name_or_path):True
model_save_name:Qwen/Qwen-VL-Chat-Int4/model

But I have Qwen/Qwen-VL-Chat-Int4/model-00001-of-00005.safetensors Qwen/Qwen-VL-Chat-Int4/model-00001-of-00005.safetensors(00001~00005)

It won't find the whole model through one splitname(when I change the "model_file_base_name": "model" to `"model_file_base_name": "model-00001-of-00005",

then I will get NotImplementedError: Cannot copy out of meta tensor; no data!

The following is the code I am using: from [Qwen-VL-Chat-Int4](https://huggingface.co/Qwen/Qwen-VL-Chat-Int4/tree/main)

from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
quantized_model_dir = "Qwen/Qwen-VL-Chat-Int4"
tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir, trust_remote_code=True)
# use cuda device
model = AutoGPTQForCausalLM.from_quantized(
    quantized_model_dir, 
    device_map="auto", 
    use_safetensors=True,
    low_cpu_mem_usage=True,
    trust_remote_code=True).eval()  # cuda
print(model.hf_device_map)

what should I do?

do you solve the bug? Met the same situation.....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants