Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with new wheels, people can't install 0.2.0 with CUDA 11.8 #124

Closed
TheBloke opened this issue Jun 2, 2023 · 34 comments
Closed

Issue with new wheels, people can't install 0.2.0 with CUDA 11.8 #124

TheBloke opened this issue Jun 2, 2023 · 34 comments
Labels
bug Something isn't working

Comments

@TheBloke
Copy link
Contributor

TheBloke commented Jun 2, 2023

Awesome work on the 0.2.0 release and the wheels, PanQiWei! Thousands of new people are trying AutoGPTQ today and that is amazing.

Got an issue that's affecting some of them:

Describe the bug
People trying to run pip install auto-gptq or pip install auto-gptq==0.2.0 are getting the follow errors:

Requested auto-gptq==0.2.0 from https://files.pythonhosted.org/packages/b1/f9/97153ae5cf926f96fd37e61424a1bb58e0c9991cc220b2e17390fb8bde97/auto_gptq-0.2.0.tar.gz has inconsistent version: expected '0.2.0', but metadata has '0.2.0+cu1180'
ERROR: Could not find a version that satisfies the requirement auto-gptq==0.2.0 (from versions: 0.0.4, 0.0.5, 0.1.0, 0.2.0)

Full log:

Found existing installation: auto-gptq 0.1.0
Uninstalling auto-gptq-0.1.0:
  Successfully uninstalled auto-gptq-0.1.0
Using pip 23.1.2 from /usr/local/lib/python3.10/dist-packages/pip (python 3.10)
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting auto-gptq==0.2.0
  Using cached auto_gptq-0.2.0.tar.gz (47 kB)
  Running command python setup.py egg_info
  running egg_info
  creating /tmp/pip-pip-egg-info-d0sklosj/auto_gptq.egg-info
  writing /tmp/pip-pip-egg-info-d0sklosj/auto_gptq.egg-info/PKG-INFO
  writing dependency_links to /tmp/pip-pip-egg-info-d0sklosj/auto_gptq.egg-info/dependency_links.txt
  writing requirements to /tmp/pip-pip-egg-info-d0sklosj/auto_gptq.egg-info/requires.txt
  writing top-level names to /tmp/pip-pip-egg-info-d0sklosj/auto_gptq.egg-info/top_level.txt
  writing manifest file '/tmp/pip-pip-egg-info-d0sklosj/auto_gptq.egg-info/SOURCES.txt'
  /usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py:476: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
    warnings.warn(msg.format('we could not find ninja.'))
  reading manifest file '/tmp/pip-pip-egg-info-d0sklosj/auto_gptq.egg-info/SOURCES.txt'
  adding license file 'LICENSE'
  writing manifest file '/tmp/pip-pip-egg-info-d0sklosj/auto_gptq.egg-info/SOURCES.txt'
  Preparing metadata (setup.py) ... done
Discarding https://files.pythonhosted.org/packages/b1/f9/97153ae5cf926f96fd37e61424a1bb58e0c9991cc220b2e17390fb8bde97/auto_gptq-0.2.0.tar.gz (from https://pypi.org/simple/auto-gptq/) (requires-python:>=3.8.0): Requested auto-gptq==0.2.0 from https://files.pythonhosted.org/packages/b1/f9/97153ae5cf926f96fd37e61424a1bb58e0c9991cc220b2e17390fb8bde97/auto_gptq-0.2.0.tar.gz has inconsistent version: expected '0.2.0', but metadata has '0.2.0+cu1180'
ERROR: Could not find a version that satisfies the requirement auto-gptq==0.2.0 (from versions: 0.0.4, 0.0.5, 0.1.0, 0.2.0)
ERROR: No matching distribution found for auto-gptq==0.2.0

Software version
Example of one user with the problem:

  • ubuntu 22.04
  • nvidia/cuda:11.8.0-devel-ubuntu22.04 container which includes a CUDA Version 11.8.0
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

To Reproduce

pip install auto-gptq

Expected behavior
Installs auto-gptq 0.2.0 + cu118

@TheBloke TheBloke added the bug Something isn't working label Jun 2, 2023
@TheBloke TheBloke changed the title Issue with new wheels, people can't install 0.2.0 Issue with new wheels, people can't install 0.2.0 with CUDA 11.8 Jun 2, 2023
@kumpulak
Copy link

kumpulak commented Jun 2, 2023

Also downloading the prebuilt wheels does not seem to work at least with the syntax mentioned in the README

> [3/5] RUN pip install auto_gptq-0.2.0+cu118-cp310-cp310-linux_x86_64.whl: #0 0.448 WARNING: Requirement 'auto_gptq-0.2.0+cu118-cp310-cp310-linux_x86_64.whl' looks like a filename, but the file does not exist #0 0.462 Processing /auto_gptq-0.2.0+cu118-cp310-cp310-linux_x86_64.whl #0 0.472 ERROR: Could not install packages due to an OSError: [Errno 2] No such file or directory: '/auto_gptq-0.2.0+cu118-cp310-cp310-linux_x86_64.whl'

It seems to expect an already downloaded package.

@PanQiWei
Copy link
Collaborator

PanQiWei commented Jun 2, 2023

@TheBloke Hi, I can install successfully using pip install auto-gptq on both my local computer and cloud server, but I also re-implement your problem when adding environment variable CUDA_VERSION=11.8 before pip command. So I think if you also have added the environment variable, you can just remove it.

@PanQiWei
Copy link
Collaborator

PanQiWei commented Jun 2, 2023

Also downloading the prebuilt wheels does not seem to work at least with the syntax mentioned in the README

> [3/5] RUN pip install auto_gptq-0.2.0+cu118-cp310-cp310-linux_x86_64.whl: #0 0.448 WARNING: Requirement 'auto_gptq-0.2.0+cu118-cp310-cp310-linux_x86_64.whl' looks like a filename, but the file does not exist #0 0.462 Processing /auto_gptq-0.2.0+cu118-cp310-cp310-linux_x86_64.whl #0 0.472 ERROR: Could not install packages due to an OSError: [Errno 2] No such file or directory: '/auto_gptq-0.2.0+cu118-cp310-cp310-linux_x86_64.whl'

It seems to expect an already downloaded package.

You need to execute the command in a directory where the wheel is downloaded and saved.

@crazycoderF12
Copy link

That user is me btw. I Was using it in the colab. I red from @TheBloke Model Card That for Above cuda version 12.0 Should Compile Fro The Source Code. I did it. but the Version it installed is 0.1.0. When i Run Tom's falcon gptq it Shows me RefineWeb model not Supported Error. I Made a Issue on Tom's Repo and He Told me to install from pip. I Did but Even if the pip project shows version 2, When I Install it Installs Only Version 1. After Some Conversation with Tom. I Compiled from the Source Code. It Succesfully compiled and Installed version version 0.2.0+cuda1180.

But How did it compiled cuda118 when my colab cuda version is 12

@TheBloke
Copy link
Contributor Author

TheBloke commented Jun 2, 2023

@TheBloke Hi, I can install successfully using pip install auto-gptq on both my local computer and cloud server, but I also re-implement your problem when adding environment variable CUDA_VERSION=11.8 before pip command. So I think if you also have added the environment variable, you can just remove it.

It's not a problem for me personally. But I have had several support requests about it this morning, from people trying to use AutoGPTQ from Google Colab and Docker containers - eg @kumpulak is using Docker and @TheFaheem is using Google Colab.

I can inform users to unset CUDA_VERSION but is possible to fix whatever problem is causing this issue so this is not necessary going forward. Otherwise I expect it's going to generate a lot of support requests. I've already had four messages about it this morning.

@TheBloke
Copy link
Contributor Author

TheBloke commented Jun 2, 2023

@TheFaheem you don't have CUDA toolkit 12.0 installed, otherwise it wouldn't work. You have 11.8. You can see your CUDA toolkit version by running:

nvcc --version

eg:

[pytorch2] tomj@a10:/workspace $ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

@crazycoderF12
Copy link

@TheFaheem you don't have CUDA toolkit 12.0 installed, otherwise it wouldn't work. You have 11.8. You can see your CUDA toolkit version by running:

nvcc --version

eg:

[pytorch2] tomj@a10:/workspace $ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

But, While Running !nvidia-smi

It Shows Cuda Version: 12.0

@TheBloke
Copy link
Contributor Author

TheBloke commented Jun 2, 2023

Yes, that is the version supported by your GPU driver. But you have CUDA toolkit 11.8 installed and that is fine. It is the same for me:

[pytorch2] tomj@a10:/workspace $ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
[pytorch2] tomj@a10:/workspace $ nvidia-smi
Fri Jun  2 10:28:46 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A10          On   | 00000000:06:00.0 Off |                    0 |
|  0%   32C    P8    16W / 150W |      0MiB / 23028MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

@PanQiWei
Copy link
Collaborator

PanQiWei commented Jun 2, 2023

@TheBloke Hi, I can install successfully using pip install auto-gptq on both my local computer and cloud server, but I also re-implement your problem when adding environment variable CUDA_VERSION=11.8 before pip command. So I think if you also have added the environment variable, you can just remove it.

It's not a problem for me personally. But I have had several support requests about it this morning, from people trying to use AutoGPTQ from Google Colab and Docker containers - eg @kumpulak is using Docker and @TheFaheem is using Google Colab.

I can inform users to unset CUDA_VERSION but is possible to fix whatever problem is causing this issue so this is not necessary going forward. Otherwise I expect it's going to generate a lot of support requests. I've already had four messages about it this morning.

I just fix the problem that users set CUDA_VERSION when install auto-gptq, I will release a patch fix later.

@TheBloke
Copy link
Contributor Author

TheBloke commented Jun 2, 2023

I just fix the problem that users set CUDA_VERSION when install auto-gptq, I will release a patch fix later.

Thank you so much!

@crazycoderF12
Copy link

crazycoderF12 commented Jun 2, 2023

@PanQiWei Can You Please Explain What are These?

WARNING:accelerate.utils.modeling:The safetensors archive passed at /root/.cache/huggingface/hub/models--TheBloke--WizardLM-Uncensored-Falcon-7B-GPTQ/snapshots/83fd597a3332323e06fe883f680829a498d9fa9f/gptq_model-4bit-64g.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
WARNING:auto_gptq.modeling._base:can't get model's sequence length from model config, will set to 4096.
WARNING:auto_gptq.modeling._base:RWGPTQForCausalLM hasn't fused attention module yet, will skip inject fused attention.
WARNING:auto_gptq.modeling._base:RWGPTQForCausalLM hasn't fused mlp module yet, will skip inject fused mlp.

And is there any parameter or option to stream the output. did you implemented any generator function?

@crazycoderF12
Copy link

Yes, that is the version supported by your GPU driver. But you have CUDA toolkit 11.8 installed and that is fine. It is the same for me:

[pytorch2] tomj@a10:/workspace $ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
[pytorch2] tomj@a10:/workspace $ nvidia-smi
Fri Jun  2 10:28:46 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A10          On   | 00000000:06:00.0 Off |                    0 |
|  0%   32C    P8    16W / 150W |      0MiB / 23028MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Ah... Thanks!. I Thought it's Showing The installed Cuda Version.

@TheBloke
Copy link
Contributor Author

TheBloke commented Jun 2, 2023

@PanQiWei Can You Please Explain What are These?
WARNING:accelerate.utils.modeling:The safetensors archive passed at /root/.cache/huggingface/hub/models--TheBloke--WizardLM-Uncensored-Falcon-7B-GPTQ/snapshots/83fd597a3332323e06fe883f680829a498d9fa9f/gptq_model-4bit-64g.safetensors does not contain metadata. Make sure to save your model with the save_pretrained method. Defaulting to 'pt' metadata.
WARNING:auto_gptq.modeling._base:can't get model's sequence length from model config, will set to 4096.
WARNING:auto_gptq.modeling._base:RWGPTQForCausalLM hasn't fused attention module yet, will skip inject fused attention.
WARNING:auto_gptq.modeling._base:RWGPTQForCausalLM hasn't fused mlp module yet, will skip inject fused mlp.

Ah yeah, I was going to raise another issue about this @PanQiWei

People are quite confused by all the WARNINGs that get printed, which actually are just information

I think it would be a good idea to print these messages as logger.info instead. And to hide the accelerate message WARNING:accelerate.utils.modeling:

People think something is wrong, when it's actually all fine.

In the case of these messages:

WARNING:auto_gptq.modeling._base:RWGPTQForCausalLM hasn't fused attention module yet, will skip inject fused attention.
WARNING:auto_gptq.modeling._base:RWGPTQForCausalLM hasn't fused mlp module yet, will skip inject fused mlp.

Perhaps these messages should only be printed if the user actually passed inject_fused_attention=True or inject_fused_mlp=True. But not when it gets set by default?

@PanQiWei
Copy link
Collaborator

PanQiWei commented Jun 2, 2023

You are right, I should improve the warnings and set some arguments' default value to None and reset to proper value internally if users not manually specify them.

@TheBloke
Copy link
Contributor Author

TheBloke commented Jun 2, 2023

And is there any parameter or option to stream the output. did you implemented any generator function?

There is no streaming code in AutoGPTQ at the moment I think.

But you could use third party software like text-generation-webui. That can provide an API, and that API has a streaming option. See example API script here: https://github.com/oobabooga/text-generation-webui/blob/main/api-example-stream.py

@PanQiWei
Copy link
Collaborator

PanQiWei commented Jun 2, 2023

And is there any parameter or option to stream the output. did you implemented any generator function?

@TheFaheem auto-gptq is compatiple with hf transformer's TextGenerationPipeline, so it's streamer should also be used for auto-gptq's models, but I haven't try it yet.

@kumpulak
Copy link

kumpulak commented Jun 2, 2023

I can confirm unsetting the CUDA_VERSION works. My Dockerfile installs it like this now:

RUN unset CUDA_VERSION && pip install auto-gptq==0.2.0

Contents of the CUDA_VERSION variable was CUDA_VERSION=11.8.0 and it's set by default in the nvidia/cuda:11.8.0-devel-ubuntu22.04 container.

@crazycoderF12
Copy link

crazycoderF12 commented Jun 2, 2023

@PanQiWei @TheBloke When i Tried to Compile AutoGPTQ in kaggle nb i goth the following:

Cloning into 'AutoGPTQ'...
remote: Enumerating objects: 2114, done.
remote: Counting objects: 100% (445/445), done.
remote: Compressing objects: 100% (240/240), done.
remote: Total 2114 (delta 278), reused 244 (delta 194), pack-reused 1669
Receiving objects: 100% (2114/2114), 7.41 MiB | 17.94 MiB/s, done.
Resolving deltas: 100% (1408/1408), done.
/kaggle/working/AutoGPTQ
Processing /kaggle/working/AutoGPTQ
  Preparing metadata (setup.py) ... done
Collecting accelerate>=0.19.0 (from auto-gptq==0.2.0)
  Downloading accelerate-0.19.0-py3-none-any.whl (219 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 219.1/219.1 kB 5.3 MB/s eta 0:00:0000:01
Requirement already satisfied: datasets in /opt/conda/lib/python3.10/site-packages (from auto-gptq==0.2.0) (2.1.0)
Requirement already satisfied: numpy in /opt/conda/lib/python3.10/site-packages (from auto-gptq==0.2.0) (1.23.5)
Collecting rouge (from auto-gptq==0.2.0)
  Downloading rouge-1.0.1-py3-none-any.whl (13 kB)
Requirement already satisfied: torch>=1.13.0 in /opt/conda/lib/python3.10/site-packages (from auto-gptq==0.2.0) (2.0.0)
Requirement already satisfied: safetensors in /opt/conda/lib/python3.10/site-packages (from auto-gptq==0.2.0) (0.3.1)
Requirement already satisfied: transformers>=4.26.1 in /opt/conda/lib/python3.10/site-packages (from auto-gptq==0.2.0) (4.30.0.dev0)
Requirement already satisfied: packaging>=20.0 in /opt/conda/lib/python3.10/site-packages (from accelerate>=0.19.0->auto-gptq==0.2.0) (21.3)
Requirement already satisfied: psutil in /opt/conda/lib/python3.10/site-packages (from accelerate>=0.19.0->auto-gptq==0.2.0) (5.9.3)
Requirement already satisfied: pyyaml in /opt/conda/lib/python3.10/site-packages (from accelerate>=0.19.0->auto-gptq==0.2.0) (5.4.1)
Requirement already satisfied: filelock in /opt/conda/lib/python3.10/site-packages (from torch>=1.13.0->auto-gptq==0.2.0) (3.12.0)
Requirement already satisfied: typing-extensions in /opt/conda/lib/python3.10/site-packages (from torch>=1.13.0->auto-gptq==0.2.0) (4.5.0)
Requirement already satisfied: sympy in /opt/conda/lib/python3.10/site-packages (from torch>=1.13.0->auto-gptq==0.2.0) (1.12)
Requirement already satisfied: networkx in /opt/conda/lib/python3.10/site-packages (from torch>=1.13.0->auto-gptq==0.2.0) (3.1)
Requirement already satisfied: jinja2 in /opt/conda/lib/python3.10/site-packages (from torch>=1.13.0->auto-gptq==0.2.0) (3.1.2)
Requirement already satisfied: huggingface-hub<1.0,>=0.14.1 in /opt/conda/lib/python3.10/site-packages (from transformers>=4.26.1->auto-gptq==0.2.0) (0.14.1)
Requirement already satisfied: regex!=2019.12.17 in /opt/conda/lib/python3.10/site-packages (from transformers>=4.26.1->auto-gptq==0.2.0) (2023.5.5)
Requirement already satisfied: requests in /opt/conda/lib/python3.10/site-packages (from transformers>=4.26.1->auto-gptq==0.2.0) (2.28.2)
Requirement already satisfied: tokenizers!=0.11.3,<0.14,>=0.11.1 in /opt/conda/lib/python3.10/site-packages (from transformers>=4.26.1->auto-gptq==0.2.0) (0.13.3)
Requirement already satisfied: tqdm>=4.27 in /opt/conda/lib/python3.10/site-packages (from transformers>=4.26.1->auto-gptq==0.2.0) (4.64.1)
Requirement already satisfied: pyarrow>=5.0.0 in /opt/conda/lib/python3.10/site-packages (from datasets->auto-gptq==0.2.0) (10.0.1)
Requirement already satisfied: dill in /opt/conda/lib/python3.10/site-packages (from datasets->auto-gptq==0.2.0) (0.3.6)
Requirement already satisfied: pandas in /opt/conda/lib/python3.10/site-packages (from datasets->auto-gptq==0.2.0) (1.5.3)
Requirement already satisfied: xxhash in /opt/conda/lib/python3.10/site-packages (from datasets->auto-gptq==0.2.0) (3.2.0)
Requirement already satisfied: multiprocess in /opt/conda/lib/python3.10/site-packages (from datasets->auto-gptq==0.2.0) (0.70.14)
Requirement already satisfied: fsspec[http]>=2021.05.0 in /opt/conda/lib/python3.10/site-packages (from datasets->auto-gptq==0.2.0) (2023.5.0)
Requirement already satisfied: aiohttp in /opt/conda/lib/python3.10/site-packages (from datasets->auto-gptq==0.2.0) (3.8.4)
Requirement already satisfied: responses<0.19 in /opt/conda/lib/python3.10/site-packages (from datasets->auto-gptq==0.2.0) (0.18.0)
Requirement already satisfied: six in /opt/conda/lib/python3.10/site-packages (from rouge->auto-gptq==0.2.0) (1.16.0)
Requirement already satisfied: attrs>=17.3.0 in /opt/conda/lib/python3.10/site-packages (from aiohttp->datasets->auto-gptq==0.2.0) (23.1.0)
Requirement already satisfied: charset-normalizer<4.0,>=2.0 in /opt/conda/lib/python3.10/site-packages (from aiohttp->datasets->auto-gptq==0.2.0) (2.1.1)
Requirement already satisfied: multidict<7.0,>=4.5 in /opt/conda/lib/python3.10/site-packages (from aiohttp->datasets->auto-gptq==0.2.0) (6.0.4)
Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /opt/conda/lib/python3.10/site-packages (from aiohttp->datasets->auto-gptq==0.2.0) (4.0.2)
Requirement already satisfied: yarl<2.0,>=1.0 in /opt/conda/lib/python3.10/site-packages (from aiohttp->datasets->auto-gptq==0.2.0) (1.9.1)
Requirement already satisfied: frozenlist>=1.1.1 in /opt/conda/lib/python3.10/site-packages (from aiohttp->datasets->auto-gptq==0.2.0) (1.3.3)
Requirement already satisfied: aiosignal>=1.1.2 in /opt/conda/lib/python3.10/site-packages (from aiohttp->datasets->auto-gptq==0.2.0) (1.3.1)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /opt/conda/lib/python3.10/site-packages (from packaging>=20.0->accelerate>=0.19.0->auto-gptq==0.2.0) (3.0.9)
Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.10/site-packages (from requests->transformers>=4.26.1->auto-gptq==0.2.0) (3.4)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/conda/lib/python3.10/site-packages (from requests->transformers>=4.26.1->auto-gptq==0.2.0) (1.26.15)
Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.10/site-packages (from requests->transformers>=4.26.1->auto-gptq==0.2.0) (2023.5.7)
Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/lib/python3.10/site-packages (from jinja2->torch>=1.13.0->auto-gptq==0.2.0) (2.1.2)
Requirement already satisfied: python-dateutil>=2.8.1 in /opt/conda/lib/python3.10/site-packages (from pandas->datasets->auto-gptq==0.2.0) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /opt/conda/lib/python3.10/site-packages (from pandas->datasets->auto-gptq==0.2.0) (2023.3)
Requirement already satisfied: mpmath>=0.19 in /opt/conda/lib/python3.10/site-packages (from sympy->torch>=1.13.0->auto-gptq==0.2.0) (1.3.0)
Building wheels for collected packages: auto-gptq
  Building wheel for auto-gptq (setup.py) ... error
  error: subprocess-exited-with-error
  
  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> [95 lines of output]
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib.linux-x86_64-3.10
      creating build/lib.linux-x86_64-3.10/auto_gptq
      copying auto_gptq/__init__.py -> build/lib.linux-x86_64-3.10/auto_gptq
      creating build/lib.linux-x86_64-3.10/auto_gptq/modeling
      copying auto_gptq/modeling/gpt_bigcode.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
      copying auto_gptq/modeling/llama.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
      copying auto_gptq/modeling/moss.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
      copying auto_gptq/modeling/gpt_neox.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
      copying auto_gptq/modeling/__init__.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
      copying auto_gptq/modeling/_const.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
      copying auto_gptq/modeling/bloom.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
      copying auto_gptq/modeling/_base.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
      copying auto_gptq/modeling/_utils.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
      copying auto_gptq/modeling/opt.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
      copying auto_gptq/modeling/rw.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
      copying auto_gptq/modeling/gpt2.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
      copying auto_gptq/modeling/codegen.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
      copying auto_gptq/modeling/auto.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
      copying auto_gptq/modeling/gptj.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
      creating build/lib.linux-x86_64-3.10/auto_gptq/eval_tasks
      copying auto_gptq/eval_tasks/sequence_classification_task.py -> build/lib.linux-x86_64-3.10/auto_gptq/eval_tasks
      copying auto_gptq/eval_tasks/text_summarization_task.py -> build/lib.linux-x86_64-3.10/auto_gptq/eval_tasks
      copying auto_gptq/eval_tasks/__init__.py -> build/lib.linux-x86_64-3.10/auto_gptq/eval_tasks
      copying auto_gptq/eval_tasks/_base.py -> build/lib.linux-x86_64-3.10/auto_gptq/eval_tasks
      copying auto_gptq/eval_tasks/language_modeling_task.py -> build/lib.linux-x86_64-3.10/auto_gptq/eval_tasks
      creating build/lib.linux-x86_64-3.10/auto_gptq/quantization
      copying auto_gptq/quantization/__init__.py -> build/lib.linux-x86_64-3.10/auto_gptq/quantization
      copying auto_gptq/quantization/quantizer.py -> build/lib.linux-x86_64-3.10/auto_gptq/quantization
      copying auto_gptq/quantization/gptq.py -> build/lib.linux-x86_64-3.10/auto_gptq/quantization
      creating build/lib.linux-x86_64-3.10/auto_gptq/utils
      copying auto_gptq/utils/__init__.py -> build/lib.linux-x86_64-3.10/auto_gptq/utils
      copying auto_gptq/utils/import_utils.py -> build/lib.linux-x86_64-3.10/auto_gptq/utils
      copying auto_gptq/utils/data_utils.py -> build/lib.linux-x86_64-3.10/auto_gptq/utils
      creating build/lib.linux-x86_64-3.10/auto_gptq/nn_modules
      copying auto_gptq/nn_modules/qlinear_triton.py -> build/lib.linux-x86_64-3.10/auto_gptq/nn_modules
      copying auto_gptq/nn_modules/qlinear.py -> build/lib.linux-x86_64-3.10/auto_gptq/nn_modules
      copying auto_gptq/nn_modules/__init__.py -> build/lib.linux-x86_64-3.10/auto_gptq/nn_modules
      copying auto_gptq/nn_modules/qlinear_old.py -> build/lib.linux-x86_64-3.10/auto_gptq/nn_modules
      copying auto_gptq/nn_modules/fused_llama_mlp.py -> build/lib.linux-x86_64-3.10/auto_gptq/nn_modules
      copying auto_gptq/nn_modules/fused_llama_attn.py -> build/lib.linux-x86_64-3.10/auto_gptq/nn_modules
      copying auto_gptq/nn_modules/fused_gptj_attn.py -> build/lib.linux-x86_64-3.10/auto_gptq/nn_modules
      copying auto_gptq/nn_modules/_fused_base.py -> build/lib.linux-x86_64-3.10/auto_gptq/nn_modules
      creating build/lib.linux-x86_64-3.10/auto_gptq/eval_tasks/_utils
      copying auto_gptq/eval_tasks/_utils/__init__.py -> build/lib.linux-x86_64-3.10/auto_gptq/eval_tasks/_utils
      copying auto_gptq/eval_tasks/_utils/generation_utils.py -> build/lib.linux-x86_64-3.10/auto_gptq/eval_tasks/_utils
      copying auto_gptq/eval_tasks/_utils/classification_utils.py -> build/lib.linux-x86_64-3.10/auto_gptq/eval_tasks/_utils
      creating build/lib.linux-x86_64-3.10/auto_gptq/nn_modules/triton_utils
      copying auto_gptq/nn_modules/triton_utils/mixin.py -> build/lib.linux-x86_64-3.10/auto_gptq/nn_modules/triton_utils
      copying auto_gptq/nn_modules/triton_utils/__init__.py -> build/lib.linux-x86_64-3.10/auto_gptq/nn_modules/triton_utils
      copying auto_gptq/nn_modules/triton_utils/kernels.py -> build/lib.linux-x86_64-3.10/auto_gptq/nn_modules/triton_utils
      copying auto_gptq/nn_modules/triton_utils/custom_autotune.py -> build/lib.linux-x86_64-3.10/auto_gptq/nn_modules/triton_utils
      running build_ext
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/kaggle/working/AutoGPTQ/setup.py", line 91, in <module>
          setup(
        File "/opt/conda/lib/python3.10/site-packages/setuptools/__init__.py", line 153, in setup
          return distutils.core.setup(**attrs)
        File "/opt/conda/lib/python3.10/distutils/core.py", line 148, in setup
          dist.run_commands()
        File "/opt/conda/lib/python3.10/distutils/dist.py", line 966, in run_commands
          self.run_command(cmd)
        File "/opt/conda/lib/python3.10/distutils/dist.py", line 985, in run_command
          cmd_obj.run()
        File "/opt/conda/lib/python3.10/site-packages/wheel/bdist_wheel.py", line 343, in run
          self.run_command("build")
        File "/opt/conda/lib/python3.10/distutils/cmd.py", line 313, in run_command
          self.distribution.run_command(command)
        File "/opt/conda/lib/python3.10/distutils/dist.py", line 985, in run_command
          cmd_obj.run()
        File "/opt/conda/lib/python3.10/distutils/command/build.py", line 135, in run
          self.run_command(cmd_name)
        File "/opt/conda/lib/python3.10/distutils/cmd.py", line 313, in run_command
          self.distribution.run_command(command)
        File "/opt/conda/lib/python3.10/distutils/dist.py", line 985, in run_command
          cmd_obj.run()
        File "/opt/conda/lib/python3.10/site-packages/setuptools/command/build_ext.py", line 79, in run
          _build_ext.run(self)
        File "/opt/conda/lib/python3.10/site-packages/Cython/Distutils/old_build_ext.py", line 186, in run
          _build_ext.build_ext.run(self)
        File "/opt/conda/lib/python3.10/distutils/command/build_ext.py", line 340, in run
          self.build_extensions()
        File "/opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 499, in build_extensions
          _check_cuda_version(compiler_name, compiler_version)
        File "/opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 387, in _check_cuda_version
          raise RuntimeError(CUDA_MISMATCH_MESSAGE.format(cuda_str_version, torch.version.cuda))
      RuntimeError:
      The detected CUDA version (12.1) mismatches the version that was used to compile
      PyTorch (11.8). Please make sure to use the same CUDA versions.
      
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for auto-gptq
  Running setup.py clean for auto-gptq
Failed to build auto-gptq
ERROR: Could not build wheels for auto-gptq, which is required to install pyproject.toml-based projects

I used !nvcc --version to Check The Cuda version it shows this:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0

How Should I Get Rid of These and Compile Successfully?

It'll be Very Helpful!

@PanQiWei
Copy link
Collaborator

PanQiWei commented Jun 2, 2023

This is because the cuda version used in pytorch isn't compatible with the one that computer installed, because the major version is different. In this case, you can only install using the cu118 pre-compiled wheel, install from source will fail. Or you can also first compile pytorch from sorce, then install auto-gptq from source.

@crazycoderF12
Copy link

This is because the cuda version used in pytorch isn't compatible with the one that computer installed, because the major version is different. In this case, you can only install using the cu118 pre-compiled wheel, install from source will fail. Or you can also first compile pytorch from sorce, then install auto-gptq from source.

Can You Please Elaborate More and me What Should I Do More Clearly?

@PanQiWei
Copy link
Collaborator

PanQiWei commented Jun 2, 2023

when the cuda's major version of pytorch used and computer installed are different, you can't compile auto-gptq from source code, instead, you can use pre-compiled wheel to install.

@crazycoderF12
Copy link

crazycoderF12 commented Jun 2, 2023

And is there any parameter or option to stream the output. did you implemented any generator function?

@TheFaheem auto-gptq is compatiple with hf transformer's TextGenerationPipeline, so it's streamer should also be used for auto-gptq's models, but I haven't try it yet.

is there any example scripts? please!

@crazycoderF12
Copy link

when the cuda's major version of pytorch used and computer installed are different, you can't compile auto-gptq from source code, instead, you can use pre-compiled wheel to install.

Yes, I Solved it using PreCompiled wheel

@Rishav-hub
Copy link

when the cuda's major version of pytorch used and computer installed are different, you can't compile auto-gptq from source code, instead, you can use pre-compiled wheel to install.

Yes, I Solved it using PreCompiled wheel

@TheFaheem can you list the steps you followed, this would be helpful to conclude.

@crazycoderF12
Copy link

when the cuda's major version of pytorch used and computer installed are different, you can't compile auto-gptq from source code, instead, you can use pre-compiled wheel to install.

Yes, I Solved it using PreCompiled wheel

@TheFaheem can you list the steps you followed, this would be helpful to conclude.

Here You Go MyFriend!

Get the link of the Precompiled wheel from the latest release and install like this

!pip install {link to the wheel}

@crazycoderF12
Copy link

@PanQiWei When i'm Using The TextGenerationpipeline,

I Got The Following Warning:

The model 'RWGPTQForCausalLM' is not supported for . Supported models are ['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'CodeGenForCausalLM', 'CpmAntForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'GitForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'LlamaForCausalLM', 'MarianForCausalLM', 'MBartForCausalLM', 'MegaForCausalLM', 'MegatronBertForCausalLM', 'MvpForCausalLM', 'OpenLlamaForCausalLM', 'OpenAIGPTLMHeadModel', 'OPTForCausalLM', 'PegasusForCausalLM', 'PLBartForCausalLM', 'ProphetNetForCausalLM', 'QDQBertLMHeadModel', 'ReformerModelWithLMHead', 'RemBertForCausalLM', 'RobertaForCausalLM', 'RobertaPreLayerNormForCausalLM', 'RoCBertForCausalLM', 'RoFormerForCausalLM', 'RwkvForCausalLM', 'Speech2Text2ForCausalLM', 'TransfoXLLMHeadModel', 'TrOCRForCausalLM', 'XGLMForCausalLM', 'XLMWithLMHeadModel', 'XLMProphetNetForCausalLM', 'XLMRobertaForCausalLM', 'XLMRobertaXLForCausalLM', 'XLNetLMHeadModel', 'XmodForCausalLM'].

But it runs fine. Why Does This Shows up?

@crazycoderF12
Copy link

Also How Does This Precompiled Wheel https://github.com/PanQiWei/AutoGPTQ/releases/download/v0.2.1/auto_gptq-0.2.1+cu118-cp310-cp310-linux_x86_64.whl works when my cuda version is 12.1 ?

@TheBloke @PanQiWei

@PanQiWei
Copy link
Collaborator

PanQiWei commented Jun 2, 2023

@PanQiWei When i'm Using The TextGenerationpipeline,

I Got The Following Warning:

The model 'RWGPTQForCausalLM' is not supported for . Supported models are ['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'CodeGenForCausalLM', 'CpmAntForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'GitForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'LlamaForCausalLM', 'MarianForCausalLM', 'MBartForCausalLM', 'MegaForCausalLM', 'MegatronBertForCausalLM', 'MvpForCausalLM', 'OpenLlamaForCausalLM', 'OpenAIGPTLMHeadModel', 'OPTForCausalLM', 'PegasusForCausalLM', 'PLBartForCausalLM', 'ProphetNetForCausalLM', 'QDQBertLMHeadModel', 'ReformerModelWithLMHead', 'RemBertForCausalLM', 'RobertaForCausalLM', 'RobertaPreLayerNormForCausalLM', 'RoCBertForCausalLM', 'RoFormerForCausalLM', 'RwkvForCausalLM', 'Speech2Text2ForCausalLM', 'TransfoXLLMHeadModel', 'TrOCRForCausalLM', 'XGLMForCausalLM', 'XLMWithLMHeadModel', 'XLMProphetNetForCausalLM', 'XLMRobertaForCausalLM', 'XLMRobertaXLForCausalLM', 'XLNetLMHeadModel', 'XmodForCausalLM'].

But it runs fine. Why Does This Shows up?

This is a warning raised from hf transformers for models that not officially supported by them, but you can just ignore it in auto-gptq.

Also How Does This Precompiled Wheel https://github.com/PanQiWei/AutoGPTQ/releases/download/v0.2.1/auto_gptq-0.2.1+cu118-cp310-cp310-linux_x86_64.whl works when my cuda version is 12.1 ?

This is because the backward compatibility of cuda

@crazycoderF12
Copy link

@PanQiWei When i'm Using The TextGenerationpipeline,
I Got The Following Warning:

The model 'RWGPTQForCausalLM' is not supported for . Supported models are ['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'CodeGenForCausalLM', 'CpmAntForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'GitForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'LlamaForCausalLM', 'MarianForCausalLM', 'MBartForCausalLM', 'MegaForCausalLM', 'MegatronBertForCausalLM', 'MvpForCausalLM', 'OpenLlamaForCausalLM', 'OpenAIGPTLMHeadModel', 'OPTForCausalLM', 'PegasusForCausalLM', 'PLBartForCausalLM', 'ProphetNetForCausalLM', 'QDQBertLMHeadModel', 'ReformerModelWithLMHead', 'RemBertForCausalLM', 'RobertaForCausalLM', 'RobertaPreLayerNormForCausalLM', 'RoCBertForCausalLM', 'RoFormerForCausalLM', 'RwkvForCausalLM', 'Speech2Text2ForCausalLM', 'TransfoXLLMHeadModel', 'TrOCRForCausalLM', 'XGLMForCausalLM', 'XLMWithLMHeadModel', 'XLMProphetNetForCausalLM', 'XLMRobertaForCausalLM', 'XLMRobertaXLForCausalLM', 'XLNetLMHeadModel', 'XmodForCausalLM'].

But it runs fine. Why Does This Shows up?

This is a warning raised from hf transformers for models that not officially supported by them, but you can just ignore it in auto-gptq.

Also How Does This Precompiled Wheel https://github.com/PanQiWei/AutoGPTQ/releases/download/v0.2.1/auto_gptq-0.2.1+cu118-cp310-cp310-linux_x86_64.whl works when my cuda version is 12.1 ?

This is because the backward compatibility of cuda

Ahh... Is this All. Thanks For Clarifying This!

@crazycoderF12
Copy link

And Please Can You create a inference example script using hf pipeline?

@PanQiWei
Copy link
Collaborator

PanQiWei commented Jun 2, 2023

here is a simple example of using auto-gptq quantized model with hf pipeline, for more advanced usage, for now you can turn to hf's tutorials and documents.

Systematic tutorials and example scripts for using auto-gptq will be continued added in with the development progress.

@PanQiWei
Copy link
Collaborator

PanQiWei commented Jun 2, 2023

close this issue for the main problem in here is solved. anyone that have other questions or suggestions can raise in a new issue. ❤️

@PanQiWei PanQiWei closed this as completed Jun 2, 2023
@lucasjinreal
Copy link

ImportError: libcudart.so.12: cannot open shared object file: No such file or directory

@fxmarty
Copy link
Collaborator

fxmarty commented Nov 24, 2023

@lucasjinreal See the issue you opened

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants