Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

量化好几个小时候出现报错,网络问题:ConnectionError #6

Open
chuangzhidan opened this issue May 20, 2024 · 20 comments

Comments

@chuangzhidan
Copy link

chuangzhidan commented May 20, 2024

time cost for block minimization: 98.45541977882385
quant layer 31 done! time cost 295.21963691711426

The quantization duration is 2.6511569974819817
Traceback (most recent call last):
File "/workspace/decoupleQ/copy_llama.py", line 427, in
dataloader, testloader = get_loaders(
File "/workspace/decoupleQ/datautils.py", line 209, in get_loaders
return get_ptb_new(nsamples, seed, seqlen, model)
File "/workspace/decoupleQ/datautils.py", line 140, in get_ptb_new
traindata = load_dataset('ptb_text_only', 'penn_treebank', split='train')
File "/opt/conda/lib/python3.10/site-packages/datasets/load.py", line 1657, in load_dataset
builder_instance = load_dataset_builder(
File "/opt/conda/lib/python3.10/site-packages/datasets/load.py", line 1494, in load_dataset_builder
dataset_module = dataset_module_factory(
File "/opt/conda/lib/python3.10/site-packages/datasets/load.py", line 1204, in dataset_module_factory
raise e1 from None
File "/opt/conda/lib/python3.10/site-packages/datasets/load.py", line 1148, in dataset_module_factory
).get_module()
File "/opt/conda/lib/python3.10/site-packages/datasets/load.py", line 523, in get_module
local_path = self.download_loading_script(revision)
File "/opt/conda/lib/python3.10/site-packages/datasets/load.py", line 506, in download_loading_script
return cached_path(file_path, download_config=self.download_config)
File "/opt/conda/lib/python3.10/site-packages/datasets/utils/file_utils.py", line 298, in cached_path
output_path = get_from_cache(
File "/opt/conda/lib/python3.10/site-packages/datasets/utils/file_utils.py", line 615, in get_from_cache
raise ConnectionError(f"Couldn't reach {url} ({repr(head_error)})")
ConnectionError: Couldn't reach https://raw.githubusercontent.com/huggingface/datasets/1.17.0/datasets/ptb_text_only/ptb_text_only.py (ReadTimeout(ReadTimeoutError("HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Read timed out. (read timeout=10)")))

1:想知道网络问题如何解决
2. 不知道可不可以把量化和这个访问网络的过程分开?解耦下,不然一整个流程的一个小bug,前面的半天的量化就白跑了

@GuoYi0
Copy link
Collaborator

GuoYi0 commented May 21, 2024

量化完了应该会保存模型的,下次可以直接load,不用重新开始量化了。比如手写一个load_state_dict;
下载数据集这个,我这边也发现时好时坏的。可能需要处理一些代理问题。。能正常下载的时候,数据集会缓存在.cache里面。我这边是把.cache文件夹备份了一份,以后就不走下载流程了

@chuangzhidan
Copy link
Author

量化完了应该会保存模型的,下次可以直接load,不用重新开始量化了。比如手写一个load_state_dict; 下载数据集这个,我这边也发现时好时坏的。可能需要处理一些代理问题。。能正常下载的时候,数据集会缓存在.cache里面。我这边是把.cache文件夹备份了一份,以后就不走下载流程了

感谢回复,想知道,我上面那一步有没有到已保存模型的步骤,路径是在?因为好像没看到。
关于数据,有公开的链接可以供大家下载吗,然后手动放到cache或其他指定的路径内?

@GuoYi0
Copy link
Collaborator

GuoYi0 commented May 22, 2024

量化完了应该会保存模型的,下次可以直接load,不用重新开始量化了。比如手写一个load_state_dict; 下载数据集这个,我这边也发现时好时坏的。可能需要处理一些代理问题。。能正常下载的时候,数据集会缓存在.cache里面。我这边是把.cache文件夹备份了一份,以后就不走下载流程了

感谢回复,想知道,我上面那一步有没有到已保存模型的步骤,路径是在?因为好像没看到。 关于数据,有公开的链接可以供大家下载吗,然后手动放到cache或其他指定的路径内?

这一步是在保存量化完了以后的模型https://github.com/bytedance/decoupleQ/blob/main/llama.py#L496

@chuangzhidan
Copy link
Author

chuangzhidan commented May 22, 2024

量化完了应该会保存模型的,下次可以直接load,不用重新开始量化了。比如手写一个load_state_dict; 下载数据集这个,我这边也发现时好时坏的。可能需要处理一些代理问题。。能正常下载的时候,数据集会缓存在.cache里面。我这边是把.cache文件夹备份了一份,以后就不走下载流程了

感谢回复,想知道,我上面那一步有没有到已保存模型的步骤,路径是在?因为好像没看到。 关于数据,有公开的链接可以供大家下载吗,然后手动放到cache或其他指定的路径内?

这一步是在保存量化完了以后的模型https://github.com/bytedance/decoupleQ/blob/main/llama.py#L496

感谢辛苦回复。
1.那应该是已经保存了,看到有true_quant.pth和fake_quant.pth两个文件。接下来推理是怎么做?

2.datasets = ['wikitext2', 'ptb', 'c4']这三个数据集有链接可以下载吗,实在不行的话,也不影响?因为看代码似乎是在做评估,看ppl指标?

@chuangzhidan
Copy link
Author

量化完了应该会保存模型的,下次可以直接load,不用重新开始量化了。比如手写一个load_state_dict; 下载数据集这个,我这边也发现时好时坏的。可能需要处理一些代理问题。。能正常下载的时候,数据集会缓存在.cache里面。我这边是把.cache文件夹备份了一份,以后就不走下载流程了

感谢回复,想知道,我上面那一步有没有到已保存模型的步骤,路径是在?因为好像没看到。 关于数据,有公开的链接可以供大家下载吗,然后手动放到cache或其他指定的路径内?

这一步是在保存量化完了以后的模型https://github.com/bytedance/decoupleQ/blob/main/llama.py#L496

**感谢辛苦回复。
1.那应该是已经保存了,看到有true_quant.pth和fake_quant.pth两个文件。接下来推理是怎么做?

2.datasets = ['wikitext2', 'ptb', 'c4']这三个数据集有链接可以下载吗,实在不行的话,也不影响?因为看代码似乎是在做评估,看ppl指标?**

@GuoYi0
Copy link
Collaborator

GuoYi0 commented May 28, 2024

量化完了应该会保存模型的,下次可以直接load,不用重新开始量化了。比如手写一个load_state_dict; 下载数据集这个,我这边也发现时好时坏的。可能需要处理一些代理问题。。能正常下载的时候,数据集会缓存在.cache里面。我这边是把.cache文件夹备份了一份,以后就不走下载流程了

感谢回复,想知道,我上面那一步有没有到已保存模型的步骤,路径是在?因为好像没看到。 关于数据,有公开的链接可以供大家下载吗,然后手动放到cache或其他指定的路径内?

这一步是在保存量化完了以后的模型https://github.com/bytedance/decoupleQ/blob/main/llama.py#L496

**感谢辛苦回复。 1.那应该是已经保存了,看到有true_quant.pth和fake_quant.pth两个文件。接下来推理是怎么做?

2.datasets = ['wikitext2', 'ptb', 'c4']这三个数据集有链接可以下载吗,实在不行的话,也不影响?因为看代码似乎是在做评估,看ppl指标?**

推理看一下run_inference_llama.sh ?
这三个数据集我这边是运行代码的时候,他自动下载的。后面只是为了eval,不下载没关系,影响不大

@chuangzhidan
Copy link
Author

Traceback (most recent call last):
File "/workspace/decoupleQ/llama.py", line 21, in
from decoupleQ.linear_w2a16 import LinearW2A16, LinearA16
File "/workspace/decoupleQ/decoupleQ/linear_w2a16.py", line 7, in
from .decoupleQ_kernels import dQ_preprocess_weights_int2_for_weight_only, dQ_asymm_qw2_gemm
ModuleNotFoundError: No module named 'decoupleQ.decoupleQ_kernels'

linear_w2a16.py
from .decoupleQ_kernels import dQ_preprocess_weights_int2_for_weight_only, dQ_asymm_qw2_gemm
好的,但运行时这个出了问题

@hsb1995
Copy link

hsb1995 commented May 30, 2024

Traceback (most recent call last): File "/workspace/decoupleQ/llama.py", line 21, in from decoupleQ.linear_w2a16 import LinearW2A16, LinearA16 File "/workspace/decoupleQ/decoupleQ/linear_w2a16.py", line 7, in from .decoupleQ_kernels import dQ_preprocess_weights_int2_for_weight_only, dQ_asymm_qw2_gemm ModuleNotFoundError: No module named 'decoupleQ.decoupleQ_kernels'

linear_w2a16.py from .decoupleQ_kernels import dQ_preprocess_weights_int2_for_weight_only, dQ_asymm_qw2_gemm 好的,但运行时这个出了问题

请问您这个解决了吗?我也出现了这样的问题:
from .decoupleQ_kernels import dQ_preprocess_weights_int2_for_weight_only, dQ_asymm_qw2_gemm
ModuleNotFoundError: No module named 'decoupleQ.decoupleQ_kernels'

@hsb1995
Copy link

hsb1995 commented May 30, 2024

@chuangzhidan

@chuangzhidan
Copy link
Author

Traceback (most recent call last): File "/workspace/decoupleQ/llama.py", line 21, in from decoupleQ.linear_w2a16 import LinearW2A16, LinearA16 File "/workspace/decoupleQ/decoupleQ/linear_w2a16.py", line 7, in from .decoupleQ_kernels import dQ_preprocess_weights_int2_for_weight_only, dQ_asymm_qw2_gemm ModuleNotFoundError: No module named 'decoupleQ.decoupleQ_kernels'
linear_w2a16.py from .decoupleQ_kernels import dQ_preprocess_weights_int2_for_weight_only, dQ_asymm_qw2_gemm 好的,但运行时这个出了问题

请问您这个解决了吗?我也出现了这样的问题: from .decoupleQ_kernels import dQ_preprocess_weights_int2_for_weight_only, dQ_asymm_qw2_gemm ModuleNotFoundError: No module named 'decoupleQ.decoupleQ_kernels'

没有,确实没有看到这个模块,只有请大佬看看了@GuoYi0

@GuoYi0
Copy link
Collaborator

GuoYi0 commented Jun 4, 2024

Traceback (most recent call last): File "/workspace/decoupleQ/llama.py", line 21, in from decoupleQ.linear_w2a16 import LinearW2A16, LinearA16 File "/workspace/decoupleQ/decoupleQ/linear_w2a16.py", line 7, in from .decoupleQ_kernels import dQ_preprocess_weights_int2_for_weight_only, dQ_asymm_qw2_gemm ModuleNotFoundError: No module named 'decoupleQ.decoupleQ_kernels'
linear_w2a16.py from .decoupleQ_kernels import dQ_preprocess_weights_int2_for_weight_only, dQ_asymm_qw2_gemm 好的,但运行时这个出了问题

请问您这个解决了吗?我也出现了这样的问题: from .decoupleQ_kernels import dQ_preprocess_weights_int2_for_weight_only, dQ_asymm_qw2_gemm ModuleNotFoundError: No module named 'decoupleQ.decoupleQ_kernels'

没有,确实没有看到这个模块,只有请大佬看看了@GuoYi0

需要修改一下build.sh,把各种路径指向你实际的工作环境的路径,然后再bash build.sh

@chuangzhidan
Copy link
Author

Traceback (most recent call last): File "/workspace/decoupleQ/llama.py", line 21, in from decoupleQ.linear_w2a16 import LinearW2A16, LinearA16 File "/workspace/decoupleQ/decoupleQ/linear_w2a16.py", line 7, in from .decoupleQ_kernels import dQ_preprocess_weights_int2_for_weight_only, dQ_asymm_qw2_gemm ModuleNotFoundError: No module named 'decoupleQ.decoupleQ_kernels'
linear_w2a16.py from .decoupleQ_kernels import dQ_preprocess_weights_int2_for_weight_only, dQ_asymm_qw2_gemm 好的,但运行时这个出了问题

请问您这个解决了吗?我也出现了这样的问题: from .decoupleQ_kernels import dQ_preprocess_weights_int2_for_weight_only, dQ_asymm_qw2_gemm ModuleNotFoundError: No module named 'decoupleQ.decoupleQ_kernels'

没有,确实没有看到这个模块,只有请大佬看看了@GuoYi0

需要修改一下build.sh,把各种路径指向你实际的工作环境的路径,然后再bash build.sh

听起来使用不是很友好的样子。cp libdecoupleQ_kernels.so 都没看到哪里有libdecoupleQ_kernels.so,运行,然后自己会生成?

@GuoYi0
Copy link
Collaborator

GuoYi0 commented Jun 4, 2024

嗯嗯,cmake的时候会生成

@hsb1995
Copy link

hsb1995 commented Jun 4, 2024

嗯嗯,cmake的时候会生成

我的环境时py3.10行么,您这里是3.9

@GuoYi0
Copy link
Collaborator

GuoYi0 commented Jun 4, 2024

应该可以的,路径都改好就行

@chuangzhidan
Copy link
Author

chuangzhidan commented Jun 4, 2024

嗯嗯,cmake的时候会生成

  • cmake -DCMAKE_PREFIX_PATH=/usr/local/lib/python3.8/dist-packages/torch/share/cmake -DDECOUPLEQ_TORCH_HOME=/usr/local/lib/python3.8/dist-packages/torch
    CMake Warning:
    No source or binary directory provided. Both will be assumed to be the
    same as the current working directory, but note that this warning will
    become a fatal error in future CMake releases.

CMake Error: The source directory "/media/data/xgp/decoupleQ/csrc/build" does not appear to contain CMakeLists.txt.
报错了
修改的是:
cmake -DCMAKE_PREFIX_PATH=$(python3 -c "import torch; print(torch.utils.cmake_prefix_path)")
-DDECOUPLEQ_TORCH_HOME=$(python3 -c "import os; import torch; print(os.path.dirname(torch.file))")
其他不变

@hsb1995
Copy link

hsb1995 commented Jun 4, 2024

嗯嗯,cmake的时候会生成

  • cmake -DCMAKE_PREFIX_PATH=/usr/local/lib/python3.8/dist-packages/torch/share/cmake -DDECOUPLEQ_TORCH_HOME=/usr/local/lib/python3.8/dist-packages/torch
    CMake Warning:
    No source or binary directory provided. Both will be assumed to be the
    same as the current working directory, but note that this warning will
    become a fatal error in future CMake releases.

CMake Error: The source directory "/media/data/xgp/decoupleQ/csrc/build" does not appear to contain CMakeLists.txt. 报错了

我也是,感觉复现的代码有点问题~

@hsb1995
Copy link

hsb1995 commented Jun 4, 2024

应该可以的,路径都改好就行

您可以直接把decoupleQ_kernels这个py文件开源吗?编译的总出问题,或者您受累给个解决方案~我们复现都有问题,我看其他人都是只用到您的推理,没有复现。
感谢您字节大佬,确实工作量很饱满!

@MyPandaShaoxiang
Copy link
Collaborator

@chuangzhidan @hsb1995
咱们是用的build.sh吗,你应该在build目录下执行cmake, 且cmake里需要有 .., 例如
cmake -DCMAKE_PREFIX_PATH=/usr/local/lib/python3.8/dist-packages/torch/share/cmake -DDECOUPLEQ_TORCH_HOME=/usr/local/lib/python3.8/dist-packages/torch 你这里缺少一个..,
应该是
cmake -DCMAKE_PREFIX_PATH=/usr/local/lib/python3.8/dist-packages/torch/share/cmake -DDECOUPLEQ_TORCH_HOME=/usr/local/lib/python3.8/dist-packages/torch ..

@MyPandaShaoxiang
Copy link
Collaborator

最好是直接在build里改一下对应路径然后直接执行呢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants