Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

项目运行360Zhinao-7B-Chat-32K 'NoneType' object is not callable #8

Open
choshiho opened this issue Apr 17, 2024 · 10 comments
Open

Comments

@choshiho
Copy link

运行环境:
python = 3.11.7
pytorch = 2.2.2
transformers = 4.38.2
CUDA = 12.1

参考github官网:https://github.com/Qihoo360/360zhinao

  1. 按顺序安装
    pip install -r requirements.txt
    pip install flash_attn-2.5.6+cu118torch2.2cxx11abiTRUE-cp311-cp311-linux_x86_64.whl

  2. 从ModelScope社区下载360Zhinao-7B-Chat-32K
    from modelscope import snapshot_download
    model_dir_360Zhinao_7B_Chat_32K = snapshot_download("qihoo360/360Zhinao-7B-Chat-32K", revision = "master")

  3. 替换模型地址为
    MODEL_NAME_OR_PATH = "/home/zhifeng.zhao/.cache/modelscope/hub/qihoo360/360Zhinao-7B-Chat-32K"

  4. 运行streamlit run web_demo.py,报错'NoneType' object is not callable,详细报错信息见下方。

(360zhinao) xxx@xxx:~/360zhinao$ streamlit run web_demo.py

Collecting usage statistics. To deactivate, set browser.gatherUsageStats to false.

2024-04-17 17:58:25.122 Did not auto detect external IP.
Please go to https://docs.streamlit.io/ for debugging hints.

You can now view your Streamlit app in your browser.

Network URL: http://192.168.50.126:8501

Please install FlashAttention first, e.g., with pip install flash-attn
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:02<00:00, 2.71it/s]
ic| self.eos_token_id: 158326
self.pad_token_id: 158323
self.im_start_id: 158332
self.im_end_id: 158333
generation_config: GenerationConfig {
"do_sample": true,
"eos_token_id": [
158326,
158332,
158333
],
"max_new_tokens": 512,
"pad_token_id": 158326,
"top_p": 0.8
}

Exception in thread Thread-7 (generate):
Traceback (most recent call last):
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
self.run()
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/threading.py", line 982, in run
self._target(*self._args, **self._kwargs)
File "/home/zhifeng.zhao/.cache/huggingface/modules/transformers_modules/360Zhinao-7B-Chat-32K/modeling_zhinao.py", line 918, in generate
response = super().generate(
^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/transformers/generation/utils.py", line 1592, in generate
return self.sample(
^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/transformers/generation/utils.py", line 2696, in sample
outputs = self(
^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/.cache/huggingface/modules/transformers_modules/360Zhinao-7B-Chat-32K/modeling_zhinao.py", line 816, in forward
outputs = self.model(
^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/.cache/huggingface/modules/transformers_modules/360Zhinao-7B-Chat-32K/modeling_zhinao.py", line 711, in forward
layer_outputs = decoder_layer(
^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/.cache/huggingface/modules/transformers_modules/360Zhinao-7B-Chat-32K/modeling_zhinao.py", line 513, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/.cache/huggingface/modules/transformers_modules/360Zhinao-7B-Chat-32K/modeling_zhinao.py", line 416, in forward
attn_output = self.flash_attention(query_states, key_states, value_states, attention_mask)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/.cache/huggingface/modules/transformers_modules/360Zhinao-7B-Chat-32K/modeling_zhinao.py", line 345, in flash_attention
query_states, key_states, value_states, indices_q, cu_seq_lens, max_seq_lens = self._upad_input(
^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/.cache/huggingface/modules/transformers_modules/360Zhinao-7B-Chat-32K/modeling_zhinao.py", line 442, in _upad_input
key_layer = index_first_axis(key_layer.reshape(batch_size * kv_seq_len, num_heads, head_dim), indices_k)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: 'NoneType' object is not callable

@zhaicunqi
Copy link
Collaborator

这种错误是因为 flash-attn 安装问题,可以先检查一下是否安装成功

@choshiho
Copy link
Author

这种错误是因为 flash-attn 安装问题,可以先检查一下是否安装成功

(360zhinao) xxx@xxx:~/360zhinao$ pip install flash-attention
Collecting flash-attention
Using cached flash_attention-1.0.0-py3-none-any.whl.metadata (274 bytes)
Using cached flash_attention-1.0.0-py3-none-any.whl (31 kB)
Installing collected packages: flash-attention
Successfully installed flash-attention-1.0.0

必须安装指定版本的吗
flash-attn==2.3.6

@jsoncode
Copy link

遇到同样问题,也是pip install flash-attn编译失败 #9

@zhaicunqi
Copy link
Collaborator

zhaicunqi commented Apr 22, 2024

可以自己编译安装试试, FLASH_ATTENTION_FORCE_BUILD=TRUE pip install flash-attn==2.3.6

@zhaicunqi
Copy link
Collaborator

这种错误是因为 flash-attn 安装问题,可以先检查一下是否安装成功

(360zhinao) xxx@xxx:~/360zhinao$ pip install flash-attention Collecting flash-attention Using cached flash_attention-1.0.0-py3-none-any.whl.metadata (274 bytes) Using cached flash_attention-1.0.0-py3-none-any.whl (31 kB) Installing collected packages: flash-attention Successfully installed flash-attention-1.0.0

必须安装指定版本的吗 flash-attn==2.3.6

2.3.6以上的版本都可以

@choshiho
Copy link
Author

这种错误是因为 flash-attn 安装问题,可以先检查一下是否安装成功

(360zhinao) xxx@xxx:~/360zhinao$ pip install flash-attention Collecting flash-attention Using cached flash_attention-1.0.0-py3-none-any.whl.metadata (274 bytes) Using cached flash_attention-1.0.0-py3-none-any.whl (31 kB) Installing collected packages: flash-attention Successfully installed flash-attention-1.0.0
必须安装指定版本的吗 flash-attn==2.3.6

2.3.6以上的版本都可以

pip install flash-attn==2.5.7
安装成功后,运行python cli_demo.py,报错如下所示,我的显卡是RTX6000

image

欢迎使用360智脑大模型,输入进行对话,vim 多行输入,clear 清空历史,stream 开关流式生成,exit 结束。

用户:hi

助手:Exception in thread Thread-2 (generate):
Traceback (most recent call last):
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
self.run()
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/threading.py", line 982, in run
self._target(*self._args, **self._kwargs)
File "/home/zhifeng.zhao/.cache/huggingface/modules/transformers_modules/360Zhinao-7B-Chat-32K/modeling_zhinao.py", line 918, in generate
response = super().generate(
^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/transformers/generation/utils.py", line 1592, in generate
return self.sample(
^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/transformers/generation/utils.py", line 2696, in sample
outputs = self(
^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/.cache/huggingface/modules/transformers_modules/360Zhinao-7B-Chat-32K/modeling_zhinao.py", line 816, in forward
outputs = self.model(
^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/.cache/huggingface/modules/transformers_modules/360Zhinao-7B-Chat-32K/modeling_zhinao.py", line 711, in forward
layer_outputs = decoder_layer(
^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/.cache/huggingface/modules/transformers_modules/360Zhinao-7B-Chat-32K/modeling_zhinao.py", line 513, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/.cache/huggingface/modules/transformers_modules/360Zhinao-7B-Chat-32K/modeling_zhinao.py", line 416, in forward
attn_output = self.flash_attention(query_states, key_states, value_states, attention_mask)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/.cache/huggingface/modules/transformers_modules/360Zhinao-7B-Chat-32K/modeling_zhinao.py", line 352, in flash_attention
attn_output_unpad = flash_attn_varlen_func(
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/flash_attn/flash_attn_interface.py", line 1066, in flash_attn_varlen_func
return FlashAttnVarlenFunc.apply(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/autograd/function.py", line 553, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/flash_attn/flash_attn_interface.py", line 581, in forward
out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state = _flash_attn_varlen_forward(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/flash_attn/flash_attn_interface.py", line 86, in _flash_attn_varlen_forward
out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state = flash_attn_cuda.varlen_fwd(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: FlashAttention only supports Ampere GPUs or newer.

@zhaicunqi
Copy link
Collaborator

这种错误是因为 flash-attn 安装问题,可以先检查一下是否安装成功

(360zhinao) xxx@xxx:~/360zhinao$ pip install flash-attention Collecting flash-attention Using cached flash_attention-1.0.0-py3-none-any.whl.metadata (274 bytes) Using cached flash_attention-1.0.0-py3-none-any.whl (31 kB) Installing collected packages: flash-attention Successfully installed flash-attention-1.0.0
必须安装指定版本的吗 flash-attn==2.3.6

2.3.6以上的版本都可以

pip install flash-attn==2.5.7 安装成功后,运行python cli_demo.py,报错如下所示,我的显卡是RTX6000

image 欢迎使用360智脑大模型,输入进行对话,vim 多行输入,clear 清空历史,stream 开关流式生成,exit 结束。

用户:hi

助手:Exception in thread Thread-2 (generate):
Traceback (most recent call last):
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
self.run()
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/threading.py", line 982, in run
self._target(*self._args, **self._kwargs)
File "/home/zhifeng.zhao/.cache/huggingface/modules/transformers_modules/360Zhinao-7B-Chat-32K/modeling_zhinao.py", line 918, in generate
response = super().generate(
^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/transformers/generation/utils.py", line 1592, in generate
return self.sample(
^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/transformers/generation/utils.py", line 2696, in sample
outputs = self(
^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/.cache/huggingface/modules/transformers_modules/360Zhinao-7B-Chat-32K/modeling_zhinao.py", line 816, in forward
outputs = self.model(
^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/.cache/huggingface/modules/transformers_modules/360Zhinao-7B-Chat-32K/modeling_zhinao.py", line 711, in forward
layer_outputs = decoder_layer(
^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/.cache/huggingface/modules/transformers_modules/360Zhinao-7B-Chat-32K/modeling_zhinao.py", line 513, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/.cache/huggingface/modules/transformers_modules/360Zhinao-7B-Chat-32K/modeling_zhinao.py", line 416, in forward
attn_output = self.flash_attention(query_states, key_states, value_states, attention_mask)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/.cache/huggingface/modules/transformers_modules/360Zhinao-7B-Chat-32K/modeling_zhinao.py", line 352, in flash_attention
attn_output_unpad = flash_attn_varlen_func(
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/flash_attn/flash_attn_interface.py", line 1066, in flash_attn_varlen_func
return FlashAttnVarlenFunc.apply(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/autograd/function.py", line 553, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/flash_attn/flash_attn_interface.py", line 581, in forward
out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state = _flash_attn_varlen_forward(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/flash_attn/flash_attn_interface.py", line 86, in _flash_attn_varlen_forward
out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state = flash_attn_cuda.varlen_fwd(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: FlashAttention only supports Ampere GPUs or newer.

看样子是你的显卡不支持 flash-attn,可以在 config.json 里面把 use_flash_attn 改为 false

@zhaicunqi
Copy link
Collaborator

看样子是你的显卡不支持 flash-attn,可以在 config.json 里面把 use_flash_attn 改为 false

请问一下config.json在哪里啊?

在你下载的模型目录里面

@choshiho
Copy link
Author

看样子是你的显卡不支持 flash-attn,可以在 config.json 里面把 use_flash_attn 改为 false

修改360Zhinao-7B-Chat-32K/config.json后依旧报错RuntimeError: FlashAttention only supports Ampere GPUs or newer.

{ "architectures": [ "ZhinaoForCausalLM" ], "auto_map": { "AutoConfig": "configuration_zhinao.ZhinaoConfig", "AutoModelForCausalLM": "modeling_zhinao.ZhinaoForCausalLM" }, "bf16": true, "fp16": false, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.01, "intermediate_size": 11008, "max_position_embeddings": 32768, "model_max_length": 32768, "model_type": "zhinao", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 32, "rms_norm_eps": 1e-05, "rope_scaling": null, "rope_theta": 50000000.0, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.38.2", "use_cache": false, "use_flash_attn": "false", "vocab_size": 158464 }

@choshiho
Copy link
Author

看样子是你的显卡不支持 flash-attn,可以在 config.json 里面把 use_flash_attn 改为 false

请问一下config.json在哪里啊?

在你下载的模型目录里面

修改use_flash_attn 为 false后,依旧报错,还有其他方式可以运行模型吗?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants