项目运行360Zhinao-7B-Chat-32K 'NoneType' object is not callable #8

choshiho · 2024-04-17T10:07:03Z

运行环境：
python = 3.11.7
pytorch = 2.2.2
transformers = 4.38.2
CUDA = 12.1

参考github官网：https://github.com/Qihoo360/360zhinao

按顺序安装
pip install -r requirements.txt
pip install flash_attn-2.5.6+cu118torch2.2cxx11abiTRUE-cp311-cp311-linux_x86_64.whl
从ModelScope社区下载360Zhinao-7B-Chat-32K
from modelscope import snapshot_download
model_dir_360Zhinao_7B_Chat_32K = snapshot_download("qihoo360/360Zhinao-7B-Chat-32K", revision = "master")
替换模型地址为
MODEL_NAME_OR_PATH = "/home/zhifeng.zhao/.cache/modelscope/hub/qihoo360/360Zhinao-7B-Chat-32K"
运行streamlit run web_demo.py，报错'NoneType' object is not callable，详细报错信息见下方。

(360zhinao) xxx@xxx:~/360zhinao$ streamlit run web_demo.py

Collecting usage statistics. To deactivate, set browser.gatherUsageStats to false.

2024-04-17 17:58:25.122 Did not auto detect external IP.
Please go to https://docs.streamlit.io/ for debugging hints.

You can now view your Streamlit app in your browser.

Network URL: http://192.168.50.126:8501

Please install FlashAttention first, e.g., with pip install flash-attn
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:02<00:00, 2.71it/s]
ic| self.eos_token_id: 158326
self.pad_token_id: 158323
self.im_start_id: 158332
self.im_end_id: 158333
generation_config: GenerationConfig {
"do_sample": true,
"eos_token_id": [
158326,
158332,
158333
],
"max_new_tokens": 512,
"pad_token_id": 158326,
"top_p": 0.8
}

Exception in thread Thread-7 (generate):
Traceback (most recent call last):
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
self.run()
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/threading.py", line 982, in run
self._target(*self._args, **self._kwargs)
File "/home/zhifeng.zhao/.cache/huggingface/modules/transformers_modules/360Zhinao-7B-Chat-32K/modeling_zhinao.py", line 918, in generate
response = super().generate(
^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/transformers/generation/utils.py", line 1592, in generate
return self.sample(
^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/transformers/generation/utils.py", line 2696, in sample
outputs = self(
^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/.cache/huggingface/modules/transformers_modules/360Zhinao-7B-Chat-32K/modeling_zhinao.py", line 816, in forward
outputs = self.model(
^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/.cache/huggingface/modules/transformers_modules/360Zhinao-7B-Chat-32K/modeling_zhinao.py", line 711, in forward
layer_outputs = decoder_layer(
^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/.cache/huggingface/modules/transformers_modules/360Zhinao-7B-Chat-32K/modeling_zhinao.py", line 513, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/.cache/huggingface/modules/transformers_modules/360Zhinao-7B-Chat-32K/modeling_zhinao.py", line 416, in forward
attn_output = self.flash_attention(query_states, key_states, value_states, attention_mask)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/.cache/huggingface/modules/transformers_modules/360Zhinao-7B-Chat-32K/modeling_zhinao.py", line 345, in flash_attention
query_states, key_states, value_states, indices_q, cu_seq_lens, max_seq_lens = self._upad_input(
^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/.cache/huggingface/modules/transformers_modules/360Zhinao-7B-Chat-32K/modeling_zhinao.py", line 442, in _upad_input
key_layer = index_first_axis(key_layer.reshape(batch_size * kv_seq_len, num_heads, head_dim), indices_k)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: 'NoneType' object is not callable

zhaicunqi · 2024-04-18T11:58:27Z

这种错误是因为 flash-attn 安装问题，可以先检查一下是否安装成功

choshiho · 2024-04-20T02:31:49Z

这种错误是因为 flash-attn 安装问题，可以先检查一下是否安装成功

(360zhinao) xxx@xxx:~/360zhinao$ pip install flash-attention
Collecting flash-attention
Using cached flash_attention-1.0.0-py3-none-any.whl.metadata (274 bytes)
Using cached flash_attention-1.0.0-py3-none-any.whl (31 kB)
Installing collected packages: flash-attention
Successfully installed flash-attention-1.0.0

必须安装指定版本的吗
flash-attn==2.3.6

jsoncode · 2024-04-22T03:36:01Z

遇到同样问题，也是pip install flash-attn编译失败 #9

zhaicunqi · 2024-04-22T03:48:13Z

可以自己编译安装试试， FLASH_ATTENTION_FORCE_BUILD=TRUE pip install flash-attn==2.3.6

zhaicunqi · 2024-04-22T03:49:46Z

这种错误是因为 flash-attn 安装问题，可以先检查一下是否安装成功

(360zhinao) xxx@xxx:~/360zhinao$ pip install flash-attention Collecting flash-attention Using cached flash_attention-1.0.0-py3-none-any.whl.metadata (274 bytes) Using cached flash_attention-1.0.0-py3-none-any.whl (31 kB) Installing collected packages: flash-attention Successfully installed flash-attention-1.0.0

必须安装指定版本的吗 flash-attn==2.3.6

2.3.6以上的版本都可以

choshiho · 2024-04-22T07:05:38Z

这种错误是因为 flash-attn 安装问题，可以先检查一下是否安装成功

(360zhinao) xxx@xxx:~/360zhinao$ pip install flash-attention Collecting flash-attention Using cached flash_attention-1.0.0-py3-none-any.whl.metadata (274 bytes) Using cached flash_attention-1.0.0-py3-none-any.whl (31 kB) Installing collected packages: flash-attention Successfully installed flash-attention-1.0.0
必须安装指定版本的吗 flash-attn==2.3.6

2.3.6以上的版本都可以

pip install flash-attn==2.5.7
安装成功后，运行python cli_demo.py，报错如下所示，我的显卡是RTX6000

欢迎使用360智脑大模型，输入进行对话，vim 多行输入，clear 清空历史，stream 开关流式生成，exit 结束。

用户：hi

助手：Exception in thread Thread-2 (generate):
Traceback (most recent call last):
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
self.run()
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/threading.py", line 982, in run
self._target(*self._args, **self._kwargs)
File "/home/zhifeng.zhao/.cache/huggingface/modules/transformers_modules/360Zhinao-7B-Chat-32K/modeling_zhinao.py", line 918, in generate
response = super().generate(
^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/transformers/generation/utils.py", line 1592, in generate
return self.sample(
^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/transformers/generation/utils.py", line 2696, in sample
outputs = self(
^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/.cache/huggingface/modules/transformers_modules/360Zhinao-7B-Chat-32K/modeling_zhinao.py", line 816, in forward
outputs = self.model(
^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/.cache/huggingface/modules/transformers_modules/360Zhinao-7B-Chat-32K/modeling_zhinao.py", line 711, in forward
layer_outputs = decoder_layer(
^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/.cache/huggingface/modules/transformers_modules/360Zhinao-7B-Chat-32K/modeling_zhinao.py", line 513, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/.cache/huggingface/modules/transformers_modules/360Zhinao-7B-Chat-32K/modeling_zhinao.py", line 416, in forward
attn_output = self.flash_attention(query_states, key_states, value_states, attention_mask)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/.cache/huggingface/modules/transformers_modules/360Zhinao-7B-Chat-32K/modeling_zhinao.py", line 352, in flash_attention
attn_output_unpad = flash_attn_varlen_func(
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/flash_attn/flash_attn_interface.py", line 1066, in flash_attn_varlen_func
return FlashAttnVarlenFunc.apply(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/autograd/function.py", line 553, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/flash_attn/flash_attn_interface.py", line 581, in forward
out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state = _flash_attn_varlen_forward(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/flash_attn/flash_attn_interface.py", line 86, in _flash_attn_varlen_forward
out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state = flash_attn_cuda.varlen_fwd(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: FlashAttention only supports Ampere GPUs or newer.

zhaicunqi · 2024-04-22T07:24:35Z

这种错误是因为 flash-attn 安装问题，可以先检查一下是否安装成功

(360zhinao) xxx@xxx:~/360zhinao$ pip install flash-attention Collecting flash-attention Using cached flash_attention-1.0.0-py3-none-any.whl.metadata (274 bytes) Using cached flash_attention-1.0.0-py3-none-any.whl (31 kB) Installing collected packages: flash-attention Successfully installed flash-attention-1.0.0
必须安装指定版本的吗 flash-attn==2.3.6

2.3.6以上的版本都可以

pip install flash-attn==2.5.7 安装成功后，运行python cli_demo.py，报错如下所示，我的显卡是RTX6000
欢迎使用360智脑大模型，输入进行对话，vim 多行输入，clear 清空历史，stream 开关流式生成，exit 结束。

用户：hi

助手：Exception in thread Thread-2 (generate):
Traceback (most recent call last):
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
self.run()
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/threading.py", line 982, in run
self._target(*self._args, **self._kwargs)
File "/home/zhifeng.zhao/.cache/huggingface/modules/transformers_modules/360Zhinao-7B-Chat-32K/modeling_zhinao.py", line 918, in generate
response = super().generate(
^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/transformers/generation/utils.py", line 1592, in generate
return self.sample(
^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/transformers/generation/utils.py", line 2696, in sample
outputs = self(
^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/.cache/huggingface/modules/transformers_modules/360Zhinao-7B-Chat-32K/modeling_zhinao.py", line 816, in forward
outputs = self.model(
^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/.cache/huggingface/modules/transformers_modules/360Zhinao-7B-Chat-32K/modeling_zhinao.py", line 711, in forward
layer_outputs = decoder_layer(
^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/.cache/huggingface/modules/transformers_modules/360Zhinao-7B-Chat-32K/modeling_zhinao.py", line 513, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/.cache/huggingface/modules/transformers_modules/360Zhinao-7B-Chat-32K/modeling_zhinao.py", line 416, in forward
attn_output = self.flash_attention(query_states, key_states, value_states, attention_mask)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/.cache/huggingface/modules/transformers_modules/360Zhinao-7B-Chat-32K/modeling_zhinao.py", line 352, in flash_attention
attn_output_unpad = flash_attn_varlen_func(
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/flash_attn/flash_attn_interface.py", line 1066, in flash_attn_varlen_func
return FlashAttnVarlenFunc.apply(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/autograd/function.py", line 553, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/flash_attn/flash_attn_interface.py", line 581, in forward
out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state = _flash_attn_varlen_forward(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/flash_attn/flash_attn_interface.py", line 86, in _flash_attn_varlen_forward
out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state = flash_attn_cuda.varlen_fwd(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: FlashAttention only supports Ampere GPUs or newer.

看样子是你的显卡不支持 flash-attn，可以在 config.json 里面把 use_flash_attn 改为 false

zhaicunqi · 2024-04-22T07:47:56Z

看样子是你的显卡不支持 flash-attn，可以在 config.json 里面把 use_flash_attn 改为 false

请问一下config.json在哪里啊？

在你下载的模型目录里面

choshiho · 2024-04-22T07:48:04Z

看样子是你的显卡不支持 flash-attn，可以在 config.json 里面把 use_flash_attn 改为 false

修改360Zhinao-7B-Chat-32K/config.json后依旧报错RuntimeError: FlashAttention only supports Ampere GPUs or newer.

{ "architectures": [ "ZhinaoForCausalLM" ], "auto_map": { "AutoConfig": "configuration_zhinao.ZhinaoConfig", "AutoModelForCausalLM": "modeling_zhinao.ZhinaoForCausalLM" }, "bf16": true, "fp16": false, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.01, "intermediate_size": 11008, "max_position_embeddings": 32768, "model_max_length": 32768, "model_type": "zhinao", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 32, "rms_norm_eps": 1e-05, "rope_scaling": null, "rope_theta": 50000000.0, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.38.2", "use_cache": false, "use_flash_attn": "false", "vocab_size": 158464 }

choshiho · 2024-04-23T01:25:39Z

看样子是你的显卡不支持 flash-attn，可以在 config.json 里面把 use_flash_attn 改为 false

请问一下config.json在哪里啊？

在你下载的模型目录里面

修改use_flash_attn 为 false后，依旧报错，还有其他方式可以运行模型吗？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

项目运行360Zhinao-7B-Chat-32K 'NoneType' object is not callable #8

项目运行360Zhinao-7B-Chat-32K 'NoneType' object is not callable #8

choshiho commented Apr 17, 2024

zhaicunqi commented Apr 18, 2024

choshiho commented Apr 20, 2024

jsoncode commented Apr 22, 2024

zhaicunqi commented Apr 22, 2024 •

edited

Loading

zhaicunqi commented Apr 22, 2024

choshiho commented Apr 22, 2024

zhaicunqi commented Apr 22, 2024

zhaicunqi commented Apr 22, 2024

choshiho commented Apr 22, 2024

choshiho commented Apr 23, 2024

项目运行360Zhinao-7B-Chat-32K 'NoneType' object is not callable #8

项目运行360Zhinao-7B-Chat-32K 'NoneType' object is not callable #8

Comments

choshiho commented Apr 17, 2024

zhaicunqi commented Apr 18, 2024

choshiho commented Apr 20, 2024

jsoncode commented Apr 22, 2024

zhaicunqi commented Apr 22, 2024 • edited Loading

zhaicunqi commented Apr 22, 2024

choshiho commented Apr 22, 2024

zhaicunqi commented Apr 22, 2024

zhaicunqi commented Apr 22, 2024

choshiho commented Apr 22, 2024

choshiho commented Apr 23, 2024

zhaicunqi commented Apr 22, 2024 •

edited

Loading