基于提供的模型进行sft报错 #13

cq1316 · 2024-01-17T11:37:29Z

几个问题：
1、3.2.3 下载预训练模型及模型配置文件，模型下载下来的名字是ChatLM-mini-Chinese，但是命令里面是mv ChatLM-Chinese-0.2B model_save，文件夹名字不匹配
2、模型文件夹放到model_save下，用python sft_train.py，报错说不存在model_save/pretrain目录
3、把文件夹改名为pretrain后，用python sft_train.py，报错说Error while deserializing header: HeaderTooLarge

网上有说把safetensors的后缀改成ckpt，试了一下，报错说找不到OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory。

怎么才能把sft跑起来？

charent · 2024-01-17T12:09:36Z

首先感谢反馈。

mv命令以你下载到的文件夹ChatLM-mini-Chinese为准，我等会修改一下readme文件。原来huggingface的仓库命名是ChatLM-Chinese-0.2B，但是调用AutoModelForSeq2SeqLM.from_pretrained(model_id, trust_remote_code=True)会报错，无法下载模型文件，改了名字就正常了，应该是huggingface的问题，不知道现在改了没有。
你可以修改config.py文件，找到64行的finetune_from_ckp_file变量（SFTconfig类下的），把默认的PROJECT_ROOT + '/model_save/pretrain'改为你下载的模型文件路径就可以了。
sft_train.py的46-55行为加载预训练模型的代码。如果你2中传入的finetune_from_ckp_file变量为文件夹，则会调用TextToTextModel.from_pretrained，这个方法是可以正常加载safetensors的。所以你可能把finetune_from_ckp_file设置为safetensors文件了，改成文件夹就可以了。 load_state_dict是加载pytorch原生模型bin文件的。

 # step 2. 加载预训练模型
    model = None
    if os.path.isdir(config.finetune_from_ckp_file):
        # 传入文件夹则 from_pretrained
        model = TextToTextModel.from_pretrained(config.finetune_from_ckp_file)
    else:
        # load_state_dict
        t5_config = get_T5_config(T5ModelConfig(), vocab_size=len(tokenizer), decoder_start_token_id=tokenizer.pad_token_id, eos_token_id=tokenizer.eos_token_id)
        model = TextToTextModel(t5_config)
        model.load_state_dict(torch.load(config.finetune_from_ckp_file, map_location='cpu')) # set cpu for no exception

cq1316 · 2024-01-17T12:46:51Z

第三步，我就是在model = TextToTextModel.from_pretrained(config.finetune_from_ckp_file)这一行报的错，我已经把模型的文件放到model_save/pretrain下面了，中间没有其他文件夹。但是就是报错Error while deserializing header: HeaderTooLarge

cq1316 · 2024-01-17T12:48:36Z

charent · 2024-01-17T12:50:03Z

你的这个model_save/pretrain文件夹下有什么文件？需要有以下这些文件才行哦，只放一个model.safetensors的不行的。

├─model_save
|  ├─config.json
|  ├─configuration_chat_model.py
|  ├─generation_config.json
|  ├─model.safetensors
|  ├─modeling_chat_model.py
|  ├─special_tokens_map.json
|  ├─tokenizer.json
|  └─tokenizer_config.json

cq1316 · 2024-01-17T12:53:10Z

我好像发现问题了。这里的finetune_from_ckp_file是到model_save，但是一开始报错的时候，说是model_save下面没有pretrain。你的model_svae目录下面自带一个tokenizer文件夹

charent · 2024-01-17T13:02:09Z

model_svae目录下面自带一个tokenizer文件是历史遗留问题了2333，当时做的时候没规划好。我都想删了但是又怕别人会用到或者等会哪里又报错了。

cq1316 · 2024-01-18T02:25:15Z

还是不行，文件的位置、内容，现在都和代码是符合的，但是跑sft的时候，就是报safetensors_rust.SafetensorError: Error while deserializing header: HeaderTooLarge

charent · 2024-01-18T02:58:24Z

你的pytorch版本、transformers版本和requirements.txt里要求的版本一致吗？能直接运行以下代码吗？能的话说明你的环境没有问题。不要把model_id 改成本地路径，让它直接从huggingface下载。

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import torch

model_id = 'charent/ChatLM-mini-Chinese'
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(model_id, trust_remote_code=True).to(device)

你检查一下你下载的模型文件是不是完整的，可以把config.py文件InferConfig下的model_dir替换为你要sft的模型目录，运行python cli_demo.py看能不能正常加载，如果不能加载就是下载的模型文件不完整，重新下载即可。
我这边试了一下，是可以正常sft的，如下图：

cq1316 · 2024-01-18T03:13:20Z

我是用的你提供的模型的，用cli_demo也是报一样的错。

cq1316 · 2024-01-18T03:13:20Z

我是用的你提供的模型的，用cli_demo也是报一样的错。

charent · 2024-01-18T03:17:20Z

还是不行，文件的位置、内容，现在都和代码是符合的，但是跑sft的时候，就是报safetensors_rust.SafetensorError: Error while deserializing header: HeaderTooLarge

我去搜了一下这个错误，就是模型文件的问题，文件不完整，重新下载即可。我这边提供通过md5sum命令得到的文件md5值，你可以对比一下，主要是model.safetensors这个文件。

1de0ba231817fcdaf97e025aa0dfcd00  config.json
e7356676de6c8bad26d2c7ceedc92fad  generation_config.json
655bcc42640baefba8b188a0aa65d339  model.safetensors
adeee419c31a613d7dd281b736e3873a  modeling_chat_model.py
ba22587440fe5ff64aab2cb552cb8654  special_tokens_map.json
0b65eef22c7fb9e1c16a4e51f359134a  tokenizer.json
9fc5ebbabcf9eb5ad752e16649938afc  tokenizer_config.json

cq1316 · 2024-01-18T05:49:00Z

问题解决了，因为没有装git lfs，导致模型文件下载不全。建议在readme里面把检测有没有git lfs的步骤放一下。
还有一个问题，sft训练完之后，有两个文件是没有在模型目录里的，需要手动去原模型文件夹里把他移过来，建议可以在sft结束的时候，自动把缺失的文件移过去

charent · 2024-01-18T06:14:42Z

我在readme已经写了要通过git命令下载文件的话要先安装Git LFS。我看还有人直接使用浏览器手动下载再移动过去的，我就没有标重点，我下次更新readme的时候改一下吧。

第二个问题，你说的是两个py文件吧，因为TextToTextModel属于自定义类（其实就是继承了T5，写了自己的generate方法），上传到huggingface仓库方便别人使用才需要的，执行 AutoModelForSeq2SeqLM.from_pretrained('charent/ChatLM-mini-Chinese', trust_remote_code=True)的时候需要下载这两个py文件来加载TextToTextModel模型。
本地使用的话：
1.如果通过TextToTextModel.from_pretrained(...)加载模型是不需要这两个py文件的，因为model文件夹下已经clone下来了，from model.chat_model import TextToTextModel就可以了。
2. 如果通过AutoModelForSeq2SeqLM.from_pretrained(...)才需要这两个py文件，以此来加载TextToTextModel。

两个方法都行，我的代码里面本地加载模型都是用的TextToTextModel，所以不用管这两个py文件。关于这两个py文件是怎么映射到具体模型的，建议查看config.json

charent mentioned this issue Jan 19, 2024

是否可以在服务器上运行？ #14

Closed

charent mentioned this issue Jan 30, 2024

在 SFT 微调途中出现报错 #23

Closed

charent closed this as completed Feb 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

基于提供的模型进行sft报错 #13

基于提供的模型进行sft报错 #13

cq1316 commented Jan 17, 2024

charent commented Jan 17, 2024

cq1316 commented Jan 17, 2024

cq1316 commented Jan 17, 2024

charent commented Jan 17, 2024

cq1316 commented Jan 17, 2024

charent commented Jan 17, 2024

cq1316 commented Jan 18, 2024

charent commented Jan 18, 2024 •

edited

Loading

cq1316 commented Jan 18, 2024

cq1316 commented Jan 18, 2024

charent commented Jan 18, 2024 •

edited

Loading

cq1316 commented Jan 18, 2024

charent commented Jan 18, 2024 •

edited

Loading

基于提供的模型进行sft报错 #13

基于提供的模型进行sft报错 #13

Comments

cq1316 commented Jan 17, 2024

charent commented Jan 17, 2024

cq1316 commented Jan 17, 2024

cq1316 commented Jan 17, 2024

charent commented Jan 17, 2024

cq1316 commented Jan 17, 2024

charent commented Jan 17, 2024

cq1316 commented Jan 18, 2024

charent commented Jan 18, 2024 • edited Loading

cq1316 commented Jan 18, 2024

cq1316 commented Jan 18, 2024

charent commented Jan 18, 2024 • edited Loading

cq1316 commented Jan 18, 2024

charent commented Jan 18, 2024 • edited Loading

charent commented Jan 18, 2024 •

edited

Loading

charent commented Jan 18, 2024 •

edited

Loading

charent commented Jan 18, 2024 •

edited

Loading