Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

发票关键信息抽取.md FileNotFoundError: configuration file<config.json> or <model_config.json> not found #12092

Closed
greatliu opened this issue May 10, 2024 · 6 comments
Assignees

Comments

@greatliu
Copy link

请提供下述完整信息以便快速定位问题/Please provide the following information to quickly locate the problem

  • 系统环境/System Environment:win11 python3.9 cuda11.8
  • 版本号/Version:
    paddlenlp 2.6.1
    paddleocr 2.7.3
    paddlepaddle-gpu 2.6.1
  • 问题相关组件/Related components:不会确定
  • 运行指令/Command Code:
    python tools/infer_kie_token_ser.py -c ./configs/kie/vi_layoutxlm/ser_vi_layoutxlm_xfund_zh_udml.yml -o Architecture.Models.Student.Backbone.checkpoints="./fapiao/ser_vi_layoutxlm_fapiao_trained/best_accuracy/" Global.infer_img=./train_data/zzsfp/val.json Global.infer_mode=False
  • 完整报错/Complete Error Message:
    [2024-05-10 16:38:43,480] [ WARNING] - Some weights of LayoutXLMForTokenClassification were not initialized from the model checkpoint at vi-layoutxlm-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
    You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
    Traceback (most recent call last):
    File "E:\Workspace\GitHub\PaddleOCR\tools\infer_kie_token_ser.py", line 119, in
    ser_engine = SerPredictor(config)
    File "E:\Workspace\GitHub\PaddleOCR\tools\infer_kie_token_ser.py", line 68, in init
    self.model = build_model(config['Architecture'])
    File "E:\Workspace\GitHub\PaddleOCR\ppocr\modeling\architectures_init_.py", line 34, in build_model
    arch = getattr(mod, name)(config)
    File "E:\Workspace\GitHub\PaddleOCR\ppocr\modeling\architectures\distillation_model.py", line 47, in init
    model = BaseModel(model_config)
    File "E:\Workspace\GitHub\PaddleOCR\ppocr\modeling\architectures\base_model.py", line 55, in init
    self.backbone = build_backbone(config["Backbone"], model_type)
    File "E:\Workspace\GitHub\PaddleOCR\ppocr\modeling\backbones_init_.py", line 82, in build_backbone
    module_class = eval(module_name)(**config)
    File "E:\Workspace\GitHub\PaddleOCR\ppocr\modeling\backbones\vqa_layoutlm.py", line 142, in init
    super(LayoutXLMForSer, self).init(
    File "E:\Workspace\GitHub\PaddleOCR\ppocr\modeling\backbones\vqa_layoutlm.py", line 55, in init
    self.model = model_class.from_pretrained(checkpoints)
    File "C:\Users\great\miniconda3\envs\Paddle\lib\site-packages\paddlenlp\transformers\model_utils.py", line 1894, in from_pretrained
    config, model_kwargs = cls.config_class.from_pretrained(
    File "C:\Users\great\miniconda3\envs\Paddle\lib\site-packages\paddlenlp\transformers\configuration_utils.py", line 749, in from_pretrained
    config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
    File "C:\Users\great\miniconda3\envs\Paddle\lib\site-packages\paddlenlp\transformers\configuration_utils.py", line 775, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(
    File "C:\Users\great\miniconda3\envs\Paddle\lib\site-packages\paddlenlp\transformers\configuration_utils.py", line 850, in _get_config_dict
    raise FileNotFoundError(f"configuration file<{CONFIG_NAME}> or <{LEGACY_CONFIG_NAME}> not found")
    FileNotFoundError: configuration file<config.json> or <model_config.json> not found

使用了百度提供的发票预训练模型,想实现发票关键信息抽取的推理。yml用的是https://github.com/PaddlePaddle/PaddleOCR/blob/main/applications/%E5%8F%91%E7%A5%A8%E5%85%B3%E9%94%AE%E4%BF%A1%E6%81%AF%E6%8A%BD%E5%8F%96.md
中“4.3.2 开始训练”中修改的训练yml
有模型下载进度,然后报上述错误。

@zhangyubo0722
Copy link
Collaborator

Architecture.Models.Student.Backbone.checkpoints="./fapiao/ser_vi_layoutxlm_fapiao_trained/best_accuracy/ 断点训练权重传入错误,需要填写完整路径

@greatliu
Copy link
Author

Architecture.Models.Student.Backbone.checkpoints="./fapiao/ser_vi_layoutxlm_fapiao_trained/best_accuracy/ 断点训练权重传入错误,需要填写完整路径

不好意思,刚上手这个,不太懂。
我看了下我下载的模型下的确没有/best_accuracy/ 这个文件夹。
然后这里应该是需要一个json file吧?例子应该怎样写?

@zhangyubo0722
Copy link
Collaborator

先传入正确的路径试一下呢

@greatliu
Copy link
Author

先传入正确的路径试一下呢

我的fapiao文件夹下,只有ser_vi_layoutxlm_fapiao_trained和re_vi_layoutxlm_fapiao_trained两个模型文件,再往下就没有文件夹了。
我指向fapiao文件夹运行,
(Paddle) E:\Workspace\GitHub\PaddleOCR>python tools/infer_kie_token_ser.py -c ./configs/kie/vi_layoutxlm/ser_vi_layoutxlm_xfund_zh_udml.yml -o Architecture.Models.Student.Backbone.checkpoints="./fapiao/" Global.infer_img=./train_data/zzsfp/
val.json Global.infer_mode=False

报错为
[2024-05-10 19:30:51,225] [ WARNING] - Some weights of LayoutXLMForTokenClassification were not initialized from the model checkpoint at vi-layoutxlm-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Traceback (most recent call last):
File "E:\Workspace\GitHub\PaddleOCR\tools\infer_kie_token_ser.py", line 119, in
ser_engine = SerPredictor(config)
File "E:\Workspace\GitHub\PaddleOCR\tools\infer_kie_token_ser.py", line 68, in init
self.model = build_model(config['Architecture'])
File "E:\Workspace\GitHub\PaddleOCR\ppocr\modeling\architectures_init_.py", line 34, in build_model
arch = getattr(mod, name)(config)
File "E:\Workspace\GitHub\PaddleOCR\ppocr\modeling\architectures\distillation_model.py", line 47, in init
model = BaseModel(model_config)
File "E:\Workspace\GitHub\PaddleOCR\ppocr\modeling\architectures\base_model.py", line 55, in init
self.backbone = build_backbone(config["Backbone"], model_type)
File "E:\Workspace\GitHub\PaddleOCR\ppocr\modeling\backbones_init_.py", line 82, in build_backbone
module_class = eval(module_name)(**config)
File "E:\Workspace\GitHub\PaddleOCR\ppocr\modeling\backbones\vqa_layoutlm.py", line 142, in init
super(LayoutXLMForSer, self).init(
File "E:\Workspace\GitHub\PaddleOCR\ppocr\modeling\backbones\vqa_layoutlm.py", line 55, in init
self.model = model_class.from_pretrained(checkpoints)
File "C:\Users\great\miniconda3\envs\Paddle\lib\site-packages\paddlenlp\transformers\model_utils.py", line 1894, in from_pretrained
config, model_kwargs = cls.config_class.from_pretrained(
File "C:\Users\great\miniconda3\envs\Paddle\lib\site-packages\paddlenlp\transformers\configuration_utils.py", line 749, in from_pretrained
config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "C:\Users\great\miniconda3\envs\Paddle\lib\site-packages\paddlenlp\transformers\configuration_utils.py", line 775, in get_config_dict
config_dict, kwargs = cls._get_config_dict(
File "C:\Users\great\miniconda3\envs\Paddle\lib\site-packages\paddlenlp\transformers\configuration_utils.py", line 831, in _get_config_dict
raise FileNotFoundError(
FileNotFoundError: please make sure there is model_config.json under the dir, or you can pass the _configuration_file param into from_pretarined method to specific the configuration file name

这好像又回到我提问的第一层的问题了。我猜不是我传参数不对,就是哪里有bug,找不到model_config.json这个文件。

@zhangyubo0722
Copy link
Collaborator

上面的报错根本上还是断点训练权重传入错误了,Architecture.Models.Student.Backbone.checkpoints="./fapiao/" 传入断点训练权重时需要具体到哪个模型,比如***.pdparams。

@UserWangZz
Copy link
Collaborator

This issue has not been updated for a long time. This issue is temporarily closed and can be reopened if necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants