Skip to content

[Question]: msra数据集无法加载 #9842

@zgrennn

Description

@zgrennn

请提出你的问题

slm/examples/information_extraction/msra_ner中文命名实体识别模型微调过程中遇到问题。
微调代码:
!python -u /home/aistudio/PaddleNLP/examples/information_extraction/msra_ner/train.py
--model_type bert
--model_name_or_path bert-base-multilingual-uncased
--dataset msra_ner
--max_seq_length 128
--batch_size 32
--learning_rate 2e-5
--num_train_epochs 3
--logging_steps 1
--save_steps 500
--output_dir ./tmp/msra_ner/
--device gpu
报错信息:
[2025-02-11 16:58:21,671] [ WARNING] - Detected that datasets module was imported before paddlenlp. This may cause PaddleNLP datasets to be unavalible in intranet. Please import paddlenlp before datasets module to avoid download issues
[2025-02-11 16:58:21,937] [ WARNING] - if you run ring_flash_attention.py, please ensure you install the paddlenlp_ops by following the instructions provided at https://github.com/PaddlePaddle/PaddleNLP/blob/develop/csrc/README.md
[2025-02-11 16:58:23,287] [ INFO] - model_type :bert
[2025-02-11 16:58:23,288] [ INFO] - model_name_or_path :bert-base-multilingual-uncased
[2025-02-11 16:58:23,288] [ INFO] - dataset :msra_ner
[2025-02-11 16:58:23,288] [ INFO] - output_dir :./tmp/msra_ner/
[2025-02-11 16:58:23,288] [ INFO] - max_seq_length :128
[2025-02-11 16:58:23,288] [ INFO] - batch_size :32
[2025-02-11 16:58:23,288] [ INFO] - learning_rate :2e-05
[2025-02-11 16:58:23,288] [ INFO] - weight_decay :0.0
[2025-02-11 16:58:23,288] [ INFO] - adam_epsilon :1e-08
[2025-02-11 16:58:23,288] [ INFO] - max_grad_norm :1.0
[2025-02-11 16:58:23,288] [ INFO] - num_train_epochs :3
[2025-02-11 16:58:23,288] [ INFO] - max_steps :-1
[2025-02-11 16:58:23,288] [ INFO] - warmup_steps :0
[2025-02-11 16:58:23,288] [ INFO] - logging_steps :1
[2025-02-11 16:58:23,289] [ INFO] - save_steps :500
[2025-02-11 16:58:23,289] [ INFO] - seed :42
[2025-02-11 16:58:23,289] [ INFO] - device :gpu
Traceback (most recent call last):
File "/home/aistudio/PaddleNLP/examples/information_extraction/msra_ner/train.py", line 216, in
do_train(args)
File "/home/aistudio/PaddleNLP/examples/information_extraction/msra_ner/train.py", line 93, in do_train
raw_datasets = load_dataset(args.dataset)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/datasets/load.py", line 2129, in load_dataset
builder_instance = load_dataset_builder(
File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/datasets/load.py", line 1849, in load_dataset_builder
dataset_module = dataset_module_factory(
File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/datasets/load.py", line 1731, in dataset_module_factory
raise e1 from None
File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/datasets/load.py", line 1618, in dataset_module_factory
raise ConnectionError(f"Couldn't reach '{path}' on the Hub ({e.class.name})") from e
ConnectionError: Couldn't reach 'msra_ner' on the Hub (LocalEntryNotFoundError)

Metadata

Metadata

Assignees

Labels

questionFurther information is requested

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions