-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
软件环境
- paddlepaddle: -
- paddlepaddle-gpu: 3.0.0
- paddlenlp: 3.0.0b4重复问题
- I have searched the existing issues
错误描述
在完成https://github.com/PaddlePaddle/PaddleNLP/issues/9763 [No. 17]的时候,尝试运行/examples/text_to_knowledge/nptag下执行NPTag模型训练时报错,显示找不到在线的模型文件。
根据源代码/transformers/ernie_ctm/configuration.py找到可选的模型:
ERNIE_CTM_PRETRAINED_RESOURCE_FILES_MAP = {
"model_state": {
"ernie-ctm": "https://bj.bcebos.com/paddlenlp/models/transformers/ernie_ctm/ernie_ctm_v3.pdparams",
"wordtag": "https://bj.bcebos.com/paddlenlp/models/transformers/ernie_ctm/wordtag_v3.pdparams",
"nptag": "https://bj.bcebos.com/paddlenlp/models/transformers/ernie_ctm/nptag_v3.pdparams",
}
}
依次直接访问,全部返回类似结果,找不到模型文件。
{"code":"NoSuchKey","message":"The specified key does not exist.","requestId":"10410205-3348-437c-80ee-d775719c8dfd"}稳定复现步骤 & 代码
- 进入到指定目录
cd ~/PaddleNLP/slm/examples/text_to_knowledge/nptag- 根据README文档运行模型训练命令
python -m paddle.distributed.launch --gpus "0" train.py \
--batch_size 64 \
--learning_rate 1e-6 \
--num_train_epochs 3 \
--logging_steps 10 \
--save_steps 100 \
--output_dir ./output \
--device "gpu"- 程序报错,显示没有在线模型
aistudio@jupyter-227232-8957468:~/PaddleNLP/slm/examples/text_to_knowledge/nptag$ python -m paddle.distributed.launch --gpus "0" train.py \
> --batch_size 64 \
> --learning_rate 1e-6 \
> --num_train_epochs 3 \
> --logging_steps 10 \
> --save_steps 100 \
> --output_dir ./output \
> --device "gpu"
LAUNCH INFO 2025-04-17 13:48:06,465 ----------- Configuration ----------------------
LAUNCH INFO 2025-04-17 13:48:06,465 auto_cluster_config: 0
LAUNCH INFO 2025-04-17 13:48:06,465 auto_parallel_config: None
LAUNCH INFO 2025-04-17 13:48:06,465 auto_tuner_json: None
LAUNCH INFO 2025-04-17 13:48:06,465 devices: 0
LAUNCH INFO 2025-04-17 13:48:06,465 elastic_level: -1
LAUNCH INFO 2025-04-17 13:48:06,465 elastic_timeout: 30
LAUNCH INFO 2025-04-17 13:48:06,465 enable_gpu_log: True
LAUNCH INFO 2025-04-17 13:48:06,465 gloo_port: 6767
LAUNCH INFO 2025-04-17 13:48:06,465 host: None
LAUNCH INFO 2025-04-17 13:48:06,465 ips: None
LAUNCH INFO 2025-04-17 13:48:06,465 job_id: default
LAUNCH INFO 2025-04-17 13:48:06,465 legacy: False
LAUNCH INFO 2025-04-17 13:48:06,465 log_dir: log
LAUNCH INFO 2025-04-17 13:48:06,465 log_level: INFO
LAUNCH INFO 2025-04-17 13:48:06,465 log_overwrite: False
LAUNCH INFO 2025-04-17 13:48:06,465 master: None
LAUNCH INFO 2025-04-17 13:48:06,465 max_restart: 3
LAUNCH INFO 2025-04-17 13:48:06,465 nnodes: 1
LAUNCH INFO 2025-04-17 13:48:06,465 nproc_per_node: None
LAUNCH INFO 2025-04-17 13:48:06,465 rank: -1
LAUNCH INFO 2025-04-17 13:48:06,465 run_mode: collective
LAUNCH INFO 2025-04-17 13:48:06,465 server_num: None
LAUNCH INFO 2025-04-17 13:48:06,465 servers:
LAUNCH INFO 2025-04-17 13:48:06,466 sort_ip: False
LAUNCH INFO 2025-04-17 13:48:06,466 start_port: 6070
LAUNCH INFO 2025-04-17 13:48:06,466 trainer_num: None
LAUNCH INFO 2025-04-17 13:48:06,466 trainers:
LAUNCH INFO 2025-04-17 13:48:06,466 training_script: train.py
LAUNCH INFO 2025-04-17 13:48:06,466 training_script_args: ['--batch_size', '64', '--learning_rate', '1e-6', '--num_train_epochs', '3', '--logging_steps', '10', '--save_steps', '100', '--output_dir', './output', '--device', 'gpu']
LAUNCH INFO 2025-04-17 13:48:06,466 with_gloo: 1
LAUNCH INFO 2025-04-17 13:48:06,466 --------------------------------------------------
LAUNCH INFO 2025-04-17 13:48:06,467 Job: default, mode collective, replicas 1[1:1], elastic False
LAUNCH INFO 2025-04-17 13:48:06,494 Run Pod: lrmnfn, replicas 1, status ready
LAUNCH INFO 2025-04-17 13:48:06,551 Watching Pod: lrmnfn, replicas 1, status running
/home/aistudio/.local/lib/python3.8/site-packages/_distutils_hack/__init__.py:26: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")
----------- Configuration Arguments -----------
adam_epsilon: 1e-08
batch_size: 64
data_dir: ./data
device: gpu
init_from_ckpt: None
learning_rate: 1e-06
logging_steps: 10
max_seq_len: 64
num_train_epochs: 3
output_dir: ./output
save_steps: 100
seed: 1000
warmup_proportion: 0.0
weight_decay: 0.0
------------------------------------------------
(…)/models/transformers/ernie_ctm/vocab.txt: 0%| | 0.00/91.7k [00:00<?, ?B/s]
(…)/models/transformers/ernie_ctm/vocab.txt: 100%|██████████| 91.7k/91.7k [00:00<00:00, 2.52MB/s]
[2025-04-17 13:48:17,942] [ INFO] - tokenizer config file saved in /home/aistudio/.paddlenlp/models/nptag/tokenizer_config.json
[2025-04-17 13:48:17,943] [ INFO] - Special tokens file saved in /home/aistudio/.paddlenlp/models/nptag/special_tokens_map.json
Traceback (most recent call last):
File "/home/aistudio/.local/lib/python3.8/site-packages/paddlenlp/utils/download/common.py", line 597, in raise_for_status
response.raise_for_status()
File "/home/aistudio/.local/lib/python3.8/site-packages/requests/models.py", line 1024, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://bj.bcebos.com/paddlenlp/models/transformers/ernie_ctm/nptag_v3.pdparams
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/aistudio/.local/lib/python3.8/site-packages/paddlenlp/utils/download/__init__.py", line 169, in resolve_file_path
cached_file = bos_download(
File "/home/aistudio/.local/lib/python3.8/site-packages/paddlenlp/utils/download/bos_download.py", line 241, in bos_download
http_get(
File "/home/aistudio/.local/lib/python3.8/site-packages/paddlenlp/utils/download/common.py", line 138, in http_get
r = _request_wrapper(
File "/home/aistudio/.local/lib/python3.8/site-packages/paddlenlp/utils/download/common.py", line 369, in _request_wrapper
raise_for_status(response)
File "/home/aistudio/.local/lib/python3.8/site-packages/paddlenlp/utils/download/common.py", line 601, in raise_for_status
raise EntryNotFoundError(message, None) from e
huggingface_hub.errors.EntryNotFoundError: 404 Client Error.
Entry Not Found for url: https://bj.bcebos.com/paddlenlp/models/transformers/ernie_ctm/nptag_v3.pdparams.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "train.py", line 191, in <module>
do_train(args)
File "train.py", line 103, in do_train
model = ErnieCtmNptagModel.from_pretrained("nptag")
File "/home/aistudio/.local/lib/python3.8/site-packages/paddlenlp/transformers/model_utils.py", line 2462, in from_pretrained
resolved_archive_file, resolved_sharded_files, sharded_metadata, is_sharded = cls._resolve_model_file_path(
File "/home/aistudio/.local/lib/python3.8/site-packages/paddlenlp/transformers/model_utils.py", line 1827, in _resolve_model_file_path
resolved_archive_file = resolve_file_path(
File "/home/aistudio/.local/lib/python3.8/site-packages/paddlenlp/utils/download/__init__.py", line 275, in resolve_file_path
raise EnvironmentError(f"Does not appear one of the {filenames} in {repo_id}.")
OSError: Does not appear one of the ['https://bj.bcebos.com/paddlenlp/models/transformers/ernie_ctm/nptag_v3.pdparams'] in nptag.
LAUNCH INFO 2025-04-17 13:48:19,566 Pod failed
LAUNCH ERROR 2025-04-17 13:48:19,566 Container failed !!!
Container rank 0 status failed cmd ['/usr/bin/python', '-u', 'train.py', '--batch_size', '64', '--learning_rate', '1e-6', '--num_train_epochs', '3', '--logging_steps', '10', '--save_steps', '100', '--output_dir', './output', '--device', 'gpu'] code 1 log log/workerlog.0
LAUNCH INFO 2025-04-17 13:48:19,566 ------------------------- ERROR LOG DETAIL -------------------------
:00<?, ?B/s]
(…)/models/transformers/ernie_ctm/vocab.txt: 100%|██████████| 91.7k/91.7k [00:00<00:00, 2.52MB/s]
[2025-04-17 13:48:17,942] [ INFO] - tokenizer config file saved in /home/aistudio/.paddlenlp/models/nptag/tokenizer_config.json
[2025-04-17 13:48:17,943] [ INFO] - Special tokens file saved in /home/aistudio/.paddlenlp/models/nptag/special_tokens_map.json
Traceback (most recent call last):
File "/home/aistudio/.local/lib/python3.8/site-packages/paddlenlp/utils/download/common.py", line 597, in raise_for_status
response.raise_for_status()
File "/home/aistudio/.local/lib/python3.8/site-packages/requests/models.py", line 1024, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://bj.bcebos.com/paddlenlp/models/transformers/ernie_ctm/nptag_v3.pdparams
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/aistudio/.local/lib/python3.8/site-packages/paddlenlp/utils/download/__init__.py", line 169, in resolve_file_path
cached_file = bos_download(
File "/home/aistudio/.local/lib/python3.8/site-packages/paddlenlp/utils/download/bos_download.py", line 241, in bos_download
http_get(
File "/home/aistudio/.local/lib/python3.8/site-packages/paddlenlp/utils/download/common.py", line 138, in http_get
r = _request_wrapper(
File "/home/aistudio/.local/lib/python3.8/site-packages/paddlenlp/utils/download/common.py", line 369, in _request_wrapper
raise_for_status(response)
File "/home/aistudio/.local/lib/python3.8/site-packages/paddlenlp/utils/download/common.py", line 601, in raise_for_status
raise EntryNotFoundError(message, None) from e
huggingface_hub.errors.EntryNotFoundError: 404 Client Error.
Entry Not Found for url: https://bj.bcebos.com/paddlenlp/models/transformers/ernie_ctm/nptag_v3.pdparams.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "train.py", line 191, in <module>
do_train(args)
File "train.py", line 103, in do_train
model = ErnieCtmNptagModel.from_pretrained("nptag")
File "/home/aistudio/.local/lib/python3.8/site-packages/paddlenlp/transformers/model_utils.py", line 2462, in from_pretrained
resolved_archive_file, resolved_sharded_files, sharded_metadata, is_sharded = cls._resolve_model_file_path(
File "/home/aistudio/.local/lib/python3.8/site-packages/paddlenlp/transformers/model_utils.py", line 1827, in _resolve_model_file_path
resolved_archive_file = resolve_file_path(
File "/home/aistudio/.local/lib/python3.8/site-packages/paddlenlp/utils/download/__init__.py", line 275, in resolve_file_path
raise EnvironmentError(f"Does not appear one of the {filenames} in {repo_id}.")
OSError: Does not appear one of the ['https://bj.bcebos.com/paddlenlp/models/transformers/ernie_ctm/nptag_v3.pdparams'] in nptag.
LAUNCH INFO 2025-04-17 13:48:19,567 Exit code 1
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working