[BUG] huggingface Connection Error

Hi,
I actually described the problem in the following issuecomment shortly, and I am submiting this issue as a new one for a more complete description.
https://github.com/OptimalScale/LMFlow/issues/431#issuecomment-1596966261

While initiating the SFT, I see a coonection error, which shows that it fails to fetch the models from huggingface. 
I actually tried some ways like using VPN / not using VPN, but still fails. 
It is quite weird that I managed to fetch the models from huggingface several days ago in the other project, which using the similar way as shown below:
    tokenizer = transformers.AutoTokenizer.from_pretrained(
        model_name_or_path,
        cache_dir=output_dir,
        model_max_length=per_device_train_batch_size,
        padding_side="right",
        use_fast=False,
    )


I believe this problem could be temporary since the local internet could be blocked by some issue in the period, however, as you metioned in the above issue: https://github.com/OptimalScale/LMFlow/issues/431
if we can manually download the model and allocate it in a right format, the error could be bypassed the problem.
I am wondering what is the proper way to allocate the files? 

For example, I am trying to fine tune bloom-560m:
https://huggingface.co/bigscience/bloom-560m/tree/main

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Traceback (most recent call last):
  File "/venv/lib/python3.9/site-packages/transformers/utils/hub.py", line 417, in cached_file
    resolved_file = hf_hub_download(
  File "/venv/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)
  File "/venv/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1291, in hf_hub_download
    raise LocalEntryNotFoundError(
huggingface_hub.utils._errors.LocalEntryNotFoundError: Connection error, and we cannot find the requested files in the disk cache. Please try again or make sure your Internet connection is on.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/LMFlow/examples/finetune.py", line 70, in <module>
    main()
  File "/LMFlow/examples/finetune.py", line 55, in main
    model = AutoModel.get_model(model_args)
  File "/LMFlow/src/lmflow/models/auto_model.py", line 14, in get_model
    return HFDecoderModel(model_args, *args, **kwargs)
  File "/LMFlow/src/lmflow/models/hf_decoder_model.py", line 113, in __init__
    config = AutoConfig.from_pretrained(model_args.model_name_or_path, **config_kwargs)
  File "/venv/lib/python3.9/site-packages/transformers/models/auto/configuration_auto.py", line 944, in from_pretrained
    config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/venv/lib/python3.9/site-packages/transformers/configuration_utils.py", line 574, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/venv/lib/python3.9/site-packages/transformers/configuration_utils.py", line 629, in _get_config_dict
    resolved_config_file = cached_file(
  File "/venv/lib/python3.9/site-packages/transformers/utils/hub.py", line 452, in cached_file
    raise EnvironmentError(
OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like bigscience/bloom-560m is not the path to a directory containing a file named config.json.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.
[2023-06-20 03:42:23,283] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 606506
[2023-06-20 03:42:23,417] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 606507
[2023-06-20 03:42:23,419] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 606508
[2023-06-20 03:42:23,421] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 606509
[2023-06-20 03:42:23,423] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 606573
[2023-06-20 03:42:23,424] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 606574
[2023-06-20 03:42:23,426] [ERROR] [launch.py:324:sigkill_handler] ['/venv/bin/python3.9', '-u', 'examples/finetune.py', '--local_rank=5', '--deepspeed', 'configs/ds_config_zero3.json', '--bf16', '--run_name', 'finetune_with_lora', '--model_name_or_path', 'bigscience/bloom-560m', '--num_train_epochs', '0.01', '--learning_rate', '2e-5', '--dataset_path', '/LMFlow/data/alpaca/train', '--per_device_train_batch_size', '1', '--per_device_eval_batch_size', '1', '--validation_split_percentage', '0', '--logging_steps', '20', '--block_size', '512', '--do_train', '--output_dir', 'output_models/finetune', '--overwrite_output_dir', '--ddp_timeout', '72000', '--save_steps', '5000', '--dataloader_num_workers', '1'] exits with return code = 1


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] huggingface Connection Error #506

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] huggingface Connection Error #506

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions