Skip to content

AttributeError: 'BiDecoderOnlyEmbedderICLModel' object has no attribute 'config', when tune bge-en-icl with deepspeed zero3 #1446

@leyi-123

Description

@leyi-123

Limited by computing resources, I tune bge-en-icl with deepspeed zero3, whose config is as following:

{
    "zero_optimization": {
        "stage": 3,
        "offload_optimizer": {
            "device": "cpu",
            "pin_memory": true
        },
        "offload_param": {
            "device": "cpu",
            "pin_memory": true
        },
        "overlap_comm": true,
        "contiguous_gradients": true,
        "sub_group_size": 1e9,        
        "reduce_bucket_size": "auto",
        "stage3_prefetch_bucket_size": "auto",
        "stage3_param_persistence_threshold": "auto",
        "stage3_max_live_parameters": 1e9,
        "stage3_max_reuse_distance": 1e9,
        "stage3_gather_fp16_weights_on_model_save": true
    },
    "fp16": {
        "enabled": "auto",
        "loss_scale": 0,
        "initial_scale_power": 10,
        "loss_scale_window": 1000,
        "hysteresis": 2,
        "min_loss_scale": 1
    },
    "bf16": {
        "enabled": "auto",
        "loss_scale": 0,
        "initial_scale_power": 10,
        "loss_scale_window": 1000,
        "hysteresis": 2,
        "min_loss_scale": 1
    },
    "optimizer": {
        "type": "AdamW",
        "params": {
            "lr": "auto",
            "betas": "auto",
            "eps": "auto",
            "weight_decay": "auto",
            "torch_adam": true
        }
    },
    "scheduler": {
        "type": "WarmupDecayLR",
        "params": {
            "warmup_min_lr": "auto",
            "warmup_max_lr": "auto",
            "warmup_num_steps": "auto",
            "total_num_steps": "auto"
        }
    },

    "gradient_accumulation_steps": "auto",
    "gradient_clipping": "auto",
    "steps_per_print": 1000,
    "train_batch_size": "auto",
    "train_micro_batch_size_per_gpu": "auto",
    "wall_clock_breakdown": false
}

However, at the beginning of the tuning process, I meet the following error message:

[rank0]: Traceback (most recent call last): 
[rank0]:   File "/data//miniconda3/envs/bge/lib/python3.10/runpy.py", line 196, in _run_module_as_main
[rank0]:     return _run_code(code, main_globals, None,
[rank0]:   File "/data//miniconda3/envs/bge/lib/python3.10/runpy.py", line 86, in _run_code
[rank0]:     exec(code, run_globals)
[rank0]:   File "/data//miniconda3/envs/bge/lib/python3.10/site-packages/FlagEmbedding/finetune/embedder/decoder_only/icl/__main__.py", line 26, in <module>
[rank0]:     runner.run()
[rank0]:   File "/data//miniconda3/envs/bge/lib/python3.10/site-packages/FlagEmbedding/finetune/embedder/decoder_only/icl/runner.py", line 157, in run
[rank0]:     self.trainer.train(resume_from_checkpoint=self.training_args.resume_from_checkpoint)
[rank0]:   File "/data//miniconda3/envs/bge/lib/python3.10/site-packages/transformers/trainer.py", line 2164, in train
[rank0]:     return inner_training_loop(
[rank0]:   File "/data//miniconda3/envs/bge/lib/python3.10/site-packages/transformers/trainer.py", line 2262, in _inner_training_loop
[rank0]:     self.optimizer, self.lr_scheduler = deepspeed_init(self, num_training_steps=max_steps)
[rank0]:   File "/data//miniconda3/envs/bge/lib/python3.10/site-packages/transformers/integrations/deepspeed.py", line 398, in deepspeed_init
[rank0]:     hf_deepspeed_config.trainer_config_finalize(args, model, num_training_steps)
[rank0]:   File "/data//miniconda3/envs/bge/lib/python3.10/site-packages/transformers/integrations/deepspeed.py", line 226, in trainer_config_finalize
[rank0]:     if hasattr(model.config, "hidden_size"):
[rank0]:   File "/data//miniconda3/envs/bge/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1729, in __getattr__ 
[rank0]:     raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'") 
[rank0]: AttributeError: 'BiDecoderOnlyEmbedderICLModel' object has no attribute 'config' 

My torch version is 2.4.0+cu118, transformers version is 4.47.1 and deepspeed version is 0.16.7. I want to know how to tune bge-en-icl (based on mistral-7B) with deepspeed zero3. Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions