[BUG]AttributeError: 'DeepSpeedEngine' object has no attribute 'quantizer'

**Describe the bug**
When I want to use deepspeed to accelerate my T5-XL pretraining with hugging face but without huggingface's trainer, it occured:
[2022-03-16 09:38:47,187] [INFO] [stage3.py:2553:_overflow_clean_up] [deepscale] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 4294967296, reducing to 4294967296

Traceback (most recent call last):
  File "main.py", line 140, in <module>
    train(args, model, train_dataset, ds_config)
  File "main.py", line 64, in train
    model_engine.step()
  File "/opt/conda/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1855, in step
    self._take_model_step(lr_kwargs)
  File "/opt/conda/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1767, in _take_model_step
    if self.quantizer:
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1185, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'DeepSpeedEngine' object has no attribute 'quantizer'

**To Reproduce**
the full code is too long,so I just show key segment,if you want to see whole code , I will paste it.:
# At first, in main function:
    if args.local_rank == -1:
        device = torch.device("cuda")
    else:
        torch.cuda.set_device(args.local_rank)
        device = torch.device("cuda", args.local_rank)
        # torch.distributed.init_process_group(backend="nccl")
        deepspeed.init_distributed()
    args.device = device
    args.n_gpu = len(args.cuda.split(","))
    set_seed(args)

    # Model and dataset
    with open('./ds_config.json') as f:
        ds_config = json.load(f)
    dschf = HfDeepSpeedConfig(ds_config)
    print('init model')
    with deepspeed.zero.Init():
        config = AutoConfig.from_pretrained("./")
        model = T5ForConditionalGeneration(config)
    print('init dataset')
    train_dataset = FinT5_Dataset(args)
    # Barrier to make sure all process train the model simultaneously.
    if args.local_rank != -1:
        torch.distributed.barrier()
    train(args, model, train_dataset, ds_config)

# secondly , ds_config  = 
{
"fp16": {"enabled": true, "loss_scale": 0, "initial_scale_power": 32, "loss_scale_window": 1000, "hysteresis": 2, "min_loss_scale": 1}, 
"zero_optimization":
 {"stage": 3, "overlap_comm": true, "contiguous_gradients": true, "reduce_bucket_size": model_hidden_size * model_hidden_size, 
"stage3_prefetch_bucket_size": 0.9 * model_hidden_size * model_hidden_size, 
"stage3_param_persistence_threshold": 10 * model_hidden_size}, 
//(the model_hidden_size for T5-XL=2048, these setting follows https://github.com/huggingface/transformers/issues/15399#issuecomment-1024961604, actually I dont know what it means...)
"optimizer": {"type": "AdamW", "params": {"lr": 0.0001, "betas": [0.8, 0.999], "eps": 1e-08, "weight_decay": 3e-07}},
 "scheduler": {"type": "WarmupLR", "params": {"warmup_min_lr": 0, "warmup_max_lr": 0.0001, "warmup_num_steps": 1000}}, 
"steps_per_print": 200, 
"train_batch_size": 256,
 "train_micro_batch_size_per_gpu": 32,
 "gradient_accumulation_steps": 1, 
"wall_clock_breakdown": false}

# 3rd, in train function:
def train(args, model, train_dataset, ds_config):
 

    train_sampler = data.distributed.DistributedSampler(train_dataset) if args.local_rank != -1 else data.RandomSampler(train_dataset)
    params = {"batch_size": args.batch_size_per_gpu, "sampler": train_sampler}
    train_dataloader = data.DataLoader(train_dataset, **params)

    # deepspeed training
    model_engine, optimizer, _, _  = deepspeed.initialize(model=model, config_params=ds_config)

    print("Begin train...")
    
    global_step = 0

    start_time = time.time()

    for i in range(args.max_epoch):
        if args.local_rank != -1:
            train_sampler.set_epoch(i)
        for step, batch in enumerate(train_dataloader):
            global_step += 1
            #forward() method
            loss = model_engine(
                input_ids=batch[0].to('cuda'),
                attention_mask=batch[1].to('cuda'),
                labels=batch[2].to('cuda')).loss
            # print(loss)

            #runs backpropagation
            model_engine.backward(loss)

            #weight update
            model_engine.step()  #error occur
            if global_step % args.save_interval == 0:
                model_engine.save_checkpoint(args.save_dir, global_step)
            if global_step == args.max_step:
                return

**ds_report output**

JIT compiled ops requires ninja
ninja .................. [OKAY]
op name ................ installed .. compatible
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
 [WARNING]  please install triton==1.0.0 if you want to use sparse attention
sparse_attn ............ [NO] ....... [NO]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
async_io ............... [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
DeepSpeed general environment info:
torch install path ............... ['/opt/conda/lib/python3.8/site-packages/torch']
torch version .................... 1.11.0a0+b6df043
torch cuda version ............... 11.5
torch hip version ................ None
nvcc version ..................... 11.5
deepspeed install path ........... ['/opt/conda/lib/python3.8/site-packages/deepspeed']
deepspeed info ................... 0.6.0, unknown, unknown
deepspeed wheel compiled w. ...... torch 1.11, cuda 11.5, hip 0.0

**System info (please complete the following information):**
 - OS: Ubuntu 20.04
 - GPU count and types:one machines with x8 A100s
 - Python version:3.8
 - Any other relevant info about your setup

**Launcher context**
deepspeed --num_gpus=8 main.py 
**Docker context**
nvcr.io/nvidia/pytorch:21.12-py3 (pip install deepspeed)
**Additional context**
Add any other context about the problem here.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]AttributeError: 'DeepSpeedEngine' object has no attribute 'quantizer' #1837

At first, in main function:

secondly , ds_config =

3rd, in train function:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG]AttributeError: 'DeepSpeedEngine' object has no attribute 'quantizer' #1837

Description

At first, in main function:

secondly , ds_config =

3rd, in train function:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions