[BUG][autotuner.py:700:model_info_profile_run] The model is not runnable with DeepSpeed with error = (

**Describe the bug**
（(base) amd00@mz00:~/_model_$ git clone https://www.modelscope.cn/AI-ModelScope/gpt2.git

(ds_ex) amd00@mz00:~/repo/DeepSpeed$     deepspeed --autotuning tune --num_nodes=1 --num_gpus=4 /home/amd00/repo/transformers/examples/pytorch/language-modeling/run_clm.py --model_name_or_path /home/amd00/_model_/gpt2 \
    --deepspeed deepspeed/autotuning/config_templates/zero3.json  \
    --do_train --do_eval --fp16 --per_device_train_batch_size 8 --gradient_accumulation_steps 1
/home/amd00/anaconda3/envs/ds_ex/bin/deepspeed:4: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
  __import__('pkg_resources').require('deepspeed==0.16.8+ee492c30')
[2025-05-03 09:41:17,131] [INFO] [real_accelerator.py:239:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-05-03 09:41:17,191] [INFO] [real_accelerator.py:239:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-05-03 09:41:20,466] [WARNING] [runner.py:215:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2025-05-03 09:41:20,467] [INFO] [autotuner.py:71:__init__] Created autotuning experiments directory: autotuning_exps
[2025-05-03 09:41:20,467] [INFO] [autotuner.py:84:__init__] Created autotuning results directory: autotuning_exps
[2025-05-03 09:41:20,467] [INFO] [autotuner.py:200:_get_resource_manager] active_resources = OrderedDict([('localhost', [0, 1, 2, 3])])
[2025-05-03 09:41:20,468] [INFO] [runner.py:392:run_autotuning] [Start] Running autotuning
[2025-05-03 09:41:20,468] [INFO] [autotuner.py:413:tune] Fast mode is enabled. Tuning micro batch size only.
[2025-05-03 09:41:20,468] [INFO] [autotuner.py:669:model_info_profile_run] Starting model info profile run.
  0%|                                                                                                                                                                           | 0/1 [00:00<?, ?it/s][2025-05-03 09:41:20,483] [INFO] [scheduler.py:346:run_experiment] Scheduler wrote ds_config to autotuning_results/profile_model_info/ds_config.json, /home/amd00/repo/DeepSpeed/autotuning_results/profile_model_info/ds_config.json
[2025-05-03 09:41:20,485] [INFO] [scheduler.py:353:run_experiment] Scheduler wrote exp to autotuning_results/profile_model_info/exp.json, /home/amd00/repo/DeepSpeed/autotuning_results/profile_model_info/exp.json
[2025-05-03 09:41:20,487] [INFO] [scheduler.py:380:run_experiment] Launching exp_id = 0, exp_name = profile_model_info, with resource = localhost:0,1,2,3, and ds_config = /home/amd00/repo/DeepSpeed/autotuning_results/profile_model_info/ds_config.json
[2025-05-03 09:41:47,809] [INFO] [scheduler.py:432:clean_up] Done cleaning up exp_id = 0 on the following workers: localhost
[2025-05-03 09:41:47,809] [INFO] [scheduler.py:395:run_experiment] Done running exp_id = 0, exp_name = profile_model_info, with resource = localhost:0,1,2,3
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:35<00:00, 35.02s/it]
[2025-05-03 09:41:55,501] [ERROR] [autotuner.py:700:model_info_profile_run] The model is not runnable with DeepSpeed with error = (

[2025-05-03 09:41:55,502] [INFO] [runner.py:397:run_autotuning] [End] Running autotuning
(ds_ex) amd00@mz00:~/repo/DeepSpeed$ 



**ds_report output**
(ds_ex) amd00@mz00:~/repo/DeepSpeed$ ds_report
/home/amd00/anaconda3/envs/ds_ex/bin/ds_report:4: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
  __import__('pkg_resources').require('deepspeed==0.16.8+ee492c30')
[2025-05-03 09:43:50,328] [INFO] [real_accelerator.py:239:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-05-03 09:43:50,392] [INFO] [real_accelerator.py:239:get_accelerator] Setting ds_accelerator to cuda (auto detect)
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
async_io ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_lion ............... [NO] ....... [OKAY]
dc ..................... [NO] ....... [OKAY]
 [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
evoformer_attn ......... [NO] ....... [NO]
 [WARNING]  NVIDIA Inference is only supported on Ampere and newer architectures
 [WARNING]  FP Quantizer is using an untested triton version (3.3.0), only 2.3.(0, 1) and 3.0.0 are known to be compatible with these kernels
fp_quantizer ........... [NO] ....... [NO]
fused_lamb ............. [NO] ....... [OKAY]
fused_lion ............. [NO] ....... [OKAY]
gds .................... [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
inference_core_ops ..... [NO] ....... [OKAY]
cutlass_ops ............ [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
ragged_device_ops ...... [NO] ....... [OKAY]
ragged_ops ............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
 [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.7
 [WARNING]  using untested triton version (3.3.0), only 1.0.0 is known to be compatible
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/amd00/anaconda3/envs/ds_ex/lib/python3.10/site-packages/torch']
torch version .................... 2.7.0+cu126
deepspeed install path ........... ['/home/amd00/repo/DeepSpeed/deepspeed']
deepspeed info ................... 0.16.8+ee492c30, ee492c30, master
torch cuda version ............... 12.6
torch hip version ................ None
nvcc version ..................... 12.6
deepspeed wheel compiled w. ...... torch 2.7, cuda 12.6
shared memory (/dev/shm) size .... 497.78 GB
(ds_ex) amd00@mz00:~/repo/DeepSpeed$ 

**Additional context**
zero3.json:



{
  "autotuning": {
    "enabled": true,
      "arg_mappings": {
        "train_micro_batch_size_per_gpu": "--per_device_train_batch_size",
        "gradient_accumulation_steps ": "--gradient_accumulation_steps"
      }
  },
  "train_batch_size": "auto",
  "train_micro_batch_size_per_gpu": "auto",
  "gradient_accumulation_steps": "auto",
  "gradient_clipping": 1,
  "fp16": {
    "enabled": "auto"
  },
  "optimizer": {
    "type": "AdamW",
    "params": {
      "lr": "auto",
      "betas": "auto",
      "eps": "auto",
      "weight_decay": "auto"
    }
  },
  "scheduler": {
    "type": "WarmupLR",
    "params": {
      "warmup_min_lr": "auto",
      "warmup_max_lr": "auto",
      "warmup_num_steps": "auto"
    }
  },
  "activation_checkpointing": {
      "partition_activations": true,
      "cpu_checkpointing": true,
      "contiguous_memory_optimization": false,
      "synchronize_checkpoint_boundary": false
  },
  "zero_optimization": {
    "stage": 3,
    "allgather_partitions": true,
    "allgather_bucket_size": 5e8,
    "overlap_comm": false,
    "reduce_scatter": true,
    "reduce_bucket_size": 5e8,
    "contiguous_gradients": false,
    "stage3_max_live_parameters": 1e9,
    "stage3_max_reuse_distance": 1e9,
    "stage3_prefetch_bucket_size": 5e8,
    "stage3_param_persistence_threshold": 1e6,
    "stage3_gather_16bit_weights_on_model_save": false,
    "sub_group_size": 1e12
  }
}



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG][autotuner.py:700:model_info_profile_run] The model is not runnable with DeepSpeed with error = ( #7269

DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja
ninja .................. [OKAY]

op name ................ installed .. compatible

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG][autotuner.py:700:model_info_profile_run] The model is not runnable with DeepSpeed with error = ( #7269

Description

DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja ninja .................. [OKAY]

op name ................ installed .. compatible

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja
ninja .................. [OKAY]